You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When fit on large datasets, parametric UMAP throws an unknown shape error. To reproduce this issue, use the below code:
import numpy as np
from umap import ParametricUMAP
from sklearn.preprocessing import StandardScaler
#import tensorflow as tf
#tf.config.run_functions_eagerly(True)
n_samples = 50000
n_features = 1536
cluster1 = np.random.normal(0, 1, (n_samples, n_features))
cluster2 = np.random.normal(3, 1, (n_samples, n_features))
X = np.vstack((cluster1, cluster2))
np.random.shuffle(X)
print("Synthetic data shape:", X.shape)
scaler = StandardScaler()
X_scaled = X
pumap = ParametricUMAP(
n_components=2,
n_neighbors=30,
verbose=True
)
embedding = pumap.fit_transform(X_scaled)
Here's the error msg:
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/umap/parametric_umap.py", line 152, in fit_transform
return super().fit_transform(X, y)
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/umap/umap_.py", line 2891, in fit_transform
self.fit(X, y, force_all_finite)
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/umap/parametric_umap.py", line 137, in fit
return super().fit(X, y)
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/umap/umap_.py", line 2784, in fit
self.embedding_, aux_data = self._fit_embed_data(
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/umap/parametric_umap.py", line 288, in _fit_embed_data
history = self.parametric_model.fit(
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/ubuntu/anaconda3/envs/wildchat/lib/python3.10/site-packages/optree/ops.py", line 747, in tree_map
return treespec.unflatten(map(func, *flat_args))
ValueError: as_list() is not defined on an unknown TensorShape.
The issue is because the construction of the edge_dataset is switched to using tf.py_function when the input X is large enough:
import tensorflow as tf
tf.config.run_functions_eagerly(True)
Although I receive the following related warning:
/home/dev/.pyenv/versions/3.12.2/lib/python3.12/site-packages/tensorflow/python/data/ops/structured_function.py:258: UserWarning: Even though the `tf.config.experimental_run_functions_eagerly` option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use `tf.data.experimental.enable_debug_mode()`.
warnings.warn(
When fit on large datasets, parametric UMAP throws an unknown shape error. To reproduce this issue, use the below code:
Here's the error msg:
The issue is because the construction of the
edge_dataset
is switched to usingtf.py_function
when the inputX
is large enough:umap/umap/parametric_umap.py
Line 648 in c72ac2f
This issue can be fixed by enabling eager mode tensorflow (which is very slow):
Or by setting
gather_indices_in_python = False
, which I'm not sure if would cause other issues (at least it works for my case).The text was updated successfully, but these errors were encountered: