I am working on the growing datasets lab.
When I am trying to load to embedding for the train set, I am loading 32% of the data very fast (33 seconds) but gets stuck there every time.
Here is the slow part of the code (from OpenImagesDataset, _load_embeddings_df):
for filepath in wrapped_loader:
df = pd.read_parquet(filepath)
index_ls.append(df.index)
values_ls.append(df['embedding'].tolist())
My RAM and CPU usage isn't too high.
Edit: I have noticed that my runtime restarts every time (probably because I used more than my 16GB of RAM)