-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Momey leak #349
Comments
Interesting. You're just visualizing an array with funlib show neuroglancer and it's using 500gb? |
no no, that graph is from dacapo train job. but i mentioned neuroglancer as side example |
Can you provide some (ideally simplified) config combination that leads to a similar memory profile? |
I don't know how to give you this exact code because i was using +200 nrs crops, but this iss the code i was running: # %%
import csv
import json
import os
import dacapo
# %%
datasplit_path = "datasplit_v2.csv"
classes_to_be_used_path = "to_be_used_v2.json"
#%%
with open(classes_to_be_used_path, 'r') as f:
classes = ["bg"]+list(json.load(f).keys())
# %%
from dacapo.experiments.datasplits import DataSplitGenerator
from funlib.geometry import Coordinate
from dacapo.store.create_store import create_config_store
config_store = create_config_store()
# %%
from dacapo.experiments.datasplits import DataSplitGenerator
from funlib.geometry import Coordinate
input_resolution = Coordinate(8, 8, 8)
output_resolution = Coordinate(8,8,8)
datasplit_config = DataSplitGenerator.generate_from_csv(
datasplit_path,
input_resolution,
output_resolution,
# targets=classes,
name="base_model_20241120_20_target_classes",
# max_validation_volume_size = 400**3,
).compute()
# %%
datasplit = datasplit_config.datasplit_type(datasplit_config)
config_store.store_datasplit_config(datasplit_config)
# %%
from dacapo.experiments.tasks import OneHotTaskConfig,
simple_one_hot = OneHotTaskConfig(
name="one_hot_task",
classes=classes,
kernel_size=1,
)
config_store.store_task_config(simple_one_hot)
# %%
from dacapo.experiments.architectures import CNNectomeUNetConfig
architecture_config = CNNectomeUNetConfig(
name="simple_unet",
input_shape=(2, 132, 132),
eval_shape_increase=(8, 32, 32),
fmaps_in=1,
num_fmaps=8,
fmaps_out=8,
fmap_inc_factor=2,
downsample_factors=[(1, 4, 4), (1, 4, 4)],
kernel_size_down=[[(1, 3, 3)] * 2] * 3,
kernel_size_up=[[(1, 3, 3)] * 2] * 2,
constant_upsample=True,
padding="valid",
)
config_store.store_architecture_config(architecture_config)
# %%
from dacapo.experiments.trainers import GunpowderTrainerConfig
trainer_config = GunpowderTrainerConfig(
name="default_v3",
batch_size=2,
learning_rate=0.0001,
num_data_fetchers=20,
augments=[
ElasticAugmentConfig(
control_point_spacing=[100, 100, 100],
control_point_displacement_sigma=[10.0, 10.0, 10.0],
rotation_interval=(0, math.pi / 2.0),
subsample=8,
uniform_3d_rotation=True,
),
IntensityAugmentConfig(
scale=(0.25, 1.75),
shift=(-0.5, 0.35),
clip=True,
),
GammaAugmentConfig(gamma_range=(0.5, 2.0)),
IntensityScaleShiftAugmentConfig(scale=2, shift=-1),
],
snapshot_interval=100000,
clip_raw=False,
)
config_store.store_trainer_config(trainer_config)
# %%
from dacapo.experiments import RunConfig
from dacapo.experiments.run import Run
iterations = 1000000
validation_interval = 10000
run_config = RunConfig(
name=f"simple_base_model",
datasplit_config=datasplit_config,
task_config=simple_one_hot,
architecture_config=architecture_config,
trainer_config=trainer_config,
num_iterations=iterations,
validation_interval=validation_interval,
)
config_store.store_run_config(run_config)
# %%
# i submitted it in different job $ dacapo train run_name
from dacapo import train
train(run_config.name) |
Ah, so this might not be a memory leak, just lots of data. |
Usually is not a problem even if there is a lot of data, because of the lazy loading. |
i submitted : |
@pattonw i think this is the problem:
|
It could be the dask array, but we never call persist, and I'm pretty sure using |
now it is clear that is related to high number of crops. but i don't know how can i narrow the reason of the bug more |
My best guess is the masking. |
didn't work :/ |
I tried directly memory profiling script: from funlib.persistence import Array, prepare_ds
from funlib.geometry import Coordinate, Roi
import dask.array as da
import random
a = prepare_ds(
"scratch/test.zarr", (10_000, 10_000, 10_000), chunk_shape=(10, 10, 10)
)
for ii in range(1000):
print(ii)
roi = Roi(
Coordinate(*[random.randint(0, a.shape[i] // 100) for i in range(3)]),
(10, 10, 10),
)
x = a[roi]
print(x.sum()) It doesn't seem like there's a memory leak from repeatedly accessing zarrs through Memory profiling I was using: |
I adapted your script with a fairly basic data setup and tested 3 different modalities. See the script here. I ran each of the following modalities for about 10 minutes at around 50 its/sec so about 30k iterations each. Each ran with 20 data fetching workers. It looks like the memory cost stays fairly constrained across all runs. Here are the results: |
script: here but with 3d model the problem stated: |
Looks like the problem is present in both, but just on a smaller scale with the 2D model. Very strange that it is dependent on the model architecture |
Are these the 2 configs you are calling "2D" and "3D"?
One thing I notice is that the input shape is much higher for the 3D model than the 2D. |
yes that's what i mean by 2d / 3d |
After the new version
there is a memory leak
I think it is coming from funlib.persistence
Because funlib.show.neuroglancer is becoming really slow and buggy
@pattonw
The text was updated successfully, but these errors were encountered: