Want to know how to control the size of the output file(detailed)! #1177
Replies: 3 comments 4 replies
-
The chunksize parameter is for the input velocity fiels. One suggestion, would be to create a loop where you run the trajectories for a shorter time, output the file, then restart them by recreating your You could also run subsets of your total number of particles for the complete integration time. I'm curious to see what is the error when you are trying to create the ~22Gb output file? Cheers, |
Beta Was this translation helpful? Give feedback.
-
Thanks for raising this Issue, @LiuJiao0408. Note that @JamiePringle has already done some work to avoid problems with large files sizes at the step where the individual *.npy files are converted to .nc format. See for example the discussion at #1159 and especially #1091 (comment). Does it help to use @JamiePringle's code at the bottom of that comment? |
Beta Was this translation helpful? Give feedback.
-
Chunksize won't help, for the reason @philippemiron mentions above. Just to add to this thread. A significant constraint to using oceanparcels on large problems is the conversion of the temporary files to the final netCDF or Zarr output. The issue is that there is no easy way to predict which parcel and when is written to to which temporary file. To write out the data in an orderly way, the code to create the output file must figure out what is where. The existing code loads the entire output to memory before writing it out. I wrote my code to deal with output sizes for runs that range between 161Gb to 500Gb (compressed -- uncompressed, as is the default, would be several times larger). If your output is only 22Gb, you might want to try a machine with more memory, but that is not a real solution. My code to get around this, as given in #1091, reduces the memory needed to convert from the temporary .npy files, but it is not perfect. It stores in memory the data required to store (the longest single drifter output record)*(the number of drifters in a single MPI rank). So if you use it, you will reduce memory usage by killing particles after they have drifted as long as you need them to drift and if you run your code on multiple cores using MPI. Note you can have more MPI processes then cores, if need be. Not super efficient, but will work. Note as well if you are running big runs, you should look at my code and think about doing things like 1) compressing the netCDF/zarr, and 2) reducing the precission of the lat/lon variables. Jamie |
Beta Was this translation helpful? Give feedback.
-
@philippemiron
When_ I do a long simulation, the output file is often large.
This makes the system unable to create a large NC file (e.g. 22G)
I have tried this statement:
output_filename = "D:/Data/sumi_0601.nc"
out_interval = timedelta(hours = 24)
output_file = pset.ParticleFile(name = output_filename, outputdt = out_interval, chunksizes=[200,1])
But this statement reports an aurgument error. do not have 'chunksizes'
I wonder if there is any good way (except to reduce the frequency of output)?
the error
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15240/3313754045.py in
2 output_filename = "D:/Data/sumi_0601.nc"
3 out_interval = timedelta(hours = 24)
----> 4 output_file = pset.ParticleFile(name = output_filename, outputdt = out_interval, chunksizes=[200,1])
~.conda\envs\parcel_formp\lib\site-packages\parcels\particleset\particlesetsoa.py in ParticleFile(self, *args, **kwargs)
655 """Wrapper method to initialise a :class:
parcels.particlefile.ParticleFile
656 object from the ParticleSet"""
--> 657 return ParticleFile(*args, particleset=self, **kwargs)
658
659 def set_variable_write_status(self, var, write_status):
TypeError: init() got an unexpected keyword argument 'chunksizes'
Beta Was this translation helpful? Give feedback.
All reactions