Want to know how to control the size of the output file(detailed)! #1177

LiuJiao0408 · 2022-06-05T06:11:46Z

LiuJiao0408
Jun 5, 2022

@philippemiron
When_ I do a long simulation, the output file is often large.
This makes the system unable to create a large NC file (e.g. 22G)

I have tried this statement:
output_filename = "D:/Data/sumi_0601.nc"
out_interval = timedelta(hours = 24)
output_file = pset.ParticleFile(name = output_filename, outputdt = out_interval, chunksizes=[200,1])

But this statement reports an aurgument error. do not have 'chunksizes'
I wonder if there is any good way (except to reduce the frequency of output)?

the error

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15240/3313754045.py in
2 output_filename = "D:/Data/sumi_0601.nc"
3 out_interval = timedelta(hours = 24)
----> 4 output_file = pset.ParticleFile(name = output_filename, outputdt = out_interval, chunksizes=[200,1])

~.conda\envs\parcel_formp\lib\site-packages\parcels\particleset\particlesetsoa.py in ParticleFile(self, *args, **kwargs)
655 """Wrapper method to initialise a :class:parcels.particlefile.ParticleFile
656 object from the ParticleSet"""
--> 657 return ParticleFile(*args, particleset=self, **kwargs)
658
659 def set_variable_write_status(self, var, write_status):

TypeError: init() got an unexpected keyword argument 'chunksizes'

philippemiron · 2022-06-05T14:06:31Z

philippemiron
Jun 5, 2022

The chunksize parameter is for the input velocity fiels.

One suggestion, would be to create a loop where you run the trajectories for a shorter time, output the file, then restart them by recreating your pset object using the method from_particlefile().

You could also run subsets of your total number of particles for the complete integration time.

I'm curious to see what is the error when you are trying to create the ~22Gb output file?

Cheers,

1 reply

LiuJiao0408 Jun 8, 2022
Author

Thank you for noticing my question!
I mistake for the parameter chunksize could control the number of particles in the output file!
This is my mistake, sorry!

The contents of the error message are as follows:
when writing the NC file , the data should be converted into an array and then written into the NC file.
The error occurred is: the array was too large to create or failed to create array with shape(100000,7300).

So I want to know if there is a way to output the files in the order of particles.
For example, particles NO.1-NO.200 are an output file, and particles 201-400 are an output file . ... ...

erikvansebille · 2022-06-06T12:56:30Z

erikvansebille
Jun 6, 2022
Maintainer

Thanks for raising this Issue, @LiuJiao0408. Note that @JamiePringle has already done some work to avoid problems with large files sizes at the step where the individual *.npy files are converted to .nc format. See for example the discussion at #1159 and especially #1091 (comment). Does it help to use @JamiePringle's code at the bottom of that comment?

1 reply

LiuJiao0408 Jun 8, 2022
Author

Thank you for noticing my question! And thank you for your reply. I will try again immediately!
Thank you very much.

JamiePringle · 2022-06-06T13:20:00Z

JamiePringle
Jun 6, 2022
Collaborator

Chunksize won't help, for the reason @philippemiron mentions above.

Just to add to this thread. A significant constraint to using oceanparcels on large problems is the conversion of the temporary files to the final netCDF or Zarr output. The issue is that there is no easy way to predict which parcel and when is written to to which temporary file. To write out the data in an orderly way, the code to create the output file must figure out what is where. The existing code loads the entire output to memory before writing it out. I wrote my code to deal with output sizes for runs that range between 161Gb to 500Gb (compressed -- uncompressed, as is the default, would be several times larger).

If your output is only 22Gb, you might want to try a machine with more memory, but that is not a real solution.

My code to get around this, as given in #1091, reduces the memory needed to convert from the temporary .npy files, but it is not perfect. It stores in memory the data required to store (the longest single drifter output record)*(the number of drifters in a single MPI rank). So if you use it, you will reduce memory usage by killing particles after they have drifted as long as you need them to drift and if you run your code on multiple cores using MPI. Note you can have more MPI processes then cores, if need be. Not super efficient, but will work.

Note as well if you are running big runs, you should look at my code and think about doing things like 1) compressing the netCDF/zarr, and 2) reducing the precission of the lat/lon variables.

Jamie

2 replies

LiuJiao0408 Jun 8, 2022
Author

Thank you for noticing my problem first!
I am currently running this program on a single core. This may have caused the problem.
I will try the method you provided immediately! Thank you very much for your advice!

The error message is that the array is too large to create. So I want to know if there is a way to output files multiple times according to a certain number of particles!

Your team and work are very nice!

JamiePringle Jun 8, 2022
Collaborator

That option does not exist, as far as I know. You could alter my code to do so -- the issue is that you don't know what drifters are in which temporary output files (the *.npy files), and so you have to scan all the temporary files to find them.

Jamie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to know how to control the size of the output file(detailed)! #1177

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Want to know how to control the size of the output file(detailed)! #1177

LiuJiao0408 Jun 5, 2022

the error

Replies: 3 comments · 4 replies

philippemiron Jun 5, 2022

LiuJiao0408 Jun 8, 2022 Author

erikvansebille Jun 6, 2022 Maintainer

LiuJiao0408 Jun 8, 2022 Author

JamiePringle Jun 6, 2022 Collaborator

LiuJiao0408 Jun 8, 2022 Author

JamiePringle Jun 8, 2022 Collaborator

LiuJiao0408
Jun 5, 2022

Replies: 3 comments 4 replies

philippemiron
Jun 5, 2022

LiuJiao0408 Jun 8, 2022
Author

erikvansebille
Jun 6, 2022
Maintainer

LiuJiao0408 Jun 8, 2022
Author

JamiePringle
Jun 6, 2022
Collaborator

LiuJiao0408 Jun 8, 2022
Author

JamiePringle Jun 8, 2022
Collaborator