performance with nested fields #1595
Replies: 2 comments
-
Just in case, I removed the "repeatdt" and created all the particles from the start (using "time" in the ParticleSet to delay their activation). |
Beta Was this translation helpful? Give feedback.
-
Hi @lucvandenbulcke, thanks for the detailed write-up of your performance issues; and sorry for the somewhat slow reply (I was on holidays the last two weeks). I think there are two somewhat separated issues here; the general performance of chunking and the implementation in parcels. In our group, we often see that The more important issue is that nesting in parcels is not as efficient as it can be (or as you would expect it to be). Very few users run with nesting on, so we never had much motivation to improve its performance. The way it's implemented is that parcels literally at every field interpolation call (so four(!) times each RK4-call) goes through the list of fields and tries interpolate the particle position in each one until it doesn't get an outofbounds returned. So for a particle that 'lives' in the outermost of three nests, that would mean twelve field interpolations for each RK4 step. In other words, the performance decreases linearly with the number of grids. Now, it's possible to come up with smarter algorithms/implementations to work with nested grids, but we simply haven't had the need yet. And to be honest, I don't think we have the resources in my team to majorly improve this the coming months. How urgent/important is being able to run on four nested grids for you? |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am running a case using current data in 4 nested grids, obtained from a NEMO simulation, stokes data, and wind data.
The currents are obtained from NEMO files; they are probably badly conditioned for Parcels, as they have a single chunk covering the whole spatial area (depth,lat,lon); the chunksize in the time direction is 1. The daily files are also compressed.
I release just 13 particles at the start, and then "repeatdt" them every 6 hours. I use a larger chunk in the output file,
output_file = pset.ParticleFile(name=out_filename, outputdt=delta(minutes=savedt), chunks=(10000,1))
The 13 particles are initially in the innermost of the 4 grids, but will flow in its parent and (eventually) in the parent's parent.
I did not use any chunking when loading each of the 4 fieldsets, as decompressing the netcdf variables would require to load the whole variable anyway:
fieldset=FieldSet.from_nemo(filenames, variables, dimensions, indices=indices, chunksize=False)
Apart from NEMO currents, I also load Stokes drift and atmospheric wind ; these netcdf files are chunked in the horizontal (and have no vertical dimension).
Performance is very bad, ~1000 it/s at the beginning, and keeps dropping during the run. After simulating 2 months, there are still only about 4000 particles, I have less than 100 it/s.
Then I used 4 MPI threads, but performance is slightly worse. I confess that I did not simulate for too long, just a week or two, so maybe using MPI would start to be beneficial after more time (more particles balancing the file-loading overhead).
Going back to 1 thread, I used NCO (ncks) to re-chunk the netcdf files; they now have chunk sizes (time=1,depth=1,lat=O(200),lon=O(100)) . O(200) means something like 200 (the exact value is different for each of the 4 grids). I suppose that now, when reading values in the files, only 1 vertical layer needs to be read and decompressed (and about half the size in the horizontal directions).
However my optimism was premature; re-chunking the files did not seem to improve the speed.
I tried using chunksizes for the fieldset (corresponding to the netcdf chunksizes), also that did not help
cs={'U': {'time': ('time_counter', chunksizes[3]), 'depth': ('depthu', chunksizes[2]), 'lat': ('y', chunksizes[1]), 'lon': ('x', chunksizes[0])},
'V': {'time': ('time_counter', chunksizes[3]), 'depth': ('depthv', chunksizes[2]), 'lat': ('y', chunksizes[1]), 'lon': ('x', chunksizes[0])}}
fieldset=FieldSet.from_nemo(filenames, variables, dimensions, indices=indices, chunksize=cs)
Furthermore this creates 2 warnings that get printed a lot on the screen. I suppose the first one has no consequences for now:
fieldfilebuffer.py:458: FutureWarning: The return type of 'Dataset.dims' will be changed to return a set of dimension names in future, in order to be more consistent with 'DataArray.dims'. To access a mapping from dimension names to lengths, please use 'Dataset.sizes'
The second warning is more worrying:
dataset.py:275: UserWarning: The specified chunks separate the stored chunks along dimension "x" starting at index 289. This could degrade performance. Instead, consider rechunking after loading.
(and the same for the y dimensions).
Maybe the netcdf chunk sizes and fieldset chunk sizes do not correspond after all? For the parent grid fieldset (the one with chunk size of 289 in the x direction), I do not use indices. In the nested grids, I use them to exclude the first and last row and column because NEMO (or XIOS) stores zeros in these rows and columns.
Or maybe the warning is due to something else than bad correspondence of chunk sizes, but then I wouldn't know what it is.
Finally, using chunksize="auto" does not improve speed either.
In none of the cases, memory seems to be a problem (~4 GB used out of 64). I read discussions here, a.o. #1404, #1485, #1554; and Kehl et al, but I'm a little bit lost at what to try next.
Thank you !
Beta Was this translation helpful? Give feedback.
All reactions