How can I further optimize simulation performance? #1485

nicrie · 2023-12-13T23:02:12Z

nicrie
Dec 13, 2023

Dear Erik et al.

First of all, thank you for providing this software! This is my first time using Parcels, but the documentation greatly facilitates getting started.

Problem Description

I use Parcels to run a simulation but it feels relatively slow. Since it's my first time I don't have any experience to back up my feeling, only by reading through the issues and discussion here on Github, my application seems relatively small.

Could my specific implementation be suboptimal? Are there ways I could improve the performance?

Application

I am following the framework presented in van Duinen et al. 2022 to identify potential sources of beach litter. My case study and technical implementation follows very closely the original study, here are the key points summarized:

FieldSet

From reading the recent study by Kehl et al. 2023 the I/O of the spatio-temporal CMEMS data could be the bottleneck. The rest are only spatial/constant fields, so probably not the issue (?).

U, V (CMEMS Northwestern Shelf Reanalysis, hourly resolution)
Stokes drift (CMEMS Northwestern Shelf Wave Reanalysis, three-hourly resolution)
Coastal displacement field (spatial only)
Land mask (spatial only)
Distance to shore (spatial only)
Diffusion (constant)

I have saved the CMEMS data fields as daily files with dimensions {"time": 24, "latitutude": 355, "longitude": 287} (currents) and {"time": 8 , "latitude": 1240, "longitude":958} (Stokes). Following the advice mentioned by @CKehl in several discussions here on Github, I load the fields with chunksizes matching the netcdf chunking:

Surface currents

path_currents = "data/cmems/nwshelf_multiyear_phy_004_009_extended/*nws*"
variables_currents = {"U": "uo", "V": "vo"}
dimensions_currents = {
    "U": {"lon": "longitude", "lat": "latitude", "time": "time"},
    "V": {"lon": "longitude", "lat": "latitude", "time": "time"},
}
chunksize_currents = {
    "U": {
        "lon": ("longitude", 287),
        "lat": ("latitude", 355),
        "time": ("time", 24),
    },
    "V": {
        "lon": ("longitude", 287),
        "lat": ("latitude", 355),
        "time": ("time", 24),
    },
}

fieldset = FieldSet.from_netcdf(
    filenames=path_currents,
    variables=variables_currents,
    dimensions=dimensions_currents,
    deferred_load=True,
    chunksize=chunksize_currents,
)

Stokes

path_stokes = "data/cmems/nwshelf_reanalysis_wav_004_015/*.nc"
variables_stokes = {"U_Stokes": "VSDX", "V_Stokes": "VSDY"}
dimensions_stokes = {
    "U_Stokes": {"lon": "longitude", "lat": "latitude", "time": "time"},
    "V_Stokes": {"lon": "longitude", "lat": "latitude", "time": "time"},
}
chunksize_stokes = {
    "U_Stokes": {
        "lon": ("longitude", 958),
        "lat": ("latitude", 1240),
        "time": ("time", 8),
    },
    "V_Stokes": {
        "lon": ("longitude", 958),
        "lat": ("latitude", 1240),
        "time": ("time", 8),
    },
}
fieldset_Stokes = FieldSet.from_netcdf(
    filenames=path_stokes,
    variables=variables_stokes,
    dimensions=dimensions_stokes,
    allow_time_extrapolation=True,
    deferred_load=True,
    chunksize=chunksize_stokes,
)
fieldset_Stokes.U_Stokes.units = GeographicPolar()
fieldset_Stokes.V_Stokes.units = Geographic()

fieldset.add_field(fieldset_Stokes.U_Stokes)
fieldset.add_field(fieldset_Stokes.V_Stokes)

vectorField_Stokes = VectorField("UV_Stokes", fieldset.U_Stokes, fieldset.V_Stokes)
fieldset.add_vector_field(vectorField_Stokes)

ParticleSet

Release 100 particles daily, homogeneously distributed within a grid cell, for approximately 90 days, so in total ~ 9000 particles.

# Release time
release_start = datetime(2020, 12, 1, 0, 0)
release_end = datetime(2020, 9, 1, 0, 0)
runtime_release = release_start - release_end
release_time = xr.date_range(release_end, release_start, freq="D").values

# Number of released particles
n_particles_release = 9000
n_particles_per_day = n_particles_release / runtime_release.days
n_particles_per_day = np.round(n_particles_per_day).astype(int)

# Release locations
rx0, rx1, ry0, ry1 = 11.444, 11.555, 58.200, 58.268
nsqrt = np.sqrt(n_particles_per_day).round().astype(int)
re_lons = np.linspace(rx0, rx1, nsqrt)
re_lats = np.linspace(ry0, ry1, nsqrt)
release_xx, release_yy = np.meshgrid(re_lons, re_lats)

# Avoid repeatdt by including release locations with different release times
release_xx = np.tile(release_xx, (release_time.size, 1, 1))
release_yy = np.tile(release_yy, (release_time.size, 1, 1))
release_time = np.tile(release_time[:, None, None], (1, nsqrt, nsqrt))

pset = ParticleSet(
    fieldset=fieldset,
    pclass=PlasticParticle,
    lon=release_xx,
    lat=release_yy,
    time=release_time,
)

Kernels

Nothing special here, basically a collection of standard kernels from the documentation:

AdvectionRK4
CoastalDisplacement
StokesUV
DiffusionUniformKh
Age
SampleLand
DeleteParticle

Output File

Store the files daily in chunks of 10,000 particles and 1 observation.

chunks_particles = int(1e4)
chunks_obs = 1
outputdt = timedelta(days=1).total_seconds()

path_target = "parcels_trajs.zarr"
output_file = pset.ParticleFile(
    name=path_target,
    outputdt=outputdt,
    chunks=(chunks_particles, chunks_obs), 
)

Execution

Simulate the released particles for a maximum of 2 years with an integration time of 3 hours.

runtime_total = timedelta(days=365 * 2)
runtime_dt = -timedelta(hours=3).total_seconds()
pset.execute(
    kernel,
    dt=runtime_dt,
    runtime=runtime_total,
    output_file=output_file,
)

Parallelization

I run the simulation in parallel with 9 CPUs using

mpirun -np 9 python run_parcels.py

I initially get around 20,000 iterations per second with an estimated total runtime of 1 hour. However, the number of iterations quickly decreases, causing the total runtime to exceed 14 hours.

Operating system

Python: 3.10
parcels: 3.0.1.dev30
mpi: 1.0
mpi4py: 3.1.5
mpich: 4.1.2

In a nutshell

From what I gathered from the discussions here on Github and the paper by Kehl et al., performance issues in my case could arise from

I/O loading times
delayed releases

I tried to mitigate these effects by

using the netcdf chunksizes for the parcel fields
avoiding repeatdt by defining a prior the delayed starting times

Still, it seems that performance should be better. Any ideas what goes wrong here?

Thank you for your help!

Answered by erikvansebille

Dec 14, 2023

Thanks for this clear and well-documented question, @nicrie! It's clear that you have a good grasp of Parcels API and structure already, and that you've carefully gone through the discussions here. Nice!

But I'm afraid the answer isn't that simple

Normally, when we experience a slowdown in performance it's due to either

An increase in file-IO on the writing of the zarr-file; when very many files have to be written. A relatively simple solution is then to set the chunks_obs to a much higher number (100?). Although I don't suspect this is the problem here, since your outputdt is only two orders of magnitude smaller than your runtime
An increase in the number of input chunks that have to be…

View full answer

erikvansebille · 2023-12-14T08:15:29Z

erikvansebille
Dec 14, 2023
Maintainer

Thanks for this clear and well-documented question, @nicrie! It's clear that you have a good grasp of Parcels API and structure already, and that you've carefully gone through the discussions here. Nice!

But I'm afraid the answer isn't that simple

Normally, when we experience a slowdown in performance it's due to either

An increase in file-IO on the writing of the zarr-file; when very many files have to be written. A relatively simple solution is then to set the chunks_obs to a much higher number (100?). Although I don't suspect this is the problem here, since your outputdt is only two orders of magnitude smaller than your runtime
An increase in the number of input chunks that have to be loaded, as particles disperse over a larger area. Setting the chunk sizes to the netcdf output is one solution, but an alternative is to just remove chunking altogether (chunksize=False); if the memory of your system permits. In general, I would also not chunk the time dimension, since Parcels reads can only hold two timeframes in memory anyways

Option 2 is the more likely candidate to improve the performance, but option 1 is much easier to implement. So I would try both if I were you. Keen to hear what you find, so please report back!

For further background, see also @CKehl's article on Parcels performance at https://www.sciencedirect.com/science/article/pii/S0098300423000262

0 replies

nicrie · 2023-12-14T12:59:48Z

nicrie
Dec 14, 2023
Author

Thanks for the quick response and for the reference you provided. Here's what happened when I tried your suggestions:

Increasing chunks_obs from 1 to 100: As you mentioned, this didn't really make a difference in terms of performance.
Removing the chunking in time while keeping the spatial chunking: This didn't lead to any improvements either.
Setting chunksize=False: Now this actually made a difference! The iterations at the start jumped from around 20,000/s to about 60,000. It's tough to say what this means for the total computation time though, because the performance dropped back to around 20,000 iterations/s after just a minute.

I also noticed something interesting about running the simulation in parallel with mpirun. It doesn't always seem to boost performance. From what I understand from Kehl's paper and discussions here, each additional process adds its own I/O loading time, which might cancel out the benefits of running things in parallel. In my tests, using mpirun -np 2 seemed to help a bit by delaying the usual drop in performance that I see after a minute. But when I tried with more processes, going up to 5, there wasn't any further improvement in speed (but, at least it didn't get worse). I'm not really an expert in parallelization, but is there a sort of rule of thumb for figuring out how many extra processors will actually help speed things up?

PS: At the time of writing the most recent simulation finished 70 % in about 30 min. The same simulation which I started yesterday needed about 10h to compute the same fraction :) So that definitely seems lake a major performance gain! Thank you!

1 reply

erikvansebille Dec 14, 2023
Maintainer

Thanks for the quick and elaborate response, @nicrie. Very interesting that chunksize=False improves performance so much. The overhead of dask chunking is apparently pretty large, so that avoiding chunking if it's feasible in terms of memory seems the better choice

As for the mpirun, each process keeps its own copy of hydrodynamic data, so indeed more processors may not be faster at all when the ratio of particles to input-data is such that the simulation performance is bound by reading hydrodynamic input... Under the hood, the mpi-implementation of parcels is not much more than a way to split the particleset in N subsets, and run each subset as its own process.

JamiePringle · 2023-12-14T15:48:40Z

JamiePringle
Dec 14, 2023
Collaborator

Several additional thoughts @nicrie. I run global runs with 100s of millions of particles, and so have fiddled with efficiency a fair amount.

The fact that your runs get slower and slower suggest that the number of particles keeps increasing. If you keep adding them with time, the number of particles increases linearly with time, and the run speed increases quadratically with time (!!) (the integral of a linear function). If you don't need the particles to last forever, perhaps kill them after a finite lifetime. This can dramatically reduce run times, if you don't need the full integration length.

Your scaling with MPI jobs is worse than I expect. One thing to look for is the number of particles in each MPI task (this can be seen in the zarr output for each task). Are they very unequal in number? If so, each task could be asking for data, and causing IO, but not doing much work. If this is an issue, I can help further. Also, on chunking, some experimentation is often helpful. Try halving and doubling each chunk size alone and finding a better size. If you are doing this a lot, rechuncking your data can help.

Another parallel issue can arise if your IO system is bad at handling multiple parallel reads. Again, each MPI task will try to read data, often the same data at once. A local SSD is best, but not always practical (anyone want to give me 20k$ for a 40Tb SSD?). But if the data is mounted on an NFS or other network file system, and you don't have enough nfs-daemon's running, the IO can badly bottleneck. Also, if you are running something like ZFS, adding an SSD cache can make a huge difference. Even adding more RAM (since linux uses RAM as IO cache) can help a lot. Again, happy to chat about this.

Also, what is the chunking on your output file? If you start out by writing only one or two particles, it can coerce the zarr file to a deeply suboptimal chunking. This can be exacerbated with MPI, if one of the jobs only writes out a small number of particles in its first output time steps. The easiest way to diagnose this is to open each of the MPI job's zarr files, and see how they are chunked. Are any silly (e.g. (1,1)?).

Good luck,
Jamie

1 reply

nicrie Dec 15, 2023
Author

Thank you @JamiePringle for your detailed answer! 👍

I run global runs with 100s of millions of particles, and so have fiddled with efficiency a fair amount.

What is the order of computational time you typically need for such runs (I guess you work on a HPC)? ... just for me to get a better idea with the time scales we're dealing with.

The fact that your runs get slower and slower suggest that the number of particles keeps increasing. If you keep adding them with time, the number of particles increases linearly with time, and the run speed increases quadratically with time (!!) (the integral of a linear function). If you don't need the particles to last forever, perhaps kill them after a finite lifetime. This can dramatically reduce run times, if you don't need the full integration length.

Ok that's indeed the case! Since I want to analyze seasonal differences I have to distribute the particle release times over the entire season, so this part is really integral to the study. The maximal age of my particles is 2 years before they get deleted.

Your scaling with MPI jobs is worse than I expect. One thing to look for is the number of particles in each MPI task (this can be seen in the zarr output for each task). Are they very unequal in number? If so, each task could be asking for data, and causing IO, but not doing much work.

I think you're spot on. In fact, my number of simulated particles is not that high (order of 10^4) which is then distributed throughout the season, so each day I'm just releasing about ~100 particles. If these are partitioned by KMeans or any other clustering algorithm, I can imagine that at the end a processor deals with 10s of particles...which is probably not ideal.

Another parallel issue can arise if your IO system is bad at handling multiple parallel reads. Again, each MPI task will try to read data, often the same data at once. A local SSD is best, but not always practical (anyone want to give me 20k$ for a 40Tb SSD?). But if the data is mounted on an NFS or other network file system, and you don't have enough nfs-daemon's running, the IO can badly bottleneck.

Since I'm on the HPC of my institute there's not much that I can do when it comes to the architecture. But good to know that these things are also part of the equation.

Also, what is the chunking on your output file? If you start out by writing only one or two particles, it can coerce the zarr file to a deeply suboptimal chunking. This can be exacerbated with MPI, if one of the jobs only writes out a small number of particles in its first output time steps. The easiest way to diagnose this is to open each of the MPI job's zarr files, and see how they are chunked. Are any silly (e.g. (1,1)?).

I set the chunking to be (10_000, 1) but will double check the actual output once the current simulation finishes. Thank you for the hint!!

JamiePringle · 2023-12-15T14:48:35Z

JamiePringle
Dec 15, 2023
Collaborator

@nicrie, I suspect there are a couple of things that you are doing that are suboptimal, and that you can run much much faster. I run 558,043,454 particles, each for 60 days, in 1 year of daily global Mercator data, and it takes about 24 hours. I do not use an HPC system -- they are often not very good for this kind of work. Instead, I have a dual CPU epyc system with 32 real cores. I have optimized the IO so that my ZFS store has a 2TB fast SSD cache, so most of the duplicative IO is from a fast SSD. (You cans see the results of my work at https://github.com/JamiePringle/EZfate)

I suspect, if you can get the circulation data local (and if you are drifting in 2D, not 3D, this should be easy, otherwise, perhaps not), you could get good (and perhaps much better) performance on a hefty desktop. But even on an HPC system, you should be able to do significantly better.

The main thing to remember is that most HPC systems really, really do not want to deal with many small files. You mention that

my number of simulated particles is not that high (order of 10^4) which is then distributed throughout the season, so each day I'm just releasing about ~100 particles...If these are partitioned by KMeans or any other clustering algorithm, I can imagine that at the end a processor deals with 10s of particles...which is probably not ideal.

This is a deeply suboptimal pattern. There are two problems. Each MPI job needs to read a bunch of flow fields in, which is expensive, to handle a small number of particles, which is cheap. This is like renting 10 moving trucks to move 10 grapes. Now, you might get some speed up because dask will only read in the data it needs, so each process is probably only reading part of the flow fields... But even this improvement may get worse with time as particles spread over all of the domain. NOTE: if you had lots of particles, this would become more efficient -- 10 moving trucks to move 20 truckloads of stuff is much faster than 1 truck moving 20 truckloads).

But an even larger problem is likely to be your output chunking. REGARDLESS of what you specify, the chunking in trajectories (the X in a chunking of (X,Y)) cannot be larger than the initial number of particles at the first timestep particles are advected. So for each MPI processes, if you start with only a few 10s of particles, the chunking won't be the (10000,1) you expect, but closer to (10,1) or something like that. You will see this immediately when you open one of the zarr files for one of the MPI processes with something like theDataFile=zarr.open('path/to/data.zarr','r') and type theDataFile.time.info which will print out the chunking for the zarr file. You will also notice it if you try to delete one of the zarr directories and it takes absolutely forever to deal with the many small files.

There are perhaps many elegant ways to deal with this. But if you just want to make the problem go away with no real thought, on the first release time, start 1000 or 10000 particles. You can put them someplace that will inform your project or make nice visualizations, or you can just put them in randomly (but spread horizontally in space so that they don't end up in just one MPI process). If things get much faster, you know what happened....

For your small problems, IO is likely more of an issue than the computational load of tracking particles.

I hope this is helpful.

Jamie

0 replies

JamiePringle · 2023-12-15T14:50:34Z

JamiePringle
Dec 15, 2023
Collaborator

On another, more science oriented note. You seem to be using diffusion with your particles.... are you sure you don't want to launch more particles to get better statistics for whatever connectivity you are calculating?

0 replies

nicrie · 2023-12-15T15:35:00Z

nicrie
Dec 15, 2023
Author

👍 @JamiePringle thank you for sharing your experience! That really helps!

Your explanations make sense to me, that's why I'm a bit puzzled now that the output chunking of the individual zarr files is actually (10_000, 100) as specified. Here the output of one of myZarrFile.time.info:

Name               : /time
Type               : zarr.core.Array
Data type          : float64
Shape              : (10023, 601)
Chunk shape        : (10000, 100)
Order              : C
Read-only          : True
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 48190584 (46.0M)
No. bytes stored   : 532447 (520.0K)
Storage ratio      : 90.5
Chunks initialized : 14/14

Note that I looked into several of the mpirun zarr output files, also for different simulations runs I did and they all seem to have a consistent chunking.

I'd happy to increase the number of simulated particles as long as the computational time remains reasonable. Diffusion was added only to simulate sub-grid cell processes that were not resolved by the model like windage, Langmuir circulation etc.

0 replies

JamiePringle · 2023-12-15T18:46:26Z

JamiePringle
Dec 15, 2023
Collaborator

Huh.... Is this the output of a run made without MPI? I presume so.

If you look at myZarrFile.time[:,0] does it show that all the tracks are starting at the same time? or @erikvansebille , did you change how chunking works so it always respects the requested chunk size, even if only a few particles are written out at first?

It seems your release strategy has changed from the code above, since that has only 9200 release times and the above data file has 10023.

Jamie

2 replies

erikvansebille Dec 18, 2023
Maintainer

did you change how chunking works so it always respects the requested chunk size, even if only a few particles are written out at first?

Yes, since 8ec1105 Parcels supports chunks that are larger than the initial particleset

JamiePringle Dec 18, 2023
Collaborator

great!

nicrie · 2023-12-17T16:47:02Z

nicrie
Dec 17, 2023
Author

Hi @JamiePringle , apologies for the delay in responding; things have been quite busy here.

Regarding the output, it was actually generated from an MPI-enabled run, specifically using -np 5, which resulted in 5 files named proc0X.zarr. When examining myZarrFile.time[:, 0], I observed 93 unique time steps, counted in seconds from the starting date of my velocity field dataset, January 1, 2014. These time steps correspond precisely to the 93 release times I programmed, each one day apart. So that seems correct.

Note that the code I presented above is a more generalized version of my actual implementation. In actuality, I run several simulations involving varying numbers of particles released and different release schedules. The number of particles released typically follows an order of about $10^2$ per day, and the release schedule is around 90 days, give or take a few. This explains the minor differences in the values I've mentioned earlier.

However, there's an aspect of the zarr files' structure that I find perplexing. A significant portion of the trajectories in the zarr file contain only NaN values. When I open the zarr file with xarray and filter out these trajectories, I can determine the true count of released trajectories. The puzzling part is this: even when the total number of trajectories is fewer than my preset chunk size of 10,000, the final size of the zarr files exceeds this number, reaching, for example, 10,023. I had anticipated that if the total count of released particles was around 9,200, the remaining space in the chunk would be filled with NaN trajectories up to 10,000. But, surprisingly, each zarr file ends up with a slightly higher count, like 10,023. I don't really understand why that happens...

t appears that initial chunking may not be a performance issue (because not happening anymore in Parcels v3 ?).

Also, if I/O is the current bottleneck, increasing particle numbers by, say, tenfold might not significantly impact performance. I'll experiment with this.

Thanks for your time and looking forward to any thoughts you might share.

2 replies

erikvansebille Dec 18, 2023
Maintainer

The puzzling part is this: even when the total number of trajectories is fewer than my preset chunk size of 10,000, the final size of the zarr files exceeds this number, reaching, for example, 10,023.

Hmm, this indeed is very strange, and shouldn't happen. Something seems to be going wrong in creating the particle file. My hunch would be in ParticleFile._extend_zarr_dims, in interaction with the MPI running.

@nicrie, could you perhaps create an Issue (ideally with a minimal breaking example) that we can then use to fix this?

nicrie Dec 18, 2023
Author

Sure! The next few days are pretty busy, but I'll try to do it within this week.

JamiePringle · 2023-12-17T19:35:39Z

JamiePringle
Dec 17, 2023
Collaborator

Hi @JamiePringle , apologies for the delay in responding; things have been quite busy here.

No worries, I more than understand; it is end of term here.

Regarding the output, it was actually generated from an MPI-enabled run, specifically using -np 5, which resulted in 5 files named proc0X.zarr. When examining myZarrFile.time[:, 0],

Ok, this is a big point. What is important is NOT the chunking of the combined file, but the chunking of the smaller, proc0X.zarr, files. It is those files that are being written out by parcels, and could be leading to slowdowns. This would be where the chunking becomes an issue.

I observed 93 unique time steps, counted in seconds from the starting date of my velocity field dataset, January 1, 2014. These time steps correspond precisely to the 93 release times I programmed, each one day apart. So that seems correct.

Ok.

Note that the code I presented above is a more generalized version of my actual implementation. In actuality, I run several simulations involving varying numbers of particles released and different release schedules. The number of particles released typically follows an order of about per day, and the release schedule is around 90 days, give or take a few. This explains the minor differences in the values I've mentioned earlier.

ok, though we need to be careful that not all differences end up being minor... see chunking comment above.

However, there's an aspect of the zarr files' structure that I find perplexing. A significant portion of the trajectories in the zarr file contain only NaN values.

This is odd... you get nan's when the number of time steps written out for a particle is less Than the maximum number of time steps written out for any drifter (I think, this gets tricky). Could you have particles whose release time is not between the start and end times of the particle tracking run?

When I open the zarr file with xarray and filter out these trajectories, I can determine the true count of released trajectories. The puzzling part is this: even when the total number of trajectories is fewer than my preset chunk size of 10,000, the final size of the zarr files exceeds this number, reaching, for example, 10,023. I had anticipated that if the total count of released particles was around 9,200, the remaining space in the chunk would be filled with NaN trajectories up to 10,000.

This is not true. When you write out a partial chunk (say the chunking is (100,10) but your array is (80,10), Zarr will only show you (80,10) points. you do not see the padding.

But, surprisingly, each zarr file ends up with a slightly higher count, like 10,023. I don't really understand why that happens...

t appears that initial chunking may not be a performance issue (because not happening anymore in Parcels v3 ?).

I am not sure about this. I suspect the important chunking, for the proc0X.zarr files, is not behaving as you think it should

Also, if I/O is the current bottleneck, increasing particle numbers by, say, tenfold might not significantly impact performance. I'll experiment with this.

Depends if it is really the case that the writing out is not the bottle neck

Thanks for your time and looking forward to any thoughts you might share.

no worries

0 replies

nicrie · 2023-12-17T20:11:25Z

nicrie
Dec 17, 2023
Author

Sorry for the misunderstanding, the chunking i reported were for the individual proc0X.zarr files, not for the combined zarr file. Just in case, here is the output of zarrFile.time.info for the 5 individual proc0X.zarr files that the mpirun -np5 command produced. They have all the same chunking.

proc00.zarr

Name               : /time
Type               : zarr.core.Array
Data type          : float64
Shape              : (10041, 201)
Chunk shape        : (10000, 100)
Order              : C
Read-only          : True
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 16145928 (15.4M)
No. bytes stored   : 308469 (301.2K)
Storage ratio      : 52.3
Chunks initialized : 6/6

proc01.zarr

Name               : /time
Type               : zarr.core.Array
Data type          : float64
Shape              : (10049, 201)
Chunk shape        : (10000, 100)
Order              : C
Read-only          : True
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 16158792 (15.4M)
No. bytes stored   : 362952 (354.4K)
Storage ratio      : 44.5
Chunks initialized : 6/6

proc02.zarr

Name               : /time
Type               : zarr.core.Array
Data type          : float64
Shape              : (10051, 401)
Chunk shape        : (10000, 100)
Order              : C
Read-only          : True
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 32243608 (30.7M)
No. bytes stored   : 459228 (448.5K)
Storage ratio      : 70.2
Chunks initialized : 10/10

proc03.zarr

Name               : /time
Type               : zarr.core.Array
Data type          : float64
Shape              : (10038, 201)
Chunk shape        : (10000, 100)
Order              : C
Read-only          : True
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 16141104 (15.4M)
No. bytes stored   : 344115 (336.0K)
Storage ratio      : 46.9
Chunks initialized : 6/6

proc04.zarr

Name               : /time
Type               : zarr.core.Array
Data type          : float64
Shape              : (10046, 201)
Chunk shape        : (10000, 100)
Order              : C
Read-only          : True
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 16153968 (15.4M)
No. bytes stored   : 364044 (355.5K)
Storage ratio      : 44.4
Chunks initialized : 6/6

1 reply

JamiePringle Dec 17, 2023
Collaborator

Ah, I misunderstood. I agree, In this context I don't think that writing-out IO will be that important!

Cheers,
Jamie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I further optimize simulation performance? #1485

{{title}}

Replies: 10 comments 7 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How can I further optimize simulation performance? #1485

nicrie Dec 13, 2023

Problem Description

Application

In a nutshell

Replies: 10 comments · 7 replies

erikvansebille Dec 14, 2023 Maintainer

nicrie Dec 14, 2023 Author

erikvansebille Dec 14, 2023 Maintainer

JamiePringle Dec 14, 2023 Collaborator

nicrie Dec 15, 2023 Author

JamiePringle Dec 15, 2023 Collaborator

JamiePringle Dec 15, 2023 Collaborator

nicrie Dec 15, 2023 Author

JamiePringle Dec 15, 2023 Collaborator

erikvansebille Dec 18, 2023 Maintainer

JamiePringle Dec 18, 2023 Collaborator

nicrie Dec 17, 2023 Author

erikvansebille Dec 18, 2023 Maintainer

nicrie Dec 18, 2023 Author

JamiePringle Dec 17, 2023 Collaborator

nicrie Dec 17, 2023 Author

JamiePringle Dec 17, 2023 Collaborator

nicrie
Dec 13, 2023

Replies: 10 comments 7 replies

erikvansebille
Dec 14, 2023
Maintainer

nicrie
Dec 14, 2023
Author

erikvansebille Dec 14, 2023
Maintainer

JamiePringle
Dec 14, 2023
Collaborator

nicrie Dec 15, 2023
Author

JamiePringle
Dec 15, 2023
Collaborator

JamiePringle
Dec 15, 2023
Collaborator

nicrie
Dec 15, 2023
Author

JamiePringle
Dec 15, 2023
Collaborator

erikvansebille Dec 18, 2023
Maintainer

JamiePringle Dec 18, 2023
Collaborator

nicrie
Dec 17, 2023
Author

erikvansebille Dec 18, 2023
Maintainer

nicrie Dec 18, 2023
Author

JamiePringle
Dec 17, 2023
Collaborator

nicrie
Dec 17, 2023
Author

JamiePringle Dec 17, 2023
Collaborator