From 65eb922103923435c76c2764ec0f82c036e6b57d Mon Sep 17 00:00:00 2001 From: Erik van Sebille Date: Thu, 11 Jan 2024 09:39:26 +0100 Subject: [PATCH 1/3] Expanding explanation of output chunks parameter --- docs/examples/tutorial_parcels_structure.ipynb | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/examples/tutorial_parcels_structure.ipynb b/docs/examples/tutorial_parcels_structure.ipynb index d1b7b593a..d18728048 100644 --- a/docs/examples/tutorial_parcels_structure.ipynb +++ b/docs/examples/tutorial_parcels_structure.ipynb @@ -395,7 +395,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note the use of the `chunks` argument in the `pset.ParticleFile()` above. This controls the 'chunking' of the output file, which is a way to optimize the writing of the output file. See also [the advanced output in zarr format tutorial](https://docs.oceanparcels.org/en/latest/examples/documentation_advanced_zarr.html) for more information on this. It is worth to optimise this parameter in your runs, as it can significantly speed up the writing of the output file and thus the runtime of `pset.execution()`." + "### A note on output chunking\n", + "Note the use of the `chunks` argument in the `pset.ParticleFile()` above. This controls the 'chunking' of the output file, which is a way to optimize the writing of the output file. The default chunking for the output in Parcels is `(number of particles in initial particleset, 1)`. \n", + "Note that this default may not be very efficient if \n", + "1. you use `repeatdt` to release particles _many_ times during the simulation and/or\n", + "2. you expect to output _a lot of timesteps_ (e.g. more than 1000).\n", + "\n", + "In the first case, it is best to increase the first argument of `chunks`. In the second case, it is best to increase the second argument of `chunks`.\n", + "\n", + "See also [the advanced output in zarr format tutorial](https://docs.oceanparcels.org/en/latest/examples/documentation_advanced_zarr.html) for more information on this. It is worth to optimise this parameter in your runs, as it can significantly speed up the writing of the output file and thus the runtime of `pset.execution()`." ] }, { From b1d7c202e2b7633fa517c42e83f82a47f05f134d Mon Sep 17 00:00:00 2001 From: Erik van Sebille Date: Thu, 11 Jan 2024 14:25:17 +0100 Subject: [PATCH 2/3] Moving chunking note to info-box Also slightly changing the text of the chunk info to be more specific what to do --- docs/examples/tutorial_parcels_structure.ipynb | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/examples/tutorial_parcels_structure.ipynb b/docs/examples/tutorial_parcels_structure.ipynb index d18728048..dc790b770 100644 --- a/docs/examples/tutorial_parcels_structure.ipynb +++ b/docs/examples/tutorial_parcels_structure.ipynb @@ -395,15 +395,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "
\n", + "\n", "### A note on output chunking\n", + "\n", "Note the use of the `chunks` argument in the `pset.ParticleFile()` above. This controls the 'chunking' of the output file, which is a way to optimize the writing of the output file. The default chunking for the output in Parcels is `(number of particles in initial particleset, 1)`. \n", "Note that this default may not be very efficient if \n", - "1. you use `repeatdt` to release particles _many_ times during the simulation and/or\n", + "1. you use `repeatdt` to release a relatively small number of particles _many_ times during the simulation and/or\n", "2. you expect to output _a lot of timesteps_ (e.g. more than 1000).\n", "\n", - "In the first case, it is best to increase the first argument of `chunks`. In the second case, it is best to increase the second argument of `chunks`.\n", + "In the first case, it is best to increase the first argument of `chunks` to 10 to 100 times the size of your initial particleset. In the second case, it is best to increase the second argument of `chunks` to 10 to 1000, depending a bit on the size of your initial particleset.\n", + "\n", + "See also [the advanced output in zarr format tutorial](https://docs.oceanparcels.org/en/latest/examples/documentation_advanced_zarr.html) for more information on this. It is worth to optimise this parameter in your runs, as it can significantly speed up the writing of the output file and thus the runtime of `pset.execution()`.\n", "\n", - "See also [the advanced output in zarr format tutorial](https://docs.oceanparcels.org/en/latest/examples/documentation_advanced_zarr.html) for more information on this. It is worth to optimise this parameter in your runs, as it can significantly speed up the writing of the output file and thus the runtime of `pset.execution()`." + "
" ] }, { From a6fe0a5397d2f200d9146b7d6aec29ed88ab9ba5 Mon Sep 17 00:00:00 2001 From: Erik van Sebille Date: Fri, 12 Jan 2024 08:16:08 +0100 Subject: [PATCH 3/3] Adding line with @JamiePringle's suggestion to output chunking note --- docs/examples/tutorial_parcels_structure.ipynb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/examples/tutorial_parcels_structure.ipynb b/docs/examples/tutorial_parcels_structure.ipynb index dc790b770..94ac9d9c0 100644 --- a/docs/examples/tutorial_parcels_structure.ipynb +++ b/docs/examples/tutorial_parcels_structure.ipynb @@ -406,7 +406,9 @@ "\n", "In the first case, it is best to increase the first argument of `chunks` to 10 to 100 times the size of your initial particleset. In the second case, it is best to increase the second argument of `chunks` to 10 to 1000, depending a bit on the size of your initial particleset.\n", "\n", - "See also [the advanced output in zarr format tutorial](https://docs.oceanparcels.org/en/latest/examples/documentation_advanced_zarr.html) for more information on this. It is worth to optimise this parameter in your runs, as it can significantly speed up the writing of the output file and thus the runtime of `pset.execution()`.\n", + "In either case, it will generally be much more efficient if `chunks[0]*chunks[1]` is (much) greater than several thousand.\n", + "\n", + "See also [the advanced output in zarr format tutorial](https://docs.oceanparcels.org/en/latest/examples/documentation_advanced_zarr.html) for more information on this. The details will depend on the nature of the filesystem the data is being written to, so it is worth to optimise this parameter in your runs, as it can significantly speed up the writing of the output file and thus the runtime of `pset.execution()`.\n", "\n", "" ]