Skip to content

Commit

Permalink
Merge pull request #887 from parthenon-hpc-lab/jmm/param-restarts
Browse files Browse the repository at this point in the history
param restarts
  • Loading branch information
Yurlungur authored Jun 21, 2023
2 parents 3596f29 + d431cec commit b8efc2d
Show file tree
Hide file tree
Showing 21 changed files with 679 additions and 86 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 887]](https://github.com/parthenon-hpc-lab/parthenon/pull/887) Add ability to dump more types of params and read them from restarts
- [[PR 884]](https://github.com/parthenon-hpc-lab/parthenon/pull/884) Add constant derivative BC and expose GenericBC
- [[PR 892]](https://github.com/parthenon-hpc-lab/parthenon/pull/892) Cost-based load balancing and memory diagnostics
- [[PR 889]](https://github.com/parthenon-hpc-lab/parthenon/pull/889) Add PreCommFillDerived
Expand Down
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,9 @@ include(cmake/Format.cmake)
include(cmake/Lint.cmake)

# regression test reference data
set(REGRESSION_GOLD_STANDARD_VER 17 CACHE STRING "Version of gold standard to download and use")
set(REGRESSION_GOLD_STANDARD_VER 18 CACHE STRING "Version of gold standard to download and use")
set(REGRESSION_GOLD_STANDARD_HASH
"SHA512=075658d5fe71e2216f3301d5d8873b2ba588120d2a3b8a2e4ac1940c50719640f4c26d99c94021b6c4f7499544ccaf58f57db5e07ee6c7f75a04ab8d88d9ec89"
"SHA512=193f205b5ebd8dc193dabc7f8e2d3f12bf4f9c34f84312aaaa201f8e19a07342d61f7f07d0119e6de1f2af00f067c20cb4d71edad076b583c102bf4dddb2ade2"
CACHE STRING "Hash of default gold standard file to download")
option(REGRESSION_GOLD_STANDARD_SYNC "Automatically sync gold standard files." ON)

Expand Down
20 changes: 10 additions & 10 deletions doc/sphinx/src/inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,16 @@ General parthenon options such as problem name and parameter handling.

Options related to time-stepping and printing of diagnostic data.

+------------------------------+---------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+==============================+=========+========+=============================================================================================================================================================================================================================================+
|| tlim || none || float || Stop criterion on simulation time. |
|| nlim || -1 || int || Stop criterion on total number of steps taken. Ignored if < 0. |
|| perf_cycle_offset || 0 || int || Skip the first N cycles when calculating the final performance (e.g., zone-cycles/wall_second). Allows to hide the initialization overhead in Parthenon, which usually takes place in the first cycles when Containers are allocated, etc. |
|| ncycle_out || 1 || int || Number of cycles between short diagnostic output to standard out containing, e.g., current time, dt, zone-update/wsec. Default: 1 (i.e, every cycle). |
|| ncycle_out_mesh || 0 || int || Number of cycles between printing the mesh structure (e.g., total number of MeshBlocks and number of MeshBlocks per level) to standard out. Use a negative number to also print every time the mesh was modified. Default: 0 (i.e, off). |
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. |
+------------------------------+---------+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Default | Type | Description |
+==============================+=========+========+=======================================================================================================================================================================+
|| tlim || none || float || Stop criterion on simulation time. |
|| nlim || -1 || int || Stop criterion on total number of steps taken. Ignored if < 0. |
|| perf_cycle_offset || 0 || int || Skip the first N cycles when calculating the final performance (e.g., zone-cycles/wall_second). Allows to hide the initialization overhead in Parthenon. |
|| ncycle_out || 1 || int || Number of cycles between short diagnostic output to standard out containing, e.g., current time, dt, zone-update/wsec. Default: 1 (i.e, every cycle). |
|| ncycle_out_mesh || 0 || int || Number of cycles between printing the mesh structure to standard out. Use a negative number to also print every time the mesh was modified. Default: 0 (i.e, off). |
|| ncrecv_bdry_buf_timeout_sec || -1.0 || Real || Timeout in seconds for the `ReceiveBoundaryBuffers` tasks. Disabed (negative) by default. Typically no need in production runs. Useful for debugging MPI calls. |
+------------------------------+---------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+


``<parthenon/mesh>``
Expand Down
4 changes: 2 additions & 2 deletions doc/sphinx/src/interface/sparse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -421,14 +421,14 @@ meshblock may vary with meshblock. Each ``MeshBlock`` object keeps a
running total of its memory footprint. You can get the footprint of an
individual meshblock by calling:

.. cpp:function::
.. code:: cpp
std::uint64_t MeshBlock::ReportMemUsage();
If desired, you may also manually change the recorded memory footprint
of a given meshblock with the function:

.. cpp:function::
.. code:: cpp
void MeshBlock::LogMemUsage(std::int64_t delta);
Expand Down
23 changes: 19 additions & 4 deletions doc/sphinx/src/interface/state.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,25 @@ several useful features and functions.
all blocks just like dense variables, however, in a future upgrade, they
will only be allocated on those blocks where the user explicitly
allocates them or non-zero values are advected into.
- ``void AddParam<T>(const std::string& key, T& value, bool is_mutable)``
adds a parameter (e.g., a timestep control coefficient, refinement
tolerance, etc.) with name ``key`` and value ``value``. If
``is_mutable`` is true, parameters can be more easily modified.
- ``void AddParam<T>(const std::string& key, T& value, Mutability mutability)``
adds a parameter (e.g., a timestep control
coefficient, refinement tolerance, etc.) with name ``key`` and value
``value``. The enum ``mutability`` can take on three values:
``Mutability::Immutable``, ``Mutability::Mutable``, and
``Mutability::Restart``. Paramters that are ``Immutable`` cannot be
modified. Parameters that are ``Mutable`` or ``Restart`` can be
modified via the ``MutableParam`` and ``UpdateParam``
options. Parameters that are ``Restart`` will be re-read from the
restart file and updated upon restart. In contrast, ``Mutable``
params not marked ``Restart`` are updated only by user code, not
automatically. Note that not all parameter types can be output to
HDF5 file. However, most common scalar, vector, and ``Kokkos`` view
types are supported. Note also that if the value of a ``Param`` is
different on different MPI ranks, this will result in undefined
behaviour.
- ``void AddParam<T>(const std::string& key, T& value, bool is_mutable=false)``
is the same as above, but adds only ``Immutable`` or ``Mutable`` params,
not ``Restart`` params.
- ``void UpdateParam<T>(const std::string& key, T& value)``\ updates a
parameter (e.g., a timestep control coefficient, refinement tolerance,
etc.) with name ``key`` and value ``value``. A parameter of the same
Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx/src/load_balancing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ modified.
On a per ``MeshBlock`` basis, you call the
function

.. cpp:function::
.. code:: cpp
void MeshBlock::SetCostForLoadBalancing(double cost);
Expand Down
119 changes: 119 additions & 0 deletions doc/sphinx/src/tests.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
.. _tests:

How to add tests to Parthenon
==============================

Unit Tests
-----------

Unit tests are straightforward to implement. Open the ``tst/unit``
directory to see the current test suites. You may either add a new
file to this directory (and the associated ``CMakeLists.txt`` file, or
extend an existing file.

Parthenon uses the `Catch2`_ unit test framework. Tests are typically
written in the following format:

.. code:: cpp
TEST_CASE("Name", "[category1][category2]") {
GIVEN("Set up code") {
// some code
WHEN("Trigger") {
THEN("Condition") {
REQUIRE(some_bool_expression);
}
}
}
}
See the Catch documentation for more details.

.. _Catch2: https://github.com/catchorg/Catch2/tree/v2.x

Regression Tests
-----------------

The regression test infrastructure is more complicated, and our
regression test infrastructure is built on a mix of Python and
CMake. Each test is defined by a *test suite*. You can find the test
suites in the ``tst/regression/test_suites`` directory. Each test
suite is a Python module that defines a ``TestCase`` class, which
inherits from the abstract base class provided by the
``utils.test_case`` module included in the test suite. A ``TestCase``
class must implement the following methods:

* ``Prepare(self, parameters, step)`` is the python code which sets up
a simulation run. It modifies an included input deck for a given
test, based on the test design. The ``parameters`` input contains a
list of command line arguments that should modify the parthenon
run. These are passed in to the test infrastructure via CMake
(described below). The ``step`` argument is an integer. It is used
for regression tests that require multiple simulation runs, such as
a convergence test.

* ``Analyze(self, parameters)`` is the post-processing step that
checks whether or not the test passed. Some tests compare to gold
files (described further below) and some simply compare to a known
solution.

A test suite needs to have not only the python file containing the
``TestCase`` class, but an empty ``__init__.py`` file to match the
Python module API.

After adding a module, you must also modify the file
``tst/regression/CMakeLists.txt``. In particular for a new regression
test, you must add a set of arguments like these:

::

list(APPEND TEST_DIRS name_of_folder )
list(APPEND TEST_PROCS ${NUM_MPI_PROC_TESTING})
list(APPEND TEST_ARGS args_to_pass_to_run_test.py )
list(APPEND EXTRA_TEST_LABELS "my-awesome-test")

The first argument specifies the name of the folder containing the new
python module for the new test. The second argument specifies number
of MPI ranks if the test should be run with MPI (specify 1 if
not). The third argument specifies arguments to pass to your test
suite, for example

::

"--driver ${PROJECT_BINARY_DIR}/example/advection/advection-example --driver_input ${CMAKE_CURRENT_SOURCE_DIR}/test_suites/advection_performance/parthinput.advection_performance --num_steps 4"

which would specify which application to run for the test, as well as the input deck and the number of steps.

The final argument specifies labels attached to the test for use with
CTest.

Gold Files
-----------

Many tests use so-called *gold files*, files are files containing
known results to compare against. Parthenon bundles its gold files as
part of releases. These files are automatically downloaded and are
located in ``tst/regression/gold_standard``. To add a new gold file
(or update an old one), place it in this directory.

To make the new (or updated) test official, you must add it to the
official test suite. First update
``tst/regression/gold_standard/README.rst`` and add a new version of
the test suite, corresponding to the commit where you added the
relevant test code and an explanation for why the gold files needed to
change. Then run the script ```make_tarball.sh`` as

::

bash make_tarball.sh NEW_VERSION

where ``NEW_VERSION`` is the new version of the gold files (not
necessarily tied to the version of the code release). You can then ask
a maintainer to create a new goldfile release and attach the resultant
tarball to the release.

As a sanity check, Parthenon checks against the ``sha512`` hash of the
tarball. The make tarball script will output the hash. The new version
and new hash must be set as the default values of the
``REGRESSION_GOLD_STANDARD_VER`` and ``REGRESSION_GOLD_STANDARD_HASH``
in the top level ``CMakeLists.txt`` file.
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ add_library(parthenon
interface/metadata.cpp
interface/metadata.hpp
interface/packages.hpp
interface/params.cpp
interface/params.hpp
interface/sparse_pack.hpp
interface/sparse_pack_base.cpp
Expand Down
1 change: 1 addition & 0 deletions src/interface/packages.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ class Packages_t {
const Dictionary<std::shared_ptr<StateDescriptor>> &AllPackages() const {
return packages_;
}
Dictionary<std::shared_ptr<StateDescriptor>> &AllPackages() { return packages_; }

private:
Dictionary<std::shared_ptr<StateDescriptor>> packages_;
Expand Down
129 changes: 129 additions & 0 deletions src/interface/params.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
//========================================================================================
// (C) (or copyright) 2020-2023. Triad National Security, LLC. All rights reserved.
//
// This program was produced under U.S. Government contract 89233218CNA000001 for Los
// Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
// for the U.S. Department of Energy/National Nuclear Security Administration. All rights
// in the program are reserved by Triad National Security, LLC, and the U.S. Department
// of Energy/National Nuclear Security Administration. The Government is granted for
// itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide
// license in this material to reproduce, prepare derivative works, distribute copies to
// the public, perform publicly and display publicly, and to permit others to do so.
//========================================================================================

#include <string>

#include "utils/error_checking.hpp"

#include "kokkos_abstraction.hpp"
#include "parthenon_arrays.hpp"

#ifdef ENABLE_HDF5
#include "outputs/parthenon_hdf5.hpp"
#endif

#include "params.hpp"

namespace parthenon {

// JMM: This could probably be done with template magic but I think
// using a macro is honestly the simplest and cleanest solution here.
// Template solution would be to define a variatic class to conain the
// list of types and then a hierarchy of structs/functions to turn
// that into function calls. Preprocessor seems easier, given we're
// not manipulating this list in any way.
#define VALID_VEC_TYPES(T) \
T, std::vector<T>, ParArray1D<T>, ParArray2D<T>, ParArray3D<T>, ParArray4D<T>, \
ParArray5D<T>, ParArray6D<T>, ParArray7D<T>, ParArray8D<T>, HostArray1D<T>, \
HostArray2D<T>, HostArray3D<T>, HostArray4D<T>, HostArray5D<T>, HostArray6D<T>, \
HostArray7D<T>, Kokkos::View<T *>, Kokkos::View<T **>, ParArrayND<T>, \
ParArrayHost<T>

#ifdef ENABLE_HDF5

template <typename T>
void Params::WriteToHDF5AllParamsOfType(const std::string &prefix,
const HDF5::H5G &group) const {
for (const auto &p : myParams_) {
const auto &key = p.first;
const auto type = myTypes_.at(key);
if (type == std::type_index(typeid(T))) {
auto typed_ptr = dynamic_cast<Params::object_t<T> *>((p.second).get());
HDF5::HDF5WriteAttribute(prefix + "/" + key, *typed_ptr->pValue, group);
}
}
}

template <typename... Ts>
void Params::WriteToHDF5AllParamsOfMultipleTypes(const std::string &prefix,
const HDF5::H5G &group) const {
([&] { WriteToHDF5AllParamsOfType<Ts>(prefix, group); }(), ...);
}

template <typename T>
void Params::WriteToHDF5AllParamsOfTypeOrVec(const std::string &prefix,
const HDF5::H5G &group) const {
WriteToHDF5AllParamsOfMultipleTypes<VALID_VEC_TYPES(T)>(prefix, group);
}

template <typename T>
void Params::ReadFromHDF5AllParamsOfType(const std::string &prefix,
const HDF5::H5G &group) {
for (auto &p : myParams_) {
auto &key = p.first;
auto type = myTypes_.at(key);
auto mutability = myMutable_.at(key);
if (type == std::type_index(typeid(T)) && mutability == Mutability::Restart) {
auto typed_ptr = dynamic_cast<Params::object_t<T> *>((p.second).get());
auto &val = *(typed_ptr->pValue);
HDF5::HDF5ReadAttribute(group, prefix + "/" + key, val);
Update(key, val);
}
}
}

template <typename... Ts>
void Params::ReadFromHDF5AllParamsOfMultipleTypes(const std::string &prefix,
const HDF5::H5G &group) {
([&] { ReadFromHDF5AllParamsOfType<Ts>(prefix, group); }(), ...);
}

template <typename T>
void Params::ReadFromHDF5AllParamsOfTypeOrVec(const std::string &prefix,
const HDF5::H5G &group) {
ReadFromHDF5AllParamsOfMultipleTypes<VALID_VEC_TYPES(T)>(prefix, group);
}

void Params::WriteAllToHDF5(const std::string &prefix, const HDF5::H5G &group) const {
// views and vecs of scalar types
WriteToHDF5AllParamsOfTypeOrVec<bool>(prefix, group);
WriteToHDF5AllParamsOfTypeOrVec<int32_t>(prefix, group);
WriteToHDF5AllParamsOfTypeOrVec<int64_t>(prefix, group);
WriteToHDF5AllParamsOfTypeOrVec<uint32_t>(prefix, group);
WriteToHDF5AllParamsOfTypeOrVec<uint64_t>(prefix, group);
WriteToHDF5AllParamsOfTypeOrVec<float>(prefix, group);
WriteToHDF5AllParamsOfTypeOrVec<double>(prefix, group);

// strings
WriteToHDF5AllParamsOfType<std::string>(prefix, group);
WriteToHDF5AllParamsOfType<std::vector<std::string>>(prefix, group);
}

void Params::ReadFromRestart(const std::string &prefix, const HDF5::H5G &group) {
// views and vecs of scalar types
ReadFromHDF5AllParamsOfTypeOrVec<bool>(prefix, group);
ReadFromHDF5AllParamsOfTypeOrVec<int32_t>(prefix, group);
ReadFromHDF5AllParamsOfTypeOrVec<int64_t>(prefix, group);
ReadFromHDF5AllParamsOfTypeOrVec<uint32_t>(prefix, group);
ReadFromHDF5AllParamsOfTypeOrVec<uint64_t>(prefix, group);
ReadFromHDF5AllParamsOfTypeOrVec<float>(prefix, group);
ReadFromHDF5AllParamsOfTypeOrVec<double>(prefix, group);

// strings
ReadFromHDF5AllParamsOfType<std::string>(prefix, group);
ReadFromHDF5AllParamsOfType<std::vector<std::string>>(prefix, group);
}

#endif // ifdef ENABLE_HDF5

} // namespace parthenon
Loading

0 comments on commit b8efc2d

Please sign in to comment.