Skip to content

Commit

Permalink
update user manual with checkpointing
Browse files Browse the repository at this point in the history
  • Loading branch information
cnpetra committed Sep 22, 2024
1 parent adf30e6 commit 4e673d5
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 7 deletions.
10 changes: 5 additions & 5 deletions doc/src/sections/solver_options.tex
Original file line number Diff line number Diff line change
Expand Up @@ -423,16 +423,16 @@ \subsubsection{Problem preprocessing}
\medskip
\subsubsection{Checkpointing of the solver state and restarting}
\Hi can save/load its internal state to/from disk. This can be helphul when running a job on a cluster that enforces limits on the job's running time. This functionality is currently available only for the quasi-Newton algorithm. The checkpointing is done using Axom's scalable Sidre data manager and IO (see \url{https://axom.readthedocs.io/en/develop/axom/sidre/docs/sphinx/index.html}) and requires an Axom-enabled build (use ``-DHIOP_USE_AXOM=ON'' with cmake).
\subsubsection{Checkpointing of the solver state and restarting}\label{sec:checkpoint}
As detailed in Section~\ref{sec:checkpoint_API}, \Hi can save/load its internal state to/from disk. All the options in this section require an Axom-enabled build (use ``-DHIOP\_USE\_AXOM=ON'' with cmake) and are supported only by the quasi-Newton IPM solver (\texttt{hiopAlgFilterIPMQuasiNewton} class) for the \texttt{hiopInterfaceDenseConstraints} NLP formulation/interface.
\noindent \textbf{checkpoint\_save}: Save state of NLP solver to file indicated by value of option ``checkpoint\_file''. String values ``yes'' or ``no'', default ``no''.
\noindent \textbf{checkpoint\_load\_on\_start} On (re)start the NLP solver will load checkpoint file specified by ``checkpoint_file`` option. String values ``yes'' or ``no'', default ``no''.
\noindent \textbf{checkpoint\_load\_on\_start} On (re)start the NLP solver will load checkpoint file specified by ``checkpoint\_file`` option. String values ``yes'' or ``no'', default ``no''.
\noindent \textbf{checkpoint\_file} Path to checkpoint file to load from or save to. If present, the character ``\#'' is replaced with the iteration number at which the checkpointing is saved (but \textit{not} when loaded). \Hi adds a ``.root'' extension internally if the value of the option is a directory. If this option is not specified and loading or saving checkpoints is enabled, \Hi will use a file named ``hiop_state_chk''.
\noindent \textbf{checkpoint\_file} Path to checkpoint file to load from or save to. If present, the character ``\#'' is replaced with the iteration number at which the checkpointing is saved (but \textit{not} when loaded). \Hi adds a ``.root'' extension internally if the value of the option is a directory. If this option is not specified and loading or saving checkpoints is enabled, \Hi will use a file named ``hiop\_state\_chk''.
\noindent \textbf{checkpoint\_save\_every\_N\_iter} Iteration frequency of saving checkpoints to disk if ``checkpoint_save'' is ``yes''. Takes positive integer values with a default value $10$.
\noindent \textbf{checkpoint\_save\_every\_N\_iter} Iteration frequency of saving checkpoints to disk if ``checkpoint\_save'' is ``yes''. Takes positive integer values with a default value $10$.
\subsubsection{Miscellaneous options}
Expand Down
27 changes: 25 additions & 2 deletions doc/src/techrep_main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
\vspace{3cm}

{\huge\bfseries \Hi\ -- User Guide} \\[14pt]
{\large\bfseries version 1.03}
{\large\bfseries version 1.1.0}

\vspace{3cm}

Expand All @@ -155,7 +155,7 @@
\vspace{4.75cm}

\textcolor{violet}{{\large\bfseries Oct 15, 2017} \\
{\large\bfseries Updated Feb 5, 2024}}
{\large\bfseries Updated Sept 22, 2024}}

\vspace{0.75cm}

Expand Down Expand Up @@ -474,6 +474,29 @@ \subsubsection{Calling \Hi for a \texttt{hiopInterfaceDenseConstraints} formulat
\end{lstlisting}
The standalone drivers \texttt{NlpDenseConsEx1}, \texttt{NlpDenseConsEx2}, and \texttt{NlpDenseConsEx3} inside directory \texttt{src/Drivers/} under the \Hi's root directory contain more detailed examples of the use of \Hi.

\subsubsection{Checkpointing}\label{sec:checkpoint_API}
File checkpointing is available for \Hi's quasi-Newton IPM solver, which is used exclusively to solve \texttt{hiopInterfaceDenseConstraints} formulation. This can be helpful when running a job on
a cluster that enforces limits on the job’s running time.
Later, this feature will also be provided for other solvers, such as the Newton IPM (used exclusively with sparse NLP) and HiOp-PriDec.

The checkpointing I/O is based on Axom's scalable Sidre data manager (see \url{https://axom.readthedocs.io/en/develop/axom/sidre/docs/sphinx/index.html} for more information) and, thus, requires an Axom-enabled build (use ``-DHIOP\_USE\_AXOM=ON'' with cmake).

There are two ways to use \Hi's checkpointing. The first is via the quasi-Newton solver's API, namely, the methods
\begin{lstlisting}
void load_state_from_sidre_group(const ::axom::sidre::Group& group);
void save_state_to_sidre_group(::axom::sidre::Group& group);
\end{lstlisting}
of \texttt{hiopAlgFilterIPMQuasiNewton} solver class. New Sidre views will be created (or reused) within the group passed as argument to load / save state variables of the quasi-Newton solver. Alternatively, \texttt{hiopAlgFilterIPMQuasiNewton} solver class offers similar methods to work directly with a file, namely,
\begin{lstlisting}
bool load_state_from_file(const ::std::string& path) noexcept;
bool save_state_to_file(const ::std::string& path) noexcept;
\end{lstlisting}
These two methods will create the Sidre group internally and checkpoint to/from it using the first two methods.

A second avenue to checkpoint is via user options. This is detailed in Section~\ref{sec:checkpoint}.

\warningcp{Note:} A couple of particularities stemming from the use of Sidre must be acknowledged. First, a checkpoint file should be loaded using HiOp with the same number of MPI ranks as when it was saved. Second, checkpointing is not available for non-MPI builds due to Axom having MPI as a dependency. Finally, when loading from or saving to a checkpoint file, the sizes of the file's variables (Sidre views) must match the sizes of the HiOp variables to which the data is loaded or saved, meaning \Hi will throw an exception if an existing file is (re)used to load or save a algorithm state for a problem that changed sizes since the file was created.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% NLP Sparse
Expand Down

0 comments on commit 4e673d5

Please sign in to comment.