examples for getting started with omega_h
The following instructions apply to the SCOREC RedHat 7 systems with NVIDIA
GPUs. Specific instructions are given for Pascal GPUs (e.g., in the blockade
and
pachisi
systems).
#### setup
mkdir ohExamplesTest
cd ohExamplesTest
export root=$PWD # This is the directory containing the kokkos and omega_h repos as subdirs
module load gcc mpich cmake
function getname() {
name=$1
buildSuffix=blockade-cuda
echo "build-${name}-${buildSuffix}"
}
export kk=$root/`getname kokkos`/install # This is where kokkos will be (or is) installed
export oh=$root/`getname omegah`/install # This is where omega_h will be (or is) installed
export pumipic=$root/`getname pumipic`/install # This is where PumiPIC will be (or is) installed
module load cuda/10.2
export LD_LIBRARY_PATH=$cuda/lib64:$LD_LIBRARY_PATH
export CMAKE_PREFIX_PATH=$engpar:$kk:$oh:$CMAKE_PREFIX_PATH
export MPICH_CXX=$root/kokkos/bin/nvcc_wrapper
#### end setup
cd $root
git clone [email protected]:kokkos/kokkos.git
cd kokkos
git checkout 2.9.00
cd -
mkdir -p $kk
cd $_/..
cmake ../kokkos -DCMAKE_CXX_COMPILER=$root/kokkos/bin/nvcc_wrapper \
-DCMAKE_CXX_FLAGS= -DKOKKOS_ARCH=Turing75 \
-DKOKKOS_ENABLE_SERIAL=ON -DKOKKOS_ENABLE_OPENMP=off -DKOKKOS_ENABLE_CUDA=on \
-DKOKKOS_ENABLE_CUDA_LAMBDA=on -DKOKKOS_ENABLE_DEBUG=on \
-DKOKKOS_ENABLE_PROFILING=on -DCMAKE_INSTALL_PREFIX=$kk
make install -j8
cd $root
git clone [email protected]:SCOREC/omega_h.git
cd omega_h
cd -
mkdir -p $oh
cd $_/..
cmake ../omega_h/ -DCMAKE_INSTALL_PREFIX=$oh -DBUILD_SHARED_LIBS=OFF -DOmega_h_USE_CUDA=on -DOmega_h_CUDA_ARCH=sm_75 -DOmega_h_USE_MPI=on -DCMAKE_CXX_COMPILER=`which mpicxx` -DOmega_h_USE_Kokkos=ON -DKokkos_PREFIX=$kk/lib/CMake/
make install -j8 # this takes a few minutes
cd $root
git clone [email protected]:SCOREC/omegah_examples.git
mkdir build-ohExamples-cuda
cd !$
cmake ../omegah_examples -DCMAKE_CXX_COMPILER=mpicxx
make
To resume work:
ssh blockade
cd ohExamplesTest
### run the commands between `### setup` and `### end setup` then:
cd build-ohExamples-cuda
make
Alternatively, for debugging purposes you can build Omega_h in serial without kokkos:
Note, you'll still need to set export oh=/path/to/omega_h/install
accordingly for the omegah_examples
build.
Setup the environment.
module load gcc cmake
# use the environment variable set in the omega_h build
export CMAKE_PREFIX_PATH=$oh:$CMAKE_PREFIX_PATH
Clone the repo, run cmake, then build.
git clone [email protected]:SCOREC/omegah_examples.git
mkdir build-ohExamples-serial
cd !$
cmake ../omegah_examples -DCMAKE_CXX_COMPILER=g++
make
Run ctest
to run the examples:
ctest
Note, any example using MPI will not compile.
Create a new source file from an existing one:
cd omegah_examples
cp adjacencies.cpp foo.cpp
Add the new file to the build and test system cmake files:
testing.cmake
- execution commands for examples, add mpi_test(foo_2D 1 ./foo ${CMAKE_SOURCE_DIR}/meshes/tri8.osh)
to start with, then modify as the new example is implemented
CMakeLists.txt
- defines compile and link commands for examples; add example(foo foo.cpp)
Ensure that the new example compiles and runs:
# setup the environment as described above
cd build-ohExamples-pascal
make
ctest
- Topological Complex - A breakdown of a Cartesian domain into topological entities
- Mesh - A topological complex whose entities have simple shape
- Entity - A topological entity of a mesh
- Vertex - A 0-dimensional entity
- Edge - A 1-dimensional entity
- Face - A 2-dimensional entity
- Region - A 3-dimensional entity
- Element - An entity not bounding another entity
- Mesh discretization - Subdivision of a continuous geometric space into discrete geometric and topological cells
- Model(CAD) - A topological complex consisting of vertices, edges, faces and regions from which a mesh is generated via mesh discretization
- Adjacency - If a point set b bounds a point set a, then there is a downward adjacency (a, b) an upward adjacency (b, a)
- Barycentric coordinate system - A coordinate system in which the location of a point of a simplex (a triangle, tetrahedron, etc.) is specified as the center of mass, or barycenter, of usually unequal masses placed of the coordinates of its vertices (from Wikipedia)
- Classification - A direct mapping from mesh entities to geometric (CAD) model entities
We aim to develop functions or tests which will aid in mesh operations and other common related tasks, using the functionalities and API of Omega_h. Tests are developed for adjacency queries, modifying data related to mesh entities, calculation of barycentric coordinates, reverse classification and data reduction in parallel. More examples focusing on parallel execution like synchronizing data on process boundary, mesh-repartitioning and creating ghost elements are desired. Details of the current examples can be found below.
Following tests, which require the mesh file as an input, are developed:
-
adjacencies - Given a mesh, query the first order downward (edge to vertex, face to edge, element to face) and upward (vertex to edge, edge to face and face to element), and second order (vertex to vertex via edge) adjacencies.
-
tags - Given a mesh, create, modify, and delete data associated with mesh entities and write mesh files for visualization
-
interpolation - Given a mesh, a function is written to compute barycentric coordinates. The calculations are tested for two cases, firstly for the centroid and secondly a vertex of the element, for all the elements.
-
classification - For a given mesh and a particular CAD model face, find all mesh elements with at least one face classified on it.
parallel:
-
reduction - Given a partitioned mesh, compute the maximum value of a tag across all processes
-
synchronization - synchronize a vertex tag such that owners of vertices on the part boundary send their values to non-owned copies on other processes
-
partitioning - define weights associated with mesh elements and repartition(balance) the mesh according to those weights
-
ghosting - create a single layer of ghost elements with mesh vertices (in 3D) defined as the bridge entity
Assuming Omega_h was built with CUDA or Kokkos, then all Write
and Read
array objects are created in device (aka GPU) memory and can only be read and modified
through device kernels (parallel_for
, parallel_reduce
, etc.). The
HostWrite
and HostRead
objects support creating arrays in host (aka CPU)
memory. One of the Write
constructors in Omega_h_array.hpp
supports
conversion from a HostWrite
. Similarly, HostRead
(HostWrite
) has a constructor for
conversion from a Read
(Write
). Note, there is no Read
constructor that
supports conversion from a HostRead
.
-
adjacencies - The user input is the mesh file. By using 'get_adj' from the host, we can get the downward adjacencies and 'ask_up' can be called from host for upward adjacencies. 'ask_star' can be called from host to query second order vertex to vertex adjacency. Furthermore in this test, the function 'dim()' can be called from host to get dimensions (2D/3D) of mesh, 'nglobal_ents(arg)' can be called from host to get the global number of entities of 'arg' dimension, similar to 'nverts()', 'nfaces()', etc. For testing the queries, assertions are used to compare the size of the list array returned by the adjacency query, to the expected size. For instance, in a edge to vertex downward adjacency query, since each edge has two vertices the size of the list array which stores the vertices for each edge should be twice the total number of edges. When we query the upward adjacency vertex to edge, the size of the list array will also be twice the number of edges because of the reverse of graph from edge to vertex since all the edges are bounded by two vertices. Similar argument is made for the assertion implemented for the second order vertex to vertex adjacency query. Each omega_h function/api/class used can be called from the host (outside a kernel) or on the device (within a kernel, e.g., in the body of a parallel_for). Also the host functions operate on large arrays of data, typically all entities of a specified dimension stored on the local process while device functions operate on small arrays of data, typically associated with a single mesh entity
-
tags - User input is the mesh file. 'Read' and 'Write' array structure of Omega_h called from host is used for setting the values for the tag. 'Write' can be called from host to declare and/or initialize arrays of integer(LO) or double(Real) type. For example the 'Write' given below creates array 'gravityArray' of type Real(double) of size 'nvert' and initializes to '9.81'. The 'Read' converts this same array to a read-only 'gravityArray_r'. 'add_tag' can be called from host to create a new variable or tag which is associated with each of the mesh entities(all vertices, all edges,etc.) associated with a dimension-specified via arguments. The values of this tag are specified by calling 'set_tag' from the host which will take an input argument of an array consisting of all the values to be assigned to all the entities for which that particular tag exists. The information about the tags can be deleted by calling 'remove_tag' from the host. The mesh file is written to visualize the results by calling 'binary::write' from the host. This writes a .osh file which needs to be converted to .vtk for visualization.
-
interpolation - This test is implemented for a 3D mesh only. The 'get_barycentric' function is called from the main function which returns barycentric coordinates given the mesh, the element ID with respect to which the calculations are to be performed, and the point coordinates for which barycentric coordinates are to be calculated. This function makes use of 'coords()' which is called from host to return the coordinates of all vertices. Since all data exists on GPU memory and thus can be accessed only from the device, 'parallel_for' loop kernel is used to access and work on the data existing on GPU memory. 'vector_3' is used from device for converting values to a three dimensional vector, 'get_triangle_normal' is called from device to calculate normal vectors of all four faces of the tetrahedron and normalize them by calling 'normalize' from device, as well as some more vector operators. This reference is used for method of calculation "https://people.sc.fsu.edu/~jburkardt/presentations/cg_lab_barycentric_tetrahedrons.pdf". The 'ask_elem_verts' is called from host to return vertex ids for all elements. The calculations in the function are structured such that the face normals point outside the tetrahedron. For this, the vertex ordering information shown in figure is used to while creating face normals. As we have implemented the test for centroid and first vertex of the element as per the ordering, assertions on the return value of the function are used to compare results with expected values.
-
classification - This test can also be called as reverse classification. A function is written which returns a flag representing whether or not a cell of a given mesh is classified on the model entity having certain ID and dimension. The 'mark_by_class' is called from host which returns all vertices classified by a particular model entity and the corresponding elements are flagged using upward adjacency relations.
-
reduction - Each process will load one part of the mesh. The 'get_max' function is called from host which will return either global max of input argument array.
-
synchronization - 'sync_tag' can be called from host to synchronize the values of tags, for the entities lying on process boundry. 'vtk::write_parallel' can be called to write mesh for parallel code.
-
partitioning - 'balance' can be called from host to partition the mesh as per RIB. For predictive load balancing, mesh 'metric' tag has to be associated with vertices and then 'balance(1)' can be used. Note, when increasing the number of parts ('splitting'), RIB requires the number of processes to be a power of 2.
-
ghosting - 'set_parting' can be called from host with the required number layers of ghost vertices as an argument