Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create benchmarks directory and move babelstream into it #2237

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

mehmetyusufoglu
Copy link
Contributor

@mehmetyusufoglu mehmetyusufoglu commented Jan 31, 2024

A simple PR. A directory called "benchmarks" is created and babelstream example is copied into it. There is a new cmake flag alpaka_BUILD_BENCHMARKS. If this flag is ON then alpaka_ACC_CPU_B_SEQ_T_SEQ_ENABLE is turned ON (Like alpaka_BUILD_EXAMPLES flag)

The codes under benchmark directory is compiled. But Babelstream example is not run at the CI, as it was in the Examples directory before.

@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft January 31, 2024 09:50
@mehmetyusufoglu mehmetyusufoglu force-pushed the benchmarkDir branch 2 times, most recently from d1769ae to ac3b661 Compare January 31, 2024 10:04
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review January 31, 2024 10:10
@SimeonEhrig
Copy link
Member

Maybe we should add a cmake target all_benchmarks where we can register and execute all benchmarks.

The benchmarks need to be build in the CI. I see no reason, why we should not enable the benchmarks for all builds. Therefore we can add the CMake argument here:

"${ALPAKA_CI_CMAKE_EXECUTABLE}" --log-level=VERBOSE -G "${ALPAKA_CI_CMAKE_GENERATOR}" ${ALPAKA_CI_CMAKE_GENERATOR_PLATFORM}\
-Dalpaka_BUILD_EXAMPLES=ON -DBUILD_TESTING=ON "$(env2cmake alpaka_ENABLE_WERROR)" \
"$(env2cmake BOOST_ROOT)" -DBOOST_LIBRARYDIR="${ALPAKA_CI_BOOST_LIB_DIR}/lib" -DBoost_USE_STATIC_LIBS=ON -DBoost_USE_MULTITHREADED=ON -DBoost_USE_STATIC_RUNTIME=OFF -DBoost_ARCHITECTURE="-x64" \
"$(env2cmake CMAKE_BUILD_TYPE)" "$(env2cmake CMAKE_CXX_FLAGS)" "$(env2cmake CMAKE_C_COMPILER)" "$(env2cmake CMAKE_CXX_COMPILER)" "$(env2cmake CMAKE_EXE_LINKER_FLAGS)" "$(env2cmake CMAKE_CXX_EXTENSIONS)"\
"$(env2cmake alpaka_ACC_CPU_B_SEQ_T_SEQ_ENABLE)" "$(env2cmake alpaka_ACC_CPU_B_SEQ_T_THREADS_ENABLE)" \
"$(env2cmake alpaka_ACC_CPU_B_TBB_T_SEQ_ENABLE)" \
"$(env2cmake alpaka_ACC_CPU_B_OMP2_T_SEQ_ENABLE)" "$(env2cmake alpaka_ACC_CPU_B_SEQ_T_OMP2_ENABLE)" \
"$(env2cmake TBB_DIR)" \
"$(env2cmake alpaka_RELOCATABLE_DEVICE_CODE)" \
"$(env2cmake alpaka_ACC_GPU_CUDA_ENABLE)" "$(env2cmake alpaka_ACC_GPU_CUDA_ONLY_MODE)" "$(env2cmake CMAKE_CUDA_ARCHITECTURES)" "$(env2cmake CMAKE_CUDA_COMPILER)" "$(env2cmake CMAKE_CUDA_FLAGS)" \
"$(env2cmake alpaka_CUDA_FAST_MATH)" "$(env2cmake alpaka_CUDA_FTZ)" "$(env2cmake alpaka_CUDA_SHOW_REGISTER)" "$(env2cmake alpaka_CUDA_KEEP_FILES)" "$(env2cmake alpaka_CUDA_EXPT_EXTENDED_LAMBDA)" \
"$(env2cmake alpaka_ACC_GPU_HIP_ENABLE)" "$(env2cmake alpaka_ACC_GPU_HIP_ONLY_MODE)" "$(env2cmake CMAKE_HIP_ARCHITECTURES)" "$(env2cmake CMAKE_HIP_COMPILER)" "$(env2cmake CMAKE_HIP_FLAGS)" \
"$(env2cmake alpaka_ACC_SYCL_ENABLE)" "$(env2cmake alpaka_SYCL_ONEAPI_CPU)" "$(env2cmake alpaka_SYCL_ONEAPI_CPU_ISA)" \
"$(env2cmake alpaka_DEBUG)" "$(env2cmake alpaka_CI)" "$(env2cmake alpaka_CHECK_HEADERS)" "$(env2cmake alpaka_CXX_STANDARD)" "$(env2cmake alpaka_USE_MDSPAN)" "$(env2cmake CMAKE_INSTALL_PREFIX)" \
".."

@mehmetyusufoglu mehmetyusufoglu force-pushed the benchmarkDir branch 2 times, most recently from 8d30818 to 1e4d0ae Compare February 1, 2024 08:22
@bernhardmgruber
Copy link
Member

I generally like the idea of separating benchmarks and examples, but could you please elaborate a bit on your motivation for doing this? Specifically, are you going to add more benchmarks? Are you planning to build the benchmarks differently than examples? Thx!

@mehmetyusufoglu
Copy link
Contributor Author

In my opinion, examples could be designed for any reason, pedagogical or showing implementation of a new feature etc. Benchmarks will mainly focus on performance and visualising it's change through time with CI will show general performance effects of each PR merged.

@bernhardmgruber
Copy link
Member

Benchmarks will mainly focus on performance and visualising it's change through time with CI will show general performance effects of each PR merged.

Alright, so you are preparing for some kind of performance CI? Here is a ticket for that: #1264

@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft February 2, 2024 09:51
@SimeonEhrig
Copy link
Member

Benchmarks will mainly focus on performance and visualising it's change through time with CI will show general performance effects of each PR merged.

Alright, so you are preparing for some kind of performance CI? Here is a ticket for that: #1264

We discussed it last week. At the moment, a CI is not possible because of lacking resources. But we want to have benchmarks to run regression benchmarks locally on laptops, workstation or server.

For example, we thought about to use mdspan for tensors in kernels. This makes the usage easier instead using raw pointers. But maybe the performance overhead is to high, which means we need also to implement an interface with raw pointers.

@sliwowitz
Copy link
Contributor

There's also #1723 which I've just rebased on top of actual develop. It uses Catch2 for benchmarking infrastructure (thus integrated with e.g. ctest). I tried to implement a generaic fixture for benchmarking kernels that would allow us to write simple benchmarks for basic features, but I didn't implement any other use case than the random generator, so I didn't know what would be some actual sensible requirements for such a fixture.

Using Catch2 to handle the benchmarks is IMHO still a good idea since we're already using it to handle tests.

@psychocoderHPC psychocoderHPC merged commit 7a8b205 into alpaka-group:develop Mar 20, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants