Add graph support #311

matmanc · 2025-01-24T12:00:40Z

CUDA Graph API seems a promising addition to the already existing cuda streams. Especially to mix host and device function in a complex order. Moreover, also HIP supports cuda graphs so it feels natural to extend cudawrappers to supports graph.

In this pull request the basic functionality of cuda graph api are introduced.

Clone and verify on a machine with an NVIDIA GPU

cd $(mktemp -d --tmpdir cudawrappers-XXXXXX)
git clone https://github.com/nlesc-recruit/cudawrappers .
git checkout <this-branch>
cmake -S . -B build -CD -DCUDAWRAPPERS_BUILD_TESTING=True
make -C build
make -C build format # Nothing should change
make -C build lint # linting is broken until we fix the issues (see #92)
build/tests/test_graph

Clone and verify on a machine with HIP and AMD GPU

git clone https://github.com/nlesc-recruit/cudawrappers .
git checkout <this-branch>
cmake -S . -B build -CD -DCUDAWRAPPERS_BUILD_TESTING=True -DCUDA_WRAPPERS_BACKEND_HIP=True
make -C build
make -C build format # Nothing should change
build/tests/test_graph

* Update changelog * Bump version number to 0.6.0

for more information, see https://pre-commit.ci

Co-authored-by: Bram Veenboer <[email protected]>

* Make Function::getAttribute const * Add Function::name * Add HostMemory::size * Add DeviceMemory::size * Add Module constructor with CUjit_option map * Update CHANGELOG

* Remove <T> for Wrapper constructors * Update changelog

updates: - [github.com/pre-commit/mirrors-clang-format: v16.0.6 → v17.0.6](pre-commit/mirrors-clang-format@v16.0.6...v17.0.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support size = 0 for DeviceMemory constructor

* Fix cu::HostMemory constructor * Add missing checkCudaCall arround free/unregister calls * Pass the correct pointer to cuMemHostRegister

* Use int for return type of CU_POINTER_ATTRIBUTE_IS_MANAGED query

* Initialize size in Stream::memAllocAsync

…ill be produced.

This is indeed more accurate. Co-authored-by: Bram Veenboer <[email protected]>

* Update CHANGELOG.md * Add mdformat to pre-commit configuration

* Change Unreleased to 0.7.0 * Cleanup of changelog

* Add nvml::Device::getClock * Update CHANGELOG * Add test

updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.5 → v18.1.8](pre-commit/mirrors-clang-format@v18.1.5...v18.1.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update CHANGELOG * Update version number to 0.8.0

* Changes to incline local includes: - #include must be the start of a line - the inlined included is placed in the same line as the original #include * Update cmake/cudawrappers-helper.cmake Co-authored-by: Bram Veenboer <[email protected]> * Update cmake/cudawrappers-helper.cmake Co-authored-by: Bram Veenboer <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update cmake/cudawrappers-helper.cmake Co-authored-by: Bram Veenboer <[email protected]> * Add updates to inline_local_includes to CHANGELOG * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Bram Veenboer <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Co-authored-by: Hanno Spreeuw <[email protected]> Co-authored-by: Leon Oostrum <[email protected]> Co-authored-by: John Romein <[email protected]>

* Add option to create slice of device memory * Add new DeviceMemory constructor to changelog * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update C++ standard to C++14 * Upgrade to Catch 3.6.0

* include cuda_runtime in cu.hpp --------- Co-authored-by: Bram Veenboer <[email protected]>

* Cleanup test of cu::DeviceMemory * Add DeviceMemory::memset methods + tests * Add Stream::memsetAsync methods + tests

updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.1](pre-commit/mirrors-clang-format@v18.1.8...v19.1.1) - [github.com/executablebooks/mdformat: 0.7.17 → 0.7.18](hukkin/mdformat@0.7.17...0.7.18)

updates: - [github.com/pre-commit/mirrors-clang-format: v19.1.1 → v19.1.3](pre-commit/mirrors-clang-format@v19.1.1...v19.1.3)

* Added `cu::Stream::memcpyHtoD2DAsync()`, `cu::Stream::memcpyDtoHD2Async()`, and `cu::Stream::memcpyDtoD2DAsync()` * Added `cu::DeviceMemory::memset2D()` and `cu::Stream::memset2DAsync()` * Added `cufft::FFT1DR2C` and `cufft::FFT1DC2R` * Added `cu::Device::getOrdinal()` * Allow non-managed memory dereferencing in `cu::DeviceMemory` --------- Co-authored-by: Bram Veenboer <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update cuda-Jimver/toolkit to v0.2.19 * Change CUDA version to 12.6.1

* Remove unused context agument from nvml::Device * Add missing checkNvmlCall * Make nvml::Device functions const * Add pass_filenames option to workaround cppcheck error

csbnw and others added 30 commits January 20, 2025 15:34

Patch target_embed_source for multiarch (#234)

f44c758

Release 0.6.0 (#235)

c49c654

* Update changelog * Bump version number to 0.6.0

Added Context::getDevice()

acb1b94

Fixed Module constructor argument type

df6ba4a

[pre-commit.ci] auto fixes from pre-commit.com hooks

e172b48

for more information, see https://pre-commit.ci

Update CHANGELOG.md

b53cdc9

Add test for context::getDevice

7df6985

Update test

357a8c1

Added new functions needed by the DPDK correlator.

361c4c0

Update CHANGELOG.md

262b50f

Co-authored-by: Bram Veenboer <[email protected]>

Update cudawrappers to support COBALT (#240)

6e79672

* Make Function::getAttribute const * Add Function::name * Add HostMemory::size * Add DeviceMemory::size * Add Module constructor with CUjit_option map * Update CHANGELOG

Remove <T> for Wrapper constructors (#241)

fb39b92

* Remove <T> for Wrapper constructors * Update changelog

[pre-commit.ci] pre-commit autoupdate (#238)

b7116f0

updates: - [github.com/pre-commit/mirrors-clang-format: v16.0.6 → v17.0.6](pre-commit/mirrors-clang-format@v16.0.6...v17.0.6) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

update links

7ee1945

Support size = 0 for DeviceMemory constructor (#246)

ece9d59

* Support size = 0 for DeviceMemory constructor

Fix host memory (#247)

dbc8b9e

* Fix cu::HostMemory constructor * Add missing checkCudaCall arround free/unregister calls * Pass the correct pointer to cuMemHostRegister

Use int for return type of CU_POINTER_ATTRIBUTE_IS_MANAGED query (#250)

26c4be8

* Use int for return type of CU_POINTER_ATTRIBUTE_IS_MANAGED query

Initialize size in Stream::memAllocAsync (#251)

8f84e04

* Initialize size in Stream::memAllocAsync

Added Function::setAttribute()

16d90b9

add markdown link checker action

b50e144

remove config file reference

2494f83

update check conditions

6d3417d

Update codacy URLs

8e7e509

Update vscode extension URLs, clang-tidy url was broken

aedae7b

apply check on main

623c941

The users should know that no libraries will be built, only headers w…

779dcaf

…ill be produced.

Update README.md

2af2ea1

This is indeed more accurate. Co-authored-by: Bram Veenboer <[email protected]>

Cosmetic cleanup of CHANGELOG.md (#256)

5a584fd

* Update CHANGELOG.md * Add mdformat to pre-commit configuration

Release 0.7.0 (#257)

8afd93a

* Change Unreleased to 0.7.0 * Cleanup of changelog

Added cu::Function::occupancyMaxActiveBlocksPerMultiprocessor()

63e0995

csbnw and others added 28 commits January 20, 2025 15:35

Remove the appendix part of the LICENSE (#284)

5a2cf78

Add nvml::Device::getClock (#285)

159fffd

* Add nvml::Device::getClock * Update CHANGELOG * Add test

Fix inline_local_includes

974aec3

Fix inline_local_includes (#287)

a07f84f

[pre-commit.ci] pre-commit autoupdate (#280)

e7781ff

updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.5 → v18.1.8](pre-commit/mirrors-clang-format@v18.1.5...v18.1.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Prepare release 0.8.0 (#288)

56ceb55

* Update CHANGELOG * Update version number to 0.8.0

Add compatibility with HIP (#289)

d5e3f51

Co-authored-by: Hanno Spreeuw <[email protected]> Co-authored-by: Leon Oostrum <[email protected]> Co-authored-by: John Romein <[email protected]>

Update C++ to C++14 and Catch2 to 3.6.0 (#294)

826ff46

* Update C++ standard to C++14 * Upgrade to Catch 3.6.0

include cuda_runtime in cu.hpp (#297)

95e1e0f

* include cuda_runtime in cu.hpp --------- Co-authored-by: Bram Veenboer <[email protected]>

Add memset (#300)

e65a779

* Cleanup test of cu::DeviceMemory * Add DeviceMemory::memset methods + tests * Add Stream::memsetAsync methods + tests

Update target_embed_source to re-run when the sources change (#301)

a56835f

Add nvml::Device::getPower (#302)

a6c6268

[pre-commit.ci] pre-commit autoupdate (#303)

8d3b057

updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.8 → v19.1.1](pre-commit/mirrors-clang-format@v18.1.8...v19.1.1) - [github.com/executablebooks/mdformat: 0.7.17 → 0.7.18](hukkin/mdformat@0.7.17...0.7.18)

[pre-commit.ci] pre-commit autoupdate (#304)

97b58aa

updates: - [github.com/pre-commit/mirrors-clang-format: v19.1.1 → v19.1.3](pre-commit/mirrors-clang-format@v19.1.1...v19.1.3)

Update cuda-Jimver/toolkit to v0.2.19 (#309)

492f3cf

* Update cuda-Jimver/toolkit to v0.2.19 * Change CUDA version to 12.6.1

Update nvml interface (#308)

d96c7ba

* Remove unused context agument from nvml::Device * Add missing checkNvmlCall * Make nvml::Device functions const * Add pass_filenames option to workaround cppcheck error

Initial commit

d8171e2

Update code

e0cf3c5

Fix hip usage

2a9ffd3

Fix format

5a218c2

Adapt to graph to support hip

032ed43

Fix tests

bfc884a

Remove whileNode support

37e7c4c

Fix missing parenthesis

8173efb

Replace missing function on older versions of hip

73db830

matmanc requested a review from csbnw January 24, 2025 12:00

matmanc closed this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graph support #311

Add graph support #311

matmanc commented Jan 24, 2025

Add graph support #311

Add graph support #311

Conversation

matmanc commented Jan 24, 2025