[BREAKING] Rust representation of GPU memory re-worked #412

DmytroTym · 2024-03-01T17:35:30Z

Overview of the PR

Currently, we use a pretty awkward representation of on-device memory. This PR creates a more idiomatic pair: DeviceVec (which is not really a vector but a boxed slice) which allocates, deallocates and owns device memory, and DeviceSlice which provides mutable and immutable views into device memory.

@ChickenLover I also changed vector operations a little bit - Montgomery is removed as it was working differently for multiplication and addition/subtraction, plus device id checks have been added. Also, I wasn't sure how to change Poseidon related data that is currently raw slices, is it correct to say that here we should use DeviceVec (which should help with the memory leak, at least on the Rust side) and digest here should be a host slice?

Unresolved questions

One potential improvement for H2D/D2H memory operations would be to pin memory inside HostSlice::from_slice method. But I'm still not sure how effective pinned memory is for modern GPUs, plus we don't really have ownership of host data so it's hard to make sure it's pinned and unpinned exactly once (which I'm not sure is a real issue). If someone has good understanding of pinned memory, please comment. And maybe we can just provide a flag for the user to decide if they want to pin memory or not.

jeremyfelder · 2024-03-06T06:11:21Z

wrappers/rust/icicle-cuda-runtime/src/memory.rs

+    pub fn cuda_malloc_for_device(count: usize, device_id: usize) -> CudaResult<Self> {
+        check_device(device_id);
+        Self::cuda_malloc(count)
+    }
+
+    pub fn cuda_malloc_async_for_device(count: usize, stream: &CudaStream, device_id: usize) -> CudaResult<Self> {
+        check_device(device_id);
+        Self::cuda_malloc_async(count, stream)
+    }


Whats the use case here for passing a specific deviceId if we require it to match the current device's id?

This version is preferred by @vhnatyk over implicit choice of device id in malloc.

tl;dr - not that it's preferred but seems aligned with everything else. Longer version: the check ensures we maintain correct device_id through our calls - since we don't have implicit device_id management elsewhere except here (by my legacy wip implementation and missed on review?). Maybe we can think of fully implicit or umm automated device_id management inherent from the call to get_device() on the current thread - but since we have the field in DeviceContext - that makes it redundant and anyway feels like can cause multiple bugs? I guess I had to provide basic description in the doc - but I somehow hoped it will emerge from everyone's edit, since it was so intensively discussed:) have to update doc with the current version of API and diagrams

I think internal management of device_id by thread makes the most sense. DeviceContext can also get set this way by using get_device(). It might be more bug prone on our end but I think it will be less error prone on the user end which we don't have control over

jeremyfelder · 2024-03-06T06:12:02Z

wrappers/rust/icicle-cuda-runtime/src/memory.rs

@@ -205,9 +227,8 @@ impl<T, const D_ID: usize> DeviceSlice<T, D_ID> {
    }
 }

-impl<T, const D_ID: usize> DeviceVec<T, D_ID> {
+impl<T> DeviceVec<T> {
    pub fn cuda_malloc(count: usize) -> CudaResult<Self> {


Should this and cuda_malloc_async be private now that we have cuda_malloc_for_device<_async>?

I don't know... If you're only using one card, there's no need in the for_device version of these methods. I personally would use cuda_malloc_async in multi-device settings too.

wrappers/rust/icicle-core/src/msm/tests.rs

vhnatyk

looks good , just one minor thing similar error strings to const, pr is pretty big one 😄 - maybe worth one more quick look

wrappers/rust/icicle-core/src/ntt/tests.rs

vhnatyk · 2024-03-06T12:09:07Z

wrappers/rust/icicle-core/src/poseidon/mod.rs

+    if let Some(device_id) = input.device_id() {
+        assert_eq!(
+            device_id, ctx_device_id,
+            "Device ids in input and context are different"


maybe worth to do string const

alxiong · 2024-03-07T07:39:04Z

I like the distinction made for DeviceVec and DeviceSlice, and the usage of ManualDrop.

Would like to confirm one thing:

the updated msm() API now accepts either DeviceVec and DeviceSlice now, just to make sure the following flow won't cause double free:

assign a DeviceVec for some bases points (say 2^20)
get a DeviceSlice for a sub-slice of it (say 2^15 out of it)
pass on this slice to the msm(), for some reason, this cuda operation panic and failed (say due to GPU out of memory)

(currently I'm using mem::forget() to forget the subslice without calling destructor to prevent double-freeing. but double-freeing would still be a risk if msm() failed in the middle before I got a chance to call forget() since currently there's no differences between subslice on device and original owned vec on device, so both have the same cudaFree logic in their destructor.
so just want to double-check that changes introduced here would obviate my concern, since DeviceSlice's destructor won't do anything, correct?)

DmytroTym · 2024-03-07T07:51:29Z

@alxiong yea, the new functions do accept either DeviceSlice, HostSlice or DeviceVec. The latter is accepted by Deref to DeviceSlice (similar to how other Rust functions that accept slices can be used with vectors). And like normal Rust slices, DeviceSlice doesn't own any data so won't call cudaFree, only DeviceVec frees data :)

jeremyfelder · 2024-04-08T09:30:41Z

Closing in favor of #443 which includes these changes

DmytroTym added 3 commits March 1, 2024 19:34

Initial version of Host/Device memory reformation

a77be06

Fix examples

79029c5

iter added to hostslice

4c5df3a

DmytroTym added the lang:rust label Mar 3, 2024

DmytroTym self-assigned this Mar 3, 2024

jeremyfelder changed the base branch from dev to main March 4, 2024 09:13

Using CUDA runtime to validate device id

f8a5d64

DmytroTym marked this pull request as ready for review March 5, 2024 09:08

DmytroTym requested review from ChickenLover and vhnatyk March 5, 2024 09:08

Warnings and comments

b6793a7

jeremyfelder reviewed Mar 6, 2024

View reviewed changes

vhnatyk reviewed Mar 6, 2024

View reviewed changes

wrappers/rust/icicle-core/src/msm/tests.rs Outdated Show resolved Hide resolved

vhnatyk reviewed Mar 6, 2024

View reviewed changes

alxiong mentioned this pull request Mar 7, 2024

feat: ICICLE msm integration EspressoSystems/jellyfish#498

Merged

9 tasks

DmytroTym added 3 commits March 18, 2024 12:15

Merge branch main into new_device_slice

0229986

Refactored Poseidon on Rust side

0001550

cargo fmt

593461b

jeremyfelder closed this Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BREAKING] Rust representation of GPU memory re-worked #412

[BREAKING] Rust representation of GPU memory re-worked #412

DmytroTym commented Mar 1, 2024 •

edited

Loading

jeremyfelder Mar 6, 2024

DmytroTym Mar 6, 2024

vhnatyk Mar 6, 2024

jeremyfelder Mar 6, 2024

jeremyfelder Mar 6, 2024

DmytroTym Mar 6, 2024

vhnatyk left a comment •

edited

Loading

vhnatyk Mar 6, 2024

alxiong commented Mar 7, 2024

DmytroTym commented Mar 7, 2024

jeremyfelder commented Apr 8, 2024

[BREAKING] Rust representation of GPU memory re-worked #412

[BREAKING] Rust representation of GPU memory re-worked #412

Conversation

DmytroTym commented Mar 1, 2024 • edited Loading

Overview of the PR

Unresolved questions

jeremyfelder Mar 6, 2024

Choose a reason for hiding this comment

DmytroTym Mar 6, 2024

Choose a reason for hiding this comment

vhnatyk Mar 6, 2024

Choose a reason for hiding this comment

jeremyfelder Mar 6, 2024

Choose a reason for hiding this comment

jeremyfelder Mar 6, 2024

Choose a reason for hiding this comment

DmytroTym Mar 6, 2024

Choose a reason for hiding this comment

vhnatyk left a comment • edited Loading

Choose a reason for hiding this comment

vhnatyk Mar 6, 2024

Choose a reason for hiding this comment

alxiong commented Mar 7, 2024

DmytroTym commented Mar 7, 2024

jeremyfelder commented Apr 8, 2024

DmytroTym commented Mar 1, 2024 •

edited

Loading

vhnatyk left a comment •

edited

Loading