Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Feature request for SYCL backend #161

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

abagusetty
Copy link

Towards Intel PVC GPUs

@abagusetty
Copy link
Author

Thanks for running the CI, the builds should now be fixed

@jyoung3131
Copy link
Contributor

Hi @abagusetty - do you just want this backend to target GPU or CPU as well? We'll see if we can review this backend addition and get back with you on next steps to merge it.

@abagusetty
Copy link
Author

Hi @abagusetty - do you just want this backend to target GPU or CPU as well? We'll see if we can review this backend addition and get back with you on next steps to merge it.

Hi @jyoung3131 for the first pass, GPU backend is targeted. Thanks for taking a peek.

@plavin
Copy link
Contributor

plavin commented Oct 20, 2023

Thanks for this PR, @abagusetty.

I haven't worked with the oneAPI tools very much. Could you share the arguments you passed to cmake?

@abagusetty
Copy link
Author

@plavin This is my cmake

abagusetty@uan-0001 ~/spatter/build-10-20-2023 (sycl) $ cmake -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DBACKEND=sycl -DCOMPILER=intel -DCMAKE_BUILD_TYPE=Release ../
No SYCL architecture specified, default set to PVC
-- The C compiler identification is IntelLLVM 2024.0.0
-- The CXX compiler identification is IntelLLVM 2024.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /soft/restricted/CNDA/updates/2023.05.15.001/oneapi/compiler/eng-20230614/compiler/linux/bin/icx - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /soft/restricted/CNDA/updates/2023.05.15.001/oneapi/compiler/eng-20230614/compiler/linux/bin/icpx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
Using SYCL backend
-- compiler version is 2024.0.0
-- compiler is /soft/restricted/CNDA/updates/2023.05.15.001/oneapi/compiler/eng-20230614/compiler/linux/bin/icx
-- Configuring done (19.8s)
-- Generating done (0.3s)
-- Build files have been written to: /home/abagusetty/spatter/build-10-20-2023

@plavin
Copy link
Contributor

plavin commented Oct 20, 2023

It looks like you have a newer version of oneAPI than what is available on the Intel website. Do you know if this works with oneAPI 2023.2.0?

I'm having some issues compiling. I don't have oneapi/mkl/rng/device.hpp, but that could be an issue with my installation.

@abagusetty
Copy link
Author

abagusetty commented Oct 20, 2023

It looks like you have a newer version of oneAPI than what is available on the Intel website. Do you know if this works with oneAPI 2023.2.0?

I'm having some issues compiling. I don't have oneapi/mkl/rng/device.hpp, but that could be an issue with my installation.

It should work with the release versions of compilers too. The PR didn't have any version specific changes. MKLROOT should have been set and also MKL is shipped with oneAPI basetoolkit IIRC.

@plavin
Copy link
Contributor

plavin commented Oct 23, 2023

Got it building - I didn't realize the mkl module on my machine was loaded separately from the compiler module.

When I run ./spatter -pUNIFORM:8:1, or when I use the gpu-ustride.json input file, I get the following error:

Run Configurations
[ {'name':'UNIFORM', 'kernel':'Gather', 'pattern':[0,1,2,3,4,5,6,7], 'pattern_gather':[], 'pattern_scatter':[], 'delta':8, 'deltas_gather':[], 'deltas_scatter':[], 'length':1024, 'agg':10, 'wrap':1, } ]

config  time(s)      bw(MB/s)
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  NULL pointer argument in memory copy operation. -30 (PI_ERROR_INVALID_VALUE)
Aborted (core dumped)

@abagusetty
Copy link
Author

Got it building - I didn't realize the mkl module on my machine was loaded separately from the compiler module.

When I run ./spatter -pUNIFORM:8:1, or when I use the gpu-ustride.json input file, I get the following error:

Run Configurations
[ {'name':'UNIFORM', 'kernel':'Gather', 'pattern':[0,1,2,3,4,5,6,7], 'pattern_gather':[], 'pattern_scatter':[], 'delta':8, 'deltas_gather':[], 'deltas_scatter':[], 'length':1024, 'agg':10, 'wrap':1, } ]

config  time(s)      bw(MB/s)
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  NULL pointer argument in memory copy operation. -30 (PI_ERROR_INVALID_VALUE)
Aborted (core dumped)

Will give it a try with exact above example and get back to you. Thanks

@abagusetty
Copy link
Author

@plavin I was able to bt some and noticed that the pointers are the null as passed to sycl::memcpy. Because max_ro_len was 0 here:

order_dev = sycl::malloc_device<uint32_t>(max_ro_len, *que);

Any pointers where I can dig further

@plavin
Copy link
Contributor

plavin commented Nov 8, 2023

Sorry I missed your second comment.

I believe max_ro_len is only used for a feature that would re-order the gathers (hence the "ro") based on Morton ordering. It's not a feature that has much use to be honest There are special kernels that are used when you do the Morton stuff and they shouldn't be called for the inputs I mentioned above.

@abagusetty
Copy link
Author

Sorry I missed your second comment.

I believe max_ro_len is only used for a feature that would re-order the gathers (hence the "ro") based on Morton ordering. It's not a feature that has much use to be honest There are special kernels that are used when you do the Morton stuff and they shouldn't be called for the inputs I mentioned above.

@plavin Thanks for clarifying. Would you recommend an other input/way to verify the tests are working as expected. As it looks like I was hitting the error with this launch ./spatter -pUNIFORM:8:1

@plavin
Copy link
Contributor

plavin commented Nov 8, 2023

Where does the error happen? If it is on that line, then I would suggest just not doing the copy when the max_ro_len is zero.

Or is it segfaulting later on trying to access that order_dev buffer?

@abagusetty
Copy link
Author

Where does the error happen? If it is on that line, then I would suggest just not doing the copy when the max_ro_len is zero.

Or is it segfaulting later on trying to access that order_dev buffer?

The stack trace indicate the sycl.memcpy is the one segfaulting because both the src and dst pointers (i.e., order & order_dev args) passed are both nullptrs. The allocation of the buffers (order & order_dev) seems to be fine with size=0. I guess the solution is to skip the line for the memcpy and give it a try.

Can you confirm for CUDA/HIPmemcpy, this is not an issue when passing nullptrs.

@plavin
Copy link
Contributor

plavin commented Nov 8, 2023

This input works fine on the CUDA backend. I haven't used HIP.

@plavin
Copy link
Contributor

plavin commented Nov 9, 2023

I have been adding more error checking to the CUDA backend and it seems this is also an issue for CUDA. Solution is definitely to skip that line when the buffer is null

@abagusetty
Copy link
Author

I have been adding more error checking to the CUDA backend and it seems this is also an issue for CUDA. Solution is definitely to skip that line when the buffer is null

Thanks so much for checking, I will push the necessary changes & comment.

@abagusetty
Copy link
Author

@plavin Sorry about the delay, I was able to fix the above nullptr issue and saw some sensible output on my end. Could you please confirm when you get a chance. Thanks again

@jyoung3131
Copy link
Contributor

@abagusetty we are revisiting this backend since we made some major changes with the refactor. However, we may need to do a separate PR and merge a few of your changes in since the backend process has changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants