Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Introduce Python module with CCCL headers #3201

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
daab580
Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_…
rwgk Dec 12, 2024
ef9d5f4
Run `copy_cccl_headers_to_aude_include()` before `setup()`
rwgk Dec 20, 2024
bc116dc
Create python/cuda_cccl/cuda/_include/__init__.py, then simply import…
rwgk Dec 20, 2024
2913ae0
Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel
rwgk Dec 20, 2024
7dbb82b
Bug fix: cuda/_include only exists after shutil.copytree() ran.
rwgk Dec 20, 2024
0703901
Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py
rwgk Dec 20, 2024
fc0e543
Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py…
rwgk Dec 20, 2024
2e64345
Replace := operator (needs Python 3.8+)
rwgk Dec 20, 2024
82467cd
Merge branch 'main' into pip-cuda-cccl
rwgk Dec 20, 2024
f13a96b
Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md
rwgk Dec 20, 2024
9ed6036
Restore original README.md: `pip3 install -e` now works on first pass.
rwgk Dec 20, 2024
c9a4d96
cuda_cccl/README.md: FOR INTERNAL USE ONLY
rwgk Dec 20, 2024
df943c0
Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as sugge…
rwgk Dec 20, 2024
40c8389
Modernize pyproject.toml, setup.py
rwgk Dec 21, 2024
e3c7867
Install CCCL headers under cuda.cccl.include
rwgk Dec 21, 2024
acbd477
Merge branch 'main' into pip-cuda-cccl
rwgk Dec 21, 2024
06f575f
Factor out cuda_cccl/cuda/cccl/include_paths.py
rwgk Dec 21, 2024
e747768
Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative
rwgk Dec 21, 2024
499b191
Merge branch 'main' into pip-cuda-cccl
rwgk Dec 21, 2024
62ce2d3
Add missing Copyright notice.
rwgk Dec 21, 2024
65c5a15
Add missing __init__.py (cuda.cccl)
rwgk Dec 21, 2024
bffece6
Add `"cuda.cccl"` to `autodoc.mock_imports`
rwgk Dec 21, 2024
585447c
Move cuda.cccl.include_paths into function where it is used. (Attempt…
rwgk Dec 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions python/cuda_cccl/.gitignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is it possible that we consolidate .gitignore files at the root directory and not have independent ones per sub dir...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #3212 to look into this later.

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cuda/_include
*egg-info
File renamed without changes.
11 changes: 11 additions & 0 deletions python/cuda_cccl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# `cuda.cccl`: Experimental CUDA Core Compute Library Python module with CCCL headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we should consider the name cuda.cccl_headers for clarity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reservation on this, if considering the mirroring to conda packages (#3201 (comment)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I waasn't aware there's a conda package called cuda-cccl already. Agree, we should be consistent with that.


## Documentation

Please visit the documentation here: https://nvidia.github.io/cccl/python.html.

## Local development

```bash
pip3 install .
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's appropriate to document that this package is currently for internal use only and not meant to be used/installed explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: c9a4d96

7 changes: 7 additions & 0 deletions python/cuda_cccl/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

[build-system]
requires = ["packaging", "setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"
75 changes: 75 additions & 0 deletions python/cuda_cccl/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED.
#
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

import os
import shutil

from setuptools import Command, setup, find_namespace_packages
from setuptools.command.build_py import build_py
from wheel.bdist_wheel import bdist_wheel


project_path = os.path.abspath(os.path.dirname(__file__))
cccl_path = os.path.abspath(os.path.join(project_path, "..", ".."))
cccl_headers = [["cub", "cub"], ["libcudacxx", "include"], ["thrust", "thrust"]]
ver = "0.1.2.8.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use the CCCL version here, not CCCL Python modules' version. We should also not hard-code it, but instead read from CMakeLists which is the source of truth AFAIK, and for that setuptools might not be doing the job. @vyasr might have a simple example for how this can be done with scikit-build-core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I added this is a bullet to the PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out the dynamic metadata section, specifically the Regex tab.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to rewrite everything here to use CMake instead of setuptools. Depending on what this module is trying to do that may or may not be beneficial. Do you need to run compilation of cuda_cccl/cooperative/parallel against CCCL headers? In that case it is almost certainly worthwhile, I wouldn't want to orchestrate that compilation using setuptools.

Copy link
Member

@leofang leofang Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to run compilation of cuda_cccl/cooperative/parallel against CCCL headers?

  • cuda_cccl would just be nvidia-cuda-cccl-cuXX containing the headers but owned/maintained by the CCCL team for faster release cycles (think of it as cccl vs cuda-cccl on conda-forge)
  • cuda_cooperative JIT compiles CCCL headers at run time, so for all purposes the headers can be thought as shared libraries; no AOT compilation is needed
  • cuda_parallel is the most interesting case, because it does need to build the CCCL C shared library and include it in the wheel, but I dunno if building it requires NVCC + CCCL headers, or GCC/MSVC alone is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I dunno if building it requires NVCC + CCCL headers, or GCC/MSVC alone is enough

Based on

  • adding -DCMAKE_VERBOSE_MAKEFILE=ON and looking at the output of
  • pip install --verbose ./cuda_parallel[test]

nvcc is required for compiling cccl/c/parallel/src/for.cu and reduce.cu:

  cd /home/coder/cccl/python/cuda_parallel/build/temp.linux-x86_64-cpython-312/c/parallel && /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCCCL_C_EXPERIMENTAL=1 -DNVRTC_GET_TYPE_NAME=1 -D_CCCL_NO_SYSTEM_HEADER -Dcccl_c_parallel_EXPORTS --options-file CMakeFiles/cccl.c.parallel.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -Xcompiler=-fPIC -Xcudafe=--display_error_number -Wno-deprecated-gpu-targets -Xcudafe=--promote_warnings -Wreorder -Xcompiler=-Werror -Xcompiler=-Wall -Xcompiler=-Wextra -Xcompiler=-Wreorder -Xcompiler=-Winit-self -Xcompiler=-Woverloaded-virtual -Xcompiler=-Wcast-qual -Xcompiler=-Wpointer-arith -Xcompiler=-Wvla -Xcompiler=-Wno-gnu-line-marker -Xcompiler=-Wno-gnu-zero-variadic-macro-arguments -Xcompiler=-Wno-unused-function -Xcompiler=-Wno-noexcept-type -MD -MT c/parallel/CMakeFiles/cccl.c.parallel.dir/src/for.cu.o -MF CMakeFiles/cccl.c.parallel.dir/src/for.cu.o.d -x cu -c /home/coder/cccl/c/parallel/src/for.cu -o CMakeFiles/cccl.c.parallel.dir/src/for.cu.o
  cd /home/coder/cccl/python/cuda_parallel/build/temp.linux-x86_64-cpython-312/c/parallel && /usr/bin/sccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/g++ -DCCCL_C_EXPERIMENTAL=1 -DNVRTC_GET_TYPE_NAME=1 -D_CCCL_NO_SYSTEM_HEADER -Dcccl_c_parallel_EXPORTS --options-file CMakeFiles/cccl.c.parallel.dir/includes_CUDA.rsp -O3 -DNDEBUG -std=c++20 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" -Xcompiler=-fPIC -Xcudafe=--display_error_number -Wno-deprecated-gpu-targets -Xcudafe=--promote_warnings -Wreorder -Xcompiler=-Werror -Xcompiler=-Wall -Xcompiler=-Wextra -Xcompiler=-Wreorder -Xcompiler=-Winit-self -Xcompiler=-Woverloaded-virtual -Xcompiler=-Wcast-qual -Xcompiler=-Wpointer-arith -Xcompiler=-Wvla -Xcompiler=-Wno-gnu-line-marker -Xcompiler=-Wno-gnu-zero-variadic-macro-arguments -Xcompiler=-Wno-unused-function -Xcompiler=-Wno-noexcept-type -MD -MT c/parallel/CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o -MF CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o.d -x cu -c /home/coder/cccl/c/parallel/src/reduce.cu -o CMakeFiles/cccl.c.parallel.dir/src/reduce.cu.o

Copy link
Member

@leofang leofang Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed over the code and I am actually confused, because my impression is that the kernel compilation is still done at run time (JIT), and that the host logic can just be handled by a host compiler. @gevtushenko IIRC you built the prototype, any reason we have to use .cu files here and use NVCC to compile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 2913ae0 adopts the established _version.py handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr I would suggest that if you have to do any compilation whatsoever beyond pure Cython you switch away from setuptools, but if you don't have any compiled modules at build time then stick to setuptools or use another backend that isn't designed for compilation (hatchling would be a great choice).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gevtushenko IIRC you built the prototype, any reason we have to use .cu files here and use NVCC to compile?

In the offline call Georgii reminded me that there are some CUB structs that we need to pre-compile to pass around. Since generally CUB headers are not host compilable, NVCC has to be used, but we don't generate any GPU-specific code.



with open("README.md") as f:
long_description = f.read()
Copy link
Member

@leofang leofang Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be moved to pyproject.toml too, ex:
https://github.com/NVIDIA/cuda-python/blob/33b7366e308201f3bca8206ae331e399ac1b3379/cuda_core/pyproject.toml#L65
(in pyproject.toml, readme is the new preferred name over long_description)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: commit 40c8389



class CustomBuildCommand(build_py):
def run(self):
self.run_command("package_cccl")
build_py.run(self)


class CustomWheelBuild(bdist_wheel):
def run(self):
self.run_command("package_cccl")
super().run()


class PackageCCCLCommand(Command):
description = "Generate additional files"
user_options = []

def initialize_options(self):
pass

def finalize_options(self):
pass

def run(self):
for proj_dir, header_dir in cccl_headers:
src_path = os.path.abspath(os.path.join(cccl_path, proj_dir, header_dir))
dst_path = os.path.join(project_path, "cuda", "_include", proj_dir)
if os.path.exists(dst_path):
shutil.rmtree(dst_path)
shutil.copytree(src_path, dst_path)


setup(
name="cuda-cccl",
version=ver,
description="Experimental Package with CCCL headers to support JIT compilation",
long_description=long_description,
long_description_content_type="text/markdown",
author="NVIDIA Corporation",
classifiers=[
"Programming Language :: Python :: 3 :: Only",
"Environment :: GPU :: NVIDIA CUDA",
],
packages=find_namespace_packages(include=["cuda.*"]),
python_requires=">=3.9",
cmdclass={
"package_cccl": PackageCCCLCommand,
"build_py": CustomBuildCommand,
"bdist_wheel": CustomWheelBuild,
},
include_package_data=True,
license="Apache-2.0 with LLVM exception",
license_files=("../../LICENSE",),
)
1 change: 0 additions & 1 deletion python/cuda_cooperative/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
cuda/_include
env
*egg-info
13 changes: 11 additions & 2 deletions python/cuda_cooperative/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,16 @@ Please visit the documentation here: https://nvidia.github.io/cccl/python.html.

## Local development

First-time installation:

```bash
pip3 install ./cuda_cccl
pip3 install ./cuda_cooperative[test]
pytest -v ./cuda_cooperative/tests/
```

For faster iterative development:

```bash
pip3 install -e .[test]
pytest -v ./tests/
pip3 install -e ./cuda_cooperative[test]
```
26 changes: 2 additions & 24 deletions python/cuda_cooperative/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

import os
import shutil

from setuptools import Command, setup, find_namespace_packages
from setuptools import setup, find_namespace_packages
from setuptools.command.build_py import build_py
from wheel.bdist_wheel import bdist_wheel

Expand All @@ -27,35 +26,14 @@

class CustomBuildCommand(build_py):
def run(self):
self.run_command("package_cccl")
build_py.run(self)


class CustomWheelBuild(bdist_wheel):
def run(self):
self.run_command("package_cccl")
super().run()


class PackageCCCLCommand(Command):
description = "Generate additional files"
user_options = []

def initialize_options(self):
pass

def finalize_options(self):
pass

def run(self):
for proj_dir, header_dir in cccl_headers:
src_path = os.path.abspath(os.path.join(cccl_path, proj_dir, header_dir))
dst_path = os.path.join(project_path, "cuda", "_include", proj_dir)
if os.path.exists(dst_path):
shutil.rmtree(dst_path)
shutil.copytree(src_path, dst_path)


setup(
name="cuda-cooperative",
version=ver,
Expand All @@ -70,6 +48,7 @@ def run(self):
packages=find_namespace_packages(include=["cuda.*"]),
python_requires=">=3.9",
install_requires=[
"cuda-cccl",
"numba>=0.60.0",
"pynvjitlink-cu12>=0.2.4",
"cuda-python",
Expand All @@ -82,7 +61,6 @@ def run(self):
]
},
cmdclass={
"package_cccl": PackageCCCLCommand,
"build_py": CustomBuildCommand,
"bdist_wheel": CustomWheelBuild,
},
Expand Down
1 change: 0 additions & 1 deletion python/cuda_parallel/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
cuda/_include
env
*egg-info
*so
1 change: 0 additions & 1 deletion python/cuda_parallel/MANIFEST.in

This file was deleted.

13 changes: 11 additions & 2 deletions python/cuda_parallel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,16 @@ Please visit the documentation here: https://nvidia.github.io/cccl/python.html.

## Local development

First-time installation:

```bash
pip3 install ./cuda_cccl
pip3 install ./cuda_parallel[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be editable if necessary? Wouldn't a regular install here and then an editable install below would lead to two copies of the package in the environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can! Thanks for asking. Previously I was incorrectly thinking that one pass without -e is required. I tried again fresh and it turns out it actually does work on first pass.

I restored the original README.md: commit 9ed6036

Wouldn't a regular install here and then an editable install below would lead to two copies of the package in the environment?

From what I can tell, the 2nd install clobbers the previous one.

pytest -v ./cuda_parallel/tests/
```

For faster iterative development:

```bash
pip3 install -e .[test]
pytest -v ./tests/
pip3 install -e ./cuda_parallel[test]
```
33 changes: 7 additions & 26 deletions python/cuda_parallel/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

import os
import shutil
import subprocess

from setuptools import Command, Extension, setup, find_namespace_packages
from setuptools import Extension, setup, find_namespace_packages
from setuptools.command.build_py import build_py
from setuptools.command.build_ext import build_ext
from wheel.bdist_wheel import bdist_wheel
Expand All @@ -29,36 +28,14 @@

class CustomBuildCommand(build_py):
def run(self):
self.run_command("package_cccl")
build_py.run(self)


class CustomWheelBuild(bdist_wheel):
def run(self):
self.run_command("package_cccl")
super().run()


class PackageCCCLCommand(Command):
description = "Generate additional files"
user_options = []

def initialize_options(self):
pass

def finalize_options(self):
pass

def run(self):
for proj_dir, header_dir in cccl_headers:
src_path = os.path.abspath(os.path.join(cccl_path, proj_dir, header_dir))
# TODO Extract cccl headers into a standalone package
dst_path = os.path.join(project_path, "cuda", "_include", proj_dir)
if os.path.exists(dst_path):
shutil.rmtree(dst_path)
shutil.copytree(src_path, dst_path)


class CMakeExtension(Extension):
def __init__(self, name):
super().__init__(name, sources=[])
Expand Down Expand Up @@ -100,7 +77,12 @@ def build_extension(self, ext):
],
packages=find_namespace_packages(include=["cuda.*"]),
python_requires=">=3.9",
install_requires=["numba>=0.60.0", "cuda-python", "jinja2"],
install_requires=[
"cuda-cccl",
"numba>=0.60.0",
"cuda-python",
"jinja2",
],
extras_require={
"test": [
"pytest",
Expand All @@ -109,7 +91,6 @@ def build_extension(self, ext):
]
},
cmdclass={
"package_cccl": PackageCCCLCommand,
"build_py": CustomBuildCommand,
"bdist_wheel": CustomWheelBuild,
"build_ext": BuildCMakeExtension,
Expand Down
Loading