Software Path Tracing renderer implemented in CUDA, from scratch.
The repo contains several external dependencies, therefore, using the following command:
git clone https://github.com/Enigmatisms/cuda-pt.git --recursive
Dependent on GLEW for the interactive viewer (./build/xx/cpt
). If GLEW is not installed, only offline application is available (./build/xx/pt
). Initially, this code base can be run on Linux (tested on Ubuntu 22.04) but I haven't try that since the day my Ubuntu machine broke down. Currently, using MSVC (VS2022) with CMake:
mkdir build && cd build
cmake --DCMAKE_BUILD_TYPE=release ..
cmake --build . --config Release
(./build/xx/cpt.exe
) and (./build/xx/pt.exe
) will be the executable files. To run the code, an example is:
cd build/Release
./cpt.exe ../../scene/xml/bunny.xml
This repo currently has no plan for OptiX, since I am experiencing how to build the wheel and make it fast, instead of implementing some useful features. Useful features are incorporated in the experimental path tracer AdaPT. Check my github homepage for more information.
The scalability of this repo might be worse than that of AdaPT, but it will improve over time, since I plan to migrate from Taichi Lang to a pure-CUDA code base. Currently, this repo supports:
- Toy CUDA depth renderer with profiling
- Megakernel unidirectional path tracing. Two major ray-scene intersection schemes are employed: shared-memory based AABB culling, and GPU BVH (see below).
- Wavefront unidirectional path tracing with stream compaction. Currently, WFPT is not as fast as megakernel PT due to the simplicity of the test scenes (and maybe, coalesced GMEM access problems, being working on this).
- GPU BVH: A stackless GPU surface area heuristic BVH. The current implementation is not optimal (since the ordering of left-child and right child is left unaccounted for, and there is no 'look-back' op), but fast enough. Profiling for this part is not every complete. 1D CUDA texture is used to store the BVH nodes, for better cache performance.
- CUDA pitched textures for environment maps, normal, roughness, index of refraction and albedo.
- Online modification of the scene. Check out the video down below.
CUDA-PT.2024-12-22.20-54-16.mp4
- (Recent) An
imgui
based interactive UI. - (Around 2025.01, stay tuned) Benchmarking with AdaPT (Taichi lang based renderer) and OptiX (optional). More profiling, and finally, I think I will write several blog posts on "How to implement an efficient software path tracing renderer with CUDA". The blog posts will be more focused on the soft(and hard)-ware related analysis-driven-optimization, so they will actually be posts that summarize (and teach) some best practices for programming the tasks with extremely imbalanced workloads.
I've tried a handful of tricks, unfortunately, due to the limitation of time I haven't document any of these (including statistical profiling and analysis) and I currently only have vague (somewhat) concepts of DOs and DON'Ts. Emmm... I really want to summarize all of them, in November, after landing on a good job. So wish me good luck.
- Divergence control part I (loop 'pre-converge')
- Divergence control part II: megakernel or wavefront?
- Stream compaction for WFPT. Shader Execution Reordering (SER) on Ada Lovelace architecture (NVIDIA 40x GPU) (More in-depth reading on this topic, since NVIDIA said almost nothing important in their SER white-book).
- Coalesced access: SoA in WFPT and lg-throttle problem for AoS
- Local memory: dynamic indexing considered harmful
- Dynamic polymorphism: GPU based
variant
or device-side inheritance (virtual functions and their pointers) ? - Avoiding bank conflicts & Use vectorized load / store
- IMC (constant cache miss): when should you use constant cache
- CPU multi-threading and GPU stream-based concurrency (maybe Hyper-Q).
- (More in-depth reading on this topic) What makes a good GPU based spatially-partitioning data structures (like BVH): well I am no expert in this, should more papers on this topic.
-
imgui
has no CMakeLists.txt so we should write it ourselves. - I think it is painful to use GLEW for windows: after compilation,
glew32.dll
should be manually copied toWindows/System32
. Also, we should build GLEW manually.
This repo originated from: w3ntao/smallpt-megakernel, but now it is very different from it. I answered his question on stackexchange computer graphics and tweaked his code, so I thought to myself... why not base on this repo and try to make it better (though, I won't call it small-pt, since it definitely won't be small after I heavily optimize the code).