You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Caveat: the overheads of Python/PyTorch can be extensive. For example, the bundled mlp_learning_an_image example is ~2x slower through PyTorch than native CUDA. (This is still faster than implementing everything from scratch in Python, but something to be aware of.)
Added a GPU memory arena that permits efficient, stream-ordered allocation and de-allocation of temporary buffers. This circumvents the need for pre-allocation, resulting in often 3x lower memory consumption.
The memory arena uses the GPU's virtual memory mapper to get its performance without invalidating pointers or shuffling memory around.
All neural networks in tiny-cuda-nn now additionally support row-major input memory layout. This affords higher performance and lower memory usage when transposition was otherwise required.
GridEncoding naturally outputs row-major data and is thus sped-up by ~20% when followed by a neural network.
tiny-cuda-nn now runs on older GPUs down to compute capability 37.
Minor Changes
Sped up the input gradient computation of GridEncoding by ~3x.
Sped up SyncedMultiStream.
Fixed incorrect gradients of SphericalHarmonicsEncoding.
Fixed incorrect gradients of GridEncoding when max_level arguments were provided or Interpolation::Nearest was used.