Skip to content

Version 1.4

Compare
Choose a tag to compare
@Tom94 Tom94 released this 14 Feb 14:53
· 343 commits to master since this release

Changes Since Last Release

Major Changes

  • Added a PyTorch extension for using tiny-cuda-nn from within Python.
    • This functionality is considered to be in a "beta" state. Please do report any issues you come across!
    • See the this section of the README for installation/usage instructions.
    • Caveat: the overheads of Python/PyTorch can be extensive. For example, the bundled mlp_learning_an_image example is ~2x slower through PyTorch than native CUDA. (This is still faster than implementing everything from scratch in Python, but something to be aware of.)
  • Significantly reduced memory usage (sometimes 3x lower)
    • Added a GPU memory arena that permits efficient, stream-ordered allocation and de-allocation of temporary buffers. This circumvents the need for pre-allocation, resulting in often 3x lower memory consumption.
    • The memory arena uses the GPU's virtual memory mapper to get its performance without invalidating pointers or shuffling memory around.
  • All neural networks in tiny-cuda-nn now additionally support row-major input memory layout. This affords higher performance and lower memory usage when transposition was otherwise required.
    • GridEncoding naturally outputs row-major data and is thus sped-up by ~20% when followed by a neural network.
  • tiny-cuda-nn now runs on older GPUs down to compute capability 37.

Minor Changes

  • Sped up the input gradient computation of GridEncoding by ~3x.
  • Sped up SyncedMultiStream.
  • Fixed incorrect gradients of SphericalHarmonicsEncoding.
  • Fixed incorrect gradients of GridEncoding when max_level arguments were provided or Interpolation::Nearest was used.