Skip to content

Version 1.6

Latest
Compare
Choose a tag to compare
@Tom94 Tom94 released this 15 Dec 14:59
· 104 commits to master since this release
8e6e242

With as many improvements as have happened since April, as well as the duration for which tiny-cuda-nn's current state has been stable, I think it's about time for another release.

Changes Since Last Release

  • Multi-GPU support: tiny-cuda-nn can now run on multiple GPUs simultaneously. It is the user's responsibility to ensure that parameters, inputs, outputs, and streams reside on the currently active CUDA device.
    • PyTorch multi-GPU operation works out-of-the-box.
  • CMake improvements: When using tiny-cuda-nn as a CMake submodule, its include folders and libraries are now tracked as part of its PUBLIC interface. This means the following two lines of CMake are sufficient for a parent project to be able to use tiny-cuda-nn in its CUDA code:
    add_subdirectory(dependencies/tiny-cuda-nn)
    target_link_libraries(<parent project> PUBLIC tiny-cuda-nn)
  • Assorted functionality upgrades:
    • AdamOptimizer can now perform weight clipping.
    • A new CompositeOptimizer got added (courtesy of @Solonets). It can optimize different parts of the model (such as encoding and neural net) using different optimizers, e.g. to use different learning rates.
    • CompositeEncoding can now perform sum or product reduction over its nested encodings.
    • Alignment of Encoding's input and output matrices has been simplified and should work automatically in all cases now.
    • Many situations that used to cause undefined behavior are now checked and throw descriptive exceptions.
    • Parameter initialization model->initialize_params(...) and setting model->set_params(...) has been decoupled. Calling set_params is required before being able to use a model. Calling initialize_params no longer influences the parameters of the model and instead merely returns a set of parameters that serves as a good initial state for training.
    • Snapshots are now compatible across CutlassMLP and FullyFusedMLP, as well as across float and __half precision. This means snapshots generated from any GPU can be loaded by any other GPU.
    • The hash function of GridEncoding can now be configured.
  • Countless bug fixes and performance improvements.