Releases: NVIDIA/DALI
DALI v1.16.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added GPU non-silent region detection operator (#3944, #4001).
- Added experimental support for the eager execution of stateful operators and arithmetic operators (#4016, #3952, #3969, #3990).
- Added
antialias
flag to Resize operator for improved control over resampling mode used (#4032). - Added experimental support for custom GPU Numba operators (#3891, #3998, #4006, #4013).
- Added support for processing video and handling of temporal arguments to color-manipulation operators and affine transform operators (#3937, #3946, #3917).
Fixed Issues
The following issues were fixed in this release:
- Fixed DALI + PyTorch Lightning iterator issue resulting in subsequent epochs terminating too early (#3923, #4048).
- Fixed scalars handling by the readers.tfrecord operator (#4024).
- Fixed variable batch size handling by the crop and coord_transform operators (#4045, #3958).
Improvements
- Add little-endian and big-endian read functions for InputStreams (#4038)
- Add antialias flag to Resize (#4032)
- Reformat python files (#4026)
- Python formatting (#4035)
- Enable nose2 in Python Tests (#4033)
- Imgcodec module boilerplate (interfaces/placeholders/basic logic) (#4029)
- Remove deprecated option options.experimental_optimization.map_vectorization.enabled (#4027)
- Guided contribution tutorial (#4011)
- Fix python formatting (#3982)
- Add eager mode stateful operators (#4016)
- Disable Numba GPU op for incompatible Numba versions (#4025)
- Add missing quote marks to the DALI_AFFINITY_MASK usage example (#4020)
- Add abstract InputStream. Refactor existing FileStreams to in to use it. (#4019)
- Make DALI iterator to call
reset()
wheniter()
is called upon it (#3923) - Add eager mode operators coverage test (#3952)
- Add ack for Numba GPU op (#3998)
- Add eager mode arithm ops (#3969)
- Reduce DALI conda package installation time (#3995)
- Add Non-silent region GPU operator (#3944)
- Workaround for nosetests in Python 3.10 (#3986)
- Numba cuda operator (#3891)
- Fix Python formatting (#3992)
- Fix Python formatting (#3988)
- Add examples of processing video that utilize per-frame operator (#3917)
- Per frame affine transforms (#3946)
- Handle partially pruned multi-output external sources (#3975)
- Dependencies update (#3979)
- Doxygen typo (#3989)
- Add per frame parameters support to brightness_contrast and color_twist families (#3937)
- Fix missing return (#3985)
- Support vector alike output for OpSpec::TryGetRepeatedArgument (#3851)
- Fix Python formatting (#3962)
- Fix and reenable optimized Cast kernel (#3976)
Bug Fixes
- Fix lack of reset when iter() is called on the DALI framework iterator (#4048)
- Use actual batch size instead of max batch size in crop_attr.h (#4045)
- Support scalars in readers.tfrecord (#4024)
- Add const char* ctor to ThreadPool (#4005)
- Remove unconditional float16 type mapping in Numba GPU op (#4013)
- Change flake8 config (#4004)
- Fix Numba CI issues (#4006)
- Fix and simplify moving mean squares CPU kernel. (#4001)
- Fix nan check and unused external source arguments in debug mode (#3990)
- Fix fn.coord_transform handling of a default matrix in variable batch case (#3958)
- Fix test_dali_tf_dataset_mnist_eager test (#3991)
- Fix test_dali_tf_dataset_mnist_eager.py and test_dali_tf_dataset_mnist_graph.py tests (#3987)
- Improve handling of "dtype" arguments in OpSchema/OpSpec (#3981)
Breaking API changes
- The shape of scalars read by the readers.tfrecord operator is now
()
instead of(1,)
. - For
cubic
andlinear
interpolation modes, theresize
operator applies the antialiasing filter by default now. The antialiasing can be turned off with theantialias
flag.
Deprecated features
- The triangular interpolation for
resize
operator has been deprecated as it is equivalent to linear interpolation with antialiasing on.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, GPU external source is not properly synchronized with DALI internal streams. As a workaround, the user may manually synchronize the device before returning the data from the callback.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.16.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.16.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.16.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.16.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.16.0-5323000-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.16.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.0-5322998-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.0-5322998-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.16.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.15.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added the GPU audio resampling operator (#3884, #3914 and #3911).
- Improved the performance of the GPU
fn.readers.numpy
by custom GDS staging (#3894, #3905). - Added support for video processing and per-frame (temporal) arguments to the
warp_affine
operator (#3879, #3900). - Added HEVC support to the GPU frames decoder (#3896).
- Added experimental support for the eager execution of stateless operators as Python functions and readers as iterators (#3887, #3930).
- Added CUDA 11.7 support (#3906).
- Profiling improvements:
Fixed Issues
The following issues were fixed in this release:
- Added the missing device/device synchronization when copying pipeline outputs with copy_to_external (#3953).
- Fixed the buffer synchronization between default and custom stream in a multi-GPU case (#3957).
Improvements
- Fix Python formatting (#3961)
- Fix coverity issues (#3974)
- Add FindReduceGPU and FindRegionGPU kernels (#3951)
- Fix Python formatting (#3965)
- Add .style.yapf file (#3970)
- Update Optical Flow example (#3971)
- Fix per frame pass through (#3959)
- Fixing Python code formatting (#3948)
- Suppress the use of a staging buffer for nvJPEG input if it's already pinned.(#3956)
- Fix cyclic dependency import problem in fn.py in python 3.6 (#3955)
- Refactor qa test scripts (#3933)
- Change thread pool creation for eager operators to lazy (#3931)
- Fix sequence shape test (#3949)
- Expose readers as iterators in eager mode (#3930)
- Add Python linter (#3929)
- Remove redundant quote marks from the protobuf version specifier (#3945)
- Skip GDS tests when the GPU is incompatible. (#3941)
- Add sequence processing to warp operator (#3879)
- Add MovingMeanSquareGpu kernel (#3922)
- Pin protobuf to <4 for Paddle Paddle (#3940)
- Update compilation flags for the DALI TensorFlow plugin (#3943)
- Change MultiDevice to MultiGpu test suffix (#3942)
- Bump up the nvidia-tensorflow version to 20.05 in tests (#3938)
- Add FindFirstLastGPU kernel (#3932)
- Adjust PR template to ask for listing exisiting tests that apply (#3939)
- Pin protobuf to <4 (#3934)
- Add VFR detection (#3921)
- Fix CVE-2022-0562 in libtiff (#3925)
- Update RNN-T pipeline tests to include GPU resampling and silence detection (#3920)
- Add more NVTX ranges to the executor (#3928)
- Add HEVC support for FramesDecoderGpu (#3896)
- Add a thread name to all DALI threads (#3912)
- Add dataclasses pip package to tests deps to fix Python3.6 operator tests (#3926)
- Add
fn.experimental.audio_resample
GPU (#3911) - Custom staging for GDS (#3894)
- Update the readme roadmap link to use 2022 one (#3918)
- Support specifying per-frame positional arguments in sequence processing test utility (#3901)
- Move audio resampler CPU implementation to a single compilation unit (#3914)
- Add stateless CPU eager operators (#3887)
- Add CUDA 11.7 support (#3906)
- Add
VideoReaderDecoder
test for missing labels (#3908) - Add signal resampling GPU kernel (#3884)
- Optimize parameter passing for ScatterGather GPU (#3905)
- Add references to ops documentation in the tutorials (#3904)
- Enable per-frame operator on GPU (#3900)
Bug Fixes
- Fix dltensor operator tests (#3984)
- Prevent clobbering of outputs before non-blocking copy_to_external finishes. (#3953)
- Fix a bug in AccessOrder when synchronizing with a default stream on the same device, which is not the current device. (#3957)
- Workaound GDS memory leak in GDSMem tests. (#3936)
- Fix circular imports in eager mode (#3919)
- Remove intermediate Tensor and use DynamicScratchpad for op tile descirptors. (#3915)
- Add missing moving of order in TensorVector's move assgiment/constructor (#3899)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.15.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.15.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.15.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.15.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.15.0-5080387-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.15.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.15.0-5080390-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.15.0-5080390-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.15.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.14.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added HEVC support to the CPU frames decoder (#3885).
- Added the CPU audio resampling operator (#3840).
- Added support for video processing and per-frame (temporal) arguments to the rotate operator (#3820).
- Added support for variable batch size in the debug mode (#3799).
- Performance optimizations:
Fixed Issues
- Fixed the compatibility with TensorFlow 2.9 by adding type propagation to DALIDataset (#3875).
- Added a missing check when the number of files and labels match in the experimental video reader (#3903).
- Added a missing check when the number of samples is greater or equal to the number of shards in readers (#3856).
- Fixed scalars handling in the GPU cast operator (#3924).
Improvements
- Add support for TensorFlow 2.9. (#3909)
- Remove deprecated usage of numpy types int and long (#3898)
- Add
output_dtype
andoutput_ndim
arguments to Pipeline constructor (#3877) - Add hevc support cpu frames decoder (#3885)
- Add a C API call to get the max batch size (#3890)
- Add bool to Pad supported types (#3895)
- Adjust eps in test comparing readers (#3892)
- Fix coverity issues. Do not re-throw worker thread error in the destructor. (#3886)
- Fix memory leak in C API test (#3889)
- Add tutorials references to ops docs - general section (#3869)
- Refactor video tests (#3864)
- Add NonsilentRegion GPU, implemented in terms of the CPU version (#3874)
- Add a check of the decoding progress in the VideoReader (#3858)
- Reduce libaviutils log verbosity to errors and above (#3871)
- Extend C Api to fetch the layout and ndim from External Source (#3862)
- Updated PyTorch-Lightning example with new strategy keyword for Trainer. (#3867)
- Update clang version to 14.02 (#3863)
- Improve cast operator performance (#3783)
- Update CUTLASS to v2.9.0 (#3860)
- Change the way how CUDA pub key is installed (#3866)
- Audio resampling operator for CPU backend (#3840)
- Dependencies update (#3831)
- Optimization of tiled transposition algorithm on small data types (#3730)
- Improve CropMirrorNormalize operator performance (#3771)
- Fix typo (model -> module) (#3848)
- Add a check against changing layout in ES (#3839)
- Add cpu only and variable batch size tests to per-frame operator (#3850)
- Missing f prefix on f-strings fix #3847
- Fix handling of arguments with trailing newlines when generating operator docs (#3841)
- Add support for sequence processing to rotate (#3820)
- Fix TF DALIDataset tests that changed layout between iterations (#3836)
- Add ndim argument to the external source operator (#3755)
- Add operators cross-referencing to data loading index (#3823)
- Features required for
autoserialization
in DALI Backend (#3795) - Remove gtest RandomBBoxCropTest tests (#3822)
- Update user documentation footer copyright date (#3819)
- Add operator cross-referencing to custom operators tutorials (#3818)
- Fix the default value of resize min_filter in the documentation (#3816)
- Benchmark for Transpose operator (#3785)
- Add operator cross-referencing to data loading section (#3809)
- Update
[shields.io](http://shields.io/)
badges inREADME.rst
. (#3815) - Add operator cross-referencing to audio processing tutorials (#3806)
- Add operator cross-referencing to video processing tutorials (#3808)
- Add support for variable batch size and NVTX ranges in debug mode (#3799)
- Shutdown() a WorkerThread in the destructor (#3810)
- Improve the redirect (#3801)
Bug Fixes
- Add tests for operator cast. Revert to plain batched cast kernel until the optimized one is fixed. (#3927)
- Fix scalar handling in GPU cast. (#3924)
- Adds check to the experimental video reader if the number of files and labels match (#3903)
- Add type propagation implementation introduced in TF 2.8 (#3875)
- Fix corruption: Change bool to int when querying pointer attributes. (#3873)
- Make libtar and libsnd root paths customizable. (#3872)
- Add check if the number of samples is greater or equal to the number of shards in readers (#3856)
- Fix transposition kernel tests (#3859)
- Fix default argument handling in cuda_vm_resource constructor (#3857)
- Fixes
test_coverage
case in test_dali_cpu_only.py and test_dali_variable_batch_size.py (#3849) - Fix rotate assertion warning (#3852)
- Make failure in curl to fail Dockerfile.build.aarch64-linux image build (#3821)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plug-in might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have the prebuilt plug-in binary that is shipped with DALI, ensure that the compiler that is used to build TensorFlow exists on the system during the plug-in installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.14.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.14.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.14.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.14.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.14.0-4921279-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.14.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.14.0-4921308-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.14.0-4921308-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.14.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.13.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added support for per-frame (temporal) arguments to the Gaussian Blur and Laplacian operators (#3715 and #3723).
- Optimized audio decoder resampling for ARM (#3745).
- Improved the debug (immediate execution) mode:
- Added support for GPU positional arguments in the Slice operator (#3741).
- Documentation improvements:
- Split the operator documentation into separate pages (#3794).
- Added a mechanism for cross-referencing examples and operators (#3748).
- Added an FAQ section to the DALI user guide (#3761).
- Added new GTC talks (#3757).
- Added shuffling and shards handling snippets to the parallel external source examples (#3744).
Fixed Issues
- Fixed the handling of samples that exceed 2GBs in the parallel external source (#3768).
Improvements
- Add per-frame operator (#3723)
- Add support for per-frame arguments to Gaussian Blur and Laplacian operators (#3715)
- Separate the documentation pages! (#3794)
- Update zlib to 1.2.12 version (#3787)
- Trim TL0_tensorflow_plugin and TL0_python-self-test-readers-decoders tests (#3796)
- Add
_schema_name
attribute in fn API (#3798) - Add resize checkerboard tests, comparing to ONNX reference precomputed data (#3792)
- Update nvJPEG2000 to 0.5.0 version (#3791)
- Fix header in parallel external source notebook (#3790)
- Update documentation link to the '22 roadmap (#3786)
- Bump Nvidia TF1 version used in tests to 22.03 (#3769)
- Add mechanism for crossreferencing examples and operators (#3748)
- Add direct operator calls in debug mode (#3734)
- Make number of samples in batch signed (#3789)
- Add debug mode benchmark (#3762)
- Fix the cuBLAS version to one compatible with nvTF 22.01 (#3781)
- Apply changes from TV sample encapsulation in NVJPEG2K (#3780)
- Ensure sample encapsulation in Tensor Vector (#3701)
- Add a TL0 test that runs on more than 1 GPU (#3772)
- Add FAQ section to the DALI documentation (#3761)
- Remove the compose operator from the fn API table (#3767)
- Add new GTC talks. Update old link (#3757)
- Update to CUDA 11.6u2 (#3764)
- RNG to use pinned memory for kernel launch args (#3765)
- Revert "Pin webdataset version to the last compatible with python 3.6 (#3746)" (#3763)
- Fix the wrong patch for CVE-2022-0907 which by mistake duplicated CVE-2022-0909 (#3760)
- Quantize GDS chunk size to 1 MB. (#3759)
- Add GDS-compatible allocator with 4k alignment. (#3754)
- Update error messaging of nvJPEG (#3756)
- Allow GPU slice arguments (#3741)
- Add filename to the error message in the numpy reader (#3753)
- Fix libtiff vulnerabilities (#3752)
- Update parallel external source notebook and include shuffling example.. (#3744)
- Add supported python version classifier to DALI TF plugin setup.py (#3751)
- Vectorize audio resampling for ARM NEON. (#3745)
- Remove prints from the regular DALI execution flow (#3740)
- Pin webdataset version to the last compatible with python 3.6 (#3746)
- Align test expectations with slice implementation rounding logic (#3738)
- Update RapidJSON (#3737)
- Regenerate getting started jupyter examples (#3732)
- Improve documentation for AccessOrder wait and set_order. (#3736)
Bug Fixes
- Add missing copying of pinned prop when sharing buffer (#3797)
- Disable PES large sample test on Xavier runner (#3788)
- Fix source device in PyTorch cross-device test. (#3775)
- Fix large mini-batch handling in parallel external source (#3768)
- Fix Yolo v4 example non-fatal teardown error (#3739)
- Rework Image Decoder example (#3731)
- Check return value of a CUDA function call. (#3733)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - The DALI TensorFlow plug-in might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have the prebuilt plug-in binary that is shipped with DALI, ensure that the compiler that is used to build TensorFlow exists on the system during the plug-in installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.13.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.13.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.13.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.13.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.13.0-4481322-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.13.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.13.0-4481327-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.13.0-4481327-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.13.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.12.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added support for the GPU-accelerated decoding of videos with a variable frame rate (experimental.readers.video) (#3668).
- Reduced the binary size (#3680 and #3682).
- Improved the TensorFlow plug-in installation even when none of the prebuilt binaries matches the exact TensorFlow version (#3720).
- Improved performance by increasing the usage of pinned memory in argument input buffers (#3728).
- Documentation improvements (#3722, #3684, and #3674).
Fixed Issues
- Fixed the TensorFlow plug-in issue that prevented it from working in the CPU-only mode (#3719).
Improvements
- [DALI TF] Try building from source when TF version doesn't match exactly. Add test step to installation script. (#3720)
- Add supported layouts to Crop, CropMirrorNormalize (#3722)
- Make output buffers for arugment inputs to GPU operators pinned. (#3728)
- Bump up TensorFlow version used in tests (#3688)
- Fix coverity issues (#3679)
- Bump up CUDA to 11.6U1 (#3709)
- Add test to check if importing DALI doesn't break Torch process forking (#3669)
- Add non-owning SampleView (#3706)
- Use pinned buffers for kernel parameters and for ToContiguousGPU. (#3689)
- Update deps version for libtiff-CVE-2022-0561 fix (#3693)
- Update documentation regarding GDS being part of CUDA toolkit (#3684)
- Add VideoReaderDecoder GPU (#3668)
- Custom build: subset of file patterns for kernel and operators (#3672)
- Remove
lineinfo
from RelWithDebInfo DALI builds (#3680) - Build DALI only for major arch versions (#3682)
- Remove mpiexec affinity binding in TensorFlow TL1 and TL3 RN50 test (#3681)
- Remove Scratchpad from KernelManager (#3678)
- Update dependencies (#3677)
- Use DynamicScratchpad in KernelManager. (#3670)
- Add an info about
fill_values
being used bypad_output
in crop_mirror_normalize (#3674)
Bug Fixes
- Fix CVE-2022-0626 in libtiff (#3727)
- Fix TensorFlow plugin operation without GPU (#3719)
- Syncrhonize at the end of BoxEncoder's constructor. (#3724)
- Fix ES debug mode test failing with missing batch (#3712)
- Add missing
import nose.SkipTest
in optical flow tests (#3707) - Fix stream handling in video loader and nvdecoder. (#3705)
- Fix typos found in tensor_shape.h docs (#3695)
- Fix optical flow tests for Turing (#3685)
- Fix Slice's adaptive tiling for smaller output types (#3687)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.12.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.12.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.12.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.12.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.12.0-4144186-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.12.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.12.0-4144197-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.12.0-4144197-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.12.0.tar.gz
FFmpeg source code:
Libsndfile source code:
v1.11.1: Fix stream usage in C API (#3713)
Key Features and Enhancements
This is a patch release.
Fixed Issues
- Fixed wrong handling of input data by GPU external source in multi-GPU scenario
- Fixed wrong usage of streams in C API
Improvements
- None
Bug Fixes
- Fix multi-device GPU external source. (#3710)
- Fix constructing GPU Tensor from DLPack capsule (#3711)
- Fix stream usage in C API (#3713)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
- The
experimental.readers.video
operator causes a crash during the process teardown with driver versions 460 to 470.21
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.11.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.11.1
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.11.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.11.1
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.11.1-4069476-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.11.1.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.11.1.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.11.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added the GPU laplacian operator (#3644, #3618).
- Updated the optical_flow operator to use the latest SDK capabilities (#3625).
- Extended the readers.webdataset operator to support pax POSIX.1-2001 tar format. (#3645).
- Improved the performance of the slice operator (#3604, #3600).
- Improved the debug (immediate execution) mode:
Fixed Issues
- Fixed the incorrect construction of TensorList from a list of tensors (#3626).
- Fixed an issue in the CPU readers.video operator that prevented it from working in the CPU-only mode (#3660).
Improvements
- Improve checking if it is safe to fork the DALI process (#3671)
- Add debug mode tutorial notebook (#3648)
- Dynamic & stream-aware scratchpad (#3667)
- Use fn API in non-silent tests (#3666)
- Frames decoder gpu (#3615)
- Add Laplacian GPU operator (#3644)
- Update third party (#3632)
- Improve the documentation about CPU tensors and named arguments (#3655)
- Update docs for the
parallel
option in external source (#3654) - Update optical flow operator to use the latest OF SDK capabilities (#3625)
- Remove deprecated usage of
.dtype()
method (#3650) - Update pattern used to generate TFRecord idx files (#3653)
- Add one_hot benchmark (#3553)
- Add str and repr for Tensor, TensorList and DataNode[Debug] (#3647)
- Relax test tolerance in DisplacementTest/Sphere and Water (#3649)
- Update warp_affine test and docs (#3639)
- Remove unnecessary Dockerfile.cuda116.x86_64deps file (#3642)
- Updates FindNVJPEG.cmake (#3643)
- Add JPEG compression distortion to augmentation gallery (#3633)
- Use index slicing in geometric transformation notebook (#3635)
- Add support for tar pax POSIX.1-2001 WebDataset (#3645)
- Remove redundant tests (#3634)
- Add dtype member for TensorList and modify dtype for Tensor (#3628)
- Remove dependency between dali_test.bin and dali_operators lib (#3637)
- Add Laplacian GPU kernel (#3618)
- Updated PR template (#3619)
- Remove synchronization from deallocate. (#3497)
- ArgHelper tests to not depend on operators from dali_operators lib (#3631)
- Add dtype argument to ExternalSource in examples (#3611)
- Add CUDA 11.6 support (#3623)
- Make data objects stream-aware (#3536)
- Changing WDS Reader
source_info
property (#3614) - Relax test tolerance in DisplacementTest/Sphere (#3621)
- Video tests utils and refactor (#3620)
- Debug mode direct ExternalSource (#3605)
- Remove Buffer inheritence from TensorList (#3576)
- Relax test tolerance in DisplacementTest/Water (#3616)
- Improve Slice's adaptive tiling (#3604)
- Explicitly coalesce stores in Slice for smaller output types (#3600)
- Add an upper bound for the video decoder workaround (#3609)
- Deterministic seeds in debug mode (#3589)
- Move from zlib to zlib-ng optimized fork (#3570)
- TensorList shape (#3591)
Bug Fixes
- Fix frames decoder destruction (#3662)
- Removes check of CUDA runtime and linked libs from the backend (#3664)
- Remove CUDA call from CUDAStreamPool's constructor (#3663)
- Fix
librosa
bugs after 0.9 release (#3665) - Fix VideoReader CPU only variant (#3660)
- Add a separate initialization method to OpticalFlowAdapter (#3657)
- Fix get-pip.py for python 3.6 (#3652)
- Fix sphinx warnings in the docs (#3651)
- Fix synchronization bug in operator benchmark (#3638)
- Replace calls to exp2 with std::exp2f (#3646)
- Fix null_stream constant evaluation fallback (#3630)
- Fix CVE-2021-4156 in libsnd (#3624)
- Fix TensorList constructor from list of tensors. (#3626)
- Fix CVE-2022-22844 in libtiff (#3612)
- Fix dtype in external_source with multiple outputs. (#3608)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
- The
experimental.readers.video
operator causes a crash during the process teardown with driver versions 460 to 470.21
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.11.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.11.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.11.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.11.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.11.0-3985923-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.11.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.11.0-3985922-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.11.0-3985922-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.11.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.10.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- New operators:
- Color-based augmentations were extended to support video data (#3580).
- Improved performance of the
slice
operator (#3584, #3573, and #3568). - Added an experimental debug (immediate execution) mode (#3586 and #3531).
Fixed Issues
No major issues were fixed in this release.
Improvements
- Adds video support to color based augmentations (#3580)
- Fixed cmake error (#3601)
- Fix debug build failures in benchmark code (#3585)
- Make sanitizers tests fail when it encounters the first issue (#3583)
- Use proper attribute filters for nosetests (#3592)
- Fix wrong parameter name in Laplacian docs (#3593)
- QA script fix: Add an empty negative branch to a conditional to prevent automatic error (#3588)
- Small refactoring in Slice GPU kernel (#3584)
- GetProperty operator CPU+GPU (#3572)
- Add comments about scale argument (#3581)
- Fix coverity issues (#3579)
- Check when using ES source and feed_input (#3574)
- Prototype of the debug mode (#3531)
- Enable tests for dynamically loaded cuda libraries (#3540)
- Add Laplacian operator [CPU] (#3563)
- Add CUDAStreamPool & CUDAStreamLease. (#3569)
- Coalesce stores in Slice for smaller output types (#3568)
- Turn off OpticalFlow test on aarch64 platform for driver r495.x and newer (#3566)
Bug Fixes
- Fixing typos in WDS's source_info (#3602)
- Fix handling of scalar argument in slice operator (#3596)
- Use the same device for debug mode test and baseline (#3594)
- Fix JPEG distortion GPU quality argument handling for sequences (#3590)
- Use current device in _as_gpu (#3586)
- Fix
version_ge: command not found
error in TL0_python-self-test-base-cuda (#3582) - Disable coalescing values in Slice for CUDA 10 (#3573)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.10.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.10.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.10.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.10.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.10.0-3728184-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.10.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.10.0-3728186-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.10.0-3728186-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.10.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.9.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Extended the
jpeg_compression_distortion
operator to support video inputs (#3482 and #3447). - Added the
file_filter
argument to thereaders.file
operator that allows you to filter files by names (#3459). - Extended the
slice
operator to support per-sampleaxes
arguments and negative axis indexing (#3516). - Extended the
pad
operator to support per-sampleaxes
,fill_value
arguments, and negative axis indexing (#3534). - Improved the performance of the
slice
operator for small batch sizes (#3557). - Added the Laplacian CPU kernel (#3565, #3535, and #3518).
Fixed Issues
This DALI release includes the following fixes:
- Fixed a race condition that randomly caused incorrect outputs in the TensorFlow plugin (#3547).
- Fixed synchronization issues in the PaddlePaddle plugin that may have caused incorrect results (#3498 and #3487).
Improvements
- Make Slice kernel tiling adaptive (#3557)
- Add Laplacian CPU kernel (#3518)
- Allows DALI to dlopen dependent CUDA toolkit libraries: NPP, cuFFT and nvJPEG (#3519)
- Fix test code to be compatible with python 3.6 (#3550)
- Fix a typo in warp jupyter notebook. (#3554)
- Add Cast and CoinFlip GPU benchmarks (#3541)
- Fix DALI TL3 test for 21.11 (#3529)
- Pad operator: Add support for per-sample axes and fill_value arguments, and negative axes (#3534)
- Add FlipGPU and GaussianBlurGPU benchmarks (#3538)
- Make bundle-wheel.sh more configurable (#3539)
- Enable DALI test on python 3.9 and add 3.10 support (#3522)
- Add transform parameter to convolution cpu (#3535)
- Remove nvJPEG leak sanitizer workaround in tests (#3532)
- Dependency update Nov 2021 (#3523)
- Add support for per-sample axes and negative axes in Slice (#3516)
- Refactor ArgValue to support empty samples and batch shape expectations (#3528)
- Move to CUDA 11.5 update 1 (#3526)
- Add Copy GPU benchmark (#3517)
- Move to CUDA_CALL for nvJPEG, nvJPEG2k, and NPP (#3521)
- Silence warning in LookupTable (#3508)
- Move unfold_outer_dim to common utilities. (#3486)
- Remove Context from memory resources. (#3485)
- Set minimum python version to 3.7 for TF 2.7 (#3489)
- Allow video inputs to JpegCompressionDistortion (#3482)
- Bump up TensorFlow version to 2.7 in tests (#3475)
- Change the way how NVML wrapper is linked internally (#3481)
- Add support for file_filters in FileReader (#3459)
- Allow video inputs to JpegCompressionDistortion (#3447)
- Move to Ubuntu 20.04 for cuda 10.2 toolkit image (#3477)
- Move to Ubuntu 20.04 for cuda toolkit image (#3476)
- Pin Keras version for TensorFlow 2.6 (#3474)
- Add support for BatchInfo in experimental TF DALI Dataset (#3468)
Bug Fixes
- Replace equality with EqualEpsRel in Laplacian kernel tests (#3565)
- Synchronize CUDA stream once in operator benchmark (#3525)
- Ensure that num_devices and device are stored in correct order. (#3560)
- Fix conda test for CUDA 10.x (#3556)
- Fix race condition when initializing per-device default memory resources (#3555)
- Fix data race when copying outputs in TF plugin (#3547)
- CUDA VM resource bugfixes (#3545)
- Fix build of DALI TensorFlow plugin during installation (#3546)
- Fix issues found during static analysis (#3524)
- Fix lack of proper device id used to obtain relevant cuda stream in paddle plugin (#3498)
- Add type check to last_batch_policy argument (#3490)
- Fix DALI paddle plugin stream synchronization error (#3487)
- Reuse GaussianBlur windows between iterations (#3484)
- Add synchronization when destroying the Executor. Make all destructors noexcept. (#3492)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.9.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.9.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.9.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.9.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.9.0-3647996-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.9.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.9.0-3647997-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.9.0-3647997-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.9.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.8.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added batch mode support to
external_source
operator with parallel callback. (#3420 and #3397) - Extended
crop_mirror_normalize
operator to support per-sample normalization parameters. (#3455) - Improved error messages when trying to decode images with unsupported format. (#3445)
- Documentation improvements. (#3448 and #3439)
Fixed Issues
This DALI release includes the following fixes:
- Fixed unsound interpretation of the aspect ratio parameter in the
random_bbox_crop
operator, when input shape is provided. (#3425) - Fixed incorrect output shape in the
experimental.readers.video
operator. (#3460)
Improvements
- Remove reseeding of numpy in RandomlyShapedDataIterator (#3466)
- Add indexing information to TF external source tests (#3467)
- Extend setup_packages.py to bing package with its dependencies (#3464)
- Update dependency versions (#3457)
- Optionally load plugins global symbols. (#3462)
- Add NVIDIA Video Codec SDK - NVDECODE API (#3458)
- CropMirrorNormalize: Add support for per-sample normalization arguments (#3455)
- Support batch mode in parallel external source (#3397)
- Turn off part of TL0_FW_iterators tests when sanitizers are enabled (#3456)
- Read ArgValue constant arguments only once (#3453)
- Rename InputRef/OutputRef to Input/Output in workspace API (#3451)
- Reduce number of Workspace Input/Output APIs (#3446)
- Fix error reporting in image factory (#3445)
- Update custom op example for newer CMake (#3448)
- Update TF dataset to 2.8 (#3442)
- Fix documentation of CropMirrorNormalize dtype argument (#3439)
- Bump up nvJPEG2k version to 0.4 (#3440)
- Enable CUDA 11.5 builds (#3436)
- Enable sanitizers in regular CI runs (#3422)
- Improve the way how available python version is available (#3438)
- RandomBBoxCrop: Fix interpretation of aspect ratio, when input shape is provided (#3425)
- Change the
permute
function to infer the output size from the indices. (#3434) - Move to the upstream deb packages for JetPack compilation (#3432)
- Change C++ standard to c++17 for non-CUDA sources (#3423)
- Add epoch number to SampleInfo and introduce BatchInfo (#3420)
- Separate type setting from data access in Buffer (#3414)
- Make SBSA build compatible with all armv8-a CPUs (#3417)
- Update TF plugin for future API change (#3415)
- Replace pointers with references for ShareData parameter (#3408)
- Code cleanup: remove unused variables, fix buffer overflow (#3410)
- Enable usage of sanitizers in tests (#3377)
Bug Fixes
- Update tensorflow version in conda build (#3471)
- Fix STRING_VEC default arguments presentation in docs (#3470)
- Remove broken class method from DALI Dataset (#3465)
- Fix experimental.readers.video output shape (#3460)
- Fix static analysis detected issues (#3444)
- Silence output from build_per_python_lib cmake utility (#3454)
- Make Workspace::Input return const reference (#3452)
- Update imports from collections to collections.abc where needed (#3429)
- Install boost/preprocessor headers (#3443)
- Fix ShareData for TensorVector with no elements (#3435)
- Update GCC version in conda recipe to 7.5 to workaround GCC bug 82461. (#3431)
- Add a missing state destruction for the NVJPEG HW decoder (#3416)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.8.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.8.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.8.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.8.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.8.0-3362432-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.8.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.8.0-3362434-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.8.0-3362434-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.8.0.tar.gz
FFmpeg source code:
Libsndfile source code: