- CUDA-based resize kernels
- Peform linear or letterbox resizing
- core.create_kernel_args nad core.Kernel
- CUDA-based preprocessing for YOLO:
- Introduced
CUDAPreprocessor
andCPUPreprocessor
. - Additional parameters in YOLO constructor and methods:
conf_thres
extra_nms
,agnostic_nms
resize_method
,preprocessing_unit
.
- Introduced
- Runtime CUDA kernel generation with NVRTC:
- Final transform (transpose from HWC to BCHW) reduced from 50ms to 5ms for 1280x1280, achieving a 10x speedup.
- Multi-threading safety:
ParallelYOLO
enforces serial deserialization of engine files.CUDAProcessor
now serializes initialization.- Core CUDA/NVRTC calls use mutexes.
- 'impls.yolo.YOLO':
- Added
input_range
parameter for specifying the input range. YOLOX
uses[0:255]
, all others use[0:1]
.
- Added
- Variations of
impls.yolo.YOLO
: YOLO7, YOLO8, YOLO9, YOLO10, and YOLOX.
impls.yolo.YOLO
:- Version inference is now automatic.
- Postprocessing determined from outputs.
- Outputs from
impls.yolo.YOLO
now use standard Python types:- Improved compatibility with JIT compilers like
numba
.
- Improved compatibility with JIT compilers like
impls.yolo.ParallelYOLO
: Enables running multiple YOLO models simultaneously.
TRTEngine
:- Uses async memory copies and execution.
- Implements pagelocked memory on host.
backend
submodule: Deprecated in favor of CUDA Python engines.
jetson.benchmark_engine
integrated withjetsontools > 0.0.3
.
TRTEngine
: Enhanced threading documentation.
trtexec.build_engine
: Correctly builds for DLA core 0.
TRTEngine
:- Uses
execute_async_v2
for inference. core.create_engine
now creates acudaStream
.
- Uses
- Locks for TensorRT engine creation and CUDA memory allocation.
benchmark_engine
: Measures engine latency.- Submodules:
jetson
impls
impls.yolo
: Supports YOLO variants (V7 to V10).
trtexec.build_from_onnx
renamed totrtexec.build_engine
.
- Async and parallel execution classes:
QueuedTRTEngine
,QueuedTRTModel
ParallelTRTEngine
,ParallelTRTModel
- Resolved
AttributeError
during deallocation crashes.
- Default
TRTEngine
now uses CUDA Python:- Improved stability and compatibility.
- Legacy PyCUDA version available via
trtutils.backends.PyCudaTRTEngine
.
trtexec
submodule:- Locate and run
trtexec
commands programmatically.
- Locate and run
- Correct package detection as fully typed.
- Examples, documentation, and stricter linting/typing.
- PyCUDA install script for Linux.