Skip to content

Latest commit

 

History

History
162 lines (107 loc) · 3.15 KB

CHANGELOG.md

File metadata and controls

162 lines (107 loc) · 3.15 KB

Changelog

0.4.0 (2024-12-05)

Added

  • CUDA-based resize kernels
    • Peform linear or letterbox resizing
  • core.create_kernel_args nad core.Kernel

0.3.4 (2024-11-12)

Added

  • CUDA-based preprocessing for YOLO:
    • Introduced CUDAPreprocessor and CPUPreprocessor.
    • Additional parameters in YOLO constructor and methods:
      • conf_thres
      • extra_nms, agnostic_nms
      • resize_method, preprocessing_unit.
  • Runtime CUDA kernel generation with NVRTC:
    • Final transform (transpose from HWC to BCHW) reduced from 50ms to 5ms for 1280x1280, achieving a 10x speedup.

Improved

  • Multi-threading safety:
    • ParallelYOLO enforces serial deserialization of engine files.
    • CUDAProcessor now serializes initialization.
    • Core CUDA/NVRTC calls use mutexes.

0.3.3 (2024-10-31)

Added

  • 'impls.yolo.YOLO':
    • Added input_range parameter for specifying the input range.
    • YOLOX uses [0:255], all others use [0:1].

0.3.2 (2024-10-31)

Added

  • Variations of impls.yolo.YOLO: YOLO7, YOLO8, YOLO9, YOLO10, and YOLOX.

Changed

  • impls.yolo.YOLO:
    • Version inference is now automatic.
    • Postprocessing determined from outputs.

0.3.1 (2024-10-29)

Improved

  • Outputs from impls.yolo.YOLO now use standard Python types:
    • Improved compatibility with JIT compilers like numba.

0.3.0 (2024-10-25)

Added

  • impls.yolo.ParallelYOLO: Enables running multiple YOLO models simultaneously.

Improved

  • TRTEngine:
    • Uses async memory copies and execution.
    • Implements pagelocked memory on host.

Removed

  • backend submodule: Deprecated in favor of CUDA Python engines.

0.2.3 (2024-10-17)

Added

  • jetson.benchmark_engine integrated with jetsontools > 0.0.3.

Improved

  • TRTEngine: Enhanced threading documentation.

Fixed

  • trtexec.build_engine: Correctly builds for DLA core 0.

0.2.2 (2024-10-17)

Changed

  • TRTEngine:
    • Uses execute_async_v2 for inference.
    • core.create_engine now creates a cudaStream.

0.2.1 (2024-10-16)

Added

  • Locks for TensorRT engine creation and CUDA memory allocation.

0.2.0 (2024-10-02)

Added

  • benchmark_engine: Measures engine latency.
  • Submodules:
    • jetson
    • impls
    • impls.yolo: Supports YOLO variants (V7 to V10).

Changed

  • trtexec.build_from_onnx renamed to trtexec.build_engine.

0.1.2 (2024-10-10)

Added

  • Async and parallel execution classes:
    • QueuedTRTEngine, QueuedTRTModel
    • ParallelTRTEngine, ParallelTRTModel

0.1.1 (2024-07-30)

Fixed

  • Resolved AttributeError during deallocation crashes.

0.1.0 (2024-07-30)

Changed

  • Default TRTEngine now uses CUDA Python:
    • Improved stability and compatibility.
    • Legacy PyCUDA version available via trtutils.backends.PyCudaTRTEngine.

0.0.8 (2024-07-21)

Added

  • trtexec submodule:
    • Locate and run trtexec commands programmatically.

0.0.3 (2024-02-22)

Fixed

  • Correct package detection as fully typed.

Improved

  • Examples, documentation, and stricter linting/typing.

Added

  • PyCUDA install script for Linux.