Skip to content

Intel® Neural Compressor v2.3 Release

Compare
Choose a tag to compare
@chensuyue chensuyue released this 15 Sep 07:56
· 704 commits to master since this release
3e1b9d4
  • Highlights
  • Features
  • Improvement
  • Productivity
  • Bug Fixes
  • Examples
  • Validated Configurations

Highlights

  • Integrate Intel Neural Compressor into MSFT ONNX Runtime (#16288) and Olive (#411, #412, #469).
  • Supported low precision (INT4, NF4, FP4) and Weight-Only Quantization algorithms including RTN, AWQ, GPTQ and TEQ on ONNX Runtime and PyTorch for LLMs optimization.
  • Supported sparseGPT pruner (88adfc).
  • Supported quantization for ONNX Runtime DML EP and DNNL EP, and verified inference on Intel NPU (e.g., Meteor Lake) and Intel CPU (e.g., Sapphire Rapids).

Features

  • [Quantization] Support ONNX Runtime quantization and inference for DNNL EP (79be8b)
  • [Quantization] [Experimental] Support ONNX Runtime quantization and inference for DirectML EP (750bb9)
  • [Quantization] Support low precision and Weight-Only Quantization (WOQ) algorithms, including RTN (501440, 19ab16, 859315), AWQ (2562f2, 641d42),
    GPTQ (b5ac3c, 6ba783) and TEQ (d2f995, 9ff7f0) for PyTorch
  • [Quantization] Support NF4 and FP4 data type for PyTorch Weight-Only Quantization (3d11b5)
  • [Quantization] Support low precision and Weight-Only Quantization algorithms, including RTN, AWQ and GPTQ for ONNX Runtime (da4c92)
  • [Quantization] Support layer-wise quantization (d9d1fc) and enable with SmoothQuant (ec9ae9)
  • [Pruning] Add sparseGPT pruner and refactor pruning class (88adfc)
  • [Pruning] Add Hyper-parameter Optimization algorithm for pruning (6613cf)
  • [Model Export] Support PT2ONNX dynamic quantization export (165532)

Improvement

  • [Common] Clean up dataloader usage in examples (1044d8,
    a2931e, 447cc7)
  • [Common] Enhance ONNX Runtime backend check (4ce9de)
  • [Strategy] Add block-wise distributed fallback in basic strategy (ea309f)
  • [Strategy] Enhance strategy exit policy (d19b42)
  • [Quantization] Add WeightOnlyLinear for Weight-Only approach to allow low memory inference (00bbf8)
  • [Quantization] Support more ONNX Runtime direct INT8 ops (b9ce61)
  • [Quantization] Support TensorFlow per-channel MatMul quantization (cf5589)
  • [Quantization] Implement a new method to perform alpha auto-tuning in SmoothQuant (084eda)
  • [Quantization] Enhance ONNX SmoothQuant tuning structure (f0d51c)
  • [Quantization] Enhance PyTorch SmoothQuant tuning structure (81da40)
  • [Quantization] Update PyTorch examples dataloader to support transformers 4.31.x (59371f)
  • [Quantization] Enhance ONNX Runtime backend setting for GPU EP support (295535)
  • [Pruning] Refactor pruning (92d14d)
  • [Mixed Precision] Update the list of supported layers for Keras mix-precision (692c8b)
  • [Mixed Precision] Introduce quant_level into mixed precision (0dc6a9)

Productivity

  • [Ecosystem] MSFT Olive integrate SmoothQuant and 3 LLM examples (#411, #412, #469)
  • [Ecosystem] MSFT ONNX Runtime integrate SmoothQuant static quantization (#16288)
  • [Neural Insights] Support PyTorch FX inspect tensor and integrate with Neural Insights (775def, 74a785)
  • [Neural Insights] Add step-by-step diagnosis cases (99c3b0)
  • [Neural Solution] Resource management and user-facing API enhancement (fbba10)
  • [Auto CI] Integrate auto CI code scan bug fix tools (f77a2c, 06cc38)

Bug Fixes

  • Fix bugs in PyTorch SmoothQuant (0349b9, 8f3645)
  • Fix pytorch dataloader batch size issue (6a98d0)
  • Fix bugs for ONNX Runtime CUDA EP (a1b566, d1f315)
  • Fix bug in ONNX Runtime adapter where _rename_node function fails with model size > 2 GB (1f6b1a)
  • Fix ONNX Runtime diagnosis bug (f10e26)
  • Update Neural Solution example and fix grpc port issue (528868)
  • Fix the objective initialization issue (9d7546)
  • Fix reshape issue for bayesian strategy (77cb83)
  • Fix CVEs (d86922, 2bbfcd, fc71fa)

Examples

  • Add Weight-Only LLM examples for PyTorch (4b24be, 66f7c1, aa457a)
  • Add Weight-Only LLM examples for ONNX Runtime (10c133)
  • Enable 3 ONNX Runtime examples, CodeBert (5e584e), LayoutLMv2 FUNSD (5f0b17), Table Transformer (eb8a95)
  • Add ONNX Runtime LLM SmoothQuant example Llama-7B (7fbcf5)
  • Enable 2 TensorFlow examples, ViT (94df99), GraphSage (29ec82)
  • Add easy get started notebooks (d7b608, 6ee846)
  • Add multi-cards magnitude pruning use case (909618)
  • Unify ONNX Runtime prepare model scripts (5ecb13)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04
  • Python 3.7, 3.8, 3.9, 3.10, 3.11
  • TensorFlow 2.11, 2.12, 2.13
  • ITEX 1.1.0, 1.2.0, 2.13.0.0
  • PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.1+cpu
  • ONNX Runtime 1.13.1, 1.14.1, 1.15.1
  • MXNet 1.9.1