Skip to content

Intel® Neural Compressor v2.4.1 Release

Compare
Choose a tag to compare
@chensuyue chensuyue released this 29 Dec 13:12
· 514 commits to master since this release
b8c7f1a
  • Improvement
  • Bug Fixes
  • Examples
  • Validated Configurations

Improvement

  • Narrow down the tuning space of SmoothQuant auto-tune (9600e1)
  • Support ONNXRT Weight-Only Quantization with different dtypes (5119fc)
  • Add progress bar for ONNXRT Weight-Only Quantization and SmoothQuant (4d26e3)

Bug Fixes

  • Fix SmoothQuant alpha-space generation (33ece9)
  • Fix inputs error for SmoothQuant example_inputs (39f63a)
  • Fix LLMs accuracy regression with IPEX 2.1.100 (3cb6d3)
  • Fix quantizable add ops detection on IPEX backend (4c004d)
  • Fix range step bug in ORTSmoothQuant (40275c)
  • Fix unit test bugs and update CI versions (6c78df, 835805)
  • Fix notebook issues (08221e)

Examples

  • Add verified LLMs list and recipes for SmoothQuant and Weight-Only Quantization (f19cc9)
  • Add code-generaion evaluation for Weight-Only Quantization GPTQ (763440)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04
  • Python 3.10
  • TensorFlow 2.14
  • ITEX 2.14.0.1
  • PyTorch/IPEX 2.1.0
  • ONNX Runtime 1.16.3