Intel® Neural Compressor v2.4.1 Release

chensuyue released this 29 Dec 13:12

· 514 commits to master since this release

b8c7f1a

Improvement
Bug Fixes
Examples
Validated Configurations

Improvement

Narrow down the tuning space of SmoothQuant auto-tune (9600e1)
Support ONNXRT Weight-Only Quantization with different dtypes (5119fc)
Add progress bar for ONNXRT Weight-Only Quantization and SmoothQuant (4d26e3)

Bug Fixes

Fix SmoothQuant alpha-space generation (33ece9)
Fix inputs error for SmoothQuant example_inputs (39f63a)
Fix LLMs accuracy regression with IPEX 2.1.100 (3cb6d3)
Fix quantizable add ops detection on IPEX backend (4c004d)
Fix range step bug in ORTSmoothQuant (40275c)
Fix unit test bugs and update CI versions (6c78df, 835805)
Fix notebook issues (08221e)

Examples

Add verified LLMs list and recipes for SmoothQuant and Weight-Only Quantization (f19cc9)
Add code-generaion evaluation for Weight-Only Quantization GPTQ (763440)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.10
TensorFlow 2.14
ITEX 2.14.0.1
PyTorch/IPEX 2.1.0
ONNX Runtime 1.16.3

Assets 2