Release v1.2.1 · huawei-noah/bolt

Support more graph optimizations : Convolution+Convolution, LayerNorm
Support more operators: ROIAlign, GenerateProposals, Reciprocal, Not, Log, ReductionL2, InstanceNorm, Expand, Gather, Scatter
Support more operators(PReLU) process NCHW input data.
Support ONNX share weight between Linear, MatMul, Gemm and Gather
Support more networks on CPU: vision transformer(ViT, TNT), recommendation networks
Support more networks on GPU : ASR, Faster_RCNN
Support Armv7 int8 to accelerate NLP network(50%+ speed-up)
Support X86 AVX512 int8 to accelerate NLP network(3x+ speed-up)
Support using image on Qualcomm GPU, add GPU image manage methods
Improve inference performance on Qualcomm GPU
Add more kit android/iOS demos : Chinese ASR, Face Detection, Sentiment Analysis
Try to bind core when using GPU

Replace mali option with gpu in install shell script, and remove default target option setting
Change data format NCWHC4 TO NCHWC4 for GPU
Simplified tensor padding method with OclMemory for GPU
Tool preprocess_ocl produces algofile and xxxlib.so before, for now algofile has been packaged into this xxxlib.so
Add BNN_FP16 option in X2bolt tool to convert ONNX 1-bit model
Replace original INT8 option with INT8_FP16 in post_training_quantization tool to convert int8+float16 hybrid inference model, and add INT8_FP32 option to convert int8+float32 hybrid inference model.
Add shell environment variable BOLT_INT8_STORAGE_ERROR_THRESHOLD to control post_training_quantization convert int8 model, default value is 0.002. post_training_quantization will use int8 storage when when quantization error lower than BOLT_INT8_STORAGE_ERROR_THRESHOLD.

Provide feedback