Skip to content

v1.2.1

Compare
Choose a tag to compare
@jianfeifeng jianfeifeng released this 01 Oct 15:45
· 112 commits to master since this release

Added

  • Support more graph optimizations : Convolution+Convolution, LayerNorm
  • Support more operators: ROIAlign, GenerateProposals, Reciprocal, Not, Log, ReductionL2, InstanceNorm, Expand, Gather, Scatter
  • Support more operators(PReLU) process NCHW input data.
  • Support ONNX share weight between Linear, MatMul, Gemm and Gather
  • Support more networks on CPU: vision transformer(ViT, TNT), recommendation networks
  • Support more networks on GPU : ASR, Faster_RCNN
  • Support Armv7 int8 to accelerate NLP network(50%+ speed-up)
  • Support X86 AVX512 int8 to accelerate NLP network(3x+ speed-up)
  • Support using image on Qualcomm GPU, add GPU image manage methods
  • Improve inference performance on Qualcomm GPU
  • Add more kit android/iOS demos : Chinese ASR, Face Detection, Sentiment Analysis
  • Try to bind core when using GPU

Changed

  • Replace mali option with gpu in install shell script, and remove default target option setting
  • Change data format NCWHC4 TO NCHWC4 for GPU
  • Simplified tensor padding method with OclMemory for GPU
  • Tool preprocess_ocl produces algofile and xxxlib.so before, for now algofile has been packaged into this xxxlib.so
  • Add BNN_FP16 option in X2bolt tool to convert ONNX 1-bit model
  • Replace original INT8 option with INT8_FP16 in post_training_quantization tool to convert int8+float16 hybrid inference model, and add INT8_FP32 option to convert int8+float32 hybrid inference model.
  • Add shell environment variable BOLT_INT8_STORAGE_ERROR_THRESHOLD to control post_training_quantization convert int8 model, default value is 0.002. post_training_quantization will use int8 storage when when quantization error lower than BOLT_INT8_STORAGE_ERROR_THRESHOLD.

Fixed

  • Fix PReLU 2d, 3d support
  • Fix Resize bug on some mode
  • Fix ONNX converter read Squeeze, UnSqueeze, Deconv parameter bug
  • Fix Arm Sigmoid precision
  • Fix ONNX RNN optimizer, and add support for NCHWC8 input data
  • Fix Concat with weight tensor in onnx converter
  • Simplify C API example