docs(blog): Add blog of how to use DirectML to accelerate inference

RapidAI · Jul 13, 2024 · ee309af · ee309af
1 parent 2638955
commit ee309af
Show file tree

Hide file tree

Showing 18 changed files with 167 additions and 64 deletions.
diff --git a/docs/blog/posts/about_model/convert_model.md b/docs/blog/posts/about_model/convert_model.md
@@ -17,9 +17,11 @@ comments: true
     不支持<strong>slim量化版</strong>的模型转换
 
 ### 简介
+
 - PaddleOCR项目模型转换，主要借助[`paddle2onnx`](https://github.com/PaddlePaddle/Paddle2ONNX)库实现。针对PaddleOCR中涉及到的相关模型，直接转换并不太方便。因此，推出了[PaddleOCRModelConverter](https://github.com/RapidAI/PaddleOCRModelConverter)转换工具。
 
 ### 在线转换
+
 - [魔搭](https://www.modelscope.cn/studios/liekkas/PaddleOCRModelConverter/summary)
 - [Hugging Face](https://huggingface.co/spaces/SWHL/PaddleOCRModelConverter)
 
@@ -28,4 +30,3 @@ comments: true
 </div>
 
 ### [离线转换](https://github.com/RapidAI/PaddleOCRModelConverter)
-
diff --git a/docs/blog/posts/about_model/custom_different_model.md b/docs/blog/posts/about_model/custom_different_model.md
@@ -10,14 +10,14 @@ comments: true
 
 > 本文详尽地给出了如何更换其他检测和识别模型的保姆级教程。
 
-
 <!-- more -->
 
 !!! note
 
     建议用`rapidocr_onnxruntime>=1.3.x`版本来加载PaddleOCR v3/v4版本训练所得模型。
 
 #### 引言
+
 `rapidocr`系列库中默认打包了轻量版的中英文检测和识别模型，这种配置可以覆盖到大部分场景。但是也总会有一些其他场景，要用到其他检测和识别模型。
 
 这一点在设计时已经做了考虑，留出了接口，这个博客就是以如何更换`rapidocr_onnxruntime`的识别模型为**英文和数字的识别模型**为例做讲解，其他模型同理。
@@ -26,6 +26,7 @@ comments: true
     检测模型，对应模型路径参数为`det_model_path`<br/>识别模型，对应模型路径参数为`rec_model_path` <br/> 详细说明参见：[link](../../../install_usage/rapidocr/usage.md)
 
 #### 1. 安装`rapidocr_onnxruntime`
+
 请先根据教程，装好`rapidocr_onnxruntime`库，具体可参考：[link](../../../install_usage/rapidocr/install.md)
 
 #### 2. 获得英文和数字的ONNX识别模型
@@ -47,6 +48,7 @@ comments: true
 **在线转换:** 基于[PaddleOCRModelConvert](https://huggingface.co/spaces/SWHL/PaddleOCRModelConverter)工具得到`en_PP-OCRv4_rec_infer.onnx`模型
 
 #### 3. 使用该模型
+
 ```python linenums="1"
 from rapidocr_onnxruntime import RapidOCR
 
@@ -57,4 +59,3 @@ result, elapse = model(img_path)
 print(result)
 print(elapse)
 ```
-
diff --git a/docs/blog/posts/about_model/download_onnx.md b/docs/blog/posts/about_model/download_onnx.md
@@ -10,12 +10,11 @@ comments: true
 <!-- more -->
 
 ### 简介
+
 该部分将已经转换好的ONNX模型做了整理，提供了Hugging Face、Google网盘和百度网盘两个下载途径。
 
 ### [Hugging Face Models](https://huggingface.co/SWHL/RapidOCR/tree/main)
 
 ### [Google网盘](https://drive.google.com/drive/folders/1x_a9KpCo_1blxH1xFOfgKVkw1HYRVywY?usp=sharing)
 
 ### [百度网盘](https://pan.baidu.com/s/1CHOXNJLZundoV_8bNpcpWQ?pwd=9h6g)
-
-
diff --git a/docs/blog/posts/about_model/model_summary.md b/docs/blog/posts/about_model/model_summary.md
@@ -12,6 +12,7 @@ comments: true
 <!-- more -->
 
 #### 引言
+
 目前，开源的项目中有很多OCR模型，但是没有一个统一的基准来衡量哪个是更好一些的。
 
 面对这么多的模型，让我们有些不知所措。为此，最近一段时间以来，我一直想要构建这样一个基准。现在来看，已经初步具有雏形。
@@ -34,8 +35,8 @@ comments: true
 
 对应模型下载地址，参见：[link](./download_onnx.md)。
 
-
 #### 已知开源OCR项目
+
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
 - [EasyOCR](https://github.com/JaidedAI/EasyOCR)
 - [MMOCR](https://github.com/open-mmlab/mmocr/blob/main/README_zh-CN.md)
@@ -44,8 +45,8 @@ comments: true
 - [mindocr](https://github.com/mindspore-lab/mindocr)
 - [surya](https://github.com/VikParuchuri/surya)
 
-
 #### 文本检测模型
+
 评测依赖仓库：
 
 - `rapidocr_onnxruntime==1.3.16`: [link](https://github.com/RapidAI/RapidOCR)
@@ -67,7 +68,6 @@ comments: true
 |     [读光-文字检测-DBNet行检测模型-中英-通用领域](https://www.modelscope.cn/models/iic/cv_resnet18_ocr-detection-db-line-level_damo/summary)      |     47.2M      |  0.7749  | 0.8167 | 0.7952 |   0.4121   |
 |     [读光-文字检测-行检测模型-中英-通用领域](https://modelscope.cn/models/iic/cv_resnet18_ocr-detection-line-level_damo/summary) 未跑通     |     312M      |  -  | - | - |   -  |
 
-
 不同推理引擎下，效果比较：
 
 |推理引擎|                       模型                       | 模型大小 | Precision | Recall | H-mean | Speed(s/img) |
@@ -77,7 +77,9 @@ comments: true
 |rapidocr_paddle==1.3.18 | ch_PP-OCRv4_det_infer.onnx|   4.5M   |  0.8301   | 0.8659 | 0.8476 | 0.9924       |
 
 #### 文本识别模型
+
 评测依赖仓库：
+
 - `rapidocr_onnxruntime==1.3.16`: [link](https://github.com/RapidAI/RapidOCR)
 - 计算指标库 TextRecMetric: [link](https://github.com/SWHL/TextRecMetric)
 - 测试集 text_rec_test_dataset: [link](https://huggingface.co/datasets/SWHL/text_rec_test_dataset)
@@ -96,7 +98,6 @@ comments: true
 |[读光-文字识别-CRNN模型-中英-通用领域](https://www.modelscope.cn/models/iic/cv_crnn_ocr-recognition-general_damo/summary)  |  -    |  46M  |       0.5935      |     0.7671     | - |
 |[OFA文字识别-中文-通用场景-base](https://www.modelscope.cn/models/iic/ofa_ocr-recognition_general_base_zh/summary) 未跑通 |  -    |  -  |       -      | -  | - |
 
-
 不同推理引擎下，效果比较：
 
 |           推理引擎           |            模型            | 模型大小 | Exact Match | Char Match | Speed(s/img) |
@@ -106,10 +107,11 @@ comments: true
 |   rapidocr_paddle==1.3.18    | ch_PP-OCRv4_rec_infer.onnx |   10M   |  0.8323   | 0.9355 | 0.6836 |
 
 - 输入Shape:
-  - v2: `[3, 32, 320]`
-  - v3~v4: `[3, 48, 320]`
+    - v2: `[3, 32, 320]`
+    - v3~v4: `[3, 48, 320]`
 
 - 不同模型，实例化示例如下：
+
   ```python  linenums="1"
   from rapidocr_onnxruntime import RapidOCR
 
@@ -124,4 +126,3 @@ comments: true
     rec_img_shape=[3, 32, 320],
   )
   ```
-
diff --git a/docs/blog/posts/about_model/support_language.md b/docs/blog/posts/about_model/support_language.md
@@ -10,16 +10,17 @@ comments: true
 <!-- more -->
 
 ### 简介
+
 - 因为本项目依托于PaddleOCR，所以理论上PaddleOCR支持识别的模型，RapidOCR都是支持的。
 
 ### 中英文检测和识别（可以直接使用）
+
 - 因为中英文是最为常用的模型，所以在打包时，就默认将中英文识别的模型放到了`rapidocr_onnxruntime`和`rapidocr_openvino`中，直接pip安装即可使用。
 
 ### 其他语种检测和识别（需要转换）
+
 - PaddleOCR中已有文本检测模型列表：[link](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md#1-%E6%96%87%E6%9C%AC%E6%A3%80%E6%B5%8B%E6%A8%A1%E5%9E%8B)
 - PaddleOCR已有文本识别模型列表： [link](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/models_list.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B)
 - 除了slim量化版的模型，上面链接中的其他模型都可以转换为ONNX格式，通过RapidOCR快速部署。
 
 ### [转换教程](./convert_model.md)
-
-
diff --git a/docs/blog/posts/bath_inference.md b/docs/blog/posts/bath_inference.md
@@ -11,7 +11,6 @@ draft: true
 
 <!-- more -->
 
-
 ### 引言
-近来，有一些小伙伴总是询问是否可以batch推理加速。之所以这么问，缘由是单张推理还是太慢。
 
+近来，有一些小伙伴总是询问是否可以batch推理加速。之所以这么问，缘由是单张推理还是太慢。
diff --git a/docs/blog/posts/faq.md b/docs/blog/posts/faq.md
@@ -11,58 +11,75 @@ comments: true
 <!-- more -->
 
 #### Q: 为什么我的模型在ONNXRuntime GPU版上比在CPU上还要慢？
+
 **A:** 因为OCR任务中输入图像Shape是动态的。每次GPU上都需要重新清空上一次不同Shape的缓存结果。如果输入图像Shape不变的情况下，ONNXRuntime GPU版一般都要比CPU快的。该问题已经提了相关issue（[issue #13198](https://github.com/microsoft/onnxruntime/issues/13198)）。
 
 推荐CPU端推理用`rapidocr_onnxruntime`或者`rapidocr_openvino`，GPU端用`rapidocr_paddle`。关于`rapidocr_onnxruntime`和`rapidocr_paddle`两者之间推理，可参见：[docs](../../install_usage/rapidocr_paddle.md#推理速度比较)
 
 #### Q: 请问这个能在32位C#中用嘛?
+
 **A:** C#可以32位，要用32位的dll，但nuget上的onnxruntime不支持win7。
 
 #### Q: Windows系统下，装完环境之后，运行示例程序之后，报错OSError: [WinError 126] 找不到指定的模組
+
 **A:** 原因是Shapely库没有正确安装，如果是在Windows，可以在[Shapely whl](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)下载对应的whl包，离线安装即可；另外一种解决办法是用conda安装也可。
 
 #### Q: Linux部署python的程序时，`import cv2`时会报`ImportError: ligGL.so.1: cannot open shared object file: No such file or directory`?
+
 **A:** [解决方法](https://stackoverflow.com/a/63978454/3335415
 ) 有两个(来自群友ddeef)：
+
   1. 安装`opencv-python-headless`取代`opencv-python`;
   2. 运行`sudo apt-get install -y libgl1-mesa-dev`
 
 #### Q: 询问下，我编译出来的进程在win7下面通过cmd调用，发生了崩溃的情况?
+
 **A:** 不支持win7 (by @如果我有時光機)
 
 #### Q: 能不能搞个openmmlab类似的那个提取信息的?
+
 **A:** 这个目前正在调研测试当中，如果mmocr中关键信息提取效果还可以，后期会考虑整合进来。
 
 #### Q: RapidOCR和PaddleOCR是什么关系呢？
+
 **A:** RapidOCR是将PaddleOCR的预训练模型转为onnx模型，不依赖paddle框架，方便各个平台部署。
 
 #### Q: onnxruntime arm32 有人编译过吗？ 我编译成功了，但是使用的时候libonnxruntime.so:-1: error: file not recognized: File format not recognized  应该是版本不匹配
+
 **A:** 没遇到过。我是直接在当前平台编译的，我们用的是arm。估计是平台不兼容,建议在本身平台上编译。没遇到过问题。通常出在交叉编译方式下。
 
 #### Q: 请问一下c++ demo必须要vs2017及以上版本吗?
+
 **A:** 最好用vs2019
 
 #### Q: 可以达到百度EasyEdge Free App的效果吗？
+
 **A:** edge的模型应该没有开源。百度开源的模型里server det的识别效果可以达到，但是模型比较大。
 
 #### Q: 我用c++推理onnx貌似是cpu推理的，gpu没有反应?
+
 **A:** 如果想用GPU的话，需要安装onnxruntime-gpu版，自己在onnxruntime的代码中添加EP (execution provider)。我们的定位是通用，只用cpu推理。
 
 #### Q: 您好，我想部署下咱们的ocr识别，有提供linux版本的ocr部署包吗?
-**A:** linux版本的自己编译即可, 可以参考我们的action中的脚本；其实编译非常容易，安装个opencv后，在cmakelists.txt中修改一下onnxruntime的路径即可，具体参考这个： https://github.com/RapidOCR/RapidOCR/blob/main/.github/workflows/make-linux.yml
+
+**A:** linux版本的自己编译即可, 可以参考我们的action中的脚本；其实编译非常容易，安装个opencv后，在cmakelists.txt中修改一下onnxruntime的路径即可，具体参考这个： <https://github.com/RapidOCR/RapidOCR/blob/main/.github/workflows/make-linux.yml>
 
 #### Q: onnxruntime编译好的C++库，哪里可以下载到？
-**A:** 从这里：https://github.com/RapidOCR/OnnxruntimeBuilder/releases/tag/1.7.0
+
+**A:** 从这里：<https://github.com/RapidOCR/OnnxruntimeBuilder/releases/tag/1.7.0>
 
 #### Q: 目前简单测试环境是  Win10 + Cygwin + gcc + 纯C编程，可以在C程序中直接接入简单OCR功能吗？
+
 **A:** 直接使用API就行，API就是由c导出的
 
 #### Q: 模型下载地址
+
 **A:** [百度网盘](https://pan.baidu.com/s/1PTcgXG2zEgQU6A_A3kGJ3Q?pwd=jhai) | [Google Drive](https://drive.google.com/drive/folders/1x_a9KpCo_1blxH1xFOfgKVkw1HYRVywY?usp=sharing)
 
 #### Q: onnxruntime 1.7 下出错：onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running ScatterND node. Name:'ScatterND@1' Status Message: updates
+
 **A:** 由于模型只支持`onnxruntime=1.5.0`导致，请更新模型,下载地址见`Q3`
 
 #### Q: 边缘总有一行文字无法识别，怎么办？
-**A:** 在 padding 参数中添加一个值 ，默认是0,你可以添加5或10, 甚至更大，直到能识别为止。注意不要添加过大，会浪费内存。
 
+**A:** 在 padding 参数中添加一个值 ，默认是0,你可以添加5或10, 甚至更大，直到能识别为止。注意不要添加过大，会浪费内存。
diff --git a/docs/blog/posts/how_to_use_directml.md b/docs/blog/posts/how_to_use_directml.md
@@ -0,0 +1,73 @@
+---
+title: 如何使用DirectML加速推理？
+date: 2024-07-13
+authors: [SWHL]
+categories:
+  - General
+comments: true
+---
+
+本篇文章主要介绍一下DirectML，以及它在OCR推理过程中是如何使用的。
+
+<!-- more -->
+
+### DirectML是什么？[^microsoft]
+
+直接机器学习 (DirectML) 是机器学习 (ML) 的低级 API。 API 具有常见的（本机 C++、nano-COM）编程接口和 DirectX 12 样式的工作流。 可将机器学习推断工作负荷集成到游戏、引擎、中间件、后端或其他应用程序中。 所有与 DirectX 12 兼容的硬件都支持 DirectML。
+
+硬件加速的机器学习基元（称为“运算符”）是 DirectML 的构建基块。 在这些构建基块中，可以开发纵向扩展、抗锯齿和样式转移等机器学习技术。 例如，使用噪声抑制和超解析度，可以实现令人印象深刻的光线跟踪效果且可以减少每个像素的光线。
+
+可将机器学习推断工作负荷集成到游戏、引擎、中间件、后端或其他应用程序中。 DirectML 提供用户熟悉的（本机C++、nano-COM）DirectX 12 式编程接口和工作流，且受所有 DirectX 12 兼容硬件的支持。 有关 DirectML 示例应用程序（包括精简 DirectML 应用程序的示例），请参阅 [DirectML 示例应用程序](https://learn.microsoft.com/zh-cn/windows/ai/directml/dml-min-app)。
+
+**DirectML 是在 Windows 10 版本 1903 和 Windows SDK 的相应版本中引入的。**
+
+### RapidOCR下如何使用DirectML加速呢？
+
+目前在`rapidocr_onnxruntime>=1.3.23`中，配置了使用DirectML的开关。在满足一定条件后，可以正常使用DirectML加速推理OCR。
+
+要想使用DirectML加速，需要满足以下条件：
+
+- [x] 设备系统要大于等于Windows 10 版本 1903
+- [x] 安装`rapidocr_onnxruntime>=1.3.23`版本
+- [x] 安装`onnxruntime-directml`包
+
+### 具体使用教程
+
+首先需要确定自己设备是Windows系统，且版本要大于等于Window 10 1903
+
+#### 安装`rapidocr_onnxruntime>=1.3.23`
+
+```bash
+pip install rapidocr_onnxruntime
+```
+
+#### 安装`onnxruntime-directml`
+
+```bash
+# 首先卸载上一步默认安装的onnxruntime
+pip uninstall onnxruntime
+
+# 安装onnxruntime-directml
+pip install onnxruntime-directml
+```
+
+#### Python使用
+
+```python linenums="1"
+from rapidocr_onnxruntime import RapidOCR
+
+engine = RapidOCR()
+
+img_path = 'tests/test_files/ch_en_num.jpg'
+
+# 默认都为False
+result, elapse = engine(img_path, det_use_dml=True, cls_use_dml=True, rec_use_dml=True)
+print(result)
+print(elapse)
+```
+
+### 有关DirectML的讨论
+
+- [Discussions #175](https://github.com/RapidAI/RapidOCR/discussions/175)
+
+[^microsoft]: <https://learn.microsoft.com/zh-cn/windows/ai/directml/dml>
diff --git a/docs/blog/posts/inference_engine/onnxruntime/infer_optim.md b/docs/blog/posts/inference_engine/onnxruntime/infer_optim.md
@@ -35,11 +35,11 @@ sess_options.enable_cpu_mem_arena = False
 
 - 作用：启用CPU上的**memory arena**。Arena可能会为将来预先申请很多内存。如果不想使用它，可以设置为`enable_cpu_mem_area=False`，默认是`True`
 - 结论：建议关闭
-  - 开启之后，占用内存会剧增（5618.3M >> 5.3M），且持续占用，不释放；推理时间只有大约13%提升
+    - 开启之后，占用内存会剧增（5618.3M >> 5.3M），且持续占用，不释放；推理时间只有大约13%提升
 
 - 测试环境：
-  - Python: 3.7.13
-  - ONNXRuntime: 1.14.1
+    - Python: 3.7.13
+    - ONNXRuntime: 1.14.1
 - 测试代码（来自[issue 11627](https://github.com/microsoft/onnxruntime/issues/11627)，[enable_cpu_memory_area_example.zip](https://github.com/microsoft/onnxruntime/files/8772315/enable_cpu_memory_area_example.zip)）
 
     ```python linenums="1"
@@ -72,7 +72,7 @@ sess_options.enable_cpu_mem_arena = False
 - Windows端 | Mac端 | Linux端 测试情况都大致相同
     <details>
 
-  - `enable_cpu_mem_arena=True`
+    - `enable_cpu_mem_arena=True`
 
         ```bash linenums="1"
         (demo) PS G:> python .\test_enable_cpu_mem_arena.py
@@ -90,7 +90,7 @@ sess_options.enable_cpu_mem_arena = False
             12   5695.5 MiB      0.0 MiB           1       return preds
         ```
 
-  - `enable_cpu_mem_arena=False`
+    - `enable_cpu_mem_arena=False`
 
         ```bash linenums="1"
         (demo) PS G:> python .\test_enable_cpu_mem_arena.py

diff --git a/docs/blog/posts/inference_engine/openvino/infer.md b/docs/blog/posts/inference_engine/openvino/infer.md
@@ -46,7 +46,7 @@ $ pip install openvino-dev
 
 - 和ONNXRuntime同时推理同一个ONNX模型，OpenVINO推理速度更快
 - 但是从对比来看，OpenVINO占用内存更大，其原因是拿空间换的时间
-  - 当指定`input_shape`在一个区间范围时，推理时内存占用会减少一些。示例命令:
+    - 当指定`input_shape`在一个区间范围时，推理时内存占用会减少一些。示例命令:
 
     ```bash linenums="1"
     mo --input_model models/ch_PP-OCRv2_det_infer.onnx \