识别多出字符 #317

hbh112233abc · 2025-01-02T10:30:30Z

问题描述 / Problem Description

这个图片识别后多了一个'2'的字符,第一个'2'的字符多出来

time used:3.3362183570861816s
(
 [
    [[[0.0, 9.0], [23.0, 7.0], [25.0, 43.0], [3.0, 44.0]], '2', 0.9986347556114197], 
    [[[4.0, 0.0], [108.0, 0.0], [108.0, 56.0], [4.0, 56.0]], '2025', 0.9997337460517883], 
    [[[143.0, 10.0], [196.0, 1.0], [203.0, 42.0], [150.0, 51.0]], '01', 0.9996651709079742], 
    [[[221.0, 3.0], [278.0, 3.0], [278.0, 48.0], [221.0, 48.0]], '02', 0.9998335540294647]
], 
[2.569706, 0.08697724342346191, 0.5999832153320312]
)

运行环境 / Runtime Environment

win10
python3.9
rapidocr-onnxruntime 1.4.3

复现代码 / Reproduction Code

import time
from rapidocr_onnxruntime import RapidOCR

rapid_ocr = RapidOCR()
img = "图片路径"
st = time.time()
res = rapid_ocr(img)
et = time.time()
print(f"time used:{et-st}s")
print(res)

可能解决方案 / Possible solutions

直接用paddleocr v2.9.1 v4 mobile 模型识别没问题的

hbh112233abc · 2025-01-04T07:04:16Z

问题可能是图片尺寸太小了, 补充白边可以解决这个问题
看了RapidOCR可以传参max_side_len,以为能解决

rapid_ocr = RapidOCR(None,min_side_len=640)

还是有识别出多余字符

现在改为以下先预处理图片,当图片小于640尺寸,先以背景扩充至640, 然后再ocr识别

# 检查图像尺寸是否小于某个尺寸
height, width, _ = im.shape
min_size = 640
im = cv2.imread("图片路径")
if height < min_size or width < min_size:
    # 计算需要添加的边框大小
    top = max(0, (min_size - height) // 2)
    bottom = max(0, min_size - height - top)
    left = max(0, (min_size - width) // 2)
    right = max(0, min_size - width - left)

    # 获取图像的背景色（假设背景色为图像左上角像素的颜色）
    background_color = [int(x) for x in im[0, 0]]

    # 添加与背景色相同的边框
    im = cv2.copyMakeBorder(
        im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=background_color
    )
rapid_ocr = RapidOCR()
res = rapid_ocr(im)

这样处理后,小尺寸的图片的识别基本上都ok的

SWHL self-assigned this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

识别多出字符 #317

识别多出字符 #317

hbh112233abc commented Jan 2, 2025 •

edited

Loading

hbh112233abc commented Jan 4, 2025

识别多出字符 #317

识别多出字符 #317

Comments

hbh112233abc commented Jan 2, 2025 • edited Loading

问题描述 / Problem Description

运行环境 / Runtime Environment

复现代码 / Reproduction Code

可能解决方案 / Possible solutions

hbh112233abc commented Jan 4, 2025

hbh112233abc commented Jan 2, 2025 •

edited

Loading