Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

包含大量图、表的pdf解析非常慢、卡死 #1315

Open
darvsum opened this issue Dec 18, 2024 · 1 comment
Open

包含大量图、表的pdf解析非常慢、卡死 #1315

darvsum opened this issue Dec 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@darvsum
Copy link

darvsum commented Dec 18, 2024

Description of the bug | 错误描述

上传标准pdf文件(非扫描)大小20多M,页面300多页。包含大量图表和表格,在线上和线下测试时超过半小时多没法解析出结果。
1734490076642
1734490155027

How to reproduce the bug | 如何复现

上传多图表的招股意向书文档

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.10.x

Device mode | 设备模式

cuda

@darvsum darvsum added the bug Something isn't working label Dec 18, 2024
@myhloli
Copy link
Collaborator

myhloli commented Dec 18, 2024

用modelscope或者huggingface解析10页看看需要多少时间

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants