We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我使用MinerU版本1.0.5进行PDF转换(包括公式和表格等),在单卡A100 GPU下设置workers_per_device=4时,处理速度约为每分钟2篇。然而,当尝试使用多卡开启多个端口,或将workers_per_device设置为更大的值时,然后每次单独用一个程序把文件发到对应的端口之后,处理速度不仅未提升,反而下降到所有卡一共每分钟1篇(不是单卡一分钟一篇)。
通过查看具体日志发现,在批量处理时开启多个端口会导致单页处理速度变慢。但我看之前的issue,好像1.0.x版本似乎不支持服务器端在多张GPU上并行处理(启动服务器之后只能在单卡上运行),而0.9.x版本是支持的。
请问有什么方法可以优化当前设置,充分利用多GPU资源以提升处理速度?感谢!
我的server.py:
import os import fitz import torch import base64 import litserve as ls from uuid import uuid4 from fastapi import HTTPException from filetype import guess_extension from magic_pdf.tools.common import do_parse from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton class MinerUAPI(ls.LitAPI): def __init__(self, output_dir='/tmp'): self.output_dir = output_dir def setup(self, device): model_manager = ModelSingleton() model_manager.get_model(True, False) model_manager.get_model(False, False) print(f'Model initialization complete on {device}!') def decode_request(self, request): # 解码文件并提取文件名 file = request['file'] file = self.to_pdf(file) file_name = request.get('file_name', str(uuid4())) # 从请求中获取文件名,如果没有则生成UUID opts = request.get('kwargs', {}) opts.setdefault('debug_able', False) opts.setdefault('parse_method', 'auto') return file, opts, file_name def predict(self, inputs): try: pdf_bytes, parse_opts, pdf_name = inputs do_parse(self.output_dir, pdf_name, pdf_bytes, [], **parse_opts) # 使用pdf_name作为存储目录和文件名 return pdf_name except Exception as e: raise HTTPException(status_code=500, detail=str(e)) finally: self.clean_memory() def encode_response(self, response): # 返回文件存储的目录信息 return {'output_dir': os.path.join(self.output_dir, response)} def clean_memory(self): import gc if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.ipc_collect() gc.collect() def to_pdf(self, file_base64): try: file_bytes = base64.b64decode(file_base64) file_ext = guess_extension(file_bytes) with fitz.open(stream=file_bytes, filetype=file_ext) as f: if f.is_pdf: return f.tobytes() return f.convert_to_pdf() except Exception as e: raise HTTPException(status_code=500, detail=str(e)) if __name__ == '__main__': os.environ['CUDA_VISIBLE_DEVICES'] = '2' print(f"CUDA_VISIBLE_DEVICES set to: {os.environ['CUDA_VISIBLE_DEVICES']}") server = ls.LitServer( MinerUAPI(output_dir='./tmp1'), accelerator='cuda', devices=[0], # 逻辑设备 0,对应物理设备 2 workers_per_device=4, timeout=False ) server.run(port=8000)
Linux
3.10
0.10.x
cuda
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Description of the bug | 错误描述
我使用MinerU版本1.0.5进行PDF转换(包括公式和表格等),在单卡A100 GPU下设置workers_per_device=4时,处理速度约为每分钟2篇。然而,当尝试使用多卡开启多个端口,或将workers_per_device设置为更大的值时,然后每次单独用一个程序把文件发到对应的端口之后,处理速度不仅未提升,反而下降到所有卡一共每分钟1篇(不是单卡一分钟一篇)。
通过查看具体日志发现,在批量处理时开启多个端口会导致单页处理速度变慢。但我看之前的issue,好像1.0.x版本似乎不支持服务器端在多张GPU上并行处理(启动服务器之后只能在单卡上运行),而0.9.x版本是支持的。
请问有什么方法可以优化当前设置,充分利用多GPU资源以提升处理速度?感谢!
How to reproduce the bug | 如何复现
我的server.py:
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.10.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: