多GPU并行处理性能下降问题 #1333

Fuyujia799 · 2024-12-20T00:21:57Z

Description of the bug | 错误描述

我使用MinerU版本1.0.5进行PDF转换（包括公式和表格等），在单卡A100 GPU下设置workers_per_device=4时，处理速度约为每分钟2篇。然而，当尝试使用多卡开启多个端口，或将workers_per_device设置为更大的值时，然后每次单独用一个程序把文件发到对应的端口之后，处理速度不仅未提升，反而下降到所有卡一共每分钟1篇（不是单卡一分钟一篇）。

通过查看具体日志发现，在批量处理时开启多个端口会导致单页处理速度变慢。但我看之前的issue，好像1.0.x版本似乎不支持服务器端在多张GPU上并行处理（启动服务器之后只能在单卡上运行），而0.9.x版本是支持的。

请问有什么方法可以优化当前设置，充分利用多GPU资源以提升处理速度？感谢！

How to reproduce the bug | 如何复现

我的server.py:

import os
import fitz
import torch
import base64
import litserve as ls
from uuid import uuid4
from fastapi import HTTPException
from filetype import guess_extension
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    def setup(self, device):
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete on {device}!')

    def decode_request(self, request):
        # 解码文件并提取文件名
        file = request['file']
        file = self.to_pdf(file)
        file_name = request.get('file_name', str(uuid4()))  # 从请求中获取文件名，如果没有则生成UUID
        opts = request.get('kwargs', {})
        opts.setdefault('debug_able', False)
        opts.setdefault('parse_method', 'auto')
        return file, opts, file_name

    def predict(self, inputs):
        try:
            pdf_bytes, parse_opts, pdf_name = inputs
            do_parse(self.output_dir, pdf_name, pdf_bytes, [], **parse_opts)  # 使用pdf_name作为存储目录和文件名
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=str(e))
        finally:
            self.clean_memory()

    def encode_response(self, response):
        # 返回文件存储的目录信息
        return {'output_dir': os.path.join(self.output_dir, response)}

    def clean_memory(self):
        import gc
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            torch.cuda.ipc_collect()
        gc.collect()

    def to_pdf(self, file_base64):
        try:
            file_bytes = base64.b64decode(file_base64)
            file_ext = guess_extension(file_bytes)
            with fitz.open(stream=file_bytes, filetype=file_ext) as f:
                if f.is_pdf:
                    return f.tobytes()
                return f.convert_to_pdf()
        except Exception as e:
            raise HTTPException(status_code=500, detail=str(e))


if __name__ == '__main__':
    os.environ['CUDA_VISIBLE_DEVICES'] = '2'
    print(f"CUDA_VISIBLE_DEVICES set to: {os.environ['CUDA_VISIBLE_DEVICES']}")

    server = ls.LitServer(
        MinerUAPI(output_dir='./tmp1'),
        accelerator='cuda',
        devices=[0],  # 逻辑设备 0，对应物理设备 2
        workers_per_device=4,
        timeout=False
    )
    server.run(port=8000)

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.10.x

Device mode | 设备模式

cuda

The text was updated successfully, but these errors were encountered:

Fuyujia799 added the bug Something isn't working label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多GPU并行处理性能下降问题 #1333

多GPU并行处理性能下降问题 #1333

Fuyujia799 commented Dec 20, 2024 •

edited

Loading

多GPU并行处理性能下降问题 #1333

多GPU并行处理性能下降问题 #1333

Comments

Fuyujia799 commented Dec 20, 2024 • edited Loading

Description of the bug | 错误描述

How to reproduce the bug | 如何复现

Operating system | 操作系统

Python version | Python 版本

Software version | 软件版本 (magic-pdf --version)

Device mode | 设备模式

Fuyujia799 commented Dec 20, 2024 •

edited

Loading