Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

一个服务化的可多GPU并行处理的方案(基于LitServe) #667

Closed
randydl opened this issue Sep 27, 2024 · 48 comments
Closed

一个服务化的可多GPU并行处理的方案(基于LitServe) #667

randydl opened this issue Sep 27, 2024 · 48 comments
Labels
enhancement New feature or request

Comments

@randydl
Copy link
Contributor

randydl commented Sep 27, 2024

支持传入jpg、png、pdf路径。批量处理的话大家只需要简单的多线程调用客户端的do_parse函数就可以了,服务端会自动在多个GPU上并行处理。

pip install -U litserve python-multipart filetype
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118

server.py

import torch
import filetype
import json, uuid
import litserve as ls
from unittest.mock import patch
from fastapi import HTTPException
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    @staticmethod
    def clean_memory(device):
        import gc
        if torch.cuda.is_available():
            with torch.cuda.device(device):
                torch.cuda.empty_cache()
                torch.cuda.ipc_collect()
        gc.collect()

    def setup(self, device):
        with patch('magic_pdf.model.doc_analyze_by_custom_model.get_device') as mock_obj:
            mock_obj.return_value = device
            model_manager = ModelSingleton()
            model_manager.get_model(True, False)
            model_manager.get_model(False, False)
            mock_obj.assert_called()
            print(f'Model initialization complete!')

    def decode_request(self, request):
        file = request['file'].file.read()
        kwargs = json.loads(request['kwargs'])
        assert filetype.guess_mime(file) == 'application/pdf'
        return file, kwargs

    def predict(self, inputs):
        try:
            pdf_name = str(uuid.uuid4())
            do_parse(self.output_dir, pdf_name, inputs[0], [], **inputs[1])
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=f'{e}')
        finally:
            self.clean_memory(self.device)

    def encode_response(self, response):
        return {'output_dir': response}


if __name__ == '__main__':
    server = ls.LitServer(MinerUAPI(), accelerator='gpu', devices=[0, 1], timeout=False)
    server.run(port=8000)

client.py

import json
import pymupdf
import requests
import numpy as np
from loguru import logger
from joblib import Parallel, delayed


def to_pdf(file_path):
    with pymupdf.open(file_path) as f:
        if f.is_pdf:
            pdf_bytes = f.tobytes()
        else:
            pdf_bytes = f.convert_to_pdf()
        return pdf_bytes


def do_parse(file_path, url='http://127.0.0.1:8000/predict', **kwargs):
    try:
        kwargs.setdefault('parse_method', 'auto')
        kwargs.setdefault('debug_able', False)

        response = requests.post(url,
            data={'kwargs': json.dumps(kwargs)},
            files={'file': to_pdf(file_path)}
        )

        if response.status_code == 200:
            output = response.json()
            output['file_path'] = file_path
            return output
        else:
            raise Exception(response.text)
    except Exception as e:
        logger.error(f'File: {file_path} - Info: {e}')


if __name__ == '__main__':
    files = ['/tmp/small_ocr.pdf']
    n_jobs = np.clip(len(files), 1, 4)
    results = Parallel(n_jobs, prefer='threads', verbose=10)(
        delayed(do_parse)(p) for p in files
    )
    print(results)
@randydl randydl added the enhancement New feature or request label Sep 27, 2024
@myhloli myhloli pinned this issue Sep 27, 2024
@BlackMoki-bot
Copy link

BlackMoki-bot commented Sep 28, 2024

你好,我在运行代码时,服务器端一直报Exception: Parsing error: 'Layoutlmv3_Predictor' object has no attribute 'parameters',客户端一直报requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/predict
http://127.0.0.1能正常访问,请问这是什么原因呀?跪求大佬指教!

@randydl
Copy link
Contributor Author

randydl commented Sep 30, 2024

你好,我在运行代码时,服务器端一直报Exception: Parsing error: 'Layoutlmv3_Predictor' object has no attribute 'parameters',客户端一直报requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/predicthttp://127.0.0.1能正常访问,请问这是什么原因呀?跪求大佬指教!

看样子是你的处理代码有问题,不是服务的问题

@randydl randydl changed the title 给大家提供一个多GPU并行处理的API调用方案,基于 LitServe (FastAPI) 一个服务化的可多GPU并行处理的实现方案(基于LitServe) Oct 8, 2024
@randydl randydl changed the title 一个服务化的可多GPU并行处理的实现方案(基于LitServe) 一个服务化的可多GPU并行处理的方案(基于LitServe) Oct 8, 2024
@flow3rdown
Copy link

使用这个代码后,表格识别变得巨慢,是什么原因呢?

@randydl
Copy link
Contributor Author

randydl commented Oct 12, 2024

使用这个代码后,表格识别变得巨慢,是什么原因呢?

你不使用服务化的方式,用magic-pdf cli的方式慢吗?

@flow3rdown
Copy link

使用这个代码后,表格识别变得巨慢,是什么原因呢?

你不使用服务化的方式,用magic-pdf cli的方式慢吗?

这样的话速度是正常的,表格识别用的TableMaster

@PoisonousBromineChan
Copy link

代码实际上没看懂咋用,就习惯性地先开server.py,把client.py里面的文件路径改成自己的再启动。结果发现报错和small_ocr.pdf有关,明明我要处理的文件都没有small_ocr.pdf了,不知道如何解决。
有没有简单一点的方法,比如直接改magic-pdf.json?把里面设备一栏改成多CUDA的?

@randydl
Copy link
Contributor Author

randydl commented Oct 16, 2024

应该是你的代码改错了吧,我这边正常运行,改了文件路径怎么可能还有small_ocr.pdf,这只是个example file @PoisonousBromineChan

@flow3rdown
Copy link

应该是你的代码改错了吧,我这边正常运行,改了文件路径怎么可能还有small_ocr.pdf,这只是个example file @PoisonousBromineChan

请问您这边跑的时候表格识别速度正常吗?

@ywh-my
Copy link

ywh-my commented Oct 18, 2024

感谢,跑通了。额外安装库 pip install python-multipart,然后启动服务器程序就请求成功了。
另外如果希望仅仅输出.md文件来节省存储空间和速度的话可以:
from magic_pdf.libs.MakeContentConfig import MakeMode # 添加这行

修改do parse 函数:

        do_parse(self.output_dir,
                  pdf_name, inputs[0],
                    [],
                    **inputs[1],
                    f_draw_span_bbox=False,
                    f_draw_layout_bbox=False,
                    f_dump_md=True,
                    f_dump_middle_json=False,
                    f_dump_model_json=False,
                    f_dump_orig_pdf=False,
                    f_dump_content_list=False,
                    f_make_md_mode=MakeMode.MM_MD,
                    f_draw_model_bbox=False)

@randydl
Copy link
Contributor Author

randydl commented Oct 18, 2024

应该是你的代码改错了吧,我这边正常运行,改了文件路径怎么可能还有small_ocr.pdf,这只是个example file @PoisonousBromineChan

请问您这边跑的时候表格识别速度正常吗?

表格我还没验证过,有时间我试试看

@234687552
Copy link

问题描述:

参考server.py使用LitServe调用,发现表格识别巨慢

系统&环境:

PRETTY_NAME="Ubuntu 24.04 LTS"

Python 3.10.14

magic-pdf version 0.7.1

paddlepaddle-gpu 3.0.0b1

magic-pdf.json配置

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": true,
        "max_time": 400
    }
}

实验pdf链接:

https://github.com/opendatalab/MinerU/blob/master/demo/demo1.pdf

使用litserve

输出日志为:

2024-10-19 21:10:57.105 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 1501, cid_chars_radio: 0.0
2024-10-19 21:10:57.861 | INFO | magic_pdf.model.pdf_extract_kit:__call__:170 - layout detection cost: 0.68
Model initialization complete!
Setup complete for worker 3.

0: 1888x1344 4 embeddings, 92.2ms
Speed: 12.7ms preprocess, 92.2ms inference, 13.2ms postprocess per image at shape (1, 3, 1888, 1344)
2024-10-19 21:10:58.633 | INFO | magic_pdf.model.pdf_extract_kit:__call__:200 - formula nums: 4, mfr time: 0.2
2024-10-19 21:10:58.640 | INFO | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins-----------------
2024-10-19 21:14:13.524 | INFO | magic_pdf.model.pdf_extract_kit:__call__:300 - ------------table recognition processing ends within 194.88404989242554s-----
2024-10-19 21:14:13.525 | INFO | magic_pdf.model.pdf_extract_kit:__call__:317 - table cost: 194.89
2024-10-19 21:14:13.525 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:124 - doc analyze cost: 196.3451521396637
2024-10-19 21:14:13.567 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:221 - page_id: 0, last_page_cost_time: 0.0
2024-10-19 21:14:13.663 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:143 - 发现了列表,列表行数:[(0, 1)], [[0]]
2024-10-19 21:14:13.663 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:156 - 列表行的第0到第1行是列表
2024-10-19 21:14:13.797 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:48 - uni_pipe mk mm_markdown finished
2024-10-19 21:14:13.805 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_uni_format:43 - uni_pipe mk content list finished
2024-10-19 21:14:13.805 | INFO | magic_pdf.tools.common:do_parse:119 - local output dir is /tmp/91dc2fda-fb5c-431f-bbce-9dcdc8ce3596/auto

使用命令行

/opt/mineru_venv/bin/magic-pdf -p origin.pdf -m auto

输出日志为:

[10/19 21:41:53 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /opt/models/Layout/model_final.pth ...
[10/19 21:41:53 fvcore.common.checkpoint]: [Checkpointer] Loading from /opt/models/Layout/model_final.pth ...
2024-10-19 21:41:56.518 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:159 - DocAnalysis init done!
2024-10-19 21:41:56.518 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:98 - model init cost: 21.35542368888855
2024-10-19 21:41:57.207 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:170 - layout detection cost: 0.61

0: 1888x1344 4 embeddings, 91.9ms
Speed: 9.7ms preprocess, 91.9ms inference, 1.1ms postprocess per image at shape (1, 3, 1888, 1344)
2024-10-19 21:41:57.948 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:200 - formula nums: 4, mfr time: 0.19
2024-10-19 21:41:57.956 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins-----------------
[2024/10/19 21:41:59] ppocr DEBUG: dt_boxes num : 18, elapse : 0.045398712158203125
[2024/10/19 21:41:59] ppocr DEBUG: dt_boxes num : 18, elapse : 0.045398712158203125
[2024/10/19 21:41:59] ppocr DEBUG: rec_res num  : 18, elapse : 0.047318220138549805
[2024/10/19 21:41:59] ppocr DEBUG: rec_res num  : 18, elapse : 0.047318220138549805
2024-10-19 21:41:59.425 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:300 - ------------table recognition processing ends within 1.4687747955322266s-----
2024-10-19 21:41:59.425 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:317 - table cost: 1.47
2024-10-19 21:41:59.425 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:124 - doc analyze cost: 2.828835964202881
2024-10-19 21:41:59.467 | INFO     | magic_pdf.pdf_parse_union_core:pdf_parse_union:221 - page_id: 0, last_page_cost_time: 0.0
2024-10-19 21:42:00.020 | INFO     | magic_pdf.para.para_split_v2:__detect_list_lines:143 - 发现了列表,列表行数:[(0, 1)], [[0]]
2024-10-19 21:42:00.020 | INFO     | magic_pdf.para.para_split_v2:__detect_list_lines:156 - 列表行的第0到第1行是列表
2024-10-19 21:42:00.154 | INFO     | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:48 - uni_pipe mk mm_markdown finished
2024-10-19 21:42:00.162 | INFO     | magic_pdf.pipe.UNIPipe:pipe_mk_uni_format:43 - uni_pipe mk content list finished
2024-10-19 21:42:00.162 | INFO     | magic_pdf.tools.common:do_parse:119 - local output dir is output/origin/auto

@234687552
Copy link

234687552 commented Oct 22, 2024

不知道是不是这里导致表格识别巨慢

https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L46

IMG20241022-111905

@myhloli
Copy link
Collaborator

myhloli commented Oct 22, 2024

不知道是不是这里导致表格识别巨慢

https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L46

IMG20241022-111905

确实是这个原因,里面写死了匹配的规则,我们修一下这里
目前可以临时修改成

use_gpu = True if device.startswith("cuda") else False

@234687552
Copy link

问题描述:

参考server.py提供接口,15并发4gpu压测,发现gpu[0]总是爆满,其他gpu都是相对空闲。

期望结果:

gpu的压力均分

实验过程执行:

nvidia-smi --loop=1

输出日志:

                                                                                   
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Wed Oct 23 19:59:02 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   68C    P0            228W /  350W |   19876MiB /  46068MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   50C    P0            146W /  350W |    9629MiB /  46068MiB |     38%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   45C    P0            154W /  350W |    9629MiB /  46068MiB |     46%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   45C    P0             90W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Wed Oct 23 19:59:04 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   68C    P0            246W /  350W |   20234MiB /  46068MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   51C    P0            155W /  350W |    9629MiB /  46068MiB |     43%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   43C    P0            130W /  350W |    9629MiB /  46068MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   45C    P0             93W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Wed Oct 23 19:59:05 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   68C    P0            217W /  350W |   20234MiB /  46068MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   50C    P0            158W /  350W |    9629MiB /  46068MiB |     34%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   43C    P0             88W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   45C    P0             90W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

image-20241023200204846

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

@234687552 你这边是打开了表格识别了吗,如果打开了可以试试关闭表格识别,再测一下负载均衡,这样可以定位是不是表格识别的问题。

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

感谢,跑通了。额外安装库 pip install python-multipart,然后启动服务器程序就请求成功了。 另外如果希望仅仅输出.md文件来节省存储空间和速度的话可以: from magic_pdf.libs.MakeContentConfig import MakeMode # 添加这行

修改do parse 函数:

        do_parse(self.output_dir,
                  pdf_name, inputs[0],
                    [],
                    **inputs[1],
                    f_draw_span_bbox=False,
                    f_draw_layout_bbox=False,
                    f_dump_md=True,
                    f_dump_middle_json=False,
                    f_dump_model_json=False,
                    f_dump_orig_pdf=False,
                    f_dump_content_list=False,
                    f_make_md_mode=MakeMode.MM_MD,
                    f_draw_model_bbox=False)

简单的方法是在调用client里面的do_parse函数时传入这些参数就可以了,不需要修改server的代码

@234687552
Copy link

234687552 commented Oct 24, 2024

@234687552 你这边是打开了表格识别了吗,如果打开了可以试试关闭表格识别,再测一下负载均衡,这样可以定位是不是表格识别的问题。

情况描述

之前是开启了表格识别:"is_table_recog_enable": true,

关闭后测试:gpu[0] 不会一直持续爆满,其他gpu相对均衡运转

关闭表格识别

cat ~/magic-pdf.json

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": false,
        "max_time": 400
    }
}

gpu使用情况

nvidia-smi --loop=1

                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 10:07:57 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   58C    P0            169W /  350W |   15238MiB /  46068MiB |     47%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   59C    P0            165W /  350W |    9627MiB /  46068MiB |     43%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   54C    P0            154W /  350W |    9627MiB /  46068MiB |     22%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   53C    P0            109W /  350W |    9619MiB /  46068MiB |     15%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 10:07:58 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   62C    P0            193W /  350W |   15238MiB /  46068MiB |     76%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   60C    P0            175W /  350W |    9627MiB /  46068MiB |     48%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   52C    P0            176W /  350W |    9627MiB /  46068MiB |     56%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   60C    P0            192W /  350W |    9629MiB /  46068MiB |     79%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 10:08:00 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   57C    P0            204W /  350W |   15238MiB /  46068MiB |     42%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   59C    P0            189W /  350W |    9627MiB /  46068MiB |     86%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   51C    P0            114W /  350W |    9627MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   54C    P0            114W /  350W |    9629MiB /  46068MiB |     19%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

image-20241024101150096

@234687552
Copy link

这边实际情况是必须开启表格识别的,现在不知道如何处理让表格识别也均衡单机使用多gpu

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

这边实际情况是必须开启表格识别的,现在不知道如何处理让表格识别也均衡单机使用多gpu

看来我的猜测是对的,还是因为表格识别的bug引起的,可能还是在代码的某个地方,表格模型还是以.cuda的方式load的,还是没有正确识别到cuda:1这种。导致所有的表格模型都load到了gpu 0上,因而gpu 0爆满。

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

对于TableMaster表格识别模型,以下是存在bug的地方:
https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55
仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量

对于struct_eqtable表格模型,以下是存在bug的地方:
https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9
这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效

@myhloli @234687552

@myhloli
Copy link
Collaborator

myhloli commented Oct 24, 2024

对于TableMaster表格识别模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55 仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量

对于struct_eqtable表格模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9 这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效

@myhloli @234687552

paddle框架指定gpu的方式和torch框架不一致,目前paddle都是使用第一张卡去加速的,目前我们的开发重心还在提高解析质量上,暂时分不出人力优化多卡分配的逻辑,欢迎有能力解决多卡分配问题的开发者提交pr

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

server.py

import os
import torch
import filetype
import json, uuid
import litserve as ls
from fastapi import HTTPException
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    @staticmethod
    def clean_memory(device):
        import gc
        if torch.cuda.is_available():
            with torch.cuda.device(device):
                torch.cuda.empty_cache()
                torch.cuda.ipc_collect()
        gc.collect()

    def setup(self, device):
        device = torch.device(device)
        os.environ['CUDA_VISIBLE_DEVICES'] = str(device.index)
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete!')

    def decode_request(self, request):
        file = request['file'].file.read()
        kwargs = json.loads(request['kwargs'])
        assert filetype.guess_mime(file) == 'application/pdf'
        return file, kwargs

    def predict(self, inputs):
        try:
            pdf_name = str(uuid.uuid4())
            do_parse(self.output_dir, pdf_name, inputs[0], [], **inputs[1])
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=f'{e}')
        finally:
            self.clean_memory(self.device)

    def encode_response(self, response):
        return {'output_dir': response}


if __name__ == '__main__':
    server = ls.LitServer(MinerUAPI(), accelerator='gpu', devices=[0, 1], timeout=False)
    server.run(port=8000)

magic-pdf.json

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": true,
        "max_time": 400
    }
}

试试把server.py改成我提供的新的代码,打开表格识别,再跑一次压测看看,应该是可以了 @234687552

@234687552
Copy link

server.py

import os
import torch
import filetype
import json, uuid
import litserve as ls
from fastapi import HTTPException
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    @staticmethod
    def clean_memory(device):
        import gc
        if torch.cuda.is_available():
            with torch.cuda.device(device):
                torch.cuda.empty_cache()
                torch.cuda.ipc_collect()
        gc.collect()

    def setup(self, device):
        device = torch.device(device)
        os.environ['CUDA_VISIBLE_DEVICES'] = str(device.index)
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete!')

    def decode_request(self, request):
        file = request['file'].file.read()
        kwargs = json.loads(request['kwargs'])
        assert filetype.guess_mime(file) == 'application/pdf'
        return file, kwargs

    def predict(self, inputs):
        try:
            pdf_name = str(uuid.uuid4())
            do_parse(self.output_dir, pdf_name, inputs[0], [], **inputs[1])
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=f'{e}')
        finally:
            self.clean_memory(self.device)

    def encode_response(self, response):
        return {'output_dir': response}


if __name__ == '__main__':
    server = ls.LitServer(MinerUAPI(), accelerator='gpu', devices=[0, 1], timeout=False)
    server.run(port=8000)

magic-pdf.json

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": true,
        "max_time": 400
    }
}

试试把server.py改成我提供的新的代码,打开表格识别,再跑一次压测看看,应该是可以了 @234687552

情况描述
@randydl

gpu是均衡分配占用【详看后面的日志和截图】,但是clean_memory有异常堆栈

参考改动如下:

  def setup(self, device):
        device = torch.device(device)
        os.environ['CUDA_VISIBLE_DEVICES'] = str(device.index)
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete!')

异常堆栈:

Please check the error trace for more details.
Traceback (most recent call last):
File "/opt/mineru_venv/lib/python3.10/site-packages/litserve/loops.py", line 134, in run_single_loop
y = _inject_context(
File "/opt/mineru_venv/lib/python3.10/site-packages/litserve/loops.py", line 55, in _inject_context
return func(*args, **kwargs)
File "/app/app.py", line 144, in predict
self.clean_memory(self.device)
File "/app/app.py", line 83, in clean_memory
with torch.cuda.device(device):
File "/opt/mineru_venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 365, in __enter__
self.prev_idx = torch.cuda._exchange_device(self.idx)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

gpu使用情况

nvidia-smi --loop=1

                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 20:54:03 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   51C    P0            135W /  350W |   11611MiB /  46068MiB |     18%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   54C    P0            124W /  350W |   11435MiB /  46068MiB |     23%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   48C    P0            112W /  350W |   12227MiB /  46068MiB |     20%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   51C    P0            124W /  350W |   11435MiB /  46068MiB |     26%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 20:54:05 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   51C    P0            117W /  350W |   11611MiB /  46068MiB |     23%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   54C    P0            130W /  350W |   11435MiB /  46068MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   48C    P0            118W /  350W |   12227MiB /  46068MiB |     23%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   51C    P0            132W /  350W |   11435MiB /  46068MiB |     31%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 20:54:06 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   51C    P0            125W /  350W |   11611MiB /  46068MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   54C    P0            138W /  350W |   11435MiB /  46068MiB |     30%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   48C    P0            126W /  350W |   12227MiB /  46068MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   52C    P0            143W /  350W |   11435MiB /  46068MiB |     36%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

image-20241024205743118

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

感谢,看来有进展!试试把with torch.cuda.device(device):这句话删掉@234687552

@234687552
Copy link

感谢,看来有进展!试试把with torch.cuda.device(device):这句话删掉@234687552

感谢支持,现在是可以多gpu正常运作了。

@randydl
Copy link
Contributor Author

randydl commented Oct 25, 2024

对于TableMaster表格识别模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55 仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量
对于struct_eqtable表格模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9 这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效
@myhloli @234687552

paddle框架指定gpu的方式和torch框架不一致,目前paddle都是使用第一张卡去加速的,目前我们的开发重心还在提高解析质量上,暂时分不出人力优化多卡分配的逻辑,欢迎有能力解决多卡分配问题的开发者提交pr

对于TableMaster表格识别模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55 仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量
对于struct_eqtable表格模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9 这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效
@myhloli @234687552

paddle框架指定gpu的方式和torch框架不一致,目前paddle都是使用第一张卡去加速的,目前我们的开发重心还在提高解析质量上,暂时分不出人力优化多卡分配的逻辑,欢迎有能力解决多卡分配问题的开发者提交pr

经过昨天的调试我们基本解决了,后续我再测一下,可以的话我提个PR

@myhloli
Copy link
Collaborator

myhloli commented Oct 25, 2024

@randydl 可以提到dev分支的project目录,参考其他项目创建一个目录放代码文件和readme

@randydl
Copy link
Contributor Author

randydl commented Oct 25, 2024

@randydl 可以提到dev分支的project目录,参考其他项目创建一个目录放代码文件和readme

好的

@myhloli
Copy link
Collaborator

myhloli commented Nov 5, 2024

@myhloli myhloli closed this as completed Nov 5, 2024
@Sakura4036
Copy link

Sakura4036 commented Nov 8, 2024

@myhloli @234687552 你好,麻烦请看看这个pip安装问题 @897

@hzspyy
Copy link

hzspyy commented Nov 28, 2024

0.10.x版本下, 启动server会报错:ModuleNotFoundError: No module named 'paddleocr.paddleocr'; 'paddleocr' is not a package

@myhloli
Copy link
Collaborator

myhloli commented Nov 29, 2024

0.10.x版本下, 启动server会报错:ModuleNotFoundError: No module named 'paddleocr.paddleocr'; 'paddleocr' is not a package

0.10.3已修复

@xcvil
Copy link

xcvil commented Dec 1, 2024

请问有大佬能指点一下为什么在LSF集群进行单点多卡的BSUB的时候,并行只在单卡上跑呢?其它卡都没用上

LSF Shell

#!/bin/bash
#BSUB -q long
#BSUB -n 4                           # Number of cores
#BSUB -o ./lsf_log/gpu_job_%J.out              # Output file name (%J is the job ID)
#BSUB -e ./lsf_log/gpu_job_%J.err              # Error file name
#BSUB -gpu num=4:gmem=80000:mode=shared
#BSUB -R "span[hosts=1]"
#BSUB -R "rusage[mem=64GB]"          # Memory requirement (specify amount needed)
#BSUB -W 24:00                        # Wall clock limit (hours:minutes)

server.py

if __name__ == '__main__':
    server = ls.LitServer(
        MinerUAPI(output_dir='...'),
        accelerator='cuda',
        devices=[0,1,2,3],
        workers_per_device=1,
        timeout=False
    )
    server.run(port=6000)

client.py

n_jobs = np.clip(len(files), 1, 8)
results = Parallel(n_jobs, prefer='threads', verbose=10)(
        delayed(do_parse)(p) for p in files
    )

GPU 占用

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB           On | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0               72W / 400W|    831MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB           On | 00000000:0B:00.0 Off |                    0 |
| N/A   38C    P0              155W / 400W|   8805MiB / 81920MiB |     55%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB           On | 00000000:48:00.0 Off |                    0 |
| N/A   30C    P0               75W / 400W|   2109MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB           On | 00000000:4C:00.0 Off |                    0 |
| N/A   31C    P0               67W / 400W|      0MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB           On | 00000000:88:00.0 Off |                    0 |
| N/A   35C    P0              157W / 400W|  41173MiB / 81920MiB |     11%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB           On | 00000000:8B:00.0 Off |                    0 |
| N/A   33C    P0               68W / 400W|      3MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB           On | 00000000:C8:00.0 Off |                    0 |
| N/A   31C    P0               65W / 400W|      3MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB           On | 00000000:CB:00.0 Off |                    0 |
| N/A   32C    P0               67W / 400W|      3MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    4   N/A  N/A    18***3     C   .../conda/envs/parser/bin/python     9230MiB |
|    4   N/A  N/A    18***2      C   .../conda/envs/parser/bin/python    10326MiB |
|    4   N/A  N/A    18***1      C   .../conda/envs/parser/bin/python     9970MiB |
|    4   N/A  N/A    18***0      C   .../conda/envs/parser/bin/python    11644MiB |
+---------------------------------------------------------------------------------------+

谢谢巨佬们!

@randydl
Copy link
Contributor Author

randydl commented Dec 2, 2024

@xcvil Ensuring CUDA_VISIBLE_DEVICES is set correctly.

@PoleGeogry
Copy link

尝试了一下,感觉并不是多张卡分布式运行服务,是每张卡部署一个服务。请问一下有什么解决方案

@xcvil
Copy link

xcvil commented Dec 2, 2024

@xcvil Ensuring CUDA_VISIBLE_DEVICES is set correctly.

Thanks a lot for replying. Could you please give me some hints about how to set CUDA_VISIBLE_DEVICES?

In my scenario, CUDA_VISIBLE_DEVICES=0,1,2,3 (I requested 4 GPUs). I noticed that in the server.py, there are codes

def setup(self, device):
        if device.startswith('cuda'):
            os.environ['CUDA_VISIBLE_DEVICES'] = device.split(':')[-1]
            if torch.cuda.device_count() > 1:
                raise RuntimeError("Remove any CUDA actions before setting 'CUDA_VISIBLE_DEVICES'.")

if torch.cuda.device_count() > 1:

I will raise error with these lines of codes.

@xcvil
Copy link

xcvil commented Dec 2, 2024

尝试了一下,感觉并不是多张卡分布式运行服务,是每张卡部署一个服务。请问一下有什么解决方案

我也有类似的问题,当我提交了多卡的工作之后,server只在第一个卡上跑,所有的multiprocessor都在这张卡上跑

@randydl
Copy link
Contributor Author

randydl commented Dec 3, 2024

@xcvil Ensuring CUDA_VISIBLE_DEVICES is set correctly.

Thanks a lot for replying. Could you please give me some hints about how to set CUDA_VISIBLE_DEVICES?

In my scenario, CUDA_VISIBLE_DEVICES=0,1,2,3 (I requested 4 GPUs). I noticed that in the server.py, there are codes

def setup(self, device):
        if device.startswith('cuda'):
            os.environ['CUDA_VISIBLE_DEVICES'] = device.split(':')[-1]
            if torch.cuda.device_count() > 1:
                raise RuntimeError("Remove any CUDA actions before setting 'CUDA_VISIBLE_DEVICES'.")

if torch.cuda.device_count() > 1:

I will raise error with these lines of codes.

The check torch.cuda.device_count() > 1 ensures that CUDA_VISIBLE_DEVICES is set effectively by preventing any CUDA operations from being performed before its configuration. This is crucial because performing CUDA operations before setting CUDA_VISIBLE_DEVICES can render the setting ineffective. By verifying that torch.cuda.device_count() > 1, we ensure that no pre-existing CUDA operations interfere with the device visibility settings, thus allowing each process to correctly select and use only the specified GPU.

@zxwsd
Copy link

zxwsd commented Dec 3, 2024

@randydl 请问我修改了一下这个代码,这样可以支持多张gpu一起运行吗
`import os
import fitz
import torch
import base64
import litserve as ls
from uuid import uuid4
from fastapi import HTTPException
from filetype import guess_extension
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton

class MinerUAPI(ls.LitAPI):
def init(self, output_dir='/tmp'):
self.output_dir = output_dir

def setup(self, device):
    if device.startswith('cuda'):
        os.environ['CUDA_VISIBLE_DEVICES'] = device.split(':')[-1]

    model_manager = ModelSingleton()
    model_manager.get_model(True, False)
    model_manager.get_model(False, False)
    print(f'Model initialization complete on {device}!')

def decode_request(self, request):
    file = request['file']
    file = self.to_pdf(file)
    opts = request.get('kwargs', {})
    opts.setdefault('debug_able', False)
    opts.setdefault('parse_method', 'auto')
    return file, opts

def predict(self, inputs):
    try:
        do_parse(self.output_dir, pdf_name := str(uuid4()), inputs[0], [], **inputs[1])
        return pdf_name
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
    finally:
        self.clean_memory()

def encode_response(self, response):
    return {'output_dir': response}

def clean_memory(self):
    import gc
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.ipc_collect()
    gc.collect()

def to_pdf(self, file_base64):
    try:
        file_bytes = base64.b64decode(file_base64)
        file_ext = guess_extension(file_bytes)
        with fitz.open(stream=file_bytes, filetype=file_ext) as f:
            if f.is_pdf: return f.tobytes()
            return f.convert_to_pdf()
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

if name == 'main':
server = ls.LitServer(
MinerUAPI(output_dir='/tmp'),
accelerator='cuda',
devices='auto',
workers_per_device=1,
timeout=False
)
server.run(port=8000)

`

@xcvil
Copy link

xcvil commented Dec 3, 2024

@randydl 请问我修改了一下这个代码,这样可以支持多张gpu一起运行吗 `import os import fitz import torch import base64 import litserve as ls from uuid import uuid4 from fastapi import HTTPException from filetype import guess_extension from magic_pdf.tools.common import do_parse from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton

class MinerUAPI(ls.LitAPI): def init(self, output_dir='/tmp'): self.output_dir = output_dir

def setup(self, device):
    if device.startswith('cuda'):
        os.environ['CUDA_VISIBLE_DEVICES'] = device.split(':')[-1]

    model_manager = ModelSingleton()
    model_manager.get_model(True, False)
    model_manager.get_model(False, False)
    print(f'Model initialization complete on {device}!')

def decode_request(self, request):
    file = request['file']
    file = self.to_pdf(file)
    opts = request.get('kwargs', {})
    opts.setdefault('debug_able', False)
    opts.setdefault('parse_method', 'auto')
    return file, opts

def predict(self, inputs):
    try:
        do_parse(self.output_dir, pdf_name := str(uuid4()), inputs[0], [], **inputs[1])
        return pdf_name
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
    finally:
        self.clean_memory()

def encode_response(self, response):
    return {'output_dir': response}

def clean_memory(self):
    import gc
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.ipc_collect()
    gc.collect()

def to_pdf(self, file_base64):
    try:
        file_bytes = base64.b64decode(file_base64)
        file_ext = guess_extension(file_bytes)
        with fitz.open(stream=file_bytes, filetype=file_ext) as f:
            if f.is_pdf: return f.tobytes()
            return f.convert_to_pdf()
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

if name == 'main': server = ls.LitServer( MinerUAPI(output_dir='/tmp'), accelerator='cuda', devices='auto', workers_per_device=1, timeout=False ) server.run(port=8000)

`

你可以试试这个
#1157 (comment)
我改了一下在LSF/Slurm上能跑,如果不能的话把报错信息post一下一起看看!

@zxwsd
Copy link

zxwsd commented Dec 4, 2024

@xcvil 我尝试了一下还是只能一个gpu上运行,不管指定哪几个,最后只在指定的第一个gpu上跑,指定1,3,只跑1,指定0,1,只跑0

@zxwsd
Copy link

zxwsd commented Dec 4, 2024

@234687552 你好,请问我尝试了你发的代码,为什么我跑的时候还是只在一个gpu上运行的

@xcvil
Copy link

xcvil commented Dec 4, 2024

@xcvil 我尝试了一下还是只能一个gpu上运行,不管指定哪几个,最后只在指定的第一个gpu上跑,指定1,3,只跑1,指定0,1,只跑0

看看你代码呗 如果slurm/lsf, shell也发一下

@zxwsd
Copy link

zxwsd commented Dec 5, 2024

@xcvil
image
server端:
`
import os
import fitz
import torch
import base64
import litserve as ls
from uuid import uuid4
from fastapi import HTTPException
from filetype import guess_extension
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton

class MinerUAPI(ls.LitAPI):
def init(self, output_dir='/tmp'):
self.output_dir = output_dir

def setup(self, device):
    os.environ['CUDA_VISIBLE_DEVICES'] = device.split(':')[-1]
   # torch.cuda.set_device(device)
    model_manager = ModelSingleton()
    model_manager.get_model(True, False)
    model_manager.get_model(False, False)
    print(f'Model initialization complete on {device}!')

def decode_request(self, request):
    file = request['file']
    file = self.to_pdf(file)
    opts = request.get('kwargs', {})
    opts.setdefault('debug_able', False)
    opts.setdefault('parse_method', 'auto')
    return file, opts

def predict(self, inputs):
    try:
        do_parse(self.output_dir, pdf_name := str(uuid4()), inputs[0], [], **inputs[1])
        return pdf_name
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
    finally:
        self.clean_memory()

def encode_response(self, response):
    return {'output_dir': response}

def clean_memory(self):
    import gc
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.ipc_collect()
    gc.collect()

def to_pdf(self, file_base64):
    try:
        file_bytes = base64.b64decode(file_base64)
        file_ext = guess_extension(file_bytes)
        with fitz.open(stream=file_bytes, filetype=file_ext) as f:
            if f.is_pdf: return f.tobytes()
            return f.convert_to_pdf()
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

if name == 'main':
server = ls.LitServer(
MinerUAPI(output_dir='/data/cuixk/output'),
accelerator='cuda',
devices=[0,1],
workers_per_device=1,
timeout=False
)
server.run(port=8000)
client端:
import os
import base64
import requests
import numpy as np
from loguru import logger
from joblib import Parallel, delayed

def to_b64(file_path):
try:
with open(file_path, 'rb') as f:
return base64.b64encode(f.read()).decode('utf-8')
except Exception as e:
raise Exception(f'File: {file_path} - Info: {e}')

def do_parse(file_path, url='http://127.0.0.1:8000/predict', **kwargs):
try:
response = requests.post(url, json={
'file': to_b64(file_path),
'kwargs': kwargs
})

    if response.status_code == 200:
        output = response.json()
        output['file_path'] = file_path
        return output
    else:
        raise Exception(response.text)
except Exception as e:
    logger.error(f'File: {file_path} - Info: {e}')

def process_pdf_files_concurrently(pdf_files):
n_jobs = np.clip(len(pdf_files), 1, 2)
results = Parallel(n_jobs = n_jobs, prefer='threads', verbose=10)(
delayed(do_parse)(p) for p in pdf_files
)
print(results)

def process_files_in_batches(directory, batch_size=20):
pdf_files = []
for root, dirs, files in os.walk(directory):
for file in files:
if file.lower().endswith('.pdf'):
pdf_files.append(os.path.join(root, file))
if len(pdf_files) >= batch_size:
print(f"the pdf files are: ${pdf_files}")
process_pdf_files_concurrently(pdf_files)
pdf_files = []

if name == 'main':
directory = "/data/cuixk/test1/knowleges"
batch_size = 20
process_files_in_batches(directory, batch_size=batch_size)
`
跑的时候只有显示
image

@xcvil
Copy link

xcvil commented Dec 7, 2024

@zxwsd 你试了这个嘛
#1157 (comment)

@zxwsd
Copy link

zxwsd commented Dec 8, 2024

@xcvil 我更换了一个版本,把MinerU从10.5降到0.9.0,然后就可以正常多卡并行了

@234687552
Copy link

@xcvil @zxwsd

可以研究litServer,单机多卡部署,其实一个请求过来,最多也就打满一张卡,不会占用其他的卡

@Ronass
Copy link

Ronass commented Dec 20, 2024

@randydl 大佬能帮忙看下吗,怎么才能支持multipart/form-data 这种类型的数据呢,想部署一个api服务,直接用的serve.py,但是发现怎么也接收不到数据,
curl -i -X POST \ -H "Content-Type:multipart/form-data" \ -F "file=@\"./10-1.pdf\";type=application/pdf;filename=\"10-1.pdf\"" \ -F "parse_method=auto" \ 'http://0.0.0.0:8000/predict'

这是请求的格式,不知道decode_request 这个方法怎么才能接收到这两个参数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests