Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] disk cache io error when simultaneously loading lots of sglang offline engine #2090

Open
5 tasks done
LeeSureman opened this issue Nov 19, 2024 · 1 comment
Open
5 tasks done

Comments

@LeeSureman
Copy link

LeeSureman commented Nov 19, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

when I use slurm to launch 32 or 192 jobs for offline batch inference, which simultaneously load sgl.engine. I met the following error although I set disable_disk_cache=True. If I only run one job for this, it will not meet this error.

The error is as follows:

Traceback (most recent call last):
  File "/home/xiaonan/mycode/code_data_synthesis/generate_python_docstring_slurm_task.py", line 64, in <module>
    llm = sgl.Engine(model_path=args.model_name, tp_size=args.tp_size, disable_disk_cache=True)
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/api.py", line 48, in Engine
    from sglang.srt.server import Engine
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/server.py", line 49, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/data_parallel_controller.py", line 24, in <module>
    from sglang.srt.managers.io_struct import (
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/io_struct.py", line 26, in <module>
    from sglang.srt.managers.schedule_batch import BaseFinishReason
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/managers/schedule_batch.py", line 40, in <module>
    from sglang.srt.constrained.grammar import Grammar
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/sglang/srt/constrained/__init__.py", line 24, in <module>
    from outlines.caching import cache as disk_cache
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/__init__.py", line 2, in <module>
    import outlines.generate
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/generate/__init__.py", line 2, in <module>
    from .cfg import cfg
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/generate/cfg.py", line 3, in <module>
    from outlines.fsm.guide import CFGGuide
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/fsm/guide.py", line 109, in <module>
    def create_states_mapping(
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/caching.py", line 93, in decorator
    memory = get_cache()
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/outlines/caching.py", line 55, in get_cache
    memory = Cache(
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/diskcache/core.py", line 499, in __init__
    sql(query, (key, value))
  File "/home/xiaonan/miniconda3/envs/llm_training_39/lib/python3.9/site-packages/diskcache/core.py", line 666, in _execute_with_retry
    return sql(statement, *args, **kwargs)
sqlite3.OperationalError: disk I/O error

Reproduction

Python:
slurm_task.py

import sglang as sgl
llm = sgl.Engine(model_path='Qwen/Qwen2.5-Coder-32B-Instruct', tp_size=2, disable_disk_cache=True)

Sbatch Script:

#!/bin/bash
#SBATCH --job-name=task1  # job name
#SBATCH --output=slurm_logs/%A_%a/output.txt     # output file 
#SBATCH --error=slurm_logs/%A_%a/error.txt       # error file
#SBATCH --array=0-191%192
#SBATCH --ntasks=1
#SBATCH --gres=gpu:2
#SBATCH --cpus-per-task=16            

Python slurm_task.py

Environment

2024-11-19 08:47:37.576574: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: False
PyTorch: 2.4.0
sglang: 0.3.5
flashinfer: 0.1.6+cu124torch2.4
triton: 3.0.0
transformers: 4.46.2
requests: 2.32.3
tqdm: 4.67.0
numpy: 1.23.0
aiohttp: 3.10.5
fastapi: 0.115.4
hf_transfer: 0.1.8
huggingface_hub: 0.24.6
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.9.2
uvicorn: 0.32.0
uvloop: 0.21.0
zmq: 26.2.0
vllm: 0.6.3.post1
multipart: 0.0.17
openai: 1.54.4
anthropic: 0.39.0
Hypervisor vendor: KVM
ulimit soft: 1024

@LeeSureman
Copy link
Author

I find this issue is due to the outlines cache issue, which is similar with this: vllm-project/vllm#7831. Maybe you can follow this to address the issue in sglang, caused by outlines: vllm-project/vllm#7831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant