Does sglang Support Inference for Multiple Models at the Compilation Stage? #1263

DopeorNope-Lee · 2024-08-30T04:11:24Z

DopeorNope-Lee
Aug 30, 2024

Hello,

I am writing to ask the following question.

Rather than parallelizing kernels or servers to aggregate the results of multiple models, I would like to know if sglang supports ensembling the results of multiple models at the compilation stage through multiprocessing or other methods.

Specifically, I am asking if it is possible to parallelize and use multiple quantized models (in GGUF, GPTQ, AWQ formats) within a single code using multiprocessing.

Thank you

Answered by merrymercy

Sep 22, 2024

I think it is possible. You can use -mem-fraction-static to control the memory usage of a server and use multiprocessing/popen to launch many servers

sglang/test/srt/test_eval_accuracy_mini.py

Line 19 in 13f1357

cls.process = popen_launch_server(

View full answer

merrymercy · 2024-09-22T09:50:20Z

merrymercy
Sep 22, 2024
Maintainer

I think it is possible. You can use -mem-fraction-static to control the memory usage of a server and use multiprocessing/popen to launch many servers

sglang/test/srt/test_eval_accuracy_mini.py

Line 19 in 13f1357

cls.process = popen_launch_server(

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does sglang Support Inference for Multiple Models at the Compilation Stage? #1263

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Does sglang Support Inference for Multiple Models at the Compilation Stage? #1263

DopeorNope-Lee Aug 30, 2024

Replies: 1 comment

merrymercy Sep 22, 2024 Maintainer

DopeorNope-Lee
Aug 30, 2024

merrymercy
Sep 22, 2024
Maintainer