Error Running Meta-Llama/Llama-3.3-70B-Instruct Model on Tesla V100 GPU with Ray Cluster and vLLM #254

btarmadmin-1954 · 2025-01-06T10:58:56Z

I deployed the Meta-Llama/Llama-3.3-70B-Instruct model on a Ray cluster using Tesla V100-SXM2-32GB GPUs with vLLM. However, when I send a query, I encounter the following error:

python: /project/lib/Analysis/Allocation.cpp:47: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.

*** SIGABRT received at time=1736148493 on cpu 1 ***
PC: @ 0x7fce07e9eb1c (unknown) pthread_kill
@ 0x7fce07e45320 (unknown) (unknown)
@ 0x7fce07e4526e 32 raise
@ 0x7fce07e288ff 192 abort
@ 0x7fce07e2881b 96 (unknown)
@ 0x7fce07e3b507 48 __assert_fail
@ 0x7fccc4cbe42a (unknown) mlir::triton::getScratchConfigForCvtLayout()
@ 0x300000001 (unknown) (unknown)
@ 0x27106e90 (unknown) (unknown)
@ 0x7fccc8df9fd7 (unknown) (unknown)
@ 0x7fccc4d6b9c0 (unknown) (unknown)
@ 0x9000623991e90789 (unknown) (unknown)
[2025-01-06 10:28:13,840 E 173516 173516] logging.cc:447: *** SIGABRT received at time=1736148493 on cpu 1 ***
[2025-01-06 10:28:13,840 E 173516 173516] logging.cc:447: PC: @ 0x7fce07e9eb1c (unknown) pthread_kill
...
Fatal Python error: Aborted

Is the Meta-Llama/Llama-3.3-70B-Instruct model compatible with Tesla V100-SXM2-32GB GPUs? If yes, what configurations or optimizations might resolve this issue?

I also tried the meta-llama/Llama-3.1-8B-Instruct and meta-llama--Llama-3.2-1B models and they give the same error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Running Meta-Llama/Llama-3.3-70B-Instruct Model on Tesla V100 GPU with Ray Cluster and vLLM #254

Error Running Meta-Llama/Llama-3.3-70B-Instruct Model on Tesla V100 GPU with Ray Cluster and vLLM #254

btarmadmin-1954 commented Jan 6, 2025 •

edited

Loading

Error Running Meta-Llama/Llama-3.3-70B-Instruct Model on Tesla V100 GPU with Ray Cluster and vLLM #254

Error Running Meta-Llama/Llama-3.3-70B-Instruct Model on Tesla V100 GPU with Ray Cluster and vLLM #254

Comments

btarmadmin-1954 commented Jan 6, 2025 • edited Loading

btarmadmin-1954 commented Jan 6, 2025 •

edited

Loading