You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I deployed the Meta-Llama/Llama-3.3-70B-Instruct model on a Ray cluster using Tesla V100-SXM2-32GB GPUs with vLLM. However, when I send a query, I encounter the following error:
python: /project/lib/Analysis/Allocation.cpp:47: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.
*** SIGABRT received at time=1736148493 on cpu 1 ***
PC: @ 0x7fce07e9eb1c (unknown) pthread_kill
@ 0x7fce07e45320 (unknown) (unknown)
@ 0x7fce07e4526e 32 raise
@ 0x7fce07e288ff 192 abort
@ 0x7fce07e2881b 96 (unknown)
@ 0x7fce07e3b507 48 __assert_fail
@ 0x7fccc4cbe42a (unknown) mlir::triton::getScratchConfigForCvtLayout()
@ 0x300000001 (unknown) (unknown)
@ 0x27106e90 (unknown) (unknown)
@ 0x7fccc8df9fd7 (unknown) (unknown)
@ 0x7fccc4d6b9c0 (unknown) (unknown)
@ 0x9000623991e90789 (unknown) (unknown)
[2025-01-06 10:28:13,840 E 173516 173516] logging.cc:447: *** SIGABRT received at time=1736148493 on cpu 1 ***
[2025-01-06 10:28:13,840 E 173516 173516] logging.cc:447: PC: @ 0x7fce07e9eb1c (unknown) pthread_kill
...
Fatal Python error: Aborted
Is the Meta-Llama/Llama-3.3-70B-Instruct model compatible with Tesla V100-SXM2-32GB GPUs? If yes, what configurations or optimizations might resolve this issue?
I also tried the meta-llama/Llama-3.1-8B-Instruct and meta-llama--Llama-3.2-1B models and they give the same error.
The text was updated successfully, but these errors were encountered:
I deployed the Meta-Llama/Llama-3.3-70B-Instruct model on a Ray cluster using Tesla V100-SXM2-32GB GPUs with vLLM. However, when I send a query, I encounter the following error:
python: /project/lib/Analysis/Allocation.cpp:47: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.
*** SIGABRT received at time=1736148493 on cpu 1 ***
PC: @ 0x7fce07e9eb1c (unknown) pthread_kill
@ 0x7fce07e45320 (unknown) (unknown)
@ 0x7fce07e4526e 32 raise
@ 0x7fce07e288ff 192 abort
@ 0x7fce07e2881b 96 (unknown)
@ 0x7fce07e3b507 48 __assert_fail
@ 0x7fccc4cbe42a (unknown) mlir::triton::getScratchConfigForCvtLayout()
@ 0x300000001 (unknown) (unknown)
@ 0x27106e90 (unknown) (unknown)
@ 0x7fccc8df9fd7 (unknown) (unknown)
@ 0x7fccc4d6b9c0 (unknown) (unknown)
@ 0x9000623991e90789 (unknown) (unknown)
[2025-01-06 10:28:13,840 E 173516 173516] logging.cc:447: *** SIGABRT received at time=1736148493 on cpu 1 ***
[2025-01-06 10:28:13,840 E 173516 173516] logging.cc:447: PC: @ 0x7fce07e9eb1c (unknown) pthread_kill
...
Fatal Python error: Aborted
Is the Meta-Llama/Llama-3.3-70B-Instruct model compatible with Tesla V100-SXM2-32GB GPUs? If yes, what configurations or optimizations might resolve this issue?
I also tried the meta-llama/Llama-3.1-8B-Instruct and meta-llama--Llama-3.2-1B models and they give the same error.
The text was updated successfully, but these errors were encountered: