Llama 3.3 70B Compilation Error on Trainium (trn1) with Batch Size 4 #1075

wxnfifth5 · 2024-12-30T03:30:45Z

I am following the tutorial at https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial.html#scenario-1-run-llama3-3-70b-on-trn2 to compile and run Llama 3.3 70B model on trn1. While the compilation works successfully with batch sizes 1 and 2, it fails when attempting to compile with batch size 4.

The compiler (neuronx-cc) terminates with the following error:

2024-12-27T04:26:59Z [F134] neuronx-cc terminated abnormally - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new

Environment:

AMI: Deep Learning AMI Neuron (Ubuntu 22.04)
Python virtual environment: /opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/
Package versions:
- libneuronxla: 2.1.681.0
- neuronx-cc: 2.16.345.0+69131dd3
- neuronx-distributed: 0.10.0
- neuronx-distributed-inference: 0.1.0
- torch-neuronx: 2.5.1.2.4.0
- vllm: 0.1.dev2830+g22c56ee.neuron216 (from /home/ubuntu/upstreaming-to-vllm)

Command Configuration:

MODEL_PATH="meta-llama/Llama-3.3-70B-Instruct"
BATCH_SIZE=4
SEQ_LEN=2048
COMPILED_MODEL_PATH="./traced_model"
TP_DEGREE=32
LNC=1

# Environment variables
export NEURON_RT_EXEC_TIMEOUT=1200
export XLA_DENSE_GATHER_FACTOR=0
export NEURON_RT_INSPECT_ENABLE=0

# Command
inference_demo \
    --model-type llama \
    --task-type causal-lm \
    run \
    --model-path $MODEL_PATH \
    --compiled-model-path $COMPILED_MODEL_PATH \
    --torch-dtype bfloat16 \
    --start_rank_id 0 \
    --local_ranks_size $TP_DEGREE \
    --tp-degree $TP_DEGREE \
    --batch-size $BATCH_SIZE \
    --seq-len $SEQ_LEN \
    --on-device-sampling \
    --top-k 1 \
    --do-sample \
    --fused-qkv \
    --sequence-parallel-enabled \
    --qkv-kernel-enabled \
    --attn-kernel-enabled \
    --mlp-kernel-enabled \
    --cc-pipeline-tiling-factor 1 \
    --pad-token-id 2 \
    --enable-bucketing \
    --logical-neuron-cores $LNC \
    --prompt "What is annapurna labs?"

Full error traceback:

Traceback (most recent call last):
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/inference_demo", line 8, in <module>
    sys.exit(main())
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/neuronx_distributed_inference/inference_demo.py", line 486, in main
    run_inference(model_cls, args)
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/neuronx_distributed_inference/inference_demo.py", line 296, in run_inference
    model.compile(args.compiled_model_path, debug=args.hlo_debug)
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/neuronx_distributed_inference/models/application_base.py", line 145, in compile
    traced_model = self.get_builder(debug).trace(initialize_model_weights=False)
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/neuronx_distributed/trace/model_builder.py", line 310, in trace
    key, bucket_rank, neff_artifacts = future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/neuronx_distributed/trace/model_builder.py", line 254, in submit_compilation_job
    return key, bucket_rank, torch_neuronx.xla_impl.trace.generate_neff(*args)
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 506, in generate_neff
    neff_filename = hlo_compile(
  File "/opt/aws_neuronx_venv_pytorch_2_5_nxd_inference/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 396, in hlo_compile
    raise RuntimeError(f"neuronx-cc failed with {status}")
RuntimeError: neuronx-cc failed with 70

The error occurs during the model compilation phase when the neuronx-cc compiler attempts to generate the NEFF (Neuron Executable File Format) file for the model with batch size 4. The same configuration works successfully with batch sizes 1 and 2.

The text was updated successfully, but these errors were encountered:

jluntamazon · 2024-12-30T21:11:19Z

Hi @wxnfifth5,

We are actively looking into improving batching support at the moment and will look to see if we can resolve this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.3 70B Compilation Error on Trainium (trn1) with Batch Size 4 #1075

Llama 3.3 70B Compilation Error on Trainium (trn1) with Batch Size 4 #1075

wxnfifth5 commented Dec 30, 2024

jluntamazon commented Dec 30, 2024

Llama 3.3 70B Compilation Error on Trainium (trn1) with Batch Size 4 #1075

Llama 3.3 70B Compilation Error on Trainium (trn1) with Batch Size 4 #1075

Comments

wxnfifth5 commented Dec 30, 2024

jluntamazon commented Dec 30, 2024