Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llama benchmarking #207

Merged
merged 3 commits into from
Jan 16, 2025
Merged

add llama benchmarking #207

merged 3 commits into from
Jan 16, 2025

Conversation

wdbaruni
Copy link
Member

@wdbaruni wdbaruni commented Jan 16, 2025

Llama 3.1 8B Training Example for Bacalhau

This repository contains a single-node training example using NVIDIA's Llama 3.1 8B model, adapted for running on Bacalhau. This is a simplified version that demonstrates basic LLM training capabilities using 8 GPUs on a single node.

Based on https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/llama31-8b-dgxc-benchmarking-a

Overview

  • Single-node training of Llama 3.1 8B model
  • Uses NVIDIA's NeMo framework
  • Supports 8 GPUs on a single node
  • Uses synthetic data by default
  • Supports both FP8 and BF16 data types

Structure

.
├── Dockerfile                  # Container definition using NeMo base image
├── llama3.1_24.11.1/          # Configuration files
│   └── llama3.1_8b.yaml       # 8B model configuration
├── run_training.sh            # Main training script
└── sample-jobs.yaml           # Bacalhau job definition

Building and Pushing the Image

  1. Login to GitHub Container Registry:
echo $GITHUB_PAT | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
  1. Build and push the image:
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 \
  -t ghcr.io/bacalhau-project/llama3-benchmark:24.12 \
  -t ghcr.io/bacalhau-project/llama3-benchmark:latest \
  --push .

Running on Bacalhau

Basic training job (10 steps with synthetic data):

bacalhau job run sample-job.yaml -V "steps=10"

Environment variables for customization:

  • DTYPE: Data type (fp8, bf16)
  • MAX_STEPS: Number of training steps
  • USE_SYNTHETIC_DATA: Whether to use synthetic data (default: true)

Output

Training results and logs are saved to the /results directory which gets:

  1. Published to S3 (bacalhau-nvidia-job-results bucket)
  2. Available in the job outputs

The results include:

  • Training logs
  • Performance metrics
  • TensorBoard logs

Resources Required

Fixed requirements:

  • 8x NVIDIA H100 GPUs (80GB each)
  • 32 CPU cores
  • 640GB system memory

Notes

  • Uses synthetic data by default - no data preparation needed
  • Training script is optimized for H100 GPUs
  • All settings are tuned for single-node performance

@wdbaruni wdbaruni merged commit ad25b69 into main Jan 16, 2025
6 checks passed
@wdbaruni wdbaruni deleted the llama-benchamrking branch January 16, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant