Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions benchmark/locust/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Locust Load Testing for NeMo Guardrails

This directory contains a Locust-based load testing framework for the NeMo Guardrails OpenAI-compatible server.

## Introduction

The [Locust](https://locust.io/) stress-testing tool ramps up concurrent users making API calls to the `/v1/chat/completions` endpoint of an OpenAI-compatible LLM with configurable parameters.
This complements [ai-perf](https://github.com/ai-dynamo/aiperf), which measures steady-state performance. Locust instead focuses on ramping up load potentially beyond what a system can handle, and measure how gracefully it degrades under higher-than-expected load.

## Getting Started

### Prerequisites

These steps have been tested with Python 3.11.11.

1. **Create a virtual environment in which to install Locust and other benchmarking tools**

```bash
$ mkdir ~/env
$ python -m venv ~/env/benchmark_env
```

2. **Activate environment and install dependencies in the virtual environment**

```bash
$ source ~/env/benchmark_env/bin/activate
(benchmark_env) $ pip install -r benchmark/requirements.txt
```

## Running Benchmarks

The Locust benchmarks uses YAML configuration file to configure load-testing parameters.
To get started and load-test a model hosted at `http://localhost:8000`, use the following command.
Set `headless: false` in your YAML config to use Locust's interactive web UI. Then open http://localhost:8089 to control the test and view real-time metrics.

```bash
(benchmark_env) $ python -m benchmark.locust benchmark/locust/configs/local.yaml
```

### CLI Options

The `benchmark.locust` CLI supports the following options:

```bash
python -m benchmark.locust [OPTIONS] CONFIG_FILE
```

**Arguments:**
- `CONFIG_FILE`: Path to YAML configuration file (required)

**Options:**
- `--dry-run`: Print commands without executing them
- `--verbose`: Enable verbose logging and debugging information

## Configuration Options

All configuration is done via YAML files. The following fields are supported:

### Required Fields

- `config_id`: Guardrails configuration ID to use
- `model`: Model name to send in requests

### Optional Fields

- `host`: Server base URL (default: `http://localhost:8000`)
- `users`: Maximum concurrent users (default: `256`, minimum: `1`)
- `spawn_rate`: Users spawned per second (default: `10`, minimum: `0.1`)
- `run_time`: Test duration in seconds (default: `60`, minimum: `1`)
- `message`: Message content to send (default: `"Hello, what can you do?"`)
- `headless`: Run without web UI (default: `true`)
- `output_base_dir`: Directory for test results (default: `"locust_results"`)

## Load Test Behavior

- **Request Type**: 100% POST `/v1/chat/completions` requests
- **Wait Time**: Zero wait time between requests (continuous hammering)
- **Ramp-up**: Users spawn gradually at the specified `spawn_rate`
- **Message Content**: Static message content (configurable via `message` field)

## Output

### Headless Mode

When run in headless mode, results are saved to timestamped directories:

```
locust_results/
└── YYYYMMDD_HHMMSS/
├── report.html # HTML report with charts
├── run_metadata.json # Test configuration metadata
├── stats.csv # Request statistics
├── stats_failures.csv # Failure statistics
└── stats_history.csv # Statistics over time
```

### Web UI Mode

Real-time metrics are displayed in the web interface at http://localhost:8089, including:
- Requests per second (RPS)
- Response time percentiles (50th, 95th, 99th)
- Failure rate
- Number of users

### Troubleshooting

If you see validation errors:
- Ensure all required fields (`config_id`, `model`) are present in your YAML config
- Check that the `config_id` matches a configuration on your server
- Verify that numeric values meet minimum requirements (e.g., `users >= 1`, `spawn_rate >= 0.1`)
- Ensure `host` starts with `http://` or `https://`
15 changes: 15 additions & 0 deletions benchmark/locust/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
22 changes: 22 additions & 0 deletions benchmark/locust/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Entry point for running the Locust load test CLI as a module: python -m benchmark.locust"""

from benchmark.locust.run_locust import app

if __name__ == "__main__":
app()
18 changes: 18 additions & 0 deletions benchmark/locust/configs/local.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Example Locust load test configuration for NeMo Guardrails

# Server details
host: "http://localhost:8000"
config_id: "my-guardrails-config"
model: "meta/llama-3.3-70b-instruct"

# Load test parameters
users: 1024 # Maximum number of concurrent users
spawn_rate: 16 # Users spawned per second
run_time: 120 # Test duration in seconds

# Request configuration
message: "Hello, what can you do?"

# Output configuration
headless: true # Set to true for headless mode, false for web UI
output_base_dir: "locust_results" # Directory for test results
82 changes: 82 additions & 0 deletions benchmark/locust/locust_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Pydantic models for Locust load test configuration validation.
"""

from pathlib import Path

from pydantic import BaseModel, Field, field_validator


class LocustConfig(BaseModel):
"""Configuration for a Locust load-test run"""

# Server details
host: str = Field(
default="http://localhost:8000",
description="Base URL of the NeMo Guardrails server to test",
)
config_id: str = Field(..., description="Guardrails configuration ID to use")
model: str = Field(..., description="Model name to use in requests")

# Load test parameters
users: int = Field(
default=256,
ge=1,
description="Maximum number of concurrent users",
)
spawn_rate: float = Field(
default=10,
ge=0.1,
description="Rate at which users are spawned (users/second)",
)
run_time: int = Field(
default=60,
ge=1,
description="Test duration in seconds",
)

# Request configuration
message: str = Field(
default="Hello, what can you do?",
description="Message content to send in chat completion requests",
)

# Output configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_output_base_path() is never called anywhere could be removed.

-
-    def get_output_base_path(self) -> Path:
-        """Get the base output directory as a Path object."""
-        return Path(self.output_base_dir)

headless: bool = Field(
default=True,
description="Run in headless mode without web UI",
)

output_base_dir: str = Field(
default="locust_results",
description="Base directory for load test results",
)

@field_validator("host")
@classmethod
def validate_host(cls, v: str) -> str:
"""Ensure host starts with http:// or https://"""
if not v.startswith(("http://", "https://")):
raise ValueError("Host must start with http:// or https://")
# Remove trailing slash if present
return v.rstrip("/")

def get_output_base_path(self) -> Path:
"""Get the base output directory as a Path object."""
return Path(self.output_base_dir)
73 changes: 73 additions & 0 deletions benchmark/locust/locustfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Locust load test file for NeMo Guardrails OpenAI-compatible server.

This file defines the load test behavior. It can be run directly with:
locust -f locustfile.py --host http://localhost:8000

Or via the Typer CLI wrapper:
python -m benchmark.locust run --config-file config.yaml
"""

import os

from locust import HttpUser, constant, task


class GuardrailsUser(HttpUser):
"""
Simulated user that continuously sends chat completion requests to the
NeMo Guardrails server.

Each user will continuously send requests with no wait time between them
(continuous hammering). The load is distributed such that 99% of requests
go to chat completions.
Comment on lines +37 to +39
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a question: isn't there only one @task, making it 100%?

"""

# No wait time between requests (continuous hammering)
wait_time = constant(0)

def on_start(self):
"""Called when a simulated user starts.
Uses environment variables to pass the Guardrails config_id, model, and message"""
# Get configuration from environment variables set by the CLI wrapper
self.config_id = os.getenv("LOCUST_CONFIG_ID", "default")
self.model = os.getenv("LOCUST_MODEL", "mock-llm")
self.message = os.getenv("LOCUST_MESSAGE", "Hello, what can you do?")

@task
def chat_completion(self):
"""
Send a Guardrails chat completion request (/v1/chat/completions)
"""
payload = {
"model": self.model,
"messages": [{"role": "user", "content": self.message}],
"guardrails": {"config_id": self.config_id},
}

with self.client.post(
"/v1/chat/completions",
json=payload,
catch_response=True,
name="/v1/chat/completions",
) as response:
if response.status_code == 200:
response.success()
else:
response.failure(f"Got status code {response.status_code}: {response.text}")
Loading