-
Notifications
You must be signed in to change notification settings - Fork 598
feat(benchmark): Add Locust stress-test #1629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tgasser-nv
wants to merge
10
commits into
develop
Choose a base branch
from
feat/add-locust-load-test
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
12015ff
Initial checkin of locust code and models
tgasser-nv 86ecc8c
Add tests and updated README
tgasser-nv d8cf48f
Update readme and clean up logging
tgasser-nv b0632ca
Revert concurrency changes in Guardrails
tgasser-nv 751e9cb
Clean up service health check
tgasser-nv eb000fa
Update locust run script and tests
tgasser-nv a1ef1b4
Final cleanups on locust script and README
tgasser-nv c6d3cbc
Remove inconsistency in docs now run_time can't be null
tgasser-nv 2a1e75e
Removed null run_time description in local.yaml file
tgasser-nv d0ab724
Add timeout and JSON decode exception catching
tgasser-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Locust Load Testing for NeMo Guardrails | ||
|
|
||
| This directory contains a Locust-based load testing framework for the NeMo Guardrails OpenAI-compatible server. | ||
|
|
||
| ## Introduction | ||
|
|
||
| The [Locust](https://locust.io/) stress-testing tool ramps up concurrent users making API calls to the `/v1/chat/completions` endpoint of an OpenAI-compatible LLM with configurable parameters. | ||
| This complements [ai-perf](https://github.com/ai-dynamo/aiperf), which measures steady-state performance. Locust instead focuses on ramping up load potentially beyond what a system can handle, and measure how gracefully it degrades under higher-than-expected load. | ||
|
|
||
| ## Getting Started | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| These steps have been tested with Python 3.11.11. | ||
|
|
||
| 1. **Create a virtual environment in which to install Locust and other benchmarking tools** | ||
|
|
||
| ```bash | ||
| $ mkdir ~/env | ||
| $ python -m venv ~/env/benchmark_env | ||
| ``` | ||
|
|
||
| 2. **Activate environment and install dependencies in the virtual environment** | ||
|
|
||
| ```bash | ||
| $ source ~/env/benchmark_env/bin/activate | ||
| (benchmark_env) $ pip install -r benchmark/requirements.txt | ||
| ``` | ||
|
|
||
| ## Running Benchmarks | ||
|
|
||
| The Locust benchmarks uses YAML configuration file to configure load-testing parameters. | ||
| To get started and load-test a model hosted at `http://localhost:8000`, use the following command. | ||
| Set `headless: false` in your YAML config to use Locust's interactive web UI. Then open http://localhost:8089 to control the test and view real-time metrics. | ||
|
|
||
| ```bash | ||
| (benchmark_env) $ python -m benchmark.locust benchmark/locust/configs/local.yaml | ||
| ``` | ||
|
|
||
| ### CLI Options | ||
|
|
||
| The `benchmark.locust` CLI supports the following options: | ||
|
|
||
| ```bash | ||
| python -m benchmark.locust [OPTIONS] CONFIG_FILE | ||
| ``` | ||
|
|
||
| **Arguments:** | ||
| - `CONFIG_FILE`: Path to YAML configuration file (required) | ||
|
|
||
| **Options:** | ||
| - `--dry-run`: Print commands without executing them | ||
| - `--verbose`: Enable verbose logging and debugging information | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| All configuration is done via YAML files. The following fields are supported: | ||
|
|
||
| ### Required Fields | ||
|
|
||
| - `config_id`: Guardrails configuration ID to use | ||
| - `model`: Model name to send in requests | ||
|
|
||
| ### Optional Fields | ||
|
|
||
| - `host`: Server base URL (default: `http://localhost:8000`) | ||
| - `users`: Maximum concurrent users (default: `256`, minimum: `1`) | ||
| - `spawn_rate`: Users spawned per second (default: `10`, minimum: `0.1`) | ||
| - `run_time`: Test duration in seconds (default: `60`, minimum: `1`) | ||
| - `message`: Message content to send (default: `"Hello, what can you do?"`) | ||
| - `headless`: Run without web UI (default: `true`) | ||
| - `output_base_dir`: Directory for test results (default: `"locust_results"`) | ||
|
|
||
| ## Load Test Behavior | ||
|
|
||
| - **Request Type**: 100% POST `/v1/chat/completions` requests | ||
| - **Wait Time**: Zero wait time between requests (continuous hammering) | ||
| - **Ramp-up**: Users spawn gradually at the specified `spawn_rate` | ||
| - **Message Content**: Static message content (configurable via `message` field) | ||
|
|
||
| ## Output | ||
|
|
||
| ### Headless Mode | ||
|
|
||
| When run in headless mode, results are saved to timestamped directories: | ||
|
|
||
| ``` | ||
| locust_results/ | ||
| └── YYYYMMDD_HHMMSS/ | ||
| ├── report.html # HTML report with charts | ||
| ├── run_metadata.json # Test configuration metadata | ||
| ├── stats.csv # Request statistics | ||
| ├── stats_failures.csv # Failure statistics | ||
| └── stats_history.csv # Statistics over time | ||
| ``` | ||
|
|
||
| ### Web UI Mode | ||
|
|
||
| Real-time metrics are displayed in the web interface at http://localhost:8089, including: | ||
| - Requests per second (RPS) | ||
| - Response time percentiles (50th, 95th, 99th) | ||
| - Failure rate | ||
| - Number of users | ||
|
|
||
| ### Troubleshooting | ||
|
|
||
| If you see validation errors: | ||
| - Ensure all required fields (`config_id`, `model`) are present in your YAML config | ||
| - Check that the `config_id` matches a configuration on your server | ||
| - Verify that numeric values meet minimum requirements (e.g., `users >= 1`, `spawn_rate >= 0.1`) | ||
| - Ensure `host` starts with `http://` or `https://` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| #!/usr/bin/env python3 | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| #!/usr/bin/env python3 | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """Entry point for running the Locust load test CLI as a module: python -m benchmark.locust""" | ||
|
|
||
| from benchmark.locust.run_locust import app | ||
|
|
||
| if __name__ == "__main__": | ||
| app() | ||
tgasser-nv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| # Example Locust load test configuration for NeMo Guardrails | ||
|
|
||
| # Server details | ||
| host: "http://localhost:8000" | ||
| config_id: "my-guardrails-config" | ||
| model: "meta/llama-3.3-70b-instruct" | ||
|
|
||
| # Load test parameters | ||
| users: 1024 # Maximum number of concurrent users | ||
| spawn_rate: 16 # Users spawned per second | ||
| run_time: 120 # Test duration in seconds | ||
|
|
||
| # Request configuration | ||
| message: "Hello, what can you do?" | ||
|
|
||
| # Output configuration | ||
| headless: true # Set to true for headless mode, false for web UI | ||
| output_base_dir: "locust_results" # Directory for test results |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| #!/usr/bin/env python3 | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """ | ||
| Pydantic models for Locust load test configuration validation. | ||
| """ | ||
|
|
||
| from pathlib import Path | ||
|
|
||
| from pydantic import BaseModel, Field, field_validator | ||
|
|
||
|
|
||
| class LocustConfig(BaseModel): | ||
| """Configuration for a Locust load-test run""" | ||
|
|
||
| # Server details | ||
| host: str = Field( | ||
| default="http://localhost:8000", | ||
| description="Base URL of the NeMo Guardrails server to test", | ||
| ) | ||
| config_id: str = Field(..., description="Guardrails configuration ID to use") | ||
| model: str = Field(..., description="Model name to use in requests") | ||
|
|
||
| # Load test parameters | ||
| users: int = Field( | ||
| default=256, | ||
| ge=1, | ||
| description="Maximum number of concurrent users", | ||
| ) | ||
| spawn_rate: float = Field( | ||
| default=10, | ||
| ge=0.1, | ||
| description="Rate at which users are spawned (users/second)", | ||
| ) | ||
| run_time: int = Field( | ||
| default=60, | ||
| ge=1, | ||
| description="Test duration in seconds", | ||
| ) | ||
|
|
||
| # Request configuration | ||
| message: str = Field( | ||
| default="Hello, what can you do?", | ||
| description="Message content to send in chat completion requests", | ||
| ) | ||
|
|
||
| # Output configuration | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. get_output_base_path() is never called anywhere could be removed. -
- def get_output_base_path(self) -> Path:
- """Get the base output directory as a Path object."""
- return Path(self.output_base_dir) |
||
| headless: bool = Field( | ||
| default=True, | ||
| description="Run in headless mode without web UI", | ||
| ) | ||
|
|
||
| output_base_dir: str = Field( | ||
| default="locust_results", | ||
| description="Base directory for load test results", | ||
| ) | ||
|
|
||
| @field_validator("host") | ||
| @classmethod | ||
| def validate_host(cls, v: str) -> str: | ||
| """Ensure host starts with http:// or https://""" | ||
| if not v.startswith(("http://", "https://")): | ||
| raise ValueError("Host must start with http:// or https://") | ||
| # Remove trailing slash if present | ||
| return v.rstrip("/") | ||
|
|
||
| def get_output_base_path(self) -> Path: | ||
| """Get the base output directory as a Path object.""" | ||
| return Path(self.output_base_dir) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| #!/usr/bin/env python3 | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """ | ||
| Locust load test file for NeMo Guardrails OpenAI-compatible server. | ||
|
|
||
| This file defines the load test behavior. It can be run directly with: | ||
| locust -f locustfile.py --host http://localhost:8000 | ||
|
|
||
| Or via the Typer CLI wrapper: | ||
| python -m benchmark.locust run --config-file config.yaml | ||
| """ | ||
|
|
||
| import os | ||
|
|
||
| from locust import HttpUser, constant, task | ||
|
|
||
|
|
||
| class GuardrailsUser(HttpUser): | ||
| """ | ||
| Simulated user that continuously sends chat completion requests to the | ||
| NeMo Guardrails server. | ||
|
|
||
| Each user will continuously send requests with no wait time between them | ||
| (continuous hammering). The load is distributed such that 99% of requests | ||
| go to chat completions. | ||
|
Comment on lines
+37
to
+39
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a question: isn't there only one |
||
| """ | ||
|
|
||
| # No wait time between requests (continuous hammering) | ||
| wait_time = constant(0) | ||
|
|
||
| def on_start(self): | ||
| """Called when a simulated user starts. | ||
| Uses environment variables to pass the Guardrails config_id, model, and message""" | ||
| # Get configuration from environment variables set by the CLI wrapper | ||
| self.config_id = os.getenv("LOCUST_CONFIG_ID", "default") | ||
| self.model = os.getenv("LOCUST_MODEL", "mock-llm") | ||
| self.message = os.getenv("LOCUST_MESSAGE", "Hello, what can you do?") | ||
|
|
||
| @task | ||
| def chat_completion(self): | ||
| """ | ||
| Send a Guardrails chat completion request (/v1/chat/completions) | ||
| """ | ||
| payload = { | ||
| "model": self.model, | ||
| "messages": [{"role": "user", "content": self.message}], | ||
| "guardrails": {"config_id": self.config_id}, | ||
| } | ||
|
|
||
| with self.client.post( | ||
| "/v1/chat/completions", | ||
| json=payload, | ||
| catch_response=True, | ||
| name="/v1/chat/completions", | ||
| ) as response: | ||
| if response.status_code == 200: | ||
| response.success() | ||
| else: | ||
| response.failure(f"Got status code {response.status_code}: {response.text}") | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.