-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
Feature Request
Currently, our API scaling is either static or based on CPU utilization. CPU is not always an accurate proxy for traffic load, especially for I/O-bound requests (waiting on DB/Network).
I want to implement Traffic-Based Autoscaling to ensure low latency during traffic spikes. The system should scale the number of API pods based on the incoming Requests Per Second (RPS).
Acceptance Criteria
- Expose Prometheus metrics from the FastAPI application (using
prometheus-fastapi-instrumentator). - Deploy Prometheus in the cluster to scrape these metrics.
- Configure a KEDA ScaledObject (or HPA Custom Metric) to target the API deployment.
- Scaling Rule: Target
100requests per second per pod.- If traffic > 100 RPS, scale UP.
- If traffic drops, scale DOWN.
- Verify: Run a load test (e.g., with k6 or locust) to verify pods increase as traffic increases.
Technical Implementation Details
- Metric Source: Prometheus Query (
rate(http_requests_total[2m])) - Scaler: KEDA Prometheus Scaler.
- Min Replicas: 2 (for high availability).
- Max Replicas: 10 (to prevent runaway costs).
Notes
This decouples our scaling logic from CPU limits, allowing the API to remain responsive even if requests are lightweight but high-volume.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed