Skip to content

feat: implement traffic-based autoscaling (RPS) for API using Prometheus/KEDA #34

@dhruvkshah75

Description

@dhruvkshah75

Feature Request

Currently, our API scaling is either static or based on CPU utilization. CPU is not always an accurate proxy for traffic load, especially for I/O-bound requests (waiting on DB/Network).

I want to implement Traffic-Based Autoscaling to ensure low latency during traffic spikes. The system should scale the number of API pods based on the incoming Requests Per Second (RPS).

Acceptance Criteria

  • Expose Prometheus metrics from the FastAPI application (using prometheus-fastapi-instrumentator).
  • Deploy Prometheus in the cluster to scrape these metrics.
  • Configure a KEDA ScaledObject (or HPA Custom Metric) to target the API deployment.
  • Scaling Rule: Target 100 requests per second per pod.
    • If traffic > 100 RPS, scale UP.
    • If traffic drops, scale DOWN.
  • Verify: Run a load test (e.g., with k6 or locust) to verify pods increase as traffic increases.

Technical Implementation Details

  • Metric Source: Prometheus Query (rate(http_requests_total[2m]))
  • Scaler: KEDA Prometheus Scaler.
  • Min Replicas: 2 (for high availability).
  • Max Replicas: 10 (to prevent runaway costs).

Notes

This decouples our scaling logic from CPU limits, allowing the API to remain responsive even if requests are lightweight but high-volume.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions