feat: implement traffic-based autoscaling (RPS) for API using Prometheus/KEDA

### Feature Request
Currently, our API scaling is either static or based on CPU utilization. CPU is not always an accurate proxy for traffic load, especially for I/O-bound requests (waiting on DB/Network).

I want to implement **Traffic-Based Autoscaling** to ensure low latency during traffic spikes. The system should scale the number of API pods based on the incoming **Requests Per Second (RPS)**.

### Acceptance Criteria
- [ ] Expose Prometheus metrics from the FastAPI application (using `prometheus-fastapi-instrumentator`).
- [ ] Deploy **Prometheus** in the cluster to scrape these metrics.
- [ ] Configure a **KEDA ScaledObject** (or HPA Custom Metric) to target the API deployment.
- [ ] **Scaling Rule:** Target `100` requests per second per pod.
    - If traffic > 100 RPS, scale UP.
    - If traffic drops, scale DOWN.
- [ ] **Verify:** Run a load test (e.g., with k6 or locust) to verify pods increase as traffic increases.

### Technical Implementation Details
- **Metric Source:** Prometheus Query (`rate(http_requests_total[2m])`)
- **Scaler:** KEDA Prometheus Scaler.
- **Min Replicas:** 2 (for high availability).
- **Max Replicas:** 10 (to prevent runaway costs).

###  Notes
This decouples our scaling logic from CPU limits, allowing the API to remain responsive even if requests are lightweight but high-volume.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement traffic-based autoscaling (RPS) for API using Prometheus/KEDA #34

Feature Request

Acceptance Criteria

Technical Implementation Details

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: implement traffic-based autoscaling (RPS) for API using Prometheus/KEDA #34

Description

Feature Request

Acceptance Criteria

Technical Implementation Details

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions