Token Rate Limit Policy #724

Tharsanan1 · 2026-01-20T04:34:34Z

Tharsanan1
Jan 20, 2026
Collaborator

Token Rate Limit Policy

Summary

Introduce a token-ratelimit policy that limits API usage based on LLM token consumption (prompt tokens, completion tokens, total tokens), providing a simplified interface for token-based rate limiting while delegating to the existing advanced-ratelimit engine.

Motivation

Problem Statement

LLM-powered APIs need rate limiting based on token consumption rather than simple request counts. Currently, users must configure the complex advanced-ratelimit policy directly with cost extraction and multiple quotas to achieve token-based limiting.

Goals

Provide a simple, intuitive configuration for token-based rate limiting
Support independent limits for prompt, completion, and total tokens
Allow multiple time-window limits (e.g., 10K/min AND 500K/hour)
Reuse existing rate limiting infrastructure (algorithms, backends)

Non-Goals

Model-specific multipliers or weights
Cost/budget-based limiting

Proposed Solution

Architecture

Use the delegation pattern (like basic-ratelimit):

Expose a simplified, token-focused configuration
Transform to advanced-ratelimit quotas internally
Reuse GCRA/Fixed-Window algorithms and memory/Redis backends

User-Facing Configuration

# Simple: Total tokens only
totalTokens:
  limits:
    - limit: 100000
      duration: "1h"
  jsonPath: "$.usage.total_tokens"

# Granular: Separate limits
promptTokens:
  limits:
    - limit: 50000
      duration: "1h"
  jsonPath: "$.usage.prompt_tokens"

completionTokens:
  limits:
    - limit: 30000
      duration: "1h"
  jsonPath: "$.usage.completion_tokens"

Key Features

Feature	Description
3 Token Types	Prompt, completion, and total tokens
Multi-window	Multiple limit/duration pairs per type
Computed Total	Auto-compute total from prompt + completion if no jsonPath
Extraction Failure Handling	Configurable: skip, default value, or reject(this is not supported by advanced ratelimit we might need to improve the adavanced ratelimit for this)
Auto Key Extraction	Based on attachment level (API vs Route)

References

gateway/policies/advanced-ratelimit/ - Core rate limiting engine
gateway/policies/basic-ratelimit/ - Similar delegation pattern example

renuka-fernando · 2026-01-20T05:49:21Z

renuka-fernando
Jan 20, 2026
Collaborator

To support multi models, multi LLM providers, I guess the better UX is to get them from the LLM provider template. So the API developer doesn't need to specify the JSON path.

cc: @Arshardh, @nimsara66, @O-sura, @malinthaprasan

0 replies

Krishanx92 · 2026-01-20T10:17:29Z

Krishanx92
Jan 20, 2026
Collaborator

+1 to remove JSON path from user params.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Rate Limit Policy #724

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Token Rate Limit Policy #724

Uh oh!

Tharsanan1 Jan 20, 2026 Collaborator

Token Rate Limit Policy

Summary

Motivation

Problem Statement

Goals

Non-Goals

Proposed Solution

Architecture

User-Facing Configuration

Key Features

References

Replies: 2 comments

Uh oh!

renuka-fernando Jan 20, 2026 Collaborator

Uh oh!

Krishanx92 Jan 20, 2026 Collaborator

Tharsanan1
Jan 20, 2026
Collaborator

renuka-fernando
Jan 20, 2026
Collaborator

Krishanx92
Jan 20, 2026
Collaborator