Token Rate Limit Policy #724
Tharsanan1
started this conversation in
General
Replies: 2 comments
-
|
To support multi models, multi LLM providers, I guess the better UX is to get them from the LLM provider template. So the API developer doesn't need to specify the JSON path. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
+1 to remove JSON path from user params. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Token Rate Limit Policy
Summary
Introduce a token-ratelimit policy that limits API usage based on LLM token consumption (prompt tokens, completion tokens, total tokens), providing a simplified interface for token-based rate limiting while delegating to the existing
advanced-ratelimitengine.Motivation
Problem Statement
LLM-powered APIs need rate limiting based on token consumption rather than simple request counts. Currently, users must configure the complex
advanced-ratelimitpolicy directly with cost extraction and multiple quotas to achieve token-based limiting.Goals
Non-Goals
Proposed Solution
Architecture
Use the delegation pattern (like
basic-ratelimit):advanced-ratelimitquotas internallyUser-Facing Configuration
Key Features
References
gateway/policies/advanced-ratelimit/- Core rate limiting enginegateway/policies/basic-ratelimit/- Similar delegation pattern exampleBeta Was this translation helpful? Give feedback.
All reactions