Quota Scheduler Reference
The Quota Scheduler is used to schedule requests based on importance while ensuring that the application adheres to third-party API rate limits or inter-service API quotas.
The Quota Scheduler can be thought of as a combination of a
Scheduler and a Rate Limiter. It
essentially provides scheduling capabilities atop a Rate Limiter. In the
policy circuit, this component takes the same input ports as a Rate Limiter,
bucket_capacity. These ports facilitate adjustment of
the global token bucket, which can be used to model an API quota or rate limit.
The token bucket represents a fixed quota that is divided among the Agents. It is used as a shared ledger for Agents in an agent group. This ledger records the total available tokens that can be distributed across the Agents. Tokens are consumed from it when admitting requests. If the ledger runs out of tokens, new requests are queued until more tokens become available or until timeout.
In a scenario where the token fill rate and bucket capacity (API quota) is known upfront, the Quota Scheduler becomes particularly beneficial to enforce client-side rate limits.
The Quota Scheduler also allows the definition of workloads, a property of the scheduler, which allows for strategic prioritization of requests when faced with quota constraints. As a result, the Quota Scheduler ensures adherence to the API's rate limits and simultaneously offers a mechanism to prioritize requests based on their importance.