Quota Scheduler Reference
The Quota Scheduler is used to schedule requests based on importance while ensuring that the application adheres to third-party API rate limits or inter-service API quotas.
This diagram illustrates the working of a quota scheduler.
The Quota Scheduler can be thought of as a combination of a
Scheduler and a Rate Limiter. It
essentially provides scheduling capabilities atop a Rate Limiter. In the
policy circuit, this component takes the same input ports as a Rate Limiter,
bucket_capacity. These ports facilitate adjustment of
the global token bucket, which can be used to model an API quota or rate limit.
The token bucket is used as a shared ledger for Agents in an
agent group. This ledger records the total
available tokens that can be distributed across the Agents.
In a scenario where the token fill rate and bucket capacity (API quota) is known upfront, the Quota Scheduler becomes particularly beneficial to enforce client-side rate limits. The tokens represent a fixed quota that is divided among the Agents. Each agent has access to this global ledger and consumes tokens from it when admitting requests. If the ledger runs out of tokens, new requests are queued until more tokens become available or until timeout.
The Quota Scheduler also allows the definition of workloads, a property of the scheduler, which allows for strategic prioritization of requests when faced with quota constraints. As a result, the Quota Scheduler ensures adherence to the API's rate limits and simultaneously offers a mechanism to prioritize requests based on their importance.