Schedulers provide a mechanism for throttling and scheduling requests based on importance when service resources are limited. The throttling is achieved through token buckets. To gain admittance, each request must obtain tokens from the bucket. When tokens are depleted, incoming requests enter a queue, awaiting admittance based on a weighted fair queuing algorithm. This algorithm ensures equitable resource allocation across workloads, factoring in the priority and weight (tokens) of each request.
This diagram illustrates the working of a scheduler for workload prioritization.
Aperture offers two variants of scheduler: Load Scheduler and Quota Scheduler. While both use the same weighted fair queuing-based scheduling algorithm, they differ in the throttling mechanism by employing distinct types of token buckets. The Load Scheduler uses a token bucket local to each agent, which gets adjusted based on the past token rate at the agent. This is useful for service protection scenarios since it provides a robust mechanism to relatively adjust the token rate. The Quota Scheduler, uses a centralized token bucket within an agent group. This is useful for scenarios involving known limits, such as third-party API rate limits or inter-service API quotas.
Workloads are groups of requests based on common Flow Labels. Workloads are expressed by label matcher rules in the Scheduler definition. Aperture Agents schedule workloads based on their priorities and tokens.
Priority represents the importance of a request compared to the other requests in the queue. It varies from 0 to an unlimited positive integer, indicating the urgency level, with higher numbers denoting higher priority. The position of a flow in the queue is computed based on its virtual finish time using the following formula:
To manage prioritized requests, the scheduler seeks tokens from the token
bucket. If tokens are available, the request gets admitted. In cases where
tokens are not readily available, requests are queued, waiting either until
tokens become accessible or until a timeout occurs - the latter being dependent
on the workload or
flowcontrol.v1.Check call timeout.
Tokens represent the cost for admitting a specific request. Typically, tokens are based on the estimated response time of a request. Estimating the number of tokens for each request within a workload is critical for making effective flow control decisions.
Aperture can automatically estimate the tokens for each workload based on
historical latency measurements. See the
more details. The latency based token calculation is aligned with
Little's Law, which relates
response times, arrival rate, and the system concurrency (number of in-flight
Alternatively, tokens can also be represented as the number of requests instead of response times. For example, when scheduling access to external APIs that have strict rate limits (global quota). In this case, the number of tokens represents the number of requests that can be made to the API within a given time window.
Tokens are determined in the following order of precedence:
- Specified in the flow labels.
- Estimated tokens (see
- Specified in the
The queue timeout is determined by the gRPC timeout provided on the
flowcontrol.v1.Check call. When a request is made, it
includes a timeout value that specifies the maximum duration the request can
wait in the queue. If the request receives the necessary tokens within this
timeout duration, it is admitted. Otherwise, if the timeout expires before the
tokens are available, the request is rejected.
The gRPC timeout on the
flowcontrol.v1.Check call is set
in the Envoy filter and the SDK during initialization. It serves as an upper
bound on the queue timeout, preventing requests from waiting excessively long.