Per-user Rate Limiting
Overview
Rate limiting is a critical strategy for managing the load on an API. By imposing restrictions on the number of requests a unique consumer can make within a specific time frame, rate limiting prevents a small set of users from monopolizing the majority of resources on a service, ensuring fair access for all API consumers.
Aperture implements this strategy through its high-performance, distributed rate limiter. This system enforces per-key limits based on fine-grained labels, thereby offering precise control over API usage. For each unique key, Aperture maintains a token bucket of a specified bucket capacity and fill rate. The fill rate dictates the sustained requests per second (RPS) permitted for a key, while transient overages over the fill rate are accommodated for brief periods, as determined by the bucket capacity.
The diagram shows how the Aperture SDK interacts with a global token bucket to determine whether to allow or reject a request. Each call decrements tokens from the bucket and if the bucket runs out of tokens, indicating that the rate limit has been reached, the incoming request is rejected. Conversely, if tokens are available in the bucket, the request is accepted. The token bucket is continually replenished at a predefined fill rate, up to the maximum number of tokens specified by the bucket capacity.
The following policy is based on the Rate Limiting blueprint.
Rate Limiting with Aperture SDK
The first step to use Aperture SDK is to import and set up Aperture Client:
- Typescript
Start the flow with StartFlow
by passing in a
Control Point and
labels necessary to determine if a request should
proceed. The function Flow.ShouldRun()
checks if the flow allows the request.
The Flow.End()
function is responsible for sending telemetry, and updating the
specified cache entry within Aperture.
- TypeScript
Configuration
This policy is based on the
Rate Limiting blueprint. It
applies a rate limiter to the awesomeFeature
and identifies unique users
by referencing the user_id
.
Each user is allowed 2
requests every 1s
(1 second) period. A burst
of up to 40
requests is allowed. This means that the user can send up to
40
requests in the first second, and then 2
requests every second
after that. The bucket gets replenished at the rate of 2
requests per
second (the fill rate).
The below values.yaml
file can be generated by following the steps in the
Installation section.
- aperturectl values.yaml
# yaml-language-server: $schema=../../../../../blueprints/rate-limiting/base/gen/definitions.json
blueprint: rate-limiting/base
uri: ../../../../../blueprints
policy:
policy_name: "static-rate-limiting"
rate_limiter:
bucket_capacity: 40
fill_amount: 2
selectors:
- control_point: "awesomeFeature"
parameters:
limit_by_label_key: "user_id"
interval: 1s
Generated Policy
apiVersion: fluxninja.com/v1alpha1
kind: Policy
metadata:
labels:
fluxninja.com/validate: "true"
name: static-rate-limiting
spec:
circuit:
components:
- flow_control:
rate_limiter:
in_ports:
bucket_capacity:
constant_signal:
value: 40
fill_amount:
constant_signal:
value: 2
out_ports:
accept_percentage:
signal_name: ACCEPT_PERCENTAGE
parameters:
interval: 1s
limit_by_label_key: user_id
request_parameters: {}
selectors:
- control_point: awesomeFeature
- decider:
in_ports:
lhs:
signal_name: ACCEPT_PERCENTAGE
rhs:
constant_signal:
value: 90
operator: gte
out_ports:
output:
signal_name: ACCEPT_PERCENTAGE_ALERT
- alerter:
in_ports:
signal:
signal_name: ACCEPT_PERCENTAGE_ALERT
parameters:
alert_name: More than 90% of requests are being rate limited
evaluation_interval: 1s
resources:
flow_control:
classifiers: []
Circuit Diagram for this policy.
Installation
Generate a values file specific to the policy. This can be achieved using the command provided below.
aperturectl blueprints values --name=rate-limiting/base --version=main --output-file=values.yaml
Apply the policy using the aperturectl
CLI or kubectl
.
- aperturectl (Aperture Cloud)
aperturectl cloud blueprints apply --values-file=values.yaml
Policy in Action
When the policy is applied at a service, no more than 2 requests per second period (after an initial burst of 40 requests) are accepted for a user.