Per-user Rate Limiting
Overview
For implementing rate limiting using Aperture SDKs refer to the developer-centric Rate Limiting Guide.
Rate limiting is a critical strategy for managing the load on an API. By imposing restrictions on the number of requests a unique consumer can make within a specific time frame, rate limiting prevents a small set of users from monopolizing the majority of resources on a service, ensuring fair access for all API consumers.
Aperture implements this strategy through its high-performance, distributed rate limiter. This system enforces per-key limits based on fine-grained labels, thereby offering precise control over API usage. For each unique key, Aperture maintains a token bucket of a specified bucket capacity and fill rate. The fill rate dictates the sustained requests per second (RPS) permitted for a key, while transient overages over the fill rate are accommodated for brief periods, as determined by the bucket capacity.
The diagram depicts the distribution of tokens across Agents through a global token bucket. Each incoming request prompts the Agents to decrement tokens from the bucket. If the bucket has run out of tokens, indicating that the rate limit has been reached, the incoming request is rejected. Conversely, if tokens are available in the bucket, the request is accepted. The token bucket is continually replenished at a predefined fill rate, up to the maximum number of tokens specified by the bucket capacity.
The following policy is based on the Rate Limiting blueprint.
Configuration
This policy is based on the
Rate Limiting blueprint. It
applies a rate limiter to the ingress
control point on the service
catalog-service.prod.svc.cluster.local
and identifies unique users by
referencing the user_id
header present in the HTTP traffic. Provided by
the Envoy proxy, this header can be located under the label key
http.request.header.user_id
(see Flow Labels
for more information).
Each user is allowed 2
requests every 1s
(1 second) period. A burst
of up to 40
requests is allowed. This means that the user can send up to
40
requests in the first second, and then 2
requests every second
after that. The bucket gets replenished at the rate of 2
requests per
second (the fill rate).
The below values.yaml
file can be generated by following the steps in the
Installation section.
- aperturectl values.yaml
# yaml-language-server: $schema=../../../../../blueprints/rate-limiting/base/gen/definitions.json
blueprint: rate-limiting/base
uri: ../../../../../../blueprints
policy:
policy_name: "static-rate-limiting"
rate_limiter:
bucket_capacity: 40
fill_amount: 2
selectors:
- service: "catalog-service.prod.svc.cluster.local"
control_point: "ingress"
agent_group: "default"
parameters:
limit_by_label_key: "http.request.header.user_id"
interval: 1s
Generated Policy
Circuit Diagram for this policy.
Installation
Generate a values file specific to the policy. This can be achieved using the command provided below.
aperturectl blueprints values --name=rate-limiting/base --version=v2.32.2 --output-file=values.yaml
Apply the policy using the aperturectl
CLI or kubectl
.
- aperturectl (Aperture Cloud)
- aperturectl (self-hosted controller)
- kubectl (self-hosted controller)
aperturectl cloud blueprints apply --values-file=values.yaml
Pass the --kube
flag with aperturectl
to directly apply the generated policy
on a Kubernetes cluster in the namespace where the Aperture Controller is
installed.
aperturectl blueprints generate --values-file=values.yaml --output-dir=policy-gen
aperturectl apply policy --file=policy-gen/policies/static-rate-limiting.yaml --kube
Apply the generated policy YAML (Kubernetes Custom Resource) with kubectl
.
aperturectl blueprints generate --values-file=values.yaml --output-dir=policy-gen
kubectl apply -f policy-gen/policies/static-rate-limiting-cr.yaml -n aperture-controller
Policy in Action
When the policy is applied at a service, no more than 2 requests per second period (after an initial burst of 40 requests) are accepted for a user.