Skip to main content
Version: development

Per-user Rate Limiting

Overview

Note to Developers

For implementing rate limiting using Aperture SDKs refer to the developer-centric Rate Limiting Guide.

Rate limiting is a critical strategy for managing the load on an API. By imposing restrictions on the number of requests a unique consumer can make within a specific time frame, rate limiting prevents a small set of users from monopolizing the majority of resources on a service, ensuring fair access for all API consumers.

Aperture implements this strategy through its high-performance, distributed rate limiter. This system enforces per-key limits based on fine-grained labels, thereby offering precise control over API usage. For each unique key, Aperture maintains a token bucket of a specified bucket capacity and fill rate. The fill rate dictates the sustained requests per second (RPS) permitted for a key, while transient overages over the fill rate are accommodated for brief periods, as determined by the bucket capacity.

flowchart LR classDef TokenBucket fill:#F8773D, stroke:#000000,stroke-width:2px; classDef Agent fill:#56AE89,stroke:#000000,stroke-width:2px; classDef Signal fill:#EFEEED,stroke:#000000,stroke-width:1px; classDef Service fill:#56AE89,stroke:#000000,stroke-width:2px; Forward("Bucket Capacity") --> TB Reset("Fill Amount") --> TB TB[\Token Bucket/] class TB TokenBucket TB <-- "Counting" --> Agents subgraph " " Client -- "req/s" --> Agents class Client Service subgraph "Agents" end class Agents Agent Agents --> Server Agents --> Server class Server Service end

The diagram depicts the distribution of tokens across Agents through a global token bucket. Each incoming request prompts the Agents to decrement tokens from the bucket. If the bucket has run out of tokens, indicating that the rate limit has been reached, the incoming request is rejected. Conversely, if tokens are available in the bucket, the request is accepted. The token bucket is continually replenished at a predefined fill rate, up to the maximum number of tokens specified by the bucket capacity.

note

The following policy is based on the Rate Limiting blueprint.

Configuration

This policy is based on the Rate Limiting blueprint. It applies a rate limiter to the ingress control point on the service catalog-service.prod.svc.cluster.local and identifies unique users by referencing the user_id header present in the HTTP traffic. Provided by the Envoy proxy, this header can be located under the label key http.request.header.user_id (see Flow Labels for more information).

Each user is allowed 2 requests every 1s (1 second) period. A burst of up to 40 requests is allowed. This means that the user can send up to 40 requests in the first second, and then 2 requests every second after that. The bucket gets replenished at the rate of 2 requests per second (the fill rate).

The below values.yaml file can be generated by following the steps in the Installation section.

# yaml-language-server: $schema=../../../../../blueprints/rate-limiting/base/gen/definitions.json
blueprint: rate-limiting/base
uri: ../../../../../../blueprints
policy:
policy_name: "static-rate-limiting"
rate_limiter:
bucket_capacity: 40
fill_amount: 2
selectors:
- service: "catalog-service.prod.svc.cluster.local"
control_point: "ingress"
agent_group: "default"
parameters:
limit_by_label_key: "http.request.header.user_id"
interval: 1s

Generated Policy

apiVersion: fluxninja.com/v1alpha1
kind: Policy
metadata:
labels:
fluxninja.com/validate: "true"
name: static-rate-limiting
spec:
circuit:
components:
- flow_control:
rate_limiter:
in_ports:
bucket_capacity:
constant_signal:
value: 40
fill_amount:
constant_signal:
value: 2
out_ports:
accept_percentage:
signal_name: ACCEPT_PERCENTAGE
parameters:
interval: 1s
limit_by_label_key: http.request.header.user_id
request_parameters: {}
selectors:
- agent_group: default
control_point: ingress
service: catalog-service.prod.svc.cluster.local
- decider:
in_ports:
lhs:
signal_name: ACCEPT_PERCENTAGE
rhs:
constant_signal:
value: 90
operator: gte
out_ports:
output:
signal_name: ACCEPT_PERCENTAGE_ALERT
- alerter:
in_ports:
signal:
signal_name: ACCEPT_PERCENTAGE_ALERT
parameters:
alert_name: More than 90% of requests are being rate limited
evaluation_interval: 1s
resources:
flow_control:
classifiers: []

info

Circuit Diagram for this policy.

Installation

Generate a values file specific to the policy. This can be achieved using the command provided below.

aperturectl blueprints values --name=rate-limiting/base --version=main --output-file=values.yaml

Apply the policy using the aperturectl CLI or kubectl.

aperturectl cloud blueprints apply --values-file=values.yaml

Policy in Action

When the policy is applied at a service, no more than 2 requests per second period (after an initial burst of 40 requests) are accepted for a user.

Static Rate Limiting