Skip to main content
Version: 2.16.0

Rate Limiting


The following policy is based on the Rate Limiting blueprint.


Rate limiting is a critical strategy for managing the load on an API. By imposing restrictions on the number of requests a unique consumer can make within a specific time frame, rate limiting prevents a small set of users from monopolizing the majority of resources on a service, ensuring fair access for all API consumers.

Aperture implements this strategy through its high-performance, distributed rate limiter. This system enforces per-key limits based on fine-grained labels, thereby offering precise control over API usage. For each unique key, Aperture maintains a token bucket of a specified bucket capacity and fill rate. The fill rate dictates the sustained requests per second (RPS) permitted for a key, while transient overages over the fill rate are accommodated for brief periods, as determined by the bucket capacity.

This intricate system of rate-limiting plays a pivotal role in maintaining the integrity of a service. It effectively safeguards against excessive usage that could potentially result in API abuse, while simultaneously ensuring optimal performance and resource allocation.

flowchart LR classDef TokenBucket fill:#F8773D, stroke:#000000,stroke-width:2px; classDef Agent fill:#56AE89,stroke:#000000,stroke-width:2px; classDef Signal fill:#EFEEED,stroke:#000000,stroke-width:1px; classDef Service fill:#56AE89,stroke:#000000,stroke-width:2px; Forward("Bucket Capacity") --> TB Reset("Fill Amount") --> TB TB[\Token Bucket/] class TB TokenBucket TB <-- "Counting" --> Agents subgraph " " Client -- "req/s" --> Agents class Client Service subgraph "Agents" end class Agents Agent Agents --> Server Agents --> Server class Server Service end

The diagram depicts the distribution of tokens across Agents through a global token bucket. Each incoming request prompts the Agents to decrement tokens from the bucket. If the bucket has run out of tokens, indicating that the rate limit has been reached, the incoming request is rejected. Conversely, if tokens are available in the bucket, the request is accepted. The token bucket is continually replenished at a predefined fill rate, up to the maximum number of tokens specified by the bucket capacity.

Example Scenario

Consider a social media platform that implements rate limits to prevent abuse of its APIs. Each user of the platform gets identified by a unique key and assigned a specific rate limit, controlled by the Aperture's distributed rate limiter. For instance, the platform might allow a user to make a certain number of requests per minute to post content, retrieve posts, or send messages.