Skip to main content
Version: development

Rate Limiter

The Rate Limiter component can be used to prevent recurring overloads by proactively regulating heavy-hitters. It achieves this by accepting or rejecting incoming flows based on per-label limits, which are configured using the token bucket algorithm.

The Rate Limiter is a component of Aperture's policy system, and it can be configured to work with different labels and limits depending on the needs of an application.

The following example creates a Rate Limiter at the ingress control point for service checkout.default.svc.cluster.local. A rate limit of 2 requests per second with a burst capacity of 40 is applied per unique value of http.request.header.user_id flow label:

- flow_control:
value: 40
value: 2
interval: 1s
label_key: http.request.header.user_id
- control_point: ingress
service: checkout.default.svc.cluster.local

Distributed Counters

For each configured Rate Limiter Component, every matching Aperture Agent instantiates a copy of the Rate Limiter. Although each agent has its own copy of the component, they all share counters through a distributed cache. This means that they work together as a single Rate Limiter, providing seamless coordination and control across Agents. The Agents within an agent group constantly share state and detect failures using a gossip protocol.

Token Bucket Algorithm

This algorithm allows users to run a substantial number of requests in bursts, and then continue at a steady rate. Here are the key points to understand about the token bucket algorithm:

  • Each user (or any flow label) has access to a bucket, which can hold, say, 60 "tokens".
  • Every second, a token is added to the bucket (if there's room). In this way, the bucket is steadily refilled over time.
  • Each API request requires the user to remove a token from the bucket.
  • If the bucket is empty, the user gets an error and has to wait for new tokens to be added to the bucket before making more requests.

This model ensures that apps that handle API calls judiciously will always have a supply of tokens for a burst of requests when necessary. For example, if users average 20 requests ("tokens") per second but suddenly need to make 30 requests at once, users can do so if they have accumulated enough tokens.

Lazy Syncing

When lazy syncing is enabled, rate-limiting counters are stored in-memory and are only synchronized between Aperture Agent instances on-demand. This allows for fast and low-latency rate-limiting decisions, at the cost of slight inaccuracy within a (small) time window (sync interval).


The Rate Limiter component accepts or rejects incoming flows based on per-label limits, configured as the maximum number of requests per a given period of time. The rate-limiting label is chosen from the flow-label with a specific key, enabling distinct limits per user as identified by unique values of the label.


The limit value is provided as a signal within the circuit. It can be set dynamically based on the circuit's logic.