Latency AIMD Concurrency Limiting Policy
Introduction
This policy detects overloads/cascading failures by comparing the real-time latency with it's exponential moving average. Gradient controller is then used to calculate a proportional response that limits the accepted concurrency. Concurrency is increased additively when the overload is no longer detected.
AIMD stands for Additive Increase, Multiplicative Decrease. That is, the concurrency is reduced by a multiplicative factor when the service is overloaded and increased by an additive factor when the service is no longer overloaded.
Please see reference for the
AIMDConcurrencyController
component that is used within this blueprint.
See tutorials on Basic Concurrency Limiting and Workload Prioritization to see this blueprint in use.
Configuration
Code: policies/latency-aimd-concurrency-limiting
Common
Parameter | common.policy_name |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Description | Name of the policy. |
Policy
Parameter | policy.flux_meter |
Type | aperture.spec.v1.FluxMeter |
Default Value | {'flow_selector': {'flow_matcher': {'control_point': '__REQUIRED_FIELD__'}, 'service_selector': {'service': '__REQUIRED_FIELD__'}}} |
Description | Flux Meter. |
Parameter | policy.flux_meter.flow_selector.service_selector.service |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Description | Service Name. |
Parameter | policy.flux_meter.flow_selector.flow_matcher.control_point |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Description | Control Point Name. |
Parameter | policy.classifiers |
Type | []aperture.spec.v1.Classifier |
Default Value | [] |
Description | List of classification rules. |
Parameter | policy.components |
Type | []aperture.spec.v1.Component |
Default Value | [] |
Description | List of additional circuit components. |
Latency Baseliner
Parameter | policy.latency_baseliner.ema |
Type | aperture.spec.v1.EMAParameters |
Default Value | {'correction_factor_on_max_envelope_violation': 0.95, 'ema_window': '1500s', 'warmup_window': '60s'} |
Description | EMA parameters. |
Parameter | policy.latency_baseliner.latency_tolerance_multiplier |
Type | float64 |
Default Value | 1.1 |
Description | Tolerance factor beyond which the service is considered to be in overloaded state. E.g. if EMA of latency is 50ms and if Tolerance is 1.1, then service is considered to be in overloaded state if current latency is more than 55ms. |
Parameter | policy.latency_baseliner.latency_ema_limit_multiplier |
Type | float64 |
Default Value | 2 |
Description | Current latency value is multiplied with this factor to calculate maximum envelope of Latency EMA. |
Concurrency Controller
Parameter | policy.concurrency_controller.flow_selector |
Type | aperture.spec.v1.FlowSelector |
Default Value | {'flow_matcher': {'control_point': '__REQUIRED_FIELD__'}, 'service_selector': {'service': '__REQUIRED_FIELD__'}} |
Description | Concurrency Limiter flow selector. |
Parameter | policy.concurrency_controller.flow_selector.service_selector.service |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Description | Service Name. |
Parameter | policy.concurrency_controller.flow_selector.flow_matcher.control_point |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Description | Control Point Name. |
Parameter | policy.concurrency_controller.scheduler |
Type | aperture.spec.v1.SchedulerParameters |
Default Value | {'auto_tokens': True} |
Description | Scheduler parameters. |
Parameter | policy.concurrency_controller.scheduler.auto_tokens |
Type | bool |
Default Value | true |
Description | Automatically estimate cost (tokens) for workload requests. |
Parameter | policy.concurrency_controller.gradient |
Type | aperture.spec.v1.GradientControllerParameters |
Default Value | {'max_gradient': 1, 'min_gradient': 0.1, 'slope': -1} |
Description | Gradient Controller parameters. |
Parameter | policy.concurrency_controller.alerter |
Type | aperture.spec.v1.AlerterParameters |
Default Value | {'alert_name': 'Load Shed Event'} |
Description | Whether tokens for workloads are computed dynamically or set statically by the user. |
Parameter | policy.concurrency_controller.max_load_multiplier |
Type | float64 |
Default Value | 2 |
Description | Current accepted concurrency is multiplied with this number to dynamically calculate the upper concurrency limit of a Service during normal (non-overload) state. This protects the Service from sudden spikes. |
Parameter | policy.concurrency_controller.load_multiplier_linear_increment |
Type | float64 |
Default Value | 0.0025 |
Description | Linear increment to load multiplier in each execution tick (0.5s) when the system is not in overloaded state. |
Parameter | policy.concurrency_controller.default_config |
Type | aperture.spec.v1.LoadActuatorDynamicConfig |
Default Value | {'dry_run': False} |
Description | Default configuration for concurrency controller that can be updated at the runtime without shutting down the policy. |
Dashboard
Parameter | dashboard.refresh_interval |
Type | string |
Default Value | '5s' |
Description | Refresh interval for dashboard panels. |
Parameter | dashboard.time_from |
Type | string |
Default Value | 'now-15m' |
Description | From time of dashboard. |
Parameter | dashboard.time_to |
Type | string |
Default Value | 'now' |
Description | To time of dashboard. |
Datasource
Parameter | dashboard.datasource.name |
Type | string |
Default Value | '$datasource' |
Description | Datasource name. |
Parameter | dashboard.datasource.filter_regex |
Type | string |
Default Value | '' |
Description | Datasource filter regex. |
The following configuration parameters can be dynamically configured at runtime, without reloading the policy.
Dynamic Configuration
Parameter | concurrency_controller |
Type | aperture.spec.v1.LoadActuatorDynamicConfig |
Default Value | __REQUIRED_FIELD__ |
Description | Default configuration for concurrency controller that can be updated at the runtime without shutting down the policy. |