Consul capacity planning

This page describes our capacity planning recommendations when deploying and maintaining a Consul cluster in production. When your organization designs a production environment, you should consider your available resources and their impact on network capacity.

Introduction

It is important to select the correct size for your server instances. Consul server environments have a standard set of minimum requirements. However, these requirements may vary depending on what you are using Consul for.

Insufficient resource allocations may cause network issues or degraded performance in general. When a slowdown in performance results in a Consul leader node that is unable to respond to requests in sufficient time, the Consul cluster triggers a new leader election. Consul pauses all network requests and Raft updates until the election ends.

Hardware requirements

The minimum hardware requirements for Consul servers in production clusters as recommended by the reference architecture are:

CPU	Memory	Disk Capacity	Disk IO	Disk Throughput	Avg Round-Trip-Time	99% Round-Trip-Time
8-16 core	32-64 GB RAM	200+ GB	7500+ IOPS	250+ MB/s	Lower than 50ms	Lower than 100ms

For the major cloud providers, we recommend starting with one of the following instances that meet the minimum requirements. Then scale up as needed. We also recommend avoiding "burstable" CPU and storage options where performance may drop after a consistent load.

Provider	Size	Instance/VM Types	Disk Volume Specs
AWS	Large	`m5.2xlarge`, `m5.4xlarge`	200+GB `gp3`, 10000 IOPS, 250MB/s
Azure	Large	`Standard_D8s_v3`, `Standard_D16s_v3`	2048GB `Premium SSD`, 7500 IOPS, 200MB/s
GCP	Large	`n2-standard-8`, `n2-standard-16`	1000GB `pd-ssd`, 30000 IOPS, 480MB/s

For HCP Consul Dedicated, cluster size is measured in the number of service instances supported. Find out more information in the HCP Consul Dedicated pricing page.

Workload input and output requirements

Workloads are any actions that interact with the Consul cluster. These actions consist of key/value reads and writes, service registrations and deregistrations, adding or removing Consul client agents, and more.

Input/output operations per second (IOPS) is a unit of measurement for the amount of reads and writes to non-adjacent storage locations. For high workloads, ensure that the Consul server disks support a high number of IOPS to keep up with the rapid Raft log update rate. Unlike bare-metal environments, IOPS for virtual instances in cloud environments is often tied to storage sizing. More storage GBs typically grants you more IOPS. Therefore, we recommend deploying on IOPS-optimized instances.

Consul server agents are generally I/O bound for writes and CPU bound for reads. For additional tuning recommendations, refer to raft tuning.

Memory requirements

You should allocate RAM for server agents so that they contain 2 to 4 times the working set size. You can determine the working set size of a running cluster by noting the value of consul.runtime.alloc_bytes in the leader node's telemetry data. Inspect your monitoring solution for the telemetry value, or run the following commands with the jq tool installed on your Consul leader instance.

Tip

For Kubernetes, execute the command from the leader pod. jq is available in the Consul server containers.

Set $CONSUL_HTTP_TOKEN to an ACL token with valid permissions, then retrieve the working set size.

$ curl --silent --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" http://127.0.0.1:8500/v1/agent/metrics | jq '.Gauges[] | select(.Name=="consul.runtime.alloc_bytes") | .Value'`
616017920

Kubernetes storage requirements

When you set up persistent volumes (PV) resources, you should define the correct server storage class parameter because the defaults are likely insufficient in performance. To set the storageClass Helm chart parameter, refer to the Kubernetes documentation on storageClasses for more information about your specific cloud provider.

Read and write heavy workload recommendations

In production, your use case may lead to Consul performing read-heavy workloads, write-heavy workloads, or both. Refer to the following table for specific resource recommendations for these types of workloads.

Workload type	Instance Recommendations	Workload element examples	Enterprise Feature Recommendations
Read-heavy	Instances of type `m5.4xlarge (AWS)`, `Standard_D16s_v3 (Azure)`, `n2-standard-16 (GCP)`	Raft RPCs calls, DNS queries, key/value retrieval	Read replicas
Write-heavy	IOPS performance of `10 000+`	Consul agent joins and leaves, services registration and deregistration, key/value writes	Network segments

For recommendations on troubleshooting issues with read-heavy or write-heavy workloads, refer to Consul at Scale.

Monitor performance

Monitoring is critical to ensure that your Consul datacenter has sufficient resources to continue operations. A proactive monitoring strategy helps you find problems in your network before they impact your deployments.

We recommend completing the Monitor Consul server health and performance with metrics and logs tutorial as a starting point for Consul metrics and telemetry. The following tutorials guide you through specific monitoring solutions for your Consul cluster.

Important metrics

In production environments, create baselines for your Consul cluster's metrics. After you discover the baselines, you will be able to define alerts and receive notifications when there are unexpected values. For a detailed explanation on the metrics and their values, refer to Consul Agent telemetry.

Transaction metrics

These metrics indicate how long it takes to complete write operations in various parts of the Consul cluster.

consul.kvs.apply measures the time it takes to complete an update to the KV store.
consul.txn.apply measures the time spent applying a transaction operation.
consul.raft.apply counts the number of Raft transactions applied during the measurement interval. This metric is only reported on the leader.
consul.raft.commitTime measures the time it takes to commit a new entry to the Raft log on disk on the leader.

Memory metrics

These performance indicators can help you diagnose if the current instance sizing is unable to handle the workload.

consul.runtime.alloc_bytes measures the number of bytes allocated by the Consul process.
consul.runtime.sys_bytes measures the total number of bytes of memory obtained from the OS.
consul.runtime.heap_objects measures the number of objects allocated on the heap and is a general memory pressure indicator.

Leadership metrics

Leadership changes are not a cause for concern but frequent changes may be a symptom of a deeper problem. Frequent elections or leadership changes may indicate network issues between the Consul servers, or the Consul servers are unable to keep up with the load.

consul.raft.leader.lastContact measures the time since the leader was last able to contact the follower nodes when checking its leader lease.
consul.raft.state.candidate increments whenever a Consul server starts an election.
consul.raft.state.leader increments whenever a Consul server becomes a leader.
consul.server.isLeader tracks whether a server is a leader.

Network metrics

Network activity and RPC count measurements indicate the current load created from a Consul agent, including when the load becomes high enough to be rate limited. If an unusually high RPC count occurs, you should investigate before it overloads the cluster.

consul.client.rpc increments whenever a Consul agent in client mode makes an RPC request to a Consul server.
consul.client.rpc.exceeded increments whenever a Consul agent in client mode makes an RPC request to a Consul server gets rate limited by that agent's limits configuration.
consul.client.rpc.failed increments whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.

Network constraints and alternate approaches

If it is impossible for you to allocate the required resources, you can make changes to Consul's performance so that it operates with lower speed or resilience. These changes ensure that your cluster remains within its resource capacity.

Soft limits prevent your cluster from degrading due to overload.
Raft tuning lets you compensate for unfavorable environments.

Soft limits

The recommended maximum size for a single datacenter is 5,000 Consul client agents. This recommendation is based on a standard, non-tuned environment and considers a blast radius's risk management factor. The maximum number of agents may be lower, depending on how you use Consul.

If you require more than 5,000 client agents, you should break up the single Consul datacenter into multiple smaller datacenters.

When the nodes are spread across separate physical locations such as different regions, you can model multiple datacenter structures based on physical locations.
Use network segments in a single available zone or region to lower overall resource usage in a single datacenter.

When deploying Consul in Kubernetes, we recommend you set both requests and limits in the Helm chart. Refer to the Helm chart documentation for more information.

Requests allocate the required resources for your Consul workloads.
Limits prevent your pods from being terminated and restarted if they consume more resources than requested and Kubernetes needs to reclaim these resources. Limits can prevent outage situations where the Consul leader's container gets terminated and redeployed due to resource constraints.

The following is an example Helm configuration that allocates 16 CPU cores and 64 gigabytes of memory:

global:
  image: "hashicorp/consul"
## ...
resources:
  requests:
    memory: '64G'
    cpu: '16000m'
  limits:
    memory: '64G'
    cpu: '16000m'

Raft tuning

Consul uses the Raft consensus algorithm to provide consistency. You may need to adjust Raft to suit your specific environment. Adjust the raft_multiplier configuration to define the trade-off between leader stability and time to recover from a leader failure.

A lower multiplier minimizes failure detection and election time, but it may trigger frequently in high latency situations.
A higher multiplier reduces the chances that failures cause leadership churn, but your cluster takes longer to detect real failures and restore availability.

The value of raft_multiplier has a default value of 5. It is a scaling factor setting that directly affects the following parameters:

Parameter name	Default value	Derived from
HeartbeatTimeout	5000ms	5 x 1000ms
ElectionTimeout	5000ms	5 x 1000ms
LeaderLeaseTimeout	2500ms	5 x 500ms

You can use the telemetry from consul.raft.leader.lastContact to observe Raft timing performance.

Wide networks with more latency perform better with larger values of raft_multiplier, but cluster failure detection will take longer. If your network operates with low latency, we recommend that you do not set the Raft multiplier higher than 5. Instead, you should either replace the servers with more powerful ones or minimize the network latency between nodes.

We recommend you start from a baseline and perform chaos engineering testing with different values for the Raft multiplier to find the acceptable time for problem detection and recovery for the cluster. Then scale the cluster and its dedicated resources with the number of workloads handled. This approach gives you the best balance between pure resource growth and pure Raft tuning strategies because it lets you use Raft tuning as a backup plan if you cannot scale your resources.

The types of workloads the Consul cluster handles also play an important role in Raft tuning. For example, if your Consul clusters are mostly static and do not handle many events, you should increase your Raft multiplier instead of scaling your resources because the risk of an important event happening while the cluster is converging or re-electing a leader is lower.