Failover overview

Services in your mesh may become unhealthy or unreachable for many reasons, but you can mitigate some of the effects associated with infrastructure issues by configuring Consul to automatically route traffic to and from failover service instances. This topic provides an overview of the failover strategies you can implement with Consul.

Service failover strategies in Consul

There are several methods for implementing failover strategies between datacenters in Consul. You can adopt one of the following strategies based on your deployment configuration and network requirements:

Configure the Failover stanza in a service resolver configuration entry to explicitly define which services should failover and the targeting logic they should follow.
Make a prepared query for each service that you can use to automate geo-failover.
Create a sameness group to identify partitions with identical namespaces and service names to establish default failover targets.

The following table compares these strategies in deployments with multiple datacenters to help you determine the best approach for your service:

	`Failover` stanza	Prepared query	Sameness groups
Supports WAN federation	✅	✅	❌
Supports cluster peering	✅	❌	✅
Supports locality-aware routing	✅	❌	✅
Multi-datacenter failover strength	✅	❌	✅
Multi-datacenter usage scenario	Enables more granular logic for failover targeting.	Central policies that can automatically target the nearest datacenter.	Group size changes without edits to existing member configurations.
Multi-datacenter usage scenario	Configuring failover for a single service or service subset, especially for testing or debugging purposes	WAN-federated deployments where a primary datacenter is configured. Prepared queries are not replicated over peer connections.	Cluster peering deployments with consistently named services and namespaces.

Although cluster peering connections support the Failover field of the prepared query request schema when using Consul's service discovery features to perform dynamic DNS queries, they do not support prepared queries for service mesh failover scenarios.

Failover configurations for a service mesh with a single datacenter

You can implement a service resolver configuration entry and specify a pool of failover service instances that other services can exchange messages with when the primary service becomes unhealthy or unreachable. We recommend adopting this strategy as a minimum baseline when implementing Consul service mesh and layering additional failover strategies to build resilience into your application network.

Refer to the Failover configuration for examples of how to configure failover services in the service resolver configuration entry on both VMs and Kubernetes deployments.

Failover configuration for WAN-federated datacenters

If your network has multiple Consul datacenters that are WAN-federated, you can configure your applications to look for failover services with prepared queries. Prepared queries are configurations that enable you to define complex service discovery lookups. This strategy hinges on the secondary datacenter containing service instances that have the same name and residing in the same namespace as their counterparts in the primary datacenter.

Refer to the Automate geo-failover with prepared queries tutorial for additional information.

Failover configuration for peered clusters and partitions

In networks with multiple datacenters or partitions that share a peer connection, each datacenter or partition functions as an independent unit. As a result, Consul does not correlate services that have the same name, even if they are in the same namespace.

You can configure sameness groups for this type of network. Sameness groups allow you to define a group of admin partitions where identical services are deployed in identical namespaces. After you configure the sameness group, you can reference the SamenessGroup parameter in service resolver, exported service, and service intention configuration entries, enabling you to add or remove cluster peers from the group without making changes to every cluster peer every time.

You can configure a sameness group so that it functions as the default for failover behavior. You can also reference sameness groups in a service resolver's Failover stanza or in a prepared query. Refer to Failover with sameness groups for more information.

Locality-aware routing

By default, Consul balances traffic to all healthy upstream instances in the cluster, even if the instances are in different network regions and zones. You can configure Consul to route requests to upstreams in the same region and zone, which reduces latency and transfer costs. Refer to Route traffic to local upstreams for additional information.