Autopilot
Autopilot enables automated workflows for managing Raft clusters. The current feature set includes 3 main features: Server Stabilization, Dead Server Cleanup and State API. These three features are introduced in Vault 1.7.
Server Stabilization
Server stabilization helps to retain the stability of the Raft cluster by safely
joining new voting nodes to the cluster. When a new voter node is joined to an
existing cluster, autopilot adds it as a non-voter instead, and waits for a
pre-configured amount of time to monitor it's health. If the node remains to be
healthy for the entire duration of stabilization, then that node will be
promoted as a voter. The server stabilization period can be tuned using
server_stabilization_time
(see below).
Dead Server Cleanup
Dead server cleanup automatically removes nodes deemed unhealthy from the
Raft cluster, avoiding the manual operator intervention. This feature can be
tuned using the cleanup_dead_servers
, dead_server_last_contact_threshold
,
and min_quorum
(see below).
State API
State API provides detailed information about all the nodes in the Raft cluster in a single call. This API can be used for monitoring for cluster health.
Follower Health
Follower node health is determined by 2 factors.
- Its ability to heartbeat to leader node at regular intervals. Tuned using
last_contact_threshold
(see below). - Its ability to keep up with data replication from the leader node. Tuned using
max_trailing_logs
(see below).
Default Configuration
By default, Autopilot will be enabled with clusters using Vault 1.7+, although dead server cleanup is not enabled by default. Upgrade of Raft clusters deployed with older versions of Vault will also transition to use Autopilot automatically.
Autopilot exposes a configuration API to manage its behavior. Autopilot gets initialized with the following default values.
cleanup_dead_servers
-false
min_quorum
- This doesn't default to anything and will need to be set to at least 3 whencleanup_dead_servers
is set astrue
.max_trailing_logs
-1000
last_contact_threshold
-10s
Replication
Performance secondary clusters have their own Autopilot configuration, managed independently of their primary.
DR secondary clusters do not currently support Autopilot.
Learn
Refer to Integrated Storage Autopilot for a step-by-step tutorial.