Automate upgrades with Vault Enterprise
Enterprise Only
The functionality described in this tutorial is available only in Vault Enterprise.
Challenge
Vault version upgrade is always a delicate moment for any production environment, and it's important to have best practices in place that simplify the process where possible.
Solution
Vault Enterprise provides automated version upgrades with the autopilot feature when using Integrated Storage. The feature allows you to start new Vault nodes alongside the older version ones and automatically switch to the new nodes after they reach quorum.
This automates the leader election process and ensures leader election among the new nodes so that removing the older version nodes from the datacenter does not trigger a leader election.
Prerequisites
To test the automated upgrades feature explained in this tutorial you will need:
- A Vault Enterprise cluster with three nodes running Vault Enterprise 1.11.0 or later.
- Three extra nodes with Vault Enterprise 1.11.0 or later to use as the new servers after the upgrade.
You will also need a text editor, the curl
executable to test the API
endpoints, and optionally the jq
command to format the output for curl
.
Scenario introduction
To learn about the new autopilot behavior, start an initial 3 node cluster (Note Step 1 diagram). Then, start an additional 3 nodes with an automatic upgrade version specified, and add them to the cluster (Note Step 2 diagram).
You will run a script to start a cluster.
- Initialize and unseal vault_1 (
http://127.0.0.1:8100
). The root token creates a transit key that enables the other Vaults auto-unseal. This Vault server is not a part of the cluster. - Initialize and unseal vault_2 (
http://127.0.0.1:8200
). This Vault starts as the cluster leader. - Start vault_3 (
http://127.0.0.1:8300
). It automatically joins the cluster viaretry_join
. - Start vault_4 (
http://127.0.0.1:8400
). It automatically joins the cluster viaretry_join
.
If this is your first time setting up a Vault cluster with integrated storage, go through the Vault HA Cluster with Integrated Storage tutorial.
Setup an initial cluster
Retrieve the configuration by cloning the
hashicorp/learn-vault-raft
repository from GitHub.This repository holds supporting content for all the Vault learn tutorials. The content specific to this tutorial is in a sub-directory.
Change the working directory to
learn-vault-raft/raft-auto-upgrade/local
.Set the
setup_1.sh
file to executable.Execute the
setup_1.sh
script to spin up a Vault cluster.You can find the server configuration files and the log files in the working directory.
Use your preferred text editor and open the
config-vault_2.hcl
file to examine the generated server configuration forvault_2
.config-vault_2.hclReview the generated server configuration for
vault_3
.config-vault_3.hclThe
retry_join
configuration block hasvault_3
andvault_4
nodes automatically joining the cluster.Export an environment variable for the
vault
CLI to address thevault_2
server.Verify the cluster members.
View the autopilot's upgrade state information.
Output:
Notice the Upgrade Info fields shows the Status to be idle.
If you have the
watch
command (or similar), you can follow the upgrade status as you proceed to adding more nodes.This checks the autopilot state every half a second.
Add new nodes
When autopilot detects that the count of nodes on the new version equals or exceeds older version nodes, it begins promoting the new nodes to voters and demoting the older version nodes to non-voters.
Use your preferred text editor and open the
config-vault_5.hcl
file to examine the generated server configuration forvault_5
.config-vault_5.hclTo specify an automatic upgrade target version, add the
autopilot_upgrade_version
parameter in thestorage
stanza where its value is a SemVer compatible version string of your choosing.Vault Configuration
The
vault_5
,vault_6
andvault_7
nodes haveautopilot_upgrade_version
parameter configured. This implies that those nodes have a specific target Vault version.Set the
setup_2.sh
file to executable.Execute the
setup_2.sh
script to add three additional nodes to the cluster.Follow the autopilot's upgrade status as it progresses.
Or,
The Status changes from
idle
toawait-new-voters
.The status will change to
promoting
as autopilot promotes the 3 new nodes to be voters. Then the status will change todemoting
, as autopilot demotes 2 out of the 3 older version nodes to be non-voters. Then, the leader will change fromvault_2
tovault_5
.The status changes to
await-server-removal
.
Autopilot Statue
The progression of autopilot statuses during an upgrade
looks like: idle
→ await-new-voters
→ demoting
→ promoting
→
leader-transfer
→ await-server-removal
→ idle
.
Remove non-voter nodes
Once the autopilot upgrade status changes to await-server-removal
, you can
remove the older version non-voting nodes from the cluster.
List the current peers before removing any nodes.
Export an environment variable for the
vault
CLI to address the server.Remove
vault_2
from the cluster.Remove
vault_3
from the cluster.Remove
vault_4
from the cluster.Verify non-voter node removal from the cluster.
Autopilot configuration
Vault Enterprise enables automated upgrade migrations by default.
Output:
To disable automated upgrade migrations, set the -disable-upgrade-migration
parameter to true
.
Clean up
The cluster.sh
script provides a clean
operation that removes all services,
configuration, and modifications to your local system.
Clean up your local workstation.
Next steps
In this tutorial you upgraded your Vault datacenter by using autopilot's automated upgrades functionality. Automated upgrades lets you automatically upgrade a cluster of Vault nodes to a new version as updated server nodes join the cluster. Once the number of nodes on the new version is equal to or greater than the number of nodes on the older version, Autopilot will promote the newer versioned nodes to voters, demote the older versioned nodes to non-voters, and begin a leadership transfer from the older version leader to one of the newer versioned nodes. After the leadership transfer completes, you can remove the older versioned non-voting nodes from the cluster.