Vault HA cluster with integrated storage
Challenge
Vault supports many storage backends to persist its encrypted data (e.g. Consul, MySQL, DynamoDB, etc.).
These backends require:
- Their own administration; increasing complexity and total administration.
- Configuration to allow Vault as a client.
- Vault configuration to connect to the provider as a client.
Solution
Use Vault's Integrated Storage to persist the encrypted data. The integrated storage has the following benefits:
- Integrated into Vault (reducing total administration).
- All configuration within Vault.
- Supports failover and multi-cluster replication.
- Eliminates additional network requests.
- Lowers complexity when diagnosing issues (leading to faster time to recovery).
Tip
HashiCorp Cloud Platform (HCP) Vault clusters use Integrated Storage. To learn more about the managed Vault clusters, refer to the Getting Started with HCP Vault Dedicated tutorials. If you are a Kubernetes user, visit the Vault Installation to Minikube via Helm with Integrated Storage tutorial.
Prerequisites
This tutorial requires Vault, sudo access, and additional configuration to create the cluster.
- Install Vault v1.4.0 or later
Setup
The cluster.sh
script configures and starts four Vault servers. Here is a
diagram describing the architecture:
- vault_1 (
http://127.0.0.1:8200
) is initialized and unsealed. The root token creates a transit key that enables the other Vaults auto-unseal. This Vault does not join the cluster. - vault_2 (
http://127.0.0.2:8200
) is initialized and unsealed. This Vault starts as the cluster leader. An example K/V-V2 secret is created. - vault_3 (
http://127.0.0.3:8200
) is only started. You will join it to the cluster. - vault_4 (
http://127.0.0.4:8200
) is only started. You will join it to the cluster.
Open a terminal, and create a directory named
$HOME/vault-tutorial
, and set it as the working directory.Retrieve the configuration by cloning the
hashicorp-education/learn-vault-raft
repository from GitHub.Change the working directory to
learn-vault-raft/raft-storage/local
.Set the
cluster.sh
file to executable:Set up the local loopback addresses for each Vault:
Note
This operation requires a user with sudo access. You will be prompted to enter that user's password.
127.0.0.0/8
address block is assigned for use as the Internet host loopback address. (RFC3330)Create the configuration for each Vault node.
Setup vault_1:
Watch out for VAULT_TOKEN
Before proceeding, make sure that you do not already have an existing
VAULT_TOKEN
environment variable exported in your shell session. If you do find that checking for it with a command likeprintenv | grep VAULT_TOKEN
returns a result, then you must unset it withunset VAULT_TOKEN
before proceeding or subsequent steps will likely not succeed.vault_1 (
http://127.0.0.1:8200
) is initialized and unsealed. The transit secrets engine is enabled and a key is created. This will be used to auto-unseal vault_2. The initial root token is stored in theroot_token-vault_1
file.Setup vault_2.
vault_2 (
http://127.0.0.2:8200
) is initialized and unsealed. K/V-V2 secrets engine is enabled with some test data atkv/apikey
. The initial root token is stored in therecovery_key-vault_2
file.Setup vault_3.
Create an HA cluster
Currently vault_2 is initialized, unsealed, and has HA enabled. It is the only node in a cluster. The remaining nodes, vault_3 and vault_4, have not joined its cluster.
Examine the leader
Let's discover more about the configuration of vault_2 and how it describes the current state of the cluster.
Open the vault_2 server configuration file (
config-vault_2.hcl
) in a text editor.config-vault_2.hcl1 2 3 4 5 6 7 8 9 101112131415161718192021222324
To use the Integrated Storage, the
storage
stanza is set toraft
. Thepath
specifies the path where Vault data will be stored ($HOME/vault-tutorial/learn-vault-raft/raft-storage/local/raft-vault_2/
).Set the
VAULT_ADDR
to point to vault_2.Examine the current raft peer set.
The cluster reports that vault_2 is the only node and is currently the leader.
Examine vault_2 root token.
The cluster.sh
script captured the root token of vault_2 during its
setup and stored it in this file. This root token has privileged access to all
nodes within the cluster.
Although the listener stanza disables TLS for this tutorial, Vault should always be used with TLS in production to provide secure communication between clients and the Vault server. It requires a certificate file and key file on each Vault host.
Join nodes to the cluster
Add vault_3 to the cluster using the vault operator raft join
command.
Open a new terminal and set the working directory to the
learn-vault-raft/raft-storage/local
directory.Set the VAULT_ADDR to vault_3 API address.
Join vault_3 to the vault_2 cluster.
The
http://127.0.0.2:8200
is the vault_2 server address which has been already initialized and auto-unsealed. This makes vault_2 the active node and its storage behaves as the leader in this cluster.Tip
In this scenario, Transit auto-unseal is used; therefore, vault_3 is automatically unsealed once it successfully joins the cluster.
Next, configure the
vault
CLI to use vault_2 root token for requests.Examine the current raft peer set.
Now, vault_3 is listed as a follower node.
Examine vault_3 log file (
vault_3.log
).The log describes the cluster joining operations.
Finally, verify that you can read the secret at
kv/apikey
.
Retry join
You can use the vault operator raft join
command to join
vault_4 to the cluster in the same way you joined vault_3 to the cluster. However, if the connection details of all the nodes
are known beforehand, you can configure the retry_join
stanza in the server
configuration file to automatically join the cluster.
Modify the server configuration file,
config-vault_4.hcl
by adding theretry_join
block inside thestorage
stanza.The resulting
config-vault_4.hcl
file should look like:config-vault_4.hcl1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
Since the address of vault_2 and vault_3 are known, you can predefine the possible cluster leader addresses in the
retry_join
block.Start vault_4.
Open a new terminal and set the working directory to the
$HOME/vault-tutorial/learn-vault-raft/raft-storage/local
directory.Set the VAULT_ADDR to vault_4 API address.
List the peers and notice that vault_4 is listed as a follower node.
Configure the
vault
CLI, in this terminal, to use vault_2 root token for requests.Patch the secret at
kv/apikey
.Return to the terminal you used to configure vault_3 and read the secret again.
Tip
If you are running Vault in a public cloud, the retry_join
stanza supports
auto_join
parameter which takes cloud provider specific configurations as a
string (e.g. provider=aws tag_key=... tag_value=...
). Refer to the Vault
documentation to
learn more.
Data snapshots for recovery
Integrated Storage provides an interface to take snapshots of its data. These snapshots can be used later to restore data if it ever becomes necessary.
Take a snapshot
Return to the terminal where VAULT_ADDR is set to vault_2 address
(http://127.0.0.2:8200
), and then execute the following command to take a
snapshot of the data.
Automated snapshots
Vault Enterprise feature
Automated snapshots require Vault Enterprise 1.6.0 or later.
Instead of taking a snapshot manually, you can schedule snapshots to be taken automatically at your desired interval. You can create multiple automatic snapshot configurations.
Create an automatic snapshot configuration named, daily
which takes a snapshot
every 24 hours. The snapshots are stored locally in a directory named,
raft-backup
and retain 5 snapshots before one can be deleted to make room
for the next snapshot. The local disk space available to store the snapshot is
1GB. This means that raft-backup
retains up to 5 snapshots or 1GB of data
whichever the condition meets first.
In absence of a specific file_prefix
value, the snapshot files will have a
prefix of vault-snapshot
.
Read and verify the automatic snapshot configuration.
Available snapshot storage types are: local
, aws-s3
, azure-blob
, and
google-gcs
. Depending on the target location, the configuration parameters
differ.
View the path help on the sys/storage/raft/snapshot-auto/config
endpoint.
Simulate loss of data
First, verify that a secrets exists at kv/apikey
.
Next, delete the secrets at kv/apikey
.
Finally, verify that the data has been deleted.
Restore data from a snapshot
First, recover the data by restoring the data found in demo.snapshot
.
Optional: You can tail the server log of the active node (vault_2).
Verify that the data has been recovered.
Resign from active duty
Currently, vault_2 is the active node. Experiment to see what happens if vault_2 steps down from its active node duty.
In the terminal where VAULT_ADDR is set to http://127.0.0.2:8200
, execute the
step-down
command.
In the terminal where VAULT_ADDR is set to http://127.0.0.3:8200
, examine the raft peer set.
Notice that vault_3 is now promoted to be the leader and vault_2 became a follower.
Remove a cluster member
It may become important to remove nodes from the cluster for maintenance, upgrades, or to preserve compute resources.
Remove vault_4 from the cluster.
Verify that vault_4 has been removed from the cluster by viewing the raft cluster peers.
Add vault_4 back to the cluster
If you wish to add vault_4 back to the HA cluster, return to the terminal
where VAULT_ADDR is set to vault_4 API address (http://127.0.0.4:8200
),
and stop vault_4.
Delete the data directory.
Now, create a raft-vault_4
directory again because the raft storage
destination must exist before you can start the server.
Start the vault_4 server.
You can again examine the peer set to confirm that vault_4 successfully joined the cluster as a follower.
Recovery mode for troubleshooting
In the case of an outage caused by corrupt entries in the storage backend, an operator might need to start Vault in recovery mode. In this mode, Vault runs with minimal capabilities and exposes a subset of its API.
Start in recovery mode
Use the setup script to stop all remaining cluster members to simulate an outage.
Stop vault_2.
Stop vault_4 if you added it back to the cluster.
Stop vault_3.
Start vault_3 in recovery mode.
Output example:
Create a recovery operational token
Open a new terminal and set the working directory to the
$HOME/vault-tutorial/learn-vault-raft/raft-storage/local
directory.Set the VAULT_ADDR to vault_3 API address.
Generate a temporary one-time password (OTP).
Start the generation of the recovery token with the OTP.
Example:
Output example:
View the recovery key that was generated during the setup of vault_3.
Note
Recovery key is used instead of unseal key since this cluster has Transit auto-unseal configured.
Create an encoded token.
Enter the recovery key when prompted. The output looks similar to below.
Finally, complete the creation of a recovery token with the
encoded token
andotp
.Example:
Output example:
In recovery mode, Vault launches with a minimal API enabled. In this mode you are able to interact with the raw system backend. Use the recovery operational token to list the contents at
sys/raw/sys
.Example:
Fix the issue using the recovery token.
Resume normal operations
First, stop the Vault server running in recovery mode by pressing Ctrl+C in the terminal where vault_3 is started in recovery mode.
Start Vault service for vault_3.
Start vault_2.
Cluster reset
When a node is brought up in recovery mode, it resets the list of cluster members. This means that when resuming normal operations, each node will need to rejoin the cluster.
Clean up
When you are done you can quickly stop all services, remove all configuration
and remove all modifications to your local system with the same cluster.sh
script you used the setup.
Clean up your local workstation.
Help and reference
- Integrated Storage
- Recovery Mode
- Raft GitHub repository
- [High Availability with Consul]