Enable disaster recovery replication

33min
|
Enterprise
Vault

Enterprise Only

Disaster Recovery Replication requires Vault Enterprise Standard license.

It is inevitable for organizations to have a disaster recovery (DR) strategy to protect their Vault deployment against catastrophic failure of an entire cluster. Vault Enterprise supports multi-datacenter deployment where you can replicate data across datacenters for performance as well as disaster recovery.

A cluster is the basic unit of Vault Enterprise replication which follows the leader-follower model. A leader cluster is referred to as the primary cluster and is considered the system of record. Data is streamed from the primary cluster to all secondary (follower) clusters. Primary clusters can stream data to both disaster recovery secondary clusters and performance replication clusters.

In DR replication, secondary clusters do not forward service read or write requests until they are promoted and become a new primary. They essentially act as a warm standby cluster.

In this tutorial you will setup disaster recovery replication and simulate a failure to the primary cluster.

Note

The Performance Replication tutorial provides step-by-step instructions on setting up performance replication. This tutorial focuses on DR replication setup.

Prerequisites

This intermediate Vault operations tutorial assumes that you have some working knowledge of Vault.

You need two Vault Enterprise clusters: one behaves as the primary cluster, and another becomes the secondary.

Note

This procedure requires both Vault clusters to run the same version of Vault.

Once the original DR primary cluster is demoted, you cannot replicate to it from a promoted cluster running a higher version of Vault.

For example, if you have Cluster A (a DR Primary) on 1.11.x and Cluster B (a new DR secondary running Vault 1.15.x), you can promote Cluster B and but you cannot replicate to Cluster A until Cluster A is upgraded to 1.15.x or above.

This limitation exists because Vault does not make backward-compatibility guarantees for its data store.

Policy requirements

To set up the Vault Enterprise Replication, it requires highly privileged policies such as root. Some of the API endpoints require the sudo capability. If you are not using the root token, expand below to see the required policies to perform the operations described in this tutorial.

Note

If you are not familiar with policies, complete the policies tutorial.

Minimum policy requirements

# To enable DR primary
path "sys/replication/dr/primary/enable" {
  capabilities = ["create", "update"]
}

# To generate a secondary token required to add a DR secondary
path "sys/replication/dr/primary/secondary-token" {
  capabilities = ["create", "update", "sudo"]
}

# To create ACL policies
path "sys/policies/acl/*" {
  capabilities = ["create", "update", "list"]
}

# Create a token role for batch DR operation token
path "auth/token/roles/*" {
  capabilities = ["create", "update"]
}

# Create a token
path "auth/token/create" {
  capabilities = ["create", "update"]
}

# To demote the primary to secondary
path "sys/replication/dr/primary/demote" {
  capabilities = ["create", "update"]
}

# To enable DR secondary
path "sys/replication/dr/secondary/enable" {
  capabilities = ["create", "update"]
}

# To generate an operation token
path "sys/replication/dr/secondary/generate-operation-token/*" {
  capabilities = ["create", "update"]
}

# To promote the secondary cluster to be primary
path "sys/replication/dr/secondary/promote" {
  capabilities = ["create", "update"]
}

# To update the assigned primary cluster
path "sys/replication/dr/secondary/update-primary" {
  capabilities = ["create", "update"]
}

# If you choose to disable the original primary cluster post-recovery
path "sys/replication/dr/primary/disable" {
  capabilities = ["create", "update"]
}

Workflow

The basic steps to configure a DR replication:

Cluster failure

When a catastrophic failure causes the primary cluster (Cluster A) to be inoperable (cannot send requests via API, CLI, or UI), promote the DR secondary (Cluster B) to become the new primary.

Promote DR secondary to primary

Update Vault clients

Do not forget to update the Vault clients to point to the new primary (Cluster B) so that they can resume normal operations.

Multiple secondaries:

If you have more than one DR secondary clusters, you need to update the remaining secondary clusters to point to the new primary.

Update the assigned primary

Post-recovery of the original DR primary:

If the original primary cluster (Cluster A) becomes operational again after you successfully promoted a DR secondary cluster (Cluster B) to be the new primary, perform one of the following options:

After failing over to Cluster B (Option 1), all the traffic is routed to Cluster B. If your goal is to promote Cluster A back to be the primary, you can reverse the steps to restore the original topology.

DR failback

Avoid split-brain situation

Keep in mind that only one cluster behaves as a primary. If the cluster failure is temporary and the DR primary (Cluster A) becomes operational shortly after you promoted the DR secondary (Cluster B), it could results in a split-brain situation.

To avoid having two primaries, make sure to perform Option 1 or Option 2 as soon as Cluster A becomes operational again to accept requests.

If you need to promote a DR secondary while the DR primary is still operational, you should demote the DR primary before promoting a DR secondary.

The workflow would be:

Make sure the time window between those operations is as small as possible.

Enable DR primary replication

Enable DR replication on the primary cluster (Cluster A).

$ vault write -f sys/replication/dr/primary/enable
WARNING! The following warnings were returned from Vault:

* This cluster is being enabled as a primary for replication. Vault will be
unavailable for a brief period and will resume service shortly.

Generate a secondary token.

$ vault write sys/replication/dr/primary/secondary-token id="dr-secondary"

The output should look similar to:

Key                              Value
---                              -----
wrapping_token:                  eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vMTI3LjAuMC4xOjgyMDAiLCJleHAiOjE2NTczMDY2NTgsImlhdCI6MTY1NzMwNDg1OCwianRpIjoiaHZzLnhCVThnZWtpYTBvRnExUVQ3ckpQUUVxcCIsIm5iZiI6MTY1NzMwNDg1MywidHlwZSI6IndyYXBwaW5nIn0.AEC_LzJST00bukWRNAaQvejLeZqeHCcwKZL0izjjywgMOm6d0qGCw9PuMT88b649HaYxPqfc6zL4rZTHIKExQLLhAHZdY_BjEPj-0CzGMuXVWApwPao8uOVaBV2ZCcestYc151xtoTZ63m8Jj8NrZafudWhhAK1oCSx6Omk9J3yrFMsT
wrapping_accessor:               odeEh1YLertHAmr2toQ6zY5O
wrapping_token_ttl:              30m
wrapping_token_creation_time:    2022-07-08 11:27:38.660697 -0700 PDT
wrapping_token_creation_path:    sys/replication/dr/primary/secondary-token

Copy the generated wrapping_token which you will need to enable the DR secondary cluster.

Enable DR replication on the primary cluster (Cluster A) by invoking /sys/replication/dr/primary/enable endpoint.

Example:

$ curl --request POST --header "X-Vault-Token: $VAULT_TOKEN" \
   --data '{}' https://cluster-A.example.com:8200/v1/sys/replication/dr/primary/enable
{
  "request_id": "ef38af20-9c1f-138a-2d03-bbb6410fb0fc",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "This cluster is being enabled as a primary for replication. Vault will be
    unavailable for a brief period and will resume service shortly."
  ],
  "auth": null
}

Generate a secondary token by invoking /sys/replication/dr/primary/secondary-token endpoint.

Example:

$ curl --request POST --header "X-Vault-Token: $VAULT_TOKEN" \
   --data '{"id": "dr-secondary"}' \
   https://cluster-A.example.com:8200/v1/sys/replication/dr/primary/secondary-token | jq
{
  "request_id": "",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": {
    "token": "eyJhbGciOiJFUzUxMiIsInR...",
    "accessor": "7e56e9da-178c-119d-1d01-807a203fa0b3",
    "ttl": 1800,
    "creation_time": "2018-06-18T17:22:07.129747708Z",
    "creation_path": "sys/replication/dr/primary/secondary-token"
  },
  "warnings": null,
  "auth": null
}

Copy the generated token which you will need to enable the DR secondary cluster.

Open a web browser and launch the Vault UI (e.g. https://cluster-A.example.com:8200/ui) and then login.

Select the arrow next to Status and click Enable under REPLICATION.
Select the Disaster Recovery (DR) radio button.
Click Enable replication.
In the Known Secondaries section, click Add secondary.
Populate the Secondary ID field, and click Generate token.
Click Copy & Close to copy the token which you will need to enable the DR secondary cluster.

Enable DR secondary replication

The following operations must be performed on the DR secondary cluster (Cluster B).

Enable DR replication on the secondary cluster.

$ vault write sys/replication/dr/secondary/enable token="eyJhbGciOiJFUzUxMiIsInR5cCI6Ik..."

Where the token is the wrapping_token obtained from the primary cluster.

Expected output:

WARNING! The following warnings were returned from Vault:

* Vault has successfully found secondary information; it may take a while to
perform setup tasks. Vault will be unavailable until these tasks and initial
sync complete.

Warning

This immediately clears all data in the secondary cluster.

Create an API request payload containing the token obtained from the primary cluster.

$ tee payload.json <<EOF
{
  "token": "eyJhbGciOiJFUzUxMiIsInR5cCI6Ik..."
}
EOF

Enable DR replication on the secondary cluster.

$ curl --request POST --header "X-Vault-Token: $VAULT_TOKEN" \
   --data @payload.json \
   https://cluster-B.example.com:8200/v1/sys/replication/dr/secondary/enable | jq

Example output:

{
  "request_id": "7a9730c1-b6fc-6557-5c0a-081e1f89ed2d",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "Vault has successfully found secondary information; it may take a while to perform setup tasks. Vault will be unavailable until these tasks and initial sync complete."
  ],
  "auth": null
}

Warning

This immediately clears all data in the secondary cluster.

Now, launch the Vault UI for the secondary cluster (e.g. https://cluster-B.example.com:8200/ui).
Select the arrow next to Status and click Enable under REPLICATION.
Check the Disaster Recovery (DR) radio button and select secondary under the Cluster mode. Paste the token you copied from the primary in the Secondary activation token field.
Click Enable replication. (Warning: This immediately clears all data in the secondary cluster.)
Click the Details tab to see replication details.

DR replication setup is now completed, and no further action is required.

Recommendation

You have successfully configured DR replication.

The remainder of this tutorial guides you through common scenarios when managing DR replication.

Refer to the Monitoring Vault Replication tutorial to learn about the replication health check.
Read the DR operation token strategy section to prepare for unexpected loss of the primary cluster, and you will have an operation token handy.

DR operation token strategy

To promote a DR secondary cluster (Cluster B) to be the new primary, a DR operation token is needed. However, the process of generating a DR operation token requires a threshold of unseal keys or recovery keys if auto-unseal is enabled. This can be troublesome since a cluster failure is usually caused by unexpected incident and you may not be able to coordinate amongst the key holders to generate the DR operation token in a timely fashion while an immediate failover to the healthy cluster is crucial to your business continuity.

As of Vault 1.4, you can create a batch DR operation token which can be used to promote the DR secondary cluster even if it was generated by the DR primary cluster. Therefore, this is a strategic operation that the Vault administrator can perform to prepare for unexpected loss of the DR primary.

A DR operation token does not have a TTL and therefore should be deleted when it is no longer needed using the /sys/replication/dr/secondary/operation-token/delete endpoint.

Vault version

The following steps require Vault 1.4 or later. If you are running an earlier version of Vault, follow the DR operation token generation steps in the Promote DR Secondary to Primary section.

On the DR primary cluster (Cluster A), create a policy named "dr-secondary-promotion" allowing the update operation against the sys/replication/dr/secondary/promote path. In addition, you can add a policy against the sys/replication/dr/secondary/update-primary path so that you can use the same DR operation token to update the primary cluster that the secondary cluster points to.
```
$ vault policy write dr-secondary-promotion - <<EOF
path "sys/replication/dr/secondary/promote" {
  capabilities = [ "update" ]
}

# To update the primary to connect
path "sys/replication/dr/secondary/update-primary" {
    capabilities = [ "update" ]
}

# Only if using integrated storage (raft) as the storage backend
# To read the current autopilot status
path "sys/storage/raft/autopilot/state" {
    capabilities = [ "update" , "read" ]
}
EOF
```
Note
The policy on the sys/storage/raft/autopilot/state path is only required if your cluster is configured with Integrated Storage as its persistence layer. Refer to the Integrated Storage Autopilot tutorial to learn more about Autopilot.

Verify to make sure that the policy was created.

$ vault policy list

 default
 dr-secondary-promotion
 root

Create a token role named "failover-handler" with the dr-secondary-promotion policy attached and its type should be batch. Batch tokens cannot be renewed, so set the renewable parameter value to false. Also, set the orphan parameter to true.
```
$ vault write auth/token/roles/failover-handler \
    allowed_policies=dr-secondary-promotion \
    orphan=true \
    renewable=false \
    token_type=batch
```

Create a token for role, "failover-handler" with time-to-live (TTL) set to 8 hours.

$ vault token create -role=failover-handler -ttl=8h

 WARNING! The following warnings were returned from Vault:

   * Endpoint ignored these unrecognized parameters: [display_name entity_alias
   explicit_max_ttl num_uses period renewable ttl type]

 Key                  Value
 ---                  -----
 token                hvb.AAAAAQIL13E0DXo9sq-1vaLJD1_69nfgb-qojEets9EkDTvWs_L_lXVFplXpIpXivr1qlrppxA0tqM4ckYG4dzISLMWqwpqLSQBgfllkd7N-xzsS0Bg7lVO7sb7B2FcWogM64SJshk9VDQajRskb3jZutWW_FLrUhWl5lKfL3j9CxIe5I73NRoYLbtLDEVsGOwQIYgTdNg
 token_accessor       n/a
 token_duration       8h
 token_renewable      false
 token_policies       ["default" "dr-secondary-promotion"]
 identity_policies    []
 policies             ["default" "dr-secondary-promotion"]

On the DR primary cluster (Cluster A), create a policy named "dr-secondary-promotion" allowing the update operation against the sys/replication/dr/secondary/promote path.

First, create a json API payload with the path and capabilities using a HCL policy string.

$ tee policy-payload.json <<EOF
{
 "policy": "path \"sys/replication/dr/secondary/promote\" {\n  capabilities = [ \"update\" ]\n}\n\n# To update the primary to connect\npath \"sys/replication/dr/secondary/update-primary\" {\n    capabilities = [ \"update\" ]\n}\n\n# Only if using integrated storage (raft) as the storage backend\n# To read the current autopilot status\npath \"sys/storage/raft/autopilot/state\" {\n    capabilities = [ \"update\" , \"read\" ]\n}\n"
}
EOF

Note

The policy on the sys/storage/raft/autopilot/state path is only required if your cluster is configured with Integrated Storage as its persistence layer. Refer to the Integrated Storage Autopilot tutorial to learn more about Autopilot.

Create a new policy named dr-secondary-promotion.

$ curl --request PUT --header "X-Vault-Token: $VAULT_TOKEN" -d @policy-payload.json http://127.0.0.1:8200/v1/sys/policies/acl/dr-secondary-promotion

Create a token role named "failover-handler" with the dr-secondary-promotion policy attached and its type should be batch. Batch tokens cannot be renewed, so set the renewable parameter value to false. Also, set the orphan parameter to true.
First, create a json API payload for the allowed dr-secondary-promotion policy.
```
$ tee role-payload.json <<EOF
{
 "allowed_policies": [ "dr-secondary-promotion" ],
 "orphan": true,
 "renewable": false,
 "token_type": "batch"
}
EOF
```
Make the API request using the role-payload.json payload created above.
```
$ curl --request PUT --header "X-Vault-Token: $VAULT_TOKEN" -d @role-payload.json http://127.0.0.1:8200/v1/auth/token/roles/failover-handler
```

Now, create a token for role, "failover-handler".

$ curl --request POST --header "X-Vault-Token: $VAULT_TOKEN" http://127.0.0.1:8200/v1/auth/token/create/failover-handler | jq ".auth"
{
 "client_token": "b.AAAAAQIJ3AB4lZSSkNI20...",
 "accessor": "",
 "policies": [
   "default",
   "dr-secondary-promotion"
 ],
 "token_policies": [
   "default",
   "dr-secondary-promotion"
 ],
 "metadata": null,
 "lease_duration": 2764800,
 "renewable": false,
 "entity_id": "",
 "token_type": "batch",
 "orphan": true
}

Launch the Vault UI for the primary cluster (e.g. https://cluster-A.example.com:8200/ui) and sign in.
Click the Policies tab, and then select Create ACL policy.

Enter dr-secondary-promotion in the Name text field, and then enter the following policy in the Policy text field.

path "sys/replication/dr/secondary/promote" {
  capabilities = [ "update" ]
}

# To update the primary to connect
path "sys/replication/dr/secondary/update-primary" {
    capabilities = [ "update" ]
}

# Only if using integrated storage (raft) as the storage backend
# To read the current autopilot status
path "sys/storage/raft/autopilot/state" {
    capabilities = [ "update" , "read" ]
}

Note

Click Create Policy to complete.
Click the Vault CLI shell icon (>_) to open a command shell.

Execute the following command to create a token role named "failover-handler" with the dr-secondary-promotion policy attached.

vault write auth/token/roles/failover-handler \
    allowed_policies=dr-secondary-promotion \
    orphan=true \
    renewable=false \
    token_type=batch

Now, create a token for role, "failover-handler" by execute the following command in the Vault CLI shell.
```
> vault write -force auth/token/create/failover-handler
```

Securely store this batch token. If the DR secondary cluster needs to be promoted, you can use this batch token to perform the necessary operation. The batch token works on both primary and secondary clusters although it was generated by the primary cluster.

This eliminates the need for the unseal keys (or recovery keys if an auto-unseal is enabled).

Note

Batch tokens have a fixed TTL and the Vault server automatically deletes them after they expire. You can use this in such a way that a Vault operator comes on a shift, the operator generates a batch DR operation token with TTL equals the duration of shift.

Promote DR secondary to primary

This step walks you through the promotion of the secondary cluster (Cluster B) to become the new primary when a catastrophic failure causes the primary cluster (Cluster A) to become inoperable.

Read to the Important Notes section for more relevant information on seals and leader changes and automated DR failover.

Note

If you don't have an environment to test this feature, the Disaster Recovery Replication Failover and Failback tutorial demonstrates the cluster failover using Docker containers.

Generate a DR operation token

You need a DR operation token to perform this task. If you do not have a batch DR operation token, you must generate a DR operation token before you can promote Cluster B. The process below is similar to Generating a Root Token (via CLI) where the threshold of unseal keys are required (or the recovery keys if auto-unseal is enabled). The unseal keys and recovery keys are the ones generated when you initialized the primary cluster (Cluster A).

Note

If you have a DR operation batch token, skip to the promote a DR secondary cluster and use the batch DR operation token.

Perform this operation on the DR secondary cluster (Cluster B).

Start the DR operation token generation process.

$ vault operator generate-root -dr-token -init

The generated output would look like:

A One-Time-Password has been generated for you and is shown in the OTP field.
You will need this value to decode the resulting root token, so keep it safe.
Nonce         b4738404-0a11-63aa-2cb6-e77dfd96946f
Started       true
Progress      0/3
Complete      false
OTP           EYHAkPQYvvz93e8iI3pg1maQ
OTP Length    24

Distribute the generated nonce to each unseal key holder.

In order to generate a DR operation token, the following operation must be executed by each unseal key holder.

Example:

$ vault operator generate-root -dr-token \
    -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
    UNSEAL_KEY_OF_ORIGINAL_DR_PRIMARY_1

Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
Started          true
Progress         1/3
Complete         false

Once the threshold has been reached, the output contains the encoded DR operation token.

Example:

$ vault operator generate-root -dr-token \
    -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
    UNSEAL_KEY_OF_ORIGINAL_DR_PRIMARY_3

Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
Started          true
Progress         3/3
Complete         true
Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf

Decode the generated DR operation token (Encoded Token).

Example:

$ vault operator generate-root -dr-token \
     -decode="djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf" \
     -otp="EYHAkPQYvvz93e8iI3pg1maQ"

hvs.5xsAyncmt1OPEHhMFPMKcYAG

In the Manage tab of your DR secondary (Cluster B), click on Generate token.
In the resulting modal, notice the option to encrypt your token with a PGP key. For this tutorial, select Generate operation token.
A quorum of unseal keys must be entered to create a new operation token for the DR secondary.
This operation must be performed by each unseal-key holder.
Once the threshold has been reached, the output displays the encoded DR operation token. If the OTP is still available it will be displayed, however if it is not, you will need to retrieve the value from the earlier step. Click the Copy icon to copy the DR Operation Token Command.

Execute the copied CLI command from a terminal to generate a DR operation token. Make sure to include the OTP from the earlier step.

Example:

$ vault operator generate-root -dr-token \
     -otp="I4BbXfN0F2biXY53bXx4bKPwU0" \
     -decode="OhobGjUifglzc1oPEwtyfSUWEUAHAT4yPHU"

hvs.5Jw2qwxzwnYrgswSoNLoYqQC

Promote a DR secondary cluster

Use the generated DR operation service token or the batch token to perform this step.

Promote the DR secondary (Cluster B) to become the new primary. The request must pass the DR operation token using the sys/replication/dr/secondary/promote endpoint.

$ vault write sys/replication/dr/secondary/promote dr_operation_token=<DR_OPERATION_TOKEN>

Example:

$ vault write sys/replication/dr/secondary/promote \
     dr_operation_token=hvs.5xsAyncmt1OPEHhMFPMKcYAG

WARNING! The following warnings were returned from Vault:

* This cluster is being promoted to a replication primary. Vault will be
unavailable for a brief period and will resume service shortly.

Do not forget to update all Vault clients to point to the new primary (Cluster B) to send requests to resume operations. If your DR replication group has more than one DR secondary, you need to update the remaining DR secondary clusters to point to the new primary (Cluster B).

Authentication

Once Cluster B is successfully promoted, you should be able to log in using the configured authentication methods to operate Cluster B. If desired, generate a new root token.

Update the assigned primary

If you have more than one DR secondary clusters, you need to update the primary cluster that the DR secondaries point to.

Requirement

This task also requires a DR operation token. Similar to the DR secondary promotion operation, use the batch DR operation token or generate a DR operation service token on the secondary cluster.

On the new primary cluster (Cluster B), generate a secondary activation token similar to what you have done in Enable DR Primary Replication.

$ vault write sys/replication/dr/primary/secondary-token id=dr-secondary

Example output:

Key                              Value
---                              -----
wrapping_token:                  eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vMTI3LjAuMC4xOjgzMDAiLCJleHAiOjE2NTczMjgxODAsImlhdCI6MTY1NzMyNjM4MCwianRpIjoiaHZzLk5lRFNBdzVyNG1BR3VlWjJDemN6VUFSdCIsIm5iZiI6MTY1NzMyNjM3NSwidHlwZSI6IndyYXBwaW5nIn0.AUDkvXgLusBk9wqrMz6BA6W8B1OJ8-TUDXSrBd0JsbZEy7tprKrP3lREB0S2vqDyeFJFhVUA_Uv2SlbRWR-Z6oIDALWyNwe8GRE-bZKuheTNPO2s6_cKgB7EwFp5C9bUYmj7ru2B4oyYE3RVk_DbYG7KhD-2k1fq-g68333OUovTccuM
wrapping_accessor:               uZ2ODJ0zBXZDMgXs34a1AXT1
wrapping_token_ttl:              30m
wrapping_token_creation_time:    2022-07-08 17:26:20.651075 -0700 PDT
wrapping_token_creation_path:    sys/replication/dr/primary/secondary-token

Copy the generated wrapping_token value.

On the DR secondary cluster (Cluster E) you wish to update, invoke the sys/replication/dr/secondary/update-primary endpoint where <SECONDARY_ACTIVATION_TOKEN> is the wrapping_token you copied from Cluster B.

$ vault write sys/replication/dr/secondary/update-primary \
    dr_operation_token=<DR_OPERATION_TOKEN> \
    token=<SECONDARY_ACTIVATION_TOKEN>

Example:

$ vault write sys/replication/dr/secondary/update-primary \
    dr_operation_token="hvb.AAAAAQIL13E0DXo9sq-1vaLJD1_69nfgb-qojEets9EkDTvWs_L_lXVFplXpIpXivr1qlrppxA0tqM4ckYG4dzISLMWqwpqLSQBgfllkd7N-xzsS0Bg7lVO7sb7B2FcWogM64SJshk9VDQajRskb3jZutWW_FLrUhWl5lKfL3j9CxIe5I73NRoYLbtLDEVsGOwQIYgTdNg" \
    token="eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vMTI3LjAuMC4xOjgzMDAiLCJleHAiOjE2NTczMjgxODAsImlhdCI6MTY1NzMyNjM4MCwianRpIjoiaHZzLk5lRFNBdzVyNG1BR3VlWjJDemN6VUFSdCIsIm5iZiI6MTY1NzMyNjM3NSwidHlwZSI6IndyYXBwaW5nIn0.AUDkvXgLusBk9wqrMz6BA6W8B1OJ8-TUDXSrBd0JsbZEy7tprKrP3lREB0S2vqDyeFJFhVUA_Uv2SlbRWR-Z6oIDALWyNwe8GRE-bZKuheTNPO2s6_cKgB7EwFp5C9bUYmj7ru2B4oyYE3RVk_DbYG7KhD-2k1fq-g68333OUovTccuM"

WARNING! The following warnings were returned from Vault:

  * Vault has successfully found secondary information; it may take a while to
  perform setup tasks. Vault will be unavailable until these tasks and initial
  sync complete.

From the new primary cluster (Cluster B), select the Secondaries tab, and then click Add secondary. Populate the Secondary ID field, and click Generate token.
Click Copy to copy the token which you will need to enable the DR secondary cluster.
For the DR secondary cluster (Cluster E) you wish to update, click the Update button from the Disaster Recovery manage page.
Enter the DR operation token in the DR operation token field, and paste the secondary activation token you copied from Cluster B.
Click Update. This updates the primary information.

Option 1 - Demote DR primary to secondary

If the original DR primary cluster (Cluster A) becomes operational again after Cluster B was promoted, you can demote Cluster A to become a secondary.

Remember that there is only one primary cluster in the DR replication. At this point, Cluster A's data is outdated due to its outage. Demoting it to be a DR secondary will properly replicate data from the current DR primary cluster (Cluster B).

Cluster A still thinks it is a DR primary that you should be able to log in with root token. Execute the following command to demote Cluster A to a secondary.
```
$ vault write -f sys/replication/dr/primary/demote
```
Cluster A does not attempt to connect to a primary, but it maintains the knowledge of its cluster ID and can be reconnected to the same DR replication set without wiping local storage. Perform the following steps to complete the update-primary operation.
On the new primary cluster (Cluster B), generate a secondary activation token similar to what you have done in Enable DR Primary Replication.
```
$ vault write sys/replication/dr/primary/secondary-token id=new-secondary
```
Copy the generated wrapping_token which you will need when you invoke the sys/replication/dr/secondary/update-primary endpoint later.

On Cluster A, generate the DR operation token similar to Promote DR Secondary to Primary.

Example:

$ vault operator generate-root -dr-token -init

A One-Time-Password has been generated for you and is shown in the OTP field.
You will need this value to decode the resulting root token, so keep it safe.
Nonce 829b8057-a486-cd02-6ce0-0a2c5d5ab0ce
Started true
...

Distribute the generated nonce to each unseal key holder so that they can execute the generate-root command with their unseal key.

$ vault operator generate-root -dr-token \
       -nonce=829b8057-a486-cd02-6ce0-0a2c5d5ab0ce \
       UNSEAL_KEY_OF_ORIGINAL_DR_PRIMARY_HERE

Once the threshold has been reached, the output contains the encoded DR operation token which you need to decode first.

$ vault operator generate-root -dr-token \
       -decode=JGsAeTApUAQsIGJTAxAIYgobcRo9TCY3IwA \
       -otp=WEaATFbgIi01meg1AUGNNySFle

 s.a8do2ceIRbnuoSKN6Ts5uqOe

Finally, invoke the sys/replication/dr/secondary/update-primary endpoint.

$ vault write sys/replication/dr/secondary/update-primary \
       dr_operation_token=s.a8do2ceIRbnuoSKN6Ts5uqOe \
       token="eyJhbGciOiJFUzUxMiIsImt..."

While token value is the wrapping_token you copied from Cluster B.

From the Cluster A web UI, select the arrow next to Status and click Disaster Recovery Primary under REPLICATION.
Select the Manage tab.
Click Demote cluster.
The Vault UI displays a modal describing the outcomes of demotion. To proceed, type Disaster Recovery in the confirmation box, then click Confirm.
Cluster A does not attempt to connect to a primary, but it maintains the knowledge of its cluster ID and can be reconnected to the same DR replication set without wiping local storage. Continue to perform the update primary operation.
Back in the Manage tab, in the Generate operation token box, click on Generate Token.
In the resulting modal, notice the option to encrypt your token with a PGP key. For this tutorial, choose Generate operation token.
Click the Copy icon and save the generated One Time Password (OTP). You will need it later to decode the Operation Token.
A quorum of unseal keys must be entered to create a new operation token for the DR secondary.
This operation must be performed by each unseal-key holder.
Once the threshold has been reached, the output displays the encoded DR operation token. If the OTP is still available it will be displayed, however if it is not you will need to retrieve the value from the earlier step. Click the Copy icon to copy the DR Operation Token Command.

Execute the copied CLI command from a terminal to generate a DR operation token.

$ vault operator generate-root -dr-token -otp="h0wSwoA6vP6tnN4f9JEPUFJWzy" \
        -decode="Gx4fYAMjOGM0aXcgOiFFHHQaDBZkDwwgFws"

s.h3tLyUB9ATToqzMPIF1IFwmr

From the new primary cluster (Cluster B), select the Secondaries tab, and then click Add secondary. Populate the Secondary ID field, and click Generate token.
Click Copy to copy the token which you will need to enable the DR secondary cluster.
Return to the Cluster A web UI.
Go the Manage tab and click Update in the Update primary box. In the resulting modal, enter the generated DR operation token in the DR operation token field, and paste the secondary activation token you copied from Cluster B.
Click Update. This updates the primary information.

Note

Refer to the Monitoring Vault Replication tutorial to check the DR replication status.

Option 2 - Disable the original DR primary

Once the DR secondary cluster (Cluster B) is promoted to be the new primary, you may want to disable the DR replication on the original primary (Cluster A) when it becomes operational again.

Remember that there is only one primary cluster available in a DR replication group. Cluster A's data is outdated due to its outage.

Execute the following command to disable DR replication.

$ vault write -f sys/replication/dr/primary/disable

WARNING! The following warnings were returned from Vault:

* This cluster is having replication disabled. Vault will be unavailable for
  a brief period and will resume service shortly.

Any secondaries will no longer be able to connect.

Invoke the sys/replication/dr/primary/disable endpoint to disable DR replication.

$ curl --header POST --header "X-Vault-Token: $VAULT_TOKEN" https://cluster-A.example.com:8200/v1/sys/replication/dr/primary/disable | jq
{
   "request_id": "92a5f57a-2f7b-11be-b9dd-0f028396fba8",
   "lease_id": "",
   "renewable": false,
   "lease_duration": 0,
   "data": null,
   "wrap_info": null,
   "warnings": [
     "This cluster is having replication disabled. Vault will be unavailable for a brief period and will resume service shortly."
   ],
   "auth": null
}

Any secondaries will no longer be able to connect.

In the Manage tab of the primary's UI, click the Disable replication button.
The resulting modal displays the results of disabling replication. If you'd like to continue, type Disaster Recovery in the input field to confirm.

Any secondaries will no longer be able to connect.

Warning

Once this is done, re-enabling the DR replication as a primary will change the cluster's ID. Its connecting secondaries will require a wipe of the underlying storage even if they have connected before. If re-enabling DR replication as a secondary, its underlying storage will be wiped when connected to a primary.

DR failback

Currently, Cluster B is the active primary.

Once Cluster A is back to a healthy state, you may wish to revert it to being the primary. To achieve this, you must promote Cluster A back to be the DR primary (perform Promote DR Secondary to Primary on Cluster A) and then demote Cluster B to DR secondary (refer to Option 1).

You need a DR operation token to perform this task. If you do not have a batch DR operation token, you must generate a DR operation token first.

Note

If you have a batch DR operation token, skip the token generation steps and use your batch DR operation token instead.

On Cluster A, start the DR operation token generation process.

$ vault operator generate-root -dr-token -init

The generated output would look like:

A One-Time-Password has been generated for you and is shown in the OTP field.
You will need this value to decode the resulting root token, so keep it safe.
Nonce         b4738404-0a11-63aa-2cb6-e77dfd96946f
Started       true
Progress      0/3
Complete      false
OTP           EYHAkPQYvvz93e8iI3pg1maQ
OTP Length    24

Distribute the generated nonce to each unseal key holder.

In order to generate a DR operation token, the following operation must be executed by each unseal key holder.

Example:

$ vault operator generate-root -dr-token \
       -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
       $PRIMARY_UNSEAL_KEY_1

Once the threshold has been reached, the output contains the encoded DR operation token.

$ vault operator generate-root -dr-token \
     -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
     $PRIMARY_UNSEAL_KEY_3

Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
Started          true
Progress         3/3
Complete         true
Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf

Decode the generated DR operation token (Encoded Token).

Example:

$ vault operator generate-root -dr-token \
     -decode="djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf" \
     -otp="EYHAkPQYvvz93e8iI3pg1maQ"

s.3epDv29lsVfc0oZadkjs6qRN

Execute the following command on Cluster A to promote it back to be the DR primary using the DR Operation Token you generated when you demoted Cluster A to DR secondary in Option 1.

Example:

$ vault write sys/replication/dr/secondary/promote \
     dr_operation_token="s.3epDv29lsVfc0oZadkjs6qRN"

WARNING! The following warnings were returned from Vault:

* This cluster is being promoted to a replication primary. Vault will be
unavailable for a brief period and will resume service shortly.

Execute the following command on Cluster B to demote it to a secondary.

$ vault write -f sys/replication/dr/primary/demote

WARNING! The following warnings were returned from Vault:

* This cluster is being demoted to a replication secondary. Vault will be
unavailable for a brief period and will resume service shortly.

Now, generate a secondary activation token similar to what you have done in Enable DR Primary Replication.
```
$ vault write sys/replication/dr/primary/secondary-token id=secondary
```
Copy the generated wrapping_token which you will need when you invoke the sys/replication/dr/secondary/update-primary endpoint later.
On Cluster B, invoke the sys/replication/dr/secondary/update-primary endpoint using the wrapping_token you just generated on Cluster A, and the DR Operation Token that you generated in Promote DR Secondary to Primary.
If you don't have the DR Operation Token any more, you can create a new one by following the steps described in Promote DR Secondary to Primary.
Example:
```
$ vault write sys/replication/dr/secondary/update-primary \
     dr_operation_token=s.YxmD095A8fKRGNGNiteJnEiE \
     token="eyJhbGciOiJFUzUxMiIsImt..."
```

On Cluster A's Manage tab, click on Generate token in the Generate operation token box.
A quorum of unseal keys must be entered to create a new operation token for the DR secondary.
This operation must be performed by each unseal-key holder.
In the resulting modal, notice the option to encrypt your token with a PGP key. For this tutorial, we'll choose to Generate operation token.
Enter a portion of the master key and click Generate Token.
Using the encoded operation token and One time password (OTP), copy and paste the DR operation token command into the vault CLI to obtain your operation token. Save this value.

Execute the copied CLI command from a terminal to generate a DR operation token.

Example:

$ vault operator generate-root -dr-token \
     -otp="I4BbXfN0F2biXY53bXx4bKPwU0" \
     -decode="OhobGjUifglzc1oPEwtyfSUWEUAHAT4yPHU"

s.YxmD095A8fKRGNGNiteJnEiE

Now that you have the operation token, head back to the Manage tab and click Promote in the Promote cluster box.
In the resulting modal, enter the DR Operation Token you generated when you demoted Cluster A to DR secondary in Step 4: Option 1.
Click Promote.
Open the UI for Cluster B.
From the Status dropdown, click Disaster Recovery Primary under REPLICATION.
In the Manage tab, click Demote in the 'Demote cluster' box.
The Vault UI displays a modal describing the outcomes of demotion. To proceed, type Disaster Recovery in the confirmation box, then click Demote.
Return to Cluster A.
Select the Secondaries tab, and then click Add. Populate the Secondary ID field (e.g. secondary), and click Generate token.
Click Copy to copy the token which you will need to enable the DR secondary cluster.
Return to Cluster B.
In the Manage tab, click Update in the Update primary box.
In the resulting modal, you can enter the DR operation token that you generated in Promote DR Secondary to Primary in the DR operation token field, and paste the secondary activation token you copied from Cluster A.
If you don't have the DR Operation Token anymore, you can create a new one by following the steps described in Promote DR Secondary to Primary.
Click Update. This completes the promotion process.

Important notes

Seal and leader changes

A change in leader may occur when performing a promote or demote on a cluster.

Depending on the Seal type used, difference between cluster in their auto-unseal configuration may result in an additional unseal step being required after promote; this is typically evident by your Vault standby nodes sealing and the Vault system log including the message:

[WARN] core: encryption keys have changed out from underneath us (possibly due to replication enabling), must be unsealed again

Automated DR failover

Vault does not support an automatic failover/promotion of a DR secondary cluster, and this is a deliberate choice due to the difficulty in accurately evaluating why a failover should or shouldn't happen. For example, imagine a DR secondary loses its connection to the primary. Is it because the primary is down, or is it because networking between the two has failed?

If the DR secondary promotes itself and clients start connecting to it, you now have two active clusters whose data sets will immediately start diverging. There's no way to understand simply from one perspective or the other which one of them is right.

Vault's API supports programmatically performing various replication operations which allows the customer to write their own logic about automating some of these operations based on experience within their own environments. You can review the available replication APIs at the following links:

Additional discussion

This tutorial focused on the DR replication workflow. In production, you may deploy additional Vault clusters across multiple datacenters and configure both DR replication and performance replication (PR).

Note

Before you configure DR replication in Data Center 2, first setup performance replication on Cluster C as a performance secondary, and then configure Cluster D as a DR secondary. This is because any existing data is immediately cleared when you enable performance replication on the PR secondary cluster (Cluster C).

When you have both DR and PR replications, the failure of Cluster A implies the disconnection of performance replication as well.

Failover Cluster A to Cluster B.

Once the failover completes, you can re-enable performance replication between Cluster B (new primary) and the Cluster C (secondary) by calling the update-primary on Cluster C.

You can learn more about performance replication in the Setting up performance replication tutorial.

Help and reference

Manage secrets across namespaces

Recover from catastrophic failure

This tutorial also appears in:

8 tutorials

Deploy cluster with Integrated Storage
Set up, maintain, and learn best practices for a Vault cluster using integrated storage.
- Vault