Key rotation
Vault has multiple encryption keys that are used for various purposes. These keys support rotation so that they can be periodically changed or in response to a potential leak or compromise. It is useful to first understand the high-level architecture before learning about key rotation.
As a review, Vault starts in a sealed state. Vault is unsealed by providing the unseal keys. By default, Vault uses a technique known as Shamir's secret sharing algorithm to split the root key into 5 shares, any 3 of which are required to reconstruct the master key. The root key is used to protect the encryption key, which is ultimately used to protect data written to the storage backend.
To support key rotation, we need to support changing the unseal keys, root key, and the
backend encryption key. We split this into two separate operations, rekey
and rotate
.
The rekey
operation is used to generate a new root key. When this is being done,
it is possible to change the parameters of the key splitting, so that the number of shares
and the threshold required to unseal can be changed. To perform a rekey a threshold of the
current unseal keys must be provided. This is to prevent a single malicious operator from
performing a rekey and invalidating the existing root key.
Performing a rekey is fairly straightforward. The rekey operation must be initialized with the new parameters for the split and threshold. Once initialized, the current unseal keys must be provided until the threshold is met. Once met, Vault will generate the new master key, perform the splitting, and re-encrypt the encryption key with the new root key. The new unseal keys are then provided to the operator, and the old unseal keys are no longer usable.
The rotate
operation is used to change the encryption key used to protect data written
to the storage backend. This key is never provided or visible to operators, who only
have unseal keys. This simplifies the rotation, as it does not require the current key
holders unlike the rekey
operation. When rotate
is triggered, a new encryption key
is generated and added to a keyring. All new values written to the storage backend are
encrypted with the new key. Old values written with previous encryption keys can still
be decrypted since older keys are saved in the keyring. This allows key rotation to be
done online, without an expensive re-encryption process.
Both the rekey
and rotate
operations can be done online and in a highly available
configuration. Only the active Vault instance can perform either of the operations
but standby instances can still assume an active role after either operation. This is
done by providing an online upgrade path for standby instances. If the current encryption
key is N
and a rotation installs N+1
, Vault creates a special "upgrade" key, which
provides the N+1
encryption key protected by the N
key. This upgrade key is only available
for a few minutes enabling standby instances to do a periodic check for upgrades.
This allows standby instances to update their keys and stay in-sync with the active Vault
without requiring operators to perform another unseal.
The rotate/config
endpoint is used to configure the number of operations or time interval
between automatic rotations of the backend encryption key.
NIST rotation guidance
Periodic rotation of the encryption keys is recommended, even in the absence of compromise. Due to the nature of the AES-256-GCM encryption used, keys should be rotated before approximately 232 encryptions have been performed, following the guidelines of NIST publication 800-38D.
As of Vault 1.7, Vault will automatically rotate the backend encryption key prior to reaching 232 encryption operations by default.
Operators can estimate the number of encryptions by summing the following:
- The
vault.barrier.put
telemetry metric. - The
vault.token.creation
metric where thetoken_type
label isbatch
. - The
merkle.flushDirty.num_pages
metric. - The WAL index.
Vault periodically persists the number of encryptions to support rotation.
This save operation has a 1 second timeout to prevent impact to performance
if Vault is under heavy load. Because persisting encryptions involves the
seal backend (if seal wrap is enabled), some seals (such as HSMs) may take
regularly longer than 1 second to respond. If this is the case, operators
may override that timeout by setting the environment variable
VAULT_ENCRYPTION_COUNT_PERSIST_TIMEOUT
to a larger value, such as "5s".