Vault 1.13.0 release notes

Software Release date: March 1, 2023

Summary: Vault Release 1.13.0 offers features and enhancements that improve the user experience while solving critical issues previously encountered by our customers. We are providing an overview of improvements in this set of release notes.

We encourage you to upgrade to the latest release of Vault to take advantage of the new benefits provided. With this latest release, we offer solutions to critical feature gaps that were identified previously. Please refer to the Changelog within the Vault release for further information on product improvements, including a comprehensive list of bug fixes.

Some of these enhancements and changes in this release include the following:

PKI improvements:
- Cross Cluster PKI Certificate Revocation: Introducing a new unified OCSP responder and CRL builder that enables a certificate revocations and CRL view across clusters for a given PKI mount.
- PKI UI Beta: New UI introducing cross-signing flow, overview page, roles and keys view.
- Health Checks: Provide a health overview of PKI mounts for proactive actions and troubleshooting.
- Command Line: Simplified CLI to discover, rotate issuers and related commands for PKI mounts
Azure Auth Improvements:
- Rotate-root support: Add the ability to rotate the root account's client secret defined in the auth method's configuration via the new rotate-root endpoint.
- Managed Identities authentication: The auth method now allows any Azure resource that supports managed identities to authenticate with Vault.
- VMSS Flex authentication: Add support for Virtual Machine Scale Set (VMSS) Flex authentication.
GCP Secrets Impersonated Account Support: Add support for GCP service account impersonation, allowing callers to generate a GCP access token without requiring Vault to store or retrieve a GCP service account key for each role.
Managed Keys in Transit Engine: Support for offloading Transit Key operations to HSMs/external KMS.
KMIP Secret Engine Enhancements: Implemented Asymmetric Key Lifecycle Server and Advanced Cryptographic Server profiles. Added support for RSA keys and operations such as: MAC, MAC Verify, Sign, Sign Verify, RNG Seed and RNG Retrieve.
Vault as a SSM: Support is planned for an upcoming Vault PKCS#11 Provider version to include mechanisms for encryption, decryption, signing and signature verification for AES and RSA keys.
Replication (enterprise): We fixed a bug that could cause a cluster to wind up in a permanent merkle-diff/merkle-sync loop and never enter stream-wals, particularly in cases of high write loads on the primary cluster.
Share Secrets in Independent Namespaces (enterprise): You can now add users from namespaces outside a namespace hierarchy to a group in a given namespace hierarchy. For Vault Agent, you can now grant it access to secrets outside the namespace where it authenticated, and reduce the number of Agents you need to run.
User Lockout: Vault now supports configuration to lock out users when they have consecutive failed login attempts. This feature is enabled by default in 1.13 for the userpass, ldap, and approle auth methods.
Event System (Alpha): Vault has a new experimental event system. Events are currently only generated on writes to the KV secrets engine, but external plugins can also be updated to start generating events.
Kubernetes authentication plugin bug fix: Ensures a consistent TLS configuration for all k8s API requests. This fixes a bug where it was possible for the http.Client's Transport to be missing the necessary root CAs to ensure that all TLS connections between the auth engine and the Kubernetes API were validated against the configured set of CA certificates.
Kubernetes Secretes Engine on Vault UI: Introducing Kubernetes secret engine support on the UI
Client Count UI improvements: Combining current month and previous history into one dashboard
OCSP Support in the TLS Certificate Auth Method: The auth method now can check for revoked certificates using the OCSP protocol.
UI Wizard removal: The UI Wizard has been removed from the UI since the information was occasionally out-of-date and did not align with the latest changes. A new and enhanced UI experience is planned in a future release.
Vault Agent improvements:
- Auto-auth introduced token_file method which reads an existing token from a file. The token file method is designed for development and testing. It is not suitable for production deployment.
- Listeners for the Vault Agent can define a role set to metrics_only so that a service can be configured to listen on a particular port to collect metrics.
- Vault Agent can read configurations from multiple files.
- Users can specify the log file path using the -log-file command flag or VAULT_LOG_FILE environment variable. This is particularly useful when Vault Agent is running as a Windows service.
OpenAPI-based Go & .NET Client Libraries (Public Beta): Use the new Go & .NET client libraries to interact with the Vault API from your applications.
- OpenAPI-based Go client library
- OpenAPI-based .NET client library

Known issues

When Vault is configured without a TLS certificate on the TCP listener, the Vault UI may throw an error that blocks you from performing operational tasks.

The error message: Q.randomUUID is not a function

Note

Refer to this Knowledge Base article for more details and a workaround.

The fix for this UI issue is coming in the Vault 1.13.1 release.

Token creation with a new entity alias could silently fail

A regression caused token creation requests under specific circumstances to be forwarded from perf standbys (Enterprise only) to the active node incorrectly. They would appear to succeed, however no lease was created. The token would then be revoked on first use causing a 403 error.

This only happened when all of the following conditions were met:

the token is being created against a role
the request specifies an entity alias which has never been used before with the same role (for example for a brand new role or a unique alias)
the request happens to be made to a perf standby rather than the active node

Retrying token creation after the affected token is rejected would work since the entity alias has already been created.

Affected versions

Affects Vault 1.13.0 to 1.13.3. Fixed in 1.13.4.

API calls to update-primary may lead to data loss

Affected versions

All versions of Vault before 1.14.1, 1.13.5, 1.12.9, and 1.11.12.

Issue

The update-primary endpoint temporarily removes all mount entries except for those that are managed automatically by vault (e.g. identity mounts). In certain situations, a race condition between mount table truncation replication repairs may lead to data loss when updating secondary replication clusters.

Situations where the race condition may occur:

When the cluster has local data (e.g., PKI certificates, app role secret IDs) in shared mounts. Calling update-primary on a performance secondary with local data in shared mounts may corrupt the merkle tree on the secondary. The secondary still contains all the previously stored data, but the corruption means that downstream secondaries will not receive the shared data and will interpret the update as a request to delete the information. If the downstream secondary is promoted before the merkle tree is repaired, the newly promoted secondary will not contain the expected local data. The missing data may be unrecoverable if the original secondary is is lost or destroyed.
When the cluster has an Allow paths defined. As of Vault 1.0.3.1, startup, unseal, and calling update-primary all trigger a background job that looks at the current mount data and removes invalid entries based on path filters. When a secondary has Allow path filters, the cleanup code may misfire in the windown of time after update-primary truncats the mount tables but before the mount tables are rewritten by replication. The cleanup code deletes data associated with the missing mount entries but does not modify the merkle tree. Because the merkle tree remains unchanged, replication will not know that the data is missing and needs to be repaired.

Workaround 1: PR secondary with local data in shared mounts

Watch for cleaning key in merkle tree in the TRACE log immediately after an update-primary call on a PR secondary to indicate the merkle tree may be corrupt. Repair the merkle tree by issuing a replication reindex request to the PR secondary.

If TRACE logs are no longer available, we recommend pre-emptively reindexing the PR secondary as a precaution.

Workaround 2: PR secondary with "Allow" path filters

Watch for deleted mistakenly stored mount entry from backend in the INFO log. Reindex the performance secondary to update the merkle tree with the missing data and allow replication to disseminate the changes. You will not be able to recover local data on shared mounts (e.g., PKI certificates).

If INFO logs are no longer available, query the shared mount in question to confirm whether your role and configuration data are present on the primary but missing from the secondary.

Internal error when vault policy in namespace does not exist

If a user is a member of a group that gets a policy from a namespace other than the one they’re trying to log into, and that policy doesn’t exist, Vault returns an internal error. This impacts all auth methods.

Affected versions

1.13.8 and 1.13.9
1.14.4 and 1.14.5
1.15.0 and 1.15.1

A fix has been released in Vault 1.13.10, 1.14.6, and 1.15.2.

Workaround

During authentication, Vault derives inherited policies based on the groups an entity belongs to. Vault returns an internal error when attaching the derived policy to a token when:

the token belongs to a different namespace than the one handling authentication, and
the derived policy does not exist under the namespace.

You can resolve the error by adding the policy to the relevant namespace or deleting the group policy mapping that uses the derived policy.

As an example, consider the following userpass auth method failure. The error is due to the fact that Vault expects a group policy under the namespace that does not exist.

# Failed login
$ vault login -method=userpass username=user1 password=123
Error authenticating: Error making API request.

URL: PUT http://127.0.0.1:8200/v1/auth/userpass/login/user1
Code: 500. Errors:

* internal error

To confirm the problem is a missing policy, start by identifying the relevant entity and group IDs:

$ vault read -format=json identity/entity/name/user1 | \
  jq '{"entity_id": .data.id, "group_ids": .data.group_ids} '
{
  "entity_id": "420c82de-57c3-df2e-2ef6-0690073b1636",
  "group_ids": [
    "6cb152b7-955d-272b-4dcf-a2ed668ca1ea"
  ]
}

Use the group ID to fetch the relevant policies for the group under the ns1 namespace:

$ vault read -format=json -namespace=ns1 \
  identity/group/id/6cb152b7-955d-272b-4dcf-a2ed668ca1ea | \
  jq '.data.policies'
[
  "group_policy"
]

Now that we know Vault is looking for a policy called group_policy, we can check whether that policy exists under the ns1 namespace:

$ vault policy list -namespace=ns1
default

The only policy in the ns1 namespace is default, which confirms that the missing policy (group_policy) is causing the error.

To fix the problem, we can either remove the missing policy from the 6cb152b7-955d-272b-4dcf-a2ed668ca1ea group or create the missing policy under the ns1 namespace.

To remove group_policy from group ID 6cb152b7-955d-272b-4dcf-a2ed668ca1ea, use the vault write command to set the applicable policies to just include default:

$ vault write                                             \
  -namespace=ns1                                          \
  identity/group/id/6cb152b7-955d-272b-4dcf-a2ed668ca1ea  \
  name="test"                                             \
  policies="default"

To create the missing policy, use vault policy write and define the appropriate capabilities:

$ vault policy write -namespace=ns1 group_policy - << EOF
    path "secret/data/*" {
        capabilities = ["create", "update"]
    }
EOF

Verify the fix by re-running the login command:

$ vault login -method=userpass username=user1 password=123

Vault is storing references to ephemeral sub-loggers leading to unbounded memory consumption

Affected versions

This memory consumption bug affects Vault Community and Enterprise versions:

1.13.7 - 1.13.9
1.14.3 - 1.14.5
1.15.0 - 1.15.1

This change that introduced this bug has been reverted as of 1.13.10, 1.14.6, and 1.15.2

Issue

Vault is unexpectedly storing references to ephemeral sub-loggers which prevents them from being cleaned up, leading to unbound memory consumption for loggers. This came about from a change to address a previously known issue around sub-logger levels not being adjusted on reload. This impacts many areas of Vault, but primarily logins in Enterprise.

Workaround

There is no workaround.

Sublogger levels not adjusted on reload

Affected versions

This issue affects all Vault Community and Vault Enterprise versions.

Issue

Vault does not honor a modified log_level configuration for certain subsystem loggers on SIGHUP.

The issue is known to specifically affect resolver.watcher and replication.index.* subloggers.

After modifying the log_level and issuing a reload (SIGHUP), some loggers are updated to reflect the new configuration, while some subsystem logger levels remain unchanged.

For example, after starting a server with log_level: "trace" and modifying it to log_level: "info" the following lines appear after reload:

[TRACE] resolver.watcher: dr mode doesn't have failover support, returning
...
[DEBUG] replication.index.perf: saved checkpoint: num_dirty=5
[DEBUG] replication.index.local: saved checkpoint: num_dirty=0
[DEBUG] replication.index.periodic: starting WAL GC: from=2531280 to=2531280 last=2531536

Workaround

The workaround is to restart the Vault server.

Fatal error during expiration metrics gathering causing Vault crash

Affected versions

This issue affects Vault Community and Enterprise versions:

1.13.9
1.14.5
1.15.1

A fix has been issued in Vault 1.13.10, 1.14.6, and 1.15.2.

Issue

A recent change to Vault to improve state change speed (e.g. becoming active or standby) introduced a concurrency issue which can lead to a concurrent iteration and write on a map, causing a fatal error and crashing Vault. This error occurs when gathering lease and token metrics from the expiration manager. These metrics originate from the active node in a HA cluster, as such a standby node will take over active duties and the cluster will remain functional should the original active node encounter this bug. The new active node will be vulnerable to the same bug, but may not encounter it immediately.

There is no workaround.

Deadlock can occur on performance secondary clusters with many mounts

Affected versions

1.15.0 - 1.15.5
1.14.5 - 1.14.9
1.13.9 - 1.13.13

Issue

Vault 1.15.0, 1.14.5, and 1.13.9 introduced a worker pool to schedule periodic rollback operations on all mounts. This worker pool defaulted to using 256 workers. The worker pool introduced a risk of deadlocking on the active node of performance secondary clusters, leaving that cluster unable to service any requests.

The conditions required to cause the deadlock on the performance secondary:

Performance replication is enabled
The performance primary cluster has more than 256 non-local mounts. The more mounts the cluster has, the more likely the deadlock becomes
One of the following occurs:
- A replicated mount is unmounted or remounted OR
- A replicated namespace is deleted OR
- Replication paths filters are used to filter at least one mount or namespace

Workaround

Set the VAULT_ROLLBACK_WORKERS environment variable to a number larger than the number of mounts in your Vault cluster and restart Vault:

$ export VAULT_ROLLBACK_WORKERS=1000

Sending SIGHUP to vault standby node causes panic

Affected versions

1.13.4+
1.14.0+
1.15.0+
1.16.0+

Issue

Sending a SIGHUP to a vault standby node running an enterprise build can cause a panic if there is a change to the license, or reporting configuration. Active and performance standby nodes will perform fine. It is recommended that operators stop and restart vault nodes individually if configuration changes are required.

Workaround

Instead of issuing a SIGHUP, users should stop individual vault nodes, update the configuration or license and then restart the node.

Feature deprecations and EOL

Please refer to the Deprecation Plans and Notice page for up-to-date information on feature deprecations and plans. A Feature Deprecation FAQ page addresses questions about decisions made about Vault feature deprecations.