Inspect data in Integrated Storage
Note
This tutorial focuses on inspecting Vault data stored in Integrated Storage. Also, refer to Vault Limits and Maximums for known upper limits on the size of certain fields and objects, and configurable limits on others.
In production deployments, Vault persists critical operational data to its configured durable storage. Gathering key facts about these operational data can be helpful when engaged in advanced troubleshooting.
During advanced troubleshooting for Vault servers which use the Integrated Storage backend, you can activate Recovery Mode on a server. This helps you inspect Vault data in an offline manner using a limited API that prevents general use of Vault while troubleshooting is in progress.
One approach available for inspecting data in a server operating in recovery mode is to use the /sys/raw API endpoint.
This tutorial provides a detailed workflow for inspecting Vault data with example commands and responses to help familiarize you with this approach.
Note
A separate, but related tutorial for inspecting Vault data in Consul describes a different approach for inspecting data in Vault installations which use the Consul storage backend.
Notes and prerequisites
Please pay attention to the following list of important notes and prerequisites before you begin with this tutorial.
- This tutorial is not intended to be comprehensive or cover all possible types of data in Vault. Instead, it's a means to help you get started in inspecting your Vault data so that you can familiarize yourself with the process.
- Vault version 1.4.0 or later using the Integrated Storage backend.
- To inspect Vault data using the
vault
CLI and HTTP API, this tutorial uses tools such as cURL and jq to fetch the information and process it. - The HTTP API examples expect that the environment variable
VAULT_TOKEN
has the Recovery Token as its value. - This tutorial uses examples from the root namespace; for details about inspecting data within other namespaces created using the Enterprise Namespaces feature, please see the What about Vault Enterprise Namespaces? section.
Note
All examples shown in this tutorial are read-only in nature, and use just GET and LIST operations. Do not try any operations not demonstrated in this tutorial while the Vault server is in recovery mode.
Problem
You can help some advanced Vault troubleshooting situations by identifying issues with data stored in Vault. Asking questions about Vault data helps identify problematic use cases which generate excessive leases, find the root cause of inexplicable growth in write ahead logs (WALs), or more.
As Vault does not expose these kinds of metrics for the data in storage directly to the user, you must query the storage directly using available tooling and techniques.
In these situations, isolating Vault data from users during inspection can help prevent further state changes by active clients. You can start a Vault server in Recovery Mode with a restricted API for data inspection and troubleshooting while also preventing general use of the Vault server.
Solution
You can get details about data stored in Integrated Storage from the /sys/raw API endpoint while Vault is operating in recovery mode.
This tutorial explains in detail about Vault data, and shares an example of a practical and safe workflow for inspecting the data in Integrated Storage.
Before beginning with the practical tutorial, take some time to learn about Vault data in durable storage.
Workflow
The workflow for examining data in Integrated Storage is as follows.
- Stop all Vault cluster servers.
- Start former active server in recovery mode.
- Generate a Recovery Mode operation token,
- Inspect Vault data as required on this server with
/sys/raw
API; the majority of examples use the List Raw API, - Stop the server.
- Start the server in without Recovery Mode.
- Rejoin servers in new state (i.e. with no existing data) to active server which means to choose one of the following strategies:
- Join 4 new standby servers with no existing data to active server to form cluster OR
- Wipe Vault data on existing 4 standby servers and join them as new servers to the active server
The goal is to restore cluster operations and original size after inspecting data, so choose which approach makes most sense for your environment and the circumstances.
For more details, please refer to the recovery mode documentation.
Stop all Vault cluster members
The first step in this workflow is to stop all Vault servers in the cluster, beginning with the standby servers.
Use the configured operating system service or startup script to stop the Vault service on each server node in the cluster.
Example:
Stop the standby servers in the cluster first, and then stop the active server.
After stopping all the servers, start the active server in recovery mode.
Start former active server in recovery mode
Note
When you start Vault in recovery mode, just a subset of its API is available for generating recovery tokens and using the /sys/raw
API. This means that Vault is entirely unavailable for general use while operating in recovery mode.
The /sys/raw API endpoint is not enabled by default. You must start a single Vault server in recovery mode, then generate a recovery mode operation token to access the /sys/raw
endpoint used in this tutorial.
Review the Recovery Mode documentation, which describes the required -recovery
runtime configuration flag. You should refer to that documentation before configuring your Vault server's startup script to start Vault in recovery mode.
When you have one Vault server operating in recovery mode, generating a recovery token, and then use the recovery mode operation token for all operations in this tutorial.
Generate recovery mode operation token
All examples of querying the /sys/raw
endpoint demonstrated in this tutorial require the use of a recovery mode operation token. You will generate one to use as an example of the process here with the with vault
CLI using vault operator generate root
.
Generate a one-time password (OTP).
Use the OTP value to initialize the token generation process.
Example output:
You must pass in a quorum of unseal or recovery keys as necessary to generate the encoded token.
Enter the unseal key (or recovery key if you use auto-unseal) when prompted and the successful output resembles this example and includes the encoded token.
Decode the encoded token to produce the recovery mode operation token.
Example output:
Note the r prefix designating this a recovery mode operation token.
Use the value of the recovery mode operation token that you generate for all examples of listing and reading
/sys/raw/...
paths throughout the tutorial.Example:
To avoid passing the recovery mode operation token
Note
Be sure to use this recovery mode operation token to inspect data as you follow this tutorial.
Inspect Vault data
After generating a recovery mode operation token, you are ready to begin inspecting data with the /sys/raw
API endpoint.
Note
When inspecting Vault data in the Integrated Storage backend via recovery mode, you must prefix any paths you use to directly access items from other storage backends with /sys/raw
.
New Vault server data example
Get to know some Vault data by viewing an actual example with descriptions of each element. When Vault is first initialized and unsealed the persisted data will resemble this example.
A total of 73 key/value pairs are present in this example, representing all the data necessary for Vault to begin operations. A Vault server that is in production will have considerably more data and key/value pairs related to its specific auth methods, and secrets engines.
Here is a brief explanation of each major branch and the elements within them from example.
- core: Items contained in core are critical and internal to Vault operations; these include data about internal auditing, authentication, keyring, mounts, the root key, the seal configuration, cluster information, HSM barrier unseal keys, seal wrapping, and more.
- index: This is local index data used by the Performance Standby feature.
- index-dr: This is index data for the Disaster Recovery mode of Vault Enterprise Replication.
- logical: Dynamic secret configuration and static secrets are here.
- sys: System data includes policy configuration along with tokens and their accessors.
- wal: Write ahead logs (WAL) are present in Vault Enterprise installations to support the Performance Standby feature and assist with enabling Enterprise Replication.
Those are the basics for now.
You will continue by inspecting secrets engine data.
Secret engine data example
A common question about Vault secret data during support and operations troubleshooting scenarios is What is the number of secrets in Vault for a given secrets engine?
To answer this question, first develop some understanding of the secrets engine data storage structure with further examples.
Vault stores secrets the secrets engine data at /sys/raw/logical/<UUID>/
where <UUID>
represents a unique identifier for each secrets engine enabled.
When you initialize a new Vault and unseal it, just the identity secrets engine gets configured in the storage as shown in the example data:
After Vault is further configured and with additional secrets engines enabled, the logical
path holds more secrets engine data.
For example, here is a tree view of example secrets engine data with detailed explanation of each element.
The earlier example shows paths for several secrets engines in the root namespace; here are details on each secrets engine and its associated elements:
- e2b7c3e2-3e21-3391-b73c-8a991a65789d A KV Secrets Engine - Version 2 containing internal configuration and metadata along with the secret data versions found under the
versions
key. - 2788376d-7042-4737-1ebd-9f6391a01f4e A PKI secrets engine which represents the root Certificate Authority (CA). It has the CA information, The Certificate Revocation List (CRL) data, the URL configuration, internal configuration (with a CA bundle), and a role in this case called tacobot-root.
- b7183aba-6e64-e001-fe57-3e7e4508fc0c A PKI secrets engine which represents the intermediate CA. It has the CA information, The CRL data, the URL configuration, internal configuration (with a CA bundle), and a role in this case called tacobot-int. Note also that it has a
certs
key with some certificate serial numbers present which represent the certificates issued from the tacobot-int role. - cb1bfb31-3ccb-ef29-6352-874902c3a021 A Database Secrets Engine with configuration and roles for MongoDB and MySQL
- d1689597-4f78-a30b-7532-e7806be9fcba An Identity Secrets Engine is the identity management solution for Vault and enabled by default. You cannot disable or move this secrets engine.
- fbd73ad9-4f9c-45be-5be2-3758d04808af A Cubbyhole Secrets Engine which Vault enables by default. You cannot disable, move, or enabled this secrets engine.
Now that you are familiar with the shape of data in Vault, try a workflow where you inspect some data in a Vault cluster that uses the Integrated Storage backend.
Here are some examples of different data points available from inspecting Vault Integrated Storage along CLI and HTTP API command examples.
Auth method data
The following are examples for getting information about enabled auth methods and their associated users.
List enabled auth methods
This is like the vault auth list
command or using the List auth Methods API. The output has auth methods described by their internally assigned UUIDs instead of their human friendly names.
Use vault list
in combination with jq
to list enabled auth methods like this.
Count auth method users
You can get a count of existing users for a given auth method like the Username and Password auth method, for example from the sys/raw/auth/$UUID/user
path.
You should replace the $UUID
part (b8acd19c-875d-8e19-3252-ebc1ca1ea936) of the example path with the value of an actual auth method UUID in your own Vault data.
Use vault list
in combination with jq
to list the number of users configured in the specified auth method like this.
32 users appear configured for this username and password auth method.
Secrets engines data
Secrets engine data are under the path sys/raw/sys/logical
.
The following are examples for getting information about enabled secrets engines and their associated secrets.
This is like the vault secrets list
command or using the List Mounted Secrets Engines API. Output has secrets engines described by their internally assigned UUIDs instead of their human friendly names.
List enabled secrets engines
Use the vault list
in combination with jq
to list enabled secrets engines like this.
Token and accessor data
Active tokens and their accessors are under the path sys/raw/sys/token
.
NOTE: If you are familiar with using the consul kv
command or Consul HTTP API to inspect Consul data, you might know that those APIs offer recursive key listing. The Vault API does not support this kind of recursive listing, so examples are more focused in this tutorial. You need to manually total counts for all auth methods, secrets engines, and leases when attempting to recursively list them yourself.
Here is an example for counting active tokens.
Use the vault list
in combination with jq
to count active tokens like this.
Here is an example for counting active token accessors.
Use the vault list
in combination with jq
to count active tokens like this.
Lease data
The following are some examples for getting information about leases associated with auth methods. As mentioned before in this tutorial, it's impossible to recursively list keys with the /sys/raw
API, so you must be specific and manually total all paths when necessary.
Count auth method leases
Here is an example of listing the leases for an existing AppRole auth method enabled at the default path approle
.
Use the vault list
in combination with jq
to count leases in the specified auth method.
Vault stored 10 active leases for this approle auth method. You can use the different names of your auth method paths from your own auth method list output to check leases in other auth methods.
Write ahead log data
Write Ahead Logs (WALs) are under the path sys/raw/wal/logs/
.
Count WAL logs
First, here is a plain list output example.
In this case, the output is a containing key named 00000000 in which each individual WAL object resides.
If you get the length of this key, the value should represent the count of WALs in 00000000.
Use vault list
to get a count of Write Ahead Logs (WAL) from the storage with a command like this.
These examples should be enough to get you started in inspecting your own Vault data when it becomes necessary to get specific answers to aid in troubleshooting.
What about Vault Enterprise Namespaces?
Vault Enterprise version 0.11.0 introduced the Enterprise Namespaces feature.
This changes the earlier procedures slightly, in that each namespace will encapsulate its own leases and tokens in paths under the namespace internal storage path name.
This an example tree of paths from a minimal Vault instance for purposes of illustration:
The actual user-configured namespace name is example-namespace
, but Vault stores it internally as a short unique identifier instead; in the above example, it's 5Gsx8.
Once you have determined the storage path for the namespace, you can then compose similar commands as those shown earlier against the root namespace with your namespaces.
Here is an example of listing the leases for an existing AppRole auth method enabled at the default path approle
in the example-namespace_/_5Gsx8
namespace.
Use the vault list
in combination with jq
to count leases in the specified auth method.
Likewise for a count of active tokens, use the following example as a starting point.
After inspecting data, you can move on to stopping the recovery mode server, starting it again without recovery mode, and rejoining other servers in the cluster.
Stop recovery mode server and restart server
Starting Vault in recovery mode with Integrated Storage resets the cluster member list, effectively reducing cluster members to 1.
Once you finish inspecting the Vault data, you can stop the recovery mode server and then start it again without the -recovery
flag. Then you can rejoin the other cluster servers to it, and re-establish a highly available cluster.
After you start Vault, verify that its status shows unsealed and active with vault status
or the /sys/seal-status API before joining the standby servers.
Start and join standby servers
With one Vault server active and unsealed, you can join the standby servers. Due to the cluster size change from recovery mode, you must ensure all standby servers have no existing Vault data before you join them to the active server.
You have two choices for achieving this goal.
- Join all new standby servers; this is most helpful if there are irrecoverable issues with the servers, for example.
- Remove the contents of the path directory that holds data for each of the standby Vault servers before starting them.
Once you have decided and implemented a strategy, go ahead and start the standby Vault servers and join each of them to the active server.
Note
If you know the connection details of all Vault servers beforehand, you can configure retry_join
inside the storage
stanza to automatically join the cluster.
Check cluster status
After joining all standby servers to the active server, you can check your cluster health with vault operator raft list-peers
.
Example:
Here in the example output, you can learn that all 5 servers are up, and that vault-0
is the active leader.
Summary
In this tutorial, you learned about how Vault stores its operational data in the Integrated Storage backend. You also learned how to access these data through the /sys/raw
API endpoint with Vault operating in recovery mode.
You should be able to use what you have learned here as a starting point in inspecting and measuring the characteristics most important to you about your Vault data.