Use hcdiag with Vault
HashiCorp Diagnostics — hcdiag — is a troubleshooting data-gathering tool that you can use to collect and archive important data from Consul, Nomad, Vault, and TFE server environments. The information gathered by hcdiag
is well-suited for sharing with teams during incident response and troubleshooting.
In this tutorial, you will:
- Run a Vault server in "dev" mode, inside an Ubuntu Docker container.
- Install hcdiag from the official HashiCorp Ubuntu package repository.
- Execute basic
hcdiag
commands against this Vault service. - Explore the contents of files created by the hcdiag tool.
- Learn about additional hcdiag features and how to use a custom configuration file with
hcdiag
.
Prerequisites
You will need a local install of Docker running on your machine for this tutorial. You can find the instructions for installing Docker here.
Scenario introduction
In this tutorial, you will run a Docker container, and access that container through a shell. Inside the container environment, you will start the Vault service in dev-mode, install the hcdiag tool, and use it to gather data from Vault.
You will then unpack the file archive created by hcdiag and examine its contents to learn about what hcdiag gathers by default.
You can explore some example production outputs along with a deep dive explanation of the output.
You will also learn about some useful hcdiag options, including how to use a custom configuration file.
Set up the environment
Run an ubuntu
Docker container in detached mode with the -d
flag. The --rm
flag instructs Docker to remove the container upon stopping it, and the -t
flag allocates a pseudo-tty which keeps the container running until you stop it.
Open an interactive shell session in the container with the -it
flags.
Tip
: Your terminal prompt will now appear differently to show that you are in a shell in the Ubuntu container - for example, it could resemble this example: root@a931b3c8ca00:/#
. You will run the rest of tutorial commands in this Ubuntu container shell.
Install dependencies
Update apt-get
and install the necessary dependencies.
Create a working directory and change into it.
Install and start Vault
Add the HashiCorp repository:
Install the vault
package.
Run the vault
server in dev mode as a background process.
Vault operational log output will scroll in the standard output -- when you see the log stop scrolling, you'll be able to type commands at the prompt again.
Root token
At the end of Vault's start-up output, you'll see the Unseal Key and Root Token displayed, like this:
Set up the environment
Before you run hcdiag, you must first perform some initial environment configuration so that the tool knows how to communicate with and authenticate to Vault.
Set the VAULT_ADDR
environment variables for use by hcdiag.
Set the VAULT_TOKEN
environment variables. Use the Root Token that Vault generated during startup (review the Install and Start Vault section).
Example:
You should now be able to authenticate to Vault and get some information:
Getting a "permission denied" or 403 Error? If so, your VAULT_TOKEN
is not set: please review vault's startup logs to find the root token you should be using as VAULT_TOKEN
. If you can't find the startup logs, just stop the vault process you started and repeat the earlier section of this tutorial.
Insecure operation
In this scenario, you use a dev mode Vault server and its initial root token. For production hcdiag use, you must use a token with enough capabilities to execute the vault
CLI commands used by hcdiag. You can examine the output of Results.json
from an hcdiag archive to discover the commands used, and create a suitable production policy that limits a token to the required commands.
Install and Run the hcdiag
tool
Install the latest hcdiag release from the HashiCorp repository.
This is a minimal environment, so make sure to set the SHELL
environment variable:
Run hcdiag to collect all available environment information for Vault.
Tip
This is a minimal environment which doesn't use some system services that hcdiag uses to gather information; you can expect to observe errors related to those services.
Tip
You can also invoke hcdiag
without options to gather all available environment and product information. To learn about all executable options, run hcdiag -h
.
Examine results
What did hcdiag produce in the brief moments while running?
List the directory for tar+gzip archive files to discover the file that hcdiag created.
You can unpack the archive and further examine its contents:
Use the hcdiag redaction feature to ensure that this bundle holds information that is appropriate to share based on your specific use cases.
Vault Enterprise users
You can share the output from hcdiag runs with HashiCorp Customer Support to greatly reduce the amount of information gathering needed in a support request.
The tool works locally, and does not export or share the diagnostic bundle with anyone. You must use other tools to transfer it to a secure location so you can share it with specific support staff who need to view it.
After you unpack the archive, the directory hcdiag-2022-08-18T170538Z
has 3 files, which the following section further describes.
Example production output
Here is a deeper dive into the output files and their contents for further clarification.
Manifest.json
The manifest has JSON data representing details about the hcdiag
run.
Here is an example.
From this output, you can learn things like products queried, the duration of the run, and the presence of errors.
Results.json
The results file has detailed information about the host and Vault environment. The large amount of output from the file is best parsed and queried with a tool like jq
for specific answers.
The debug file
The debug tarball holds the results of invoking the vault debug
command.
The following is a tree of output files produced by unpacking the hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz
file.
The first entry, 2022-02-24T14-50-47Z
is a directory containing runtime profiling information and goroutine data as gathered from the running Vault processes with the Go pprof utility.
These profiles are essentially collections of stack traces and their associated metadata. They are most useful when debugging issues by engineers familiar with the related Vault source code.
Here is a breakdown on the contents of each file.
allocs.prof
: All past memory allocations.block.prof
: Stack traces which led to blocking on synchronization primitives.goroutine.prof
: Traces on all current goroutines.goroutines.txt
: Listing of all goroutines.heap.prof
: Memory allocation of live objects.mutex.prof
: Stack traces for holders of contended mutexes.profile.prof
: CPU profile information.threadcreate.prof
: Stack traces that led to creation of new OS threads.trace.out
: CPU trace information.
Visualizing profile information is typically performed with the pprof command by passing in the filename of a .prof
file. If you have an established Go environment, you can use it to examine these files.
You can use the pprof tool in both interactive and non-interactive modes. Here are some example non-interactive invocations of the tool against the example data to familiarize you with some of its outputs.
The first example lists the top 10 entries from the 2022-02-10T16-41-35Z CPU profile:
This shows CPU time and usage for functions in use by Vault.
Another example for examining memory usage would be to use the same command against the heap file instead.
You can also generate SVG based call graphs. For example, to generate a graph of goroutines, you would use a command like this.
This will generate an SVG image and open in the default handler for such images on your system. You can learn more about interpreting call graphs in the pprof documentation
Tip
If you are new to pprof, there is an excellent article on pprof that explains it thoroughly.
The second directory 2022-02-24T14-50-47Z
has the same information gathered 10 seconds later.
These are the remaining files which form the debug archive:
config.json
: A JSON representation of the current Vault server configuration.host_info.json
: Detailed host resource information about CPU, filesystems, memory, etc.index.json
: A summary of the hcdiag run and files gathered.metrics.json
: Vault Telemetry metrics data.replication_status.json
: The current Vault server's Enterprise Replication status.server_status.json
: The output from the seal-status API.vault.log
: Entries from the Vault server operational log captured during the hcdiag run.
Configuration file
You can configure hcdiag's behavior with a HashiCorp Configuration Language (HCL) formatted file. Using this file, you can configure behavior by adding your own custom runners, redacting sensitive content using regular expressions, excluding commands, and more.
To run hcdiag with a custom configuration file, just create the file and point hcdiag
at it with the -config
flag:
Tip
This minimal environment doesn't ship with most common command-line text editors,so you'll want to install one with apt-get install nano
or apt-get install vim
, depending on which one you prefer.
Here is a minimal configuration file, which does two things:
It adds an agent-level (global) redaction which instructs hcdiag to redact all sensitive content relating to Vault Tokens, when they occur in the format you saw earlier in this tutorial while starting the Vault service. This is a slightly contrived example; please refer to the official hcdiag Documentation for more detailed information about how to redact sensitive content.
It instructs hcdiag to exclude the
vault debug
command shown as an example.
If you created this file as diag.hcl
and executed hcdiag as follows, then you could expect output like this:
If you compare this output to that of the hcdiag invocation you ran earlier, you'll notice that the Vault Debug information is not present in this example.
Cleanup
Exit the Ubuntu container to return to your terminal prompt.
Stop the Docker container. Docker will automatically delete it because the -rm
flag instructs it to do so when the container stops.
Production usage tips
By default, the hcdiag tool includes files for up to 72 hours back from the current time. You can specify the desired time range using the -include-since
flag.
If you have concerns about impacting performance of your Vault servers, you can ensure that runners run serially, instead of concurrently, by invoking hcdiag with the -serial
flag.
Deploying hcdiag in production involves a workflow like the following:
Place the hcdiag binary on a system that is capable of connecting to the Vault server targeted by hcdiag, such as a bastion host or the host itself.
When running with a configuration file and the
-config
flag, ensure that the specified configuration file is readable by the user that executes hcdiag.Ensure that the current directory or that specified by the
dest
flag is writable by the user that executes hcdiag.Ensure connectivity to the HashiCorp products that hcdiag needs to connect to during the run. Export any required environment variables for establishing connection or passing authentication tokens as necessary.
Decide on a duration for information gathering, noting that the default is to gather for up to 72 hours back in server log output. Adjust your needs as necessary with the
-include-since
flag. For example, to include 24 hours of log output, invoke as:Limit what is gathered with the
-includes
flag. For example,-includes /var/log/vault-*,/var/log/nomad-*
instructs hcdiag to only gather logs matching the specified Vault and Nomad filename patterns.Use redaction to prevent sensitive information like keys or passwords from reaching hcdiag's output or the generated bundle files.
Use the
-dryrun
flag to observe what hcdiag will do without anything actually being done for testing configuration and options.
Summary
In this tutorial, you learned about the hcdiag tool, and used it to gather information from a running Vault server environment. You also learned about some of hcdiag's configuration flags, the configuration file, and production specific tips for using hcdiag.
Next Steps
For additional information about the tool, check out the the hcdiag
GitHub repository.
There are also hcdiag
guides for other HashiCorp tools including Nomad, Terraform, and Consul.