Query audit device logs
Production installations of Vault typically operate with one or more enabled audit devices. Audit devices log details of all external requests and responses for the specific purpose of performing audits on those data.
Note
Audit device logs are separate and unrelated to Vault operational logs. Operational logs are typically gathered by the operating system journal from standard output and standard error while Vault is running, and hold a different set of information.
You can enable an audit device for output to a variety of destinations including static files, TCP, UDP, or Unix sockets, and syslog. Regardless of the configured destination, the resulting output is always in JSON format.
More details including an example audit device log entry, are in the Troubleshooting Vault tutorial.
Challenge
Audit device logs offer great details for troubleshooting, such as counts for operations against a specific Vault endpoint or the IP addresses of hosts responsible for generating specific errors.
You can consume these logs in a solution for use with dashboards and querying.
Without access to such a solution, you might find it useful to have an alternative way to query audit device logs in an ad-hoc manner during troubleshooting scenarios.
Solution
You can query the log file from an enabled File Audit Device in an ad-hoc manner using a common command line tool. You can access important details which could help resolve the troubleshooting scenario from the audit device log when other solutions for querying aggregated logs are unavailable.
Notes and prerequisites
This tutorial explains how to query the logs written by a File Audit Device using the popular command line utility jq. You will examine data points found in these logs, which are useful troubleshooting examples.
To complete the scenario in this tutorial you need the following.
- Download the example log file from hashicorp-education
- Install jq if not already installed
This tutorial includes examples validated for Linux, but you can apply the same principles on other operating systems. jq
is a cross platform tool so these examples will work for you on other platforms as well.
Audit device filters
Starting in Vault 1.16.0, you can enable audit devices with a filter
option that Vault uses to evaluate audit entries to determine whether it writes them to the log. You should determine if your own audit devices are filtered and make necessary changes to expose the log fields which you need to monitor for your use case.
You can familiarize yourself with Vault filtering concepts and filtering audit entries and how to enable audit filters in the documentation.
Example log file preparation
Whenever you reference an audit device log filename in the examples, it will appear as $AUDIT_LOG_FILE
. Replace this value with the actual filename of the audit device log so that examples work as-is.
Clone the repository.
Export the AUDIT_LOG_FILE
environment variable to the example log file path.
You are now prepared to try the examples shown in this tutorial.
All errors and their timestamps
For a simple example, list all non-null error
fields along with their corresponding timestamps. This helps you gain some insight into the volume and error types logged by the audit device.
If this command returns nothing, then there are no errors present in the log file, otherwise results would resemble this example.
In this simple example, you can observe that there are some errors, which break down as follows:
- "permission denied": The client token used in the request lacks the required capabilities for the requested endpoint; this is perhaps the most common error you will find in audit device logs.
- "missing client token": A request to an authenticated endpoint without providing a client token.
- "unsupported path": A request against an auth method or secrets engine using an unsupported endpoint path (possibly due to typo).
Tip
While this is just a simple and compact file for this tutorial, you can expect to find a larger volume of more varied errors in a heavily used Vault environment.
HMAC hashed errors and their timestamps
Sensitive information, including details returned in errors are all hashed with a salt using HMAC-SHA256 according to the Sensitive Information section of the Audit Devices documentation.
To locate these hashed errors and their timestamps, use a query like this example.
The results show some HMAC hashed errors, which appear as the hash algorithm type and the actual hashed information separated by a colon.
You can use the hashed values compared against known error payloads to find matches.
For a simple example, suppose you have AppRole auth method login failures with the error "invalid secret id". These appear in the .response.data.error
field and Vault hashes the value.
If you want to find the corresponding entries in the audit device log, you can use the /sys/audit-hash API to compare a known value with a hashed value. Review the API documentation for more details.
HMAC hash calculation example
Note
The HMAC used by an audit device is unique to that device. The example provided here is a reference for the process involved to calculate a hash using the sys/audit-hash
API.
From the earlier example output, note that the second and third entries have the same hash value (hmac-sha256:8c436a490d2dd8e2410c5a67d2e2663a09f2e0e861cb4dbf6c224d02cc84f2e3).
Presuming you have enabled a file audit device at the path file
, you can use this command to compare the hash value with the string to learn if they match.
You specify the file audit device as the last part of the path in the API request to select the correct audit device for calculating hashes against. Pass in the string "invalid secret id" as the value of input
to compare its hash.
Successful response:
In this example, there is a match. You can conclude that the last 2 of the 3 example HMAC hashed error lines indeed contain the error "invalid secret id".
Again, you'll need to use this technique on logs for which you still have access to the audit device that wrote them to calculate hashes.
Counts and specific details
Sometimes being able to group related items from the audit device logs by their count is helpful to spot outliers and other problems.
Count all requests and responses
You can count the occurrences of requests and responses like this.
This combined number of requests and responses (1397) should equal the total lines in the file as shown by counting lines for example, with the wc
command.
Successful output:
Response display names
To break out the authentication display name counts for responses, try this example query, that sets up a map of display names and counts where the values are not null.
Successful output:
It's evident from this output that the busiest display names accessing this Vault are token
, token-lab-admins
and userpass-lab-user-7
.
Request operations
This query breaks out all request operation
types by count.
Successful output:
From this output, you can observe that most operations against this Vault server during the period covered by the audit device logs are update
and read
, followed by create
, and list
. In contrast, fewer delete
operations occur during the same period.
Request paths
Learning about hot API endpoints by counting their requests for access can be helpful for troubleshooting. Display the top 5 most busy endpoints based on their request counts.
Successful output:
From this output you know that the hottest path represented by these logs is auth/userpass/login/lab-user-7
with 204 requests. This is representative of the lab-user-7
authenticating with the username and password auth method.
Errors by count
You can query for errors and get their counts like this.
Successful output:
Of note here, there are 1386 successful requests (where the error value is null) and 7 error occurrences. The "permission denied" errors are the result of using a token with insufficient capabilities to access an endpoint. You can view the complete request and response example to learn more about them.
The errors containing an asterisk (*
) character originate from the logical backend (auth method or secrets engine).
The first, "permission denied" is a permission denied error from attempted access to a list of enabled secrets engines. The second, "unsupported path" is from attempted listing of an unsupported endpoint in the AppRole auth method.
The "missing client token" resulted from attempted access to authenticated endpoint without providing a valid client token.
Remote address by count
It can be handy to know the request frequency by the value of the remote_address
field in situations where inexplicable activity is occurring at a high volume, for example. This can help to identify the responsible hosts for correction or mitigation of the issue.
Successful output:
In the preceding example, a clear outlier is 10.10.42.222
, as it has generated the majority of requests.
Path access by remote address
For a more complex query that breaks down API path access by count over each remote address, try this example. You can use what you have learned here to return just the top 5 hottest paths for each host, for example.
Successful output:
Tips
- Feel free to browse the example log file in your favorite JSON exploration tool to get an idea of the shape and structure of audit device data. Developing an idea of the fields present in the data helps you to later identify key information in the logs.
- Explore the jq manual to learn about building more advanced and specific queries
Summary
In this tutorial, you learned how to use a common command-line utility to query a file audit device log. You can use what you have learned here to expand your understanding of the Vault audit device logs and as a tool for troubleshooting scenarios.