HCP Vault Dedicated logs and metrics overview

Audit log and metrics observability is essential for ensuring the performance and security of your HCP Vault Dedicated cluster. It's also useful for business operations, like understanding client-related usage. HCP Vault Dedicated metrics provide operational insights into:

Whether your cluster is adequately provisioned to handle existing and predicted workloads
Client access patterns and anomalies
Opportunities for optimizing client usage patterns to reduce cost

HCP Vault Dedicated metrics include critical Vault performance and usage metrics from the Vault telemetry endpoint, as well as host performance metrics. To reduce noise, the metrics available for HCP Vault Dedicated are scoped to best practice metrics that are actionable to users in a managed service context. This document details the metrics available to HCP Vault Dedicated production clusters, and provides guidance on detecting and addressing anomalous conditions.

Availability

Audit log and metrics streaming is not available for Development tier clusters.

For detailed instructions on how to configure HCP Vault Dedicated audit log, and metrics streaming, refer to the specific provider documentation in the left navigation menu. Unless otherwise noted, any HCP Vault Dedicated sample dashboards average all gauge metrics and per node metrics are aggregated across the cluster.

Connectivity considerations

Metrics and audit logs are streamed directly from each Vault node in a cluster. When you configure a peering (AWS and Azure) or transit gateway (AWS only) connection, you can stream metrics and audit logs using a supported integration such as the generic HTTP sink to a private address in the connected network.

For external services, streaming is performed over the internet.

Audit log availability

Audit logs are available for download from the HashiCorp Cloud Platform for 30 days. Audit log availability when streamed to a third-party service is subject to the configuration of the target service but will still be available for download from HCP.

Vault system metrics

Due to differences in third-party observability platform metrics naming conventions, there may be slight differences in the metrics name formatting depending on the third-party integration. This document will use the metrics naming convention that reflects metrics exported by the Vault telemetry endpoint to align with existing Vault reference documentation.

Sealed status (vault.core.unsealed)

Metric source	Description	Unit	Type
Vault	This Boolean metric indicates whether a cluster node has been sealed by a user or during startup.	bool	gauge

For this metric, a value of 1 indicates Vault is unsealed, whereas 0 means that Vault is sealed.

Why it is important:

By default, Vault is sealed on startup, so if this value changes to 0 unexpectedly, Vault has restarted. Vault won't respond to client requests until it is unsealed.

What to look for:

The HCP Vault Dedicated sample dashboards will display "Unsealed" if at least one node is accepting requests. HashiCorp operations also monitors sealed status in the background, and will be alerted if one or more of a cluster's nodes unexpectedly report as sealed.

CPU utilization

These metrics represent system level CPU measurements. In the HCP Vault Dedicated sample dashboards, CPU Utilization is calculated as the ratio of CPU used (rate of CPU time - rate of CPU idle time) over the rate of CPU total time, as calculated from the following metrics. In the sample dashboards, all rates are calculated over 5 minute intervals.

host_cpu_seconds_total

Metric source	Description
host	This metric represents the total CPU time.

host_cpu_seconds_total (idle mode)

Metric source	Description
host	This metric represents the time the CPU was in an idle state.

Why it is important:

Encryption can place a heavy demand on the CPU. If the CPU is too busy, Vault may have trouble keeping up with the incoming request load. It is useful to compare requests and request latency metrics in context with CPU utilization to guide capacity planning.

Memory utilization

The following metrics represent host memory measurements. In the HCP Vault Dedicated sample dashboards, Memory Utilization is the ratio of used memory (memory capacity - unused memory) over the memory capacity, as calculated from the following metrics:

host_memory_total_bytes

Metric source	Description
host	This metric represents the total amount of physical memory (RAM) capacity on the server.

host_memory_available_bytes

Metric source	Description
host	This metric represents the total amount of unused physical memory (RAM) on the server.

Why it is important:

Vault requires sufficient memory to hold its working data set and if it exhausts available memory it can crash.

Disk Utilization

These metrics represent host disk measurements. In the HCP Vault Dedicated sample dashboards, Disk Utilization is the ratio of used disk (disk capacity - unused disk) over the total disk capacity, as calculated from the following metrics:

host_filesystem_total_bytes

Metric source	Description
host	This metric represents the disk storage capacity.

host_filesystem_free_bytes

Metric source	Description
host	This metric represents unused disk storage.

Why it is important:

Disk utilization is critical to monitor to ensure your cluster has sufficient capacity for writing Vault secrets. It is useful to compare disk storage with Vault usage metrics to guide capacity planning.

What to look for:

If disk utilization exceeds 80%.

Auth requests (vault.core.handle_login_request.count)

Metric source	Description	Unit	Type
Vault	This metric represents the number of authentication requests handled by Vault core.	request	gauge

Why it is important:

This is a key measure of how busy Vault is with respect to client authentication requests. It is useful to follow this metric over time and compare the trend with host metrics to understand whether the cluster is adequately provisioned to handle anticipated traffic. An unexpected spike may also indicated a potential security threat.

What to look for:

Changes to the count or mean fields that exceed 50% of baseline values, or more than 3 standard deviations above baseline.

Expiration metrics

These metrics represent lease measurements that are provided by Vault.

Active leases (vault.expire.num_leases)

Metric source	Description	Unit	Type
Vault	This metric represents the number of all leases which are eligible for eventual expiry.	lease	gauge

Why it is important:

This metric represents an approximate total lease count for Vault across all lease generating auth methods and secrets engines.

What to look for:

A large and unexpected delta in count can indicate a bulk operation, load testing, or runaway client application is generating excessive leases and should be immediately investigated. Persistently high counts can indicate that the cluster is underprovisioned.

Token revoke latency (vault.expire.revoke)

Metric source	Description	Unit	Type
Vault	This metric represents the duration of time to revoke a token.	ms	sampled

Why it is important:

This value measures the sampled latency for revoking a token after a token's TTL expires or it is explicitly revoked, such as during a security incident. To reduce security risk, latency should be minimized.

What to look for:

High token revoke latencies can indicate a performance problem, potentially stemming from an underprovisioned or otherwise unhealthy cluster.

Token renew latency (vault.expire.renew)

Metric source	Description	Unit	Type
Vault	This metric represents the duration of time to renew a token.	ms	sampled

Why it is important:

This value measures the sampled latency for renewing a token lease after a valid lease renewal request has been made.

What to look for:

High token lease renewal latencies can indicate a performance problem, potentially due to an underprovisioned or otherwise unhealthy cluster.

Vault usage metrics

The following are usage metrics related to common types of usage including identity, lease, secret, and token usage. These metrics are the useful for understanding vault usage patterns from a security and business operations perspective.

Vault token usage metrics

Why it is important:

The following metrics capture token-based usage. They are useful for understanding client usage patterns and identifying abnormalities that may indicate a security threat.

Batch and service tokens by methods & TTL (vault.token.creation)

Metric source	Description	Unit	Type
Vault	This metric represents the number of new batch or service tokens created.	token	counter

In the HCP Vault Dedicated sample dashboards this metric is broken down by auth method and TTL.

Available tokens by namespace (vault.token.count)

Metric source	Description	Unit	Type
Vault	This metric represents the number of service tokens available for use.	token	gauge

Available tokens by namespace by auth method (vault.token.count.by_auth)

Metric source	Description	Unit	Type
Vault	This metric represents the number of available tokens grouped by the auth method used to create them.	token	gauge

Available tokens by namespace by policy (vault.token.count.by_policy)

Metric source	Description	Unit	Type
Vault	This metric represents the number of available tokens, counted in each policy assigned.	token	gauge

Available tokens by TTL (vault.token.count.by_ttl)

Metric source	Description	Unit	Type
Vault	This metric represents the number of existing tokens, aggregated by their time-to-live (TTL) setting at creation.	token	gauge

Why it is important:

Since longer time-to-live (TTL) settings can introduce security risk, this metric is useful to identify suboptimal administrative settings. A spike in unexpectedly long-lived tokens may also signal a security breach.

Token lookups (vault.token.lookup.count)

Metric source	Description	Unit	Type
Vault	This metric represents the number of token lookups.	lookup	summary

Why it is important:

This metric may also be useful for comparing with other performance metrics to ensure there is sufficient overhead to service anticipated token read requests.

KV secrets by mount (vault.secret.kv.count)

Metric source	Description	Unit	Type
Vault	This metric represents the count of secrets in key-value stores.	secret	gauge

In the HCP Vault Dedicated sample dashboards this metric is displayed by mount.

Identity entities by namespace (vault.identity.entity.count)

Metric source	Description	Unit	Type
Vault	This metric represents the number of identity entities.	entity	gauge

In the HCP Vault Dedicated sample dashboards this metric is grouped by namespace.

Metrics streaming configuration

For detailed instructions on how to configure HCP Vault Dedicated audit log or metrics streaming to your preferred provider, refer to the following documentation:

HCP Vault Dedicated logs and metrics overview

Connectivity considerations

Audit log availability

Vault system metrics

Sealed status (vault.core.unsealed)

CPU utilization

host_cpu_seconds_total

host_cpu_seconds_total (idle mode)

Memory utilization

host_memory_total_bytes

host_memory_available_bytes

Disk Utilization

host_filesystem_total_bytes

host_filesystem_free_bytes

Auth requests (vault.core.handle_login_request.count)

Expiration metrics

Active leases (vault.expire.num_leases)

Token revoke latency (vault.expire.revoke)

Token renew latency (vault.expire.renew)

Vault usage metrics

Vault token usage metrics

Batch and service tokens by methods & TTL (vault.token.creation)

Available tokens by namespace (vault.token.count)

Available tokens by namespace by auth method (vault.token.count.by_auth)

Available tokens by namespace by policy (vault.token.count.by_policy)

Available tokens by TTL (vault.token.count.by_ttl)

Token lookups (vault.token.lookup.count)

KV secrets by mount (vault.secret.kv.count)

Identity entities by namespace (vault.identity.entity.count)

Metrics streaming configuration

Audit logs

Metrics

References