PKI secrets engine - considerations

To successfully deploy this secrets engine, there are a number of important considerations to be aware of, as well as some preparatory steps that should be undertaken. You should read all of these before using this secrets engine or generating the CA to use with this secrets engine.

Be careful with root CAs

Vault storage is secure, but not as secure as a piece of paper in a bank vault. It is, after all, networked software. If your root CA is hosted outside of Vault, don't put it in Vault as well; instead, issue a shorter-lived intermediate CA certificate and put this into Vault. This aligns with industry best practices.

Since 0.4, the secrets engine supports generating self-signed root CAs and creating and signing CSRs for intermediate CAs. In each instance, for security reasons, the private key can only be exported at generation time, and the ability to do so is part of the command path (so it can be put into ACL policies).

If you plan on using intermediate CAs with Vault, it is suggested that you let Vault create CSRs and do not export the private key, then sign those with your root CA (which may be a second mount of the pki secrets engine).

Managed keys

Since 1.10, Vault Enterprise can access private key material in a managed key. In this case, Vault never sees the private key, and the external KMS or HSM performs certificate signing operations. Managed keys are configured by selecting the kms type when generating a root or intermediate.

One CA certificate, one secrets engine

Since Vault 1.11.0, the PKI Secrets Engine supports multiple issuers in a single mount. However, in order to simplify the configuration, it is strongly recommended that operators limit a mount to a single issuer. If you want to issue certificates from multiple disparate CAs, mount the PKI secrets engine at multiple mount points with separate CA certificates in each.

The rationale for separating mounts is to simplify permissions management: very few individuals need access to perform operations with the root, but many need access to create leaves. The operations on a root should generally be limited to issuing and revoking intermediate CAs, which is a highly privileged operation; it becomes much easier to audit these operations when they're in a separate mount than if they're mixed in with day-to-day leaf issuance.

A common pattern is to have one mount act as your root CA and to use this CA only to sign intermediate CA CSRs from other PKI secrets engines.

To keep old CAs active, there's two approaches to achieving rotation:

Use multiple secrets engines. This allows a fresh start, preserving the old issuer and CRL. Vault ACL policy can be updated to deny new issuance under the old mount point and roles can be re-evaluated before being imported into the new mount point.
Use multiple issuers in the same mount point. The usage of the old issuer can be restricted to CRL signing, and existing roles and ACL policy can be kept as-is. This allows cross-signing within the same mount, and consumers of the mount won't have to update their configuration. Once the transitional period for this rotation has completed and all past issued certificate have expired, it is encouraged to fully remove the old issuer and any unnecessary cross-signed issuers from the mount point.

Another suggested use case for multiple issuers in the same mount is splitting issuance by TTL lifetime. For short-lived certificates, an intermediate stored in Vault will often out-perform a HSM-backed intermediate. For longer-lived certificates, however, it is often important to have the intermediate key material secured throughout the lifetime of the end-entity certificate. This means that two intermediates in the same mount -- one backed by the HSM and one backed by Vault -- can satisfy both use cases. Operators can make roles setting maximum TTLs for each issuer and consumers of the mount can decide which to use.

Always configure a default issuer

For backwards compatibility, the default issuer is used to service PKI endpoints without an explicit issuer (either via path selection or role-based selection). When certificates are revoked and their issuer is no longer part of this PKI mount, Vault places them on the default issuer's CRL. This means maintaining a default issuer is important for both backwards compatibility for issuing certificates and for ensuring revoked certificates land on a CRL.

Key types matter

Certain key types have impacts on performance. Signing certificates from a RSA key will be slower than issuing from an ECDSA or Ed25519 key. Key generation (using /issue/:role endpoints) using RSA keys will also be slow: RSA key generation involves finding suitable random primes, whereas Ed25519 keys can be random data. As the number of bits goes up (RSA 2048 -> 4096 or ECDSA P-256 -> P-521), signature times also increases.

This matters in both directions: not only is issuance more expensive, but validation of the corresponding signature (in say, TLS handshakes) will also be more expensive. Careful consideration of both issuer and issued key types can have meaningful impacts on performance of not only Vault, but systems using these certificates.

Cluster performance and key types

The benchmark-vault project can be used to measure the performance of a Vault PKI instance. In general, some considerations to be aware of:

RSA key generation is much slower and highly variable than EC key generation. If performance and throughput are a necessity, consider using EC keys (including NIST P-curves and Ed25519) instead of RSA.
Key signing requests (via /pki/sign) will be faster than (/pki/issue), especially for RSA keys: this removes the necessity for Vault to generate key material and can sign the key material provided by the client. This signing step is common between both endpoints, so key generation is pure overhead if the client has a sufficiently secure source of entropy.
The CA's key type matters as well: using a RSA CA will result in a RSA signature and takes longer than a ECDSA or Ed25519 CA.
Storage is an important factor: with BYOC Revocation, using no_store=true still gives you the ability to revoke certificates and audit logs can be used to track issuance. Clusters using a remote storage (like Consul) over a slow network and using no_store=false or no_store_cert_metadata=false along with specifying metadata on issuance, will result in additional latency on issuance. Adding leases for every issued certificate compounds the problem.
- Storing too many certificates results in longer LIST /pki/certs time, including the time to tidy the instance. As such, for large scale deployments (>= 250k active certificates) it is recommended to use audit logs to track certificates outside of Vault.

As a general comparison on unspecified hardware, using benchmark-vault for 30s on a local, single node, raft-backed Vault instance:

Vault can issue 300k certificates using EC P-256 for CA & leaf keys and without storage.
- But switching to storing these leaves drops that number to 65k, and only 20k with leases.
Using large, expensive RSA-4096 bit keys, Vault can only issue 160 leaves, regardless of whether or not storage or leases were used. The 95% key generation time is above 10s.
- In comparison, using P-521 keys, Vault can issue closer to 30k leaves without leases and 18k with leases.

These numbers are for example only, to represent the impact different key types can have on PKI cluster performance.

The use of ACME adds additional latency into these numbers, both because certificates need to be stored and because challenge validation needs to be performed.

Use a CA hierarchy

It is generally recommended to use a hierarchical CA setup, with a root certificate which issues one or more intermediates (based on usage), which in turn issue the leaf certificates.

This allows stronger storage or policy guarantees around protection of the root CA, while letting Vault manage the intermediate CAs and issuance of leaves. Different intermediates might be issued for different usage, such as VPN signing, Email signing, or testing versus production TLS services. This helps to keep CRLs limited to specific purposes: for example, VPN services don't care about the revoked set of email signing certificates if they're using separate certificates and different intermediates, and thus don't need both CRL contents. Additionally, this allows higher risk intermediates (such as those issuing longer-lived email signing certificates) to have HSM-backing without impacting the performance of easier-to-rotate intermediates and certificates (such as TLS intermediates).

Vault supports the use of both the allowed_domains parameter on Roles and the permitted_dns_domains parameter to set the Name Constraints extension on root and intermediate generation. This allows for several layers of separation of concerns between TLS-based services.

Cross-Signed intermediates

When cross-signing intermediates from two separate roots, two separate intermediate issuers will exist within the Vault PKI mount. In order to correctly serve the cross-signed chain on issuance requests, the manual_chain override is required on either or both intermediates. This can be constructed in the following order:

this issuer (self)
this root
the other copy of this intermediate
the other root

All requests to this issuer for signing will now present the full cross-signed chain.

Cluster URLs are important

In Vault 1.13, support for templated AIA URLs was added. With the per-cluster URL configuration pointing to this Performance Replication cluster, AIA information will point to the cluster that issued this certificate automatically.

In Vault 1.14, with ACME support, the same configuration is used for allowing ACME clients to discover the URL of this cluster.

Warning: It is important to ensure that this configuration is up to date and maintained correctly, always pointing to the node's PR cluster address (which may be a Load Balanced or a DNS Round-Robbin address). If this configuration is not set on every Performance Replication cluster, certificate issuance (via REST and/or via ACME) will fail.

Automate rotation with ACME

In Vault 1.14, support for the Automatic Certificate Management Environment (ACME) protocol has been added to the PKI Engine. This is a standardized way to handle validation, issuance, rotation, and revocation of server certificates.

Many ecosystems, from web servers like Caddy, Nginx, and Apache, to orchestration environments like Kubernetes (via cert-manager) natively support issuance via the ACME protocol. For deployments without native support, stand-alone tools like certbot support fetching and renewing certificates on behalf of consumers. Vault's PKI Engine only includes server support for ACME; no client functionality has been included.

Note: Vault's PKI ACME server caps the certificate's validity at 90 days maximum by default, overridable using the ACME config max_ttl parameter. Shorter validity durations can be set via limiting the role's TTL to be under the global ACME configured limit. Aligning with Let's Encrypt, we do not support the optional NotBefore and NotAfter order request parameters.

ACME stores certificates

Because ACME requires stored certificates in order to function, the notes below about automating tidy are especially important for the long-term health of the PKI cluster. ACME also introduces additional resource types (accounts, orders, authorizations, and challenges) that must be tidied via the tidy_acme=true option. Orders, authorizations, and challenges are cleaned up based on the safety_buffer parameter, but accounts can live longer past their last issued certificate by controlling the acme_account_safety_buffer parameter.

As a consequence of the above, and like the discussions in the Cluster Scalability section, because these roles have no_store=false set, ACME can only issue certificates on the active nodes of PR clusters; standby nodes, if contacted, will transparently forward all requests to the active node.

ACME role restrictions require EAB

Because ACME by default has no external authorization engine and is unauthenticated from a Vault perspective, the use of roles with ACME in the default configuration are of limited value as any ACME client can request certificates under any role by proving possession of the requested certificate identifiers.

To solve this issue, there are two possible approaches:

Use a restrictive allowed_roles, allowed_issuers, and default_directory_policy ACME configuration to let only a single role and issuer be used. This prevents user choice, allowing some global restrictions to be placed on issuance and avoids requiring ACME clients to have (at initial setup) access to a Vault token other mechanism for acquiring a Vault EAB ACME token.
Use a more permissive configuration with eab_policy=always-required to allow more roles and users to select the roles, but bind ACME clients to a Vault token which can be suitably ACL'd to particular sets of approved ACME directories.

The choice of approach depends on the policies of the organization wishing to use ACME.

Another consequence of the Vault unauthenticated nature of ACME requests are that role templating, based on entity information, cannot be used as there is no token and thus no entity associated with the request, even when EAB binding is used.

ACME and the public internet

Using ACME is possible over the public internet; public CAs like Let's Encrypt offer this as a service. Similarly, organizations running internal PKI infrastructure might wish to issue server certificates to pieces of infrastructure outside of their internal network boundaries, from a publicly accessible Vault instance. By default, without enforcing a restrictive eab_policy, this results in a complicated threat model: any external client which can prove possession of a domain can issue a certificate under this CA, which might be considered more trusted by this organization.

As such, we strongly recommend publicly facing Vault instances (such as HCP Vault) enforce that PKI mount operators have required a restrictive eab_policy=always-required configuration. System administrators of Vault instances can enforce this by setting the VAULT_DISABLE_PUBLIC_ACME=true environment variable.

ACME errors are in server logs

Because the ACME client is not necessarily trusted (as account registration may not be tied to a valid Vault token when EAB is not used), many error messages end up in the Vault server logs out of security necessity. When troubleshooting issues with clients requesting certificates, first check the client's logs, if any, (e.g., certbot will state the log location on errors), and then correlate with Vault server logs to identify the failure reason.

ACME security considerations

ACME allows any client to use Vault to make some sort of external call; while the design of ACME attempts to minimize this scope and will prohibit issuance if incorrect servers are contacted, it cannot account for all possible remote server implementations. Vault's ACME server makes three types of requests:

DNS requests for _acme-challenge.<domain>, which should be least invasive and most safe.
TLS ALPN requests for the acme-tls/1 protocol, which should be safely handled by the TLS before any application code is invoked.
HTTP requests to http://<domain>/.well-known/acme-challenge/<token>, which could be problematic based on server design; if all requests, regardless of path, are treated the same and assumed to be trusted, this could result in Vault being used to make (invalid) requests. Ideally, any such server implementations should be updated to ignore such ACME validation requests or to block access originating from Vault to this service.

In all cases, no information about the response presented by the remote server is returned to the ACME client.

When running Vault on multiple networks, note that Vault's ACME server places no restrictions on requesting client/destination identifier validations paths; a client could use a HTTP challenge to force Vault to reach out to a server on a network it could otherwise not access.

ACME and client counting

In Vault 1.14, ACME contributes differently to usage metrics than other interactions with the PKI Secrets Engine. Due to its use of unauthenticated requests (which do not generate Vault tokens), it would not be counted in the traditional activity log APIs. Instead, certificates issued via ACME will be counted via their unique certificate identifiers (the combination of CN, DNS SANs, and IP SANs). These will create a stable identifier that will be consistent across renewals, other ACME clients, mounts, and namespaces, contributing to the activity log presently as a non-entity token attributed to the first mount which created that request.

Keep certificate lifetimes short, for CRL's sake

This secrets engine aligns with Vault's philosophy of short-lived secrets. As such it is not expected that CRLs will grow large; the only place a private key is ever returned is to the requesting client (this secrets engine does not store generated private keys, except for CA certificates). In most cases, if the key is lost, the certificate can simply be ignored, as it will expire shortly.

If a certificate must truly be revoked, the normal Vault revocation function can be used, and any revocation action will cause the CRL to be regenerated. When the CRL is regenerated, any expired certificates are removed from the CRL (and any revoked, expired certificate are removed from secrets engine storage). This is an expensive operation! Due to the structure of the CRL standard, Vault must read all revoked certificates into memory in order to rebuild the CRL and clients must fetch the regenerated CRL.

This secrets engine does not support multiple CRL endpoints with sliding date windows; often such mechanisms will have the transition point a few days apart, but this gets into the expected realm of the actual certificate validity periods issued from this secrets engine. A good rule of thumb for this secrets engine would be to simply not issue certificates with a validity period greater than your maximum comfortable CRL lifetime. Alternately, you can control CRL caching behavior on the client to ensure that checks happen more often.

Often multiple endpoints are used in case a single CRL endpoint is down so that clients don't have to figure out what to do with a lack of response. Run Vault in HA mode, and the CRL endpoint should be available even if a particular node is down.

Note: Since Vault 1.11.0, with multiple issuers in the same mount point, different issuers may have different CRLs (depending on subject and key material). This means that Vault may need to regenerate multiple CRLs. This is again a rationale for keeping TTLs short and avoiding revocation if possible.

Note: Since Vault 1.12.0, we support two complementary revocation mechanisms: Delta CRLs, which allow for rebuilds of smaller, incremental additions to the last complete CRL, and OCSP, which allows responding to revocation status requests for individual certificates. When coupled with the new CRL auto-rebuild functionality, this means that the revoking step isn't as costly (as the CRL isn't always rebuilt on each revocation), outside of storage considerations. However, while the rebuild operation still can be expensive with lots of certificates, it will be done on a schedule rather than on demand.

NotAfter behavior on leaf certificates

In Vault 1.11.0, the PKI Secrets Engine has introduced a new leaf_not_after_behavior parameter on issuers. This allows modification of the issuance behavior: should Vault err, preventing issuance of a longer-lived leaf cert than issuer, silently truncate to that of the issuer's NotAfter value, or permit longer expirations.

It is strongly suggested to use err or truncate for intermediates; permit is only useful for root certificates, as intermediate's NotAfter expiration are checked when validating presented chains.

In combination with a cascading expiration with longer lived roots (perhaps on the range of 2-10 years), shorter lived intermediates (perhaps on the range of 6 months to 2 years), and short-lived leaf certificates (on the range of 30 to 90 days), and the rotation strategies discussed in other sections, this should keep the CRLs adequately small.

Cluster performance and quantity of leaf certificates

As mentioned above, keeping TTLs short (or using no_store=true and no_store_cert_metadata=true) and avoiding leases is important for a healthy cluster. However it is important to note this is a scale problem: 10-1000 long-lived, stored certificates are probably fine, but 50k-100k become a problem and 500k+ stored, unexpired certificates can negatively impact even large Vault clusters--even with short TTLs!

However, once these certificates are expired, a tidy operation will clean up CRLs and Vault cluster storage.

Note that organizational risk assessments for certificate compromise might mean certain certificate types should always be issued with no_store=false; even short-lived broad wildcard certificates (say, *.example.com) might be important enough to have precise control over revocation. However, an internal service with a well-scoped certificate (say, service.example.com) might be of low enough risk to issue a 90-day TTL with no_store=true, preventing the need for revocation in the unlikely case of compromise.

Having a shorter TTL decreases the likelihood of needing to revoke a cert (but cannot prevent it entirely) and decrease the impact of any such compromise.

Note: As of Vault 1.12, the PKI Secret Engine's Bring-Your-Own-Cert (BYOC) functionality allows revocation of certificates not previously stored (e.g., issued via a role with no_store=true). This means that setting no_store=true is now safe to be used globally, regardless of importance of issued certificates (and their likelihood for revocation).

You must configure issuing/CRL/OCSP information in advance

This secrets engine serves CRLs from a predictable location, but it is not possible for the secrets engine to know where it is running. Therefore, you must configure desired URLs for the issuing certificate, CRL distribution points, and OCSP servers manually using the config/urls endpoint. It is supported to have more than one of each of these by passing in the multiple URLs as a comma-separated string parameter.

Note: when using Vault Enterprise's Performance Replication features with a PKI Secrets Engine mount, each cluster will have its own CRL; this means each cluster's unique CRL address should be included in the AIA information field separately, or the CRLs should be consolidated and served outside of Vault.

Note: When using multiple issuers in the same mount, it is suggested to use the per-issuer AIA fields rather than the global (/config/urls) variant. This is for correctness: these fields are used for chain building and automatic CRL detection in certain applications. If they point to the wrong issuer's information, these applications may break.

Distribution of CRLs and OCSP

Both CRLs and OCSP allow interrogating revocation status of certificates. Both of these methods include internal security and authenticity (both CRLs and OCSP responses are signed by the issuing CA within Vault). This means both are fine to distribute over non-secure and non-authenticated channels, such as HTTP.

Note: The OCSP implementation for GET requests can lead to intermittent 400 errors when an encoded OCSP request contains consecutive '/' characters. Until this is resolved it is recommended to use POST based OCSP requests.

Automate CRL building and tidying

Since Vault 1.12, the PKI Secrets Engine supports automated CRL rebuilding (including optional Delta CRLs which can be built more frequently than complete CRLs) via the /config/crl endpoint. Additionally, tidying of revoked and expired certificates can be configured automatically via the /config/auto-tidy endpoint. Both of these should be enabled to ensure compatibility with the wider PKIX ecosystem and performance of the cluster.

Spectrum of revocation support

Starting with Vault 1.13, the PKI secrets engine has the ability to support a spectrum of cluster sizes and certificate revocation quantities.

For users with few revocations or who want a unified view and have the inter-cluster bandwidth to support it, we recommend turning on auto rebuilding of CRLs, cross-cluster revocation queues, and cross-cluster CRLs. This allows all consumers of the CRLs to have the most accurate picture of revocations, regardless of which cluster they talk to.

If the unified CRL becomes too big for the underlying storage mechanism or for a single host to build, we recommend relying on OCSP instead of CRLs. These have much smaller storage entries, and the CRL disabled flag is independent of unified_crls, allowing unified OCSP to remain.

However, when cross-cluster traffic becomes too high (or if CRLs are still necessary in addition to OCSP), we recommend sharding the CRL between different clusters. This has been the default behavior of Vault, but with the introduction of per-cluster, templated AIA information, the leaf certificate's Authority Information Access (AIA) info will point directly to the cluster which issued it, allowing the correct CRL for this cert to be identified by the application. This more correctly mimics the behavior of Let's Encrypt's CRL sharding.

This sharding behavior can also be used for OCSP, if the cross-cluster traffic for revocation entries becomes too high.

For users who wish to manage revocation manually, using the audit logs to track certificate issuance would allow an external system to identify which certificates were issued. These can be manually tracked for revocation, and a custom CRL can be built using externally tracked revocations. This would allow usage of roles set to no_store=true, so Vault is strictly used as an issuing authority and isn't storing any certificates, issued or revoked. For the highest of revocation volumes, this could be the best option.

Notably, this last approach can either be used for the creation of externally stored unified or sharded CRLs. If a single external unified CRL becomes unreasonably large, each cluster's certificates could have AIA info point to an externally stored and maintained, sharded CRL. However, Vault has no mechanism to sign OCSP requests at this time.

What are Cross-Cluster CRLs?

Vault Enterprise supports a clustering mode called Performance Replication. In a replicated PKI Secrets Engine mount, issuer and role information is synced between the Performance Primary and all Performance Secondary clusters. However, each Performance Secondary cluster has its own local storage of issued certificates and revocations which is not synced. In Vault versions before 1.13, this meant that each of these clusters had its own CRL and OCSP data, and any revocation requests needed to be processed on the cluster that issued it (or BYOC used).

Starting with Vault 1.13, we've added two features to Vault Enterprise to help manage this setup more correctly and easily: revocation request queues (cross_cluster_revocation=true in config/crl) and unified revocation entries (unified_crl=true in config/crl).

The former allows operators (revoking by serial number) to request a certificate be revoked regardless of which cluster it was issued on. For example, if a request goes into the Performance Primary, but it didn't issue the certificate, it'll write a cross-cluster revocation request, and mark the results as pending. If another cluster already has this certificate in storage, it will revoke it and confirm the revocation back to the main cluster. An operator can list pending revocations to see the status of these requests. To clean up invalid requests (e.g., if the cluster which had that certificate disappeared, if that certificate was issued with no_store=true on the role, or if it was an invalid serial number), an operator can use tidy with tidy_revocation_queue=true, optionally shortening revocation_queue_safety_buffer to remove them quicker.

The latter allows all clusters to have a unified view of revocations, that is, to have access to a list of revocations performed by other clusters. While the configuration parameter includes crl in the description, this applies to both CRLs and the OCSP responder. When this revocation replication occurs, if any cluster considers a cert revoked when another doesn't (e.g., via BYOC revocation of a no_store=false certificate), all clusters will now consider it revoked assuming it hasn't expired. Notably, the active node of the primary cluster will be used to rebuild the CRL; as this can grow large if many clusters have lots of revoked certs, an operator might need to disable CRL building (disabled=true in config/crl) or increase the storage size.

As an aside, all new cross-cluster writes (from Performance Secondary up to the Performance Primary) are performed synchronously. This gives the caller confidence that the request actually went through, at the expense of incurring a bit higher overhead for revoking certificates. When a node loses its GRPC connection (e.g., during leadership election or being otherwise unable to contact the active primary), errors will occur though the local portion of the write (if any) will still succeed. For cross-cluster revocation requests, due to there being no local write, this means that the operation will need to be retried, but in the event of an issue writing a cross-cluster revocation entry when the cert existed locally, the revocation will eventually be synced across clusters when the connection comes back.

Issuer subjects and CRLs

As noted on several GitHub issues, Go's x509 library has an opinionated parsing and structuring mechanism for certificate's Subjects. Issuers created within Vault are fine, but when using externally created CA certificates, these may not be parsed correctly throughout all parts of the PKI. In particular, CRLs embed a (modified) copy of the issuer name. This can be avoided by using OCSP to track revocation, but note that performance characteristics are different between OCSP and CRLs.

Note: As of Go 1.20 and Vault 1.13, Go correctly formats the CRL's issuer name and this notice does not apply.

Automate leaf certificate renewal

To manage certificates for services at scale, it is best to automate the certificate renewal as much as possible. Vault Agent has support for automatically renewing requested certificates based on the validTo field. Other solutions might involve using cert-manager in Kubernetes or OpenShift, backed by the Vault CA.

Safe minimums

Since its inception, this secrets engine has enforced SHA256 for signature hashes rather than SHA1. As of 0.5.1, a minimum of 2048 bits for RSA keys is also enforced. Software that can handle SHA256 signatures should also be able to handle 2048-bit keys, and 1024-bit keys are considered unsafe and are disallowed in the Internet PKI.

Token lifetimes and revocation

When a token expires, it revokes all leases associated with it. This means that long-lived CA certs need correspondingly long-lived tokens, something that is easy to forget. Starting with 0.6, root and intermediate CA certs no longer have associated leases, to prevent unintended revocation when not using a token with a long enough lifetime. To revoke these certificates, use the pki/revoke endpoint.

Safe usage of roles

The Vault PKI Secrets Engine supports many options to limit issuance via Roles. Careful consideration of construction is necessary to ensure that more permissions are not given than necessary. Additionally, roles should generally do one thing; multiple roles should be preferable over having too permissive roles that allow arbitrary issuance (e.g., allow_any_name should generally be used sparingly, if at all).

allow_any_name should generally be set to false; this is the default.
allow_localhost should generally be set to false for production services, unless listening on localhost is expected.
Unless necessary, allow_wildcard_certificates should generally be set to false. This is not the default due to backwards compatibility concerns.
- This is especially necessary when allow_subdomains or allow_glob_domains are enabled.
enforce_hostnames should generally be enabled for TLS services; this is the default.
allow_ip_sans should generally be set to false (but defaults to true), unless IP address certificates are explicitly required.
When using short TTLs (< 30 days) or with high issuance volume, it is generally recommend to set no_store to true (defaults to false). This prevents serial number based revocation, but allows higher throughput as Vault no longer needs to store every issued certificate. This is discussed more in the Replicated Datasets section below.
Do not use roles with root certificates (issuer_ref). Root certificates should generally only issue intermediates (see the section on CA hierarchy above), which doesn't rely on roles.
Limit key_usage and ext_key_usage; don't attempt to allow all usages for all purposes. Generally the default values are useful for client and server TLS authentication.

Telemetry

Beyond Vault's default telemetry around request processing, PKI exposes count and duration metrics for the issue, sign, sign-verbatim, and revoke calls. The metrics keys take the form mount-path,operation,[failure] with labels for namespace and role name.

Note that these metrics are per-node and thus would need to be aggregated across nodes and clusters.

Auditing

Because Vault HMACs audit string keys by default, it is necessary to tune PKI secrets mounts to get an accurate view of issuance that is occurring under this mount.

Note: Depending on usage of Vault, CRLs (and rarely, CA chains) can grow to be rather large. We don't recommend un-HMACing the crl field for this reason, but note that the recommendations below suggest to un-HMAC the certificate response parameter, which the CRL can be served in via the /pki/cert/crl API endpoint. Additionally, the http_raw_body can be used to return CRL both in PEM and raw binary DER form, so it is suggested not to un-HMAC that field to not corrupt the log format.

If this is done with only a syslog audit device, Vault can deny requests (with an opaque 500 Internal Error message) after the action has been performed on the server, because it was unable to log the message.

The suggested workaround is to either leave the certificate and crl response fields HMACed and/or to also enable the file audit log type.

Some suggested keys to un-HMAC for requests are as follows:

csr - the requested CSR to sign,
certificate - the requested self-signed certificate to re-sign or when importing issuers,
Various issuance-related overriding parameters, such as:
- issuer_ref - the issuer requested to sign this certificate,
- common_name - the requested common name,
- alt_names - alternative requested DNS-type SANs for this certificate,
- other_sans - other (non-DNS, non-Email, non-IP, non-URI) requested SANs for this certificate,
- ip_sans - requested IP-type SANs for this certificate,
- uri_sans - requested URI-type SANs for this certificate,
- ttl - requested expiration date of this certificate,
- not_after - requested expiration date of this certificate,
- serial_number - the subject's requested serial number,
- key_type - the requested key type,
- private_key_format - the requested key format which is also used for the public certificate format as well,
Various role- or issuer-related generation parameters, such as:
- managed_key_name - when creating an issuer, the requested managed key name,
- managed_key_id - when creating an issuer, the requested managed key identifier,
- ou - the subject's organizational unit,
- organization - the subject's organization,
- country - the subject's country code,
- locality - the subject's locality,
- province - the subject's province,
- street_address - the subject's street address,
- postal_code - the subject's postal code,
- permitted_dns_domains - permitted DNS domains,
- policy_identifiers - the requested policy identifiers when creating a role, and
- ext_key_usage_oids - the extended key usage OIDs for the requested certificate.

Some suggested keys to un-HMAC for responses are as follows:

certificate - the certificate that was issued,
issuing_ca - the certificate of the CA which issued the requested certificate,
serial_number - the serial number of the certificate that was issued,
error - to show errors associated with the request, and
ca_chain - optional due to noise; the full CA chain of the issuer of the requested certificate.

Note: These list of parameters to un-HMAC are provided as a suggestion and may not be exhaustive.

The following keys are suggested NOT to un-HMAC, due to their sensitive nature:

private_key - this response parameter contains the private keys generated by Vault during issuance, and
pem_bundle this request parameter is only used on the issuer-import paths and may contain sensitive private key material.

Role-Based access

Vault supports path-based ACL Policies for limiting access to various paths within Vault.

The following is a condensed example reference of ACLing the PKI Secrets Engine. These are just a suggestion; other personas and policy approaches may also be valid.

We suggest the following personas:

Operator; a privileged user who manages the health of the PKI subsystem; manages issuers and key material.
Agent; a semi-privileged user that manages roles and handles revocation on behalf of an operator; may also handle delegated issuance. This may also be called an administrator or role manager.
Advanced; potentially a power-user or service that has access to additional issuance APIs.
Requester; a low-level user or service that simply requests certificates.
Unauthed; any arbitrary user or service that lacks a Vault token.

For these personas, we suggest the following ACLs, in condensed, tabular form:

Path	Operations	Operator	Agent	Advanced	Requester	Unauthed
`/ca(/pem)?`	Read	Yes	Yes	Yes	Yes	Yes
`/ca_chain`	Read	Yes	Yes	Yes	Yes	Yes
`/crl(/pem)?`	Read	Yes	Yes	Yes	Yes	Yes
`/crl/delta(/pem)?`	Read	Yes	Yes	Yes	Yes	Yes
`/cert/:serial(/raw(/pem)?)?`	Read	Yes	Yes	Yes	Yes	Yes
`/issuers`	List	Yes	Yes	Yes	Yes	Yes
`/issuer/:issuer_ref/(json¦der¦pem)`	Read	Yes	Yes	Yes	Yes	Yes
`/issuer/:issuer_ref/crl(/der¦/pem)?`	Read	Yes	Yes	Yes	Yes	Yes
`/issuer/:issuer_ref/crl/delta(/der¦/pem)?`	Read	Yes	Yes	Yes	Yes	Yes
`/ocsp/<request>`	Read	Yes	Yes	Yes	Yes	Yes
`/ocsp`	Write	Yes	Yes	Yes	Yes	Yes
`/certs`	List	Yes	Yes	Yes	Yes
`/revoke-with-key`	Write	Yes	Yes	Yes	Yes
`/roles`	List	Yes	Yes	Yes	Yes
`/roles/:role`	Read	Yes	Yes	Yes	Yes
`/(issue¦sign)/:role`	Write	Yes	Yes	Yes	Yes
`/issuer/:issuer_ref/(issue¦sign)/:role`	Write	Yes	Yes	Yes
`/config/auto-tidy`	Read	Yes	Yes
`/config/ca`	Read	Yes	Yes
`/config/crl`	Read	Yes	Yes
`/config/issuers`	Read	Yes	Yes
`/crl/rotate`	Read	Yes	Yes
`/crl/rotate-delta`	Read	Yes	Yes
`/roles/:role`	Write	Yes	Yes
`/issuer/:issuer_ref`	Read	Yes	Yes
`/sign-verbatim(/:role)?`	Write	Yes	Yes
`/issuer/:issuer_ref/sign-verbatim(/:role)?`	Write	Yes	Yes
`/revoke`	Write	Yes	Yes
`/tidy`	Write	Yes	Yes
`/tidy-cancel`	Write	Yes	Yes
`/tidy-status`	Read	Yes	Yes
`/config/auto-tidy`	Write	Yes
`/config/ca`	Write	Yes
`/config/crl`	Write	Yes
`/config/issuers`	Write	Yes
`/config/keys`	Read, Write	Yes
`/config/urls`	Read, Write	Yes
`/issuer/:issuer_ref`	Write	Yes
`/issuer/:issuer_ref/revoke`	Write	Yes
`/issuer/:issuer_ref/sign-intermediate`	Write	Yes
`/issuer/issuer_ref/sign-self-issued`	Write	Yes
`/issuers/generate/+/+`	Write	Yes
`/issuers/import/+`	Write	Yes
`/intermediate/generate/+`	Write	Yes
`/intermediate/cross-sign`	Write	Yes
`/intermediate/set-signed`	Write	Yes
`/keys`	List	Yes
`/key/:key_ref`	Read, Write	Yes
`/keys/generate/+`	Write	Yes
`/keys/import`	Write	Yes
`/root/generate/+`	Write	Yes
`/root/sign-intermediate`	Write	Yes
`/root/sign-self-issued`	Write	Yes
`/root/rotate/+`	Write	Yes
`/root/replace`	Write	Yes

Note: With managed keys, operators might need access to read the mount point's tunable data (Read on /sys/mounts) and may need access to use or manage managed keys.

Replicated DataSets

When operating with Performance Secondary clusters, certain data-sets are maintained across all clusters, while others for performance and scalability reasons are kept within a given cluster.

The following table breaks down by data type what data sets will cross the cluster boundaries. For data-types that do not cross a cluster boundary, read requests for that data will need to be sent to the appropriate cluster that the data was generated on.

Data Set	Replicated Across Clusters
Issuers & Keys	Yes
Roles	Yes
CRL Config	Yes
URL Config	Yes
Issuer Config	Yes
Key Config	Yes
CRL	No
Revoked Certificates	No
Leaf/Issued Certificates	No
Certificate Metadata	No

The main effect is that within the PKI secrets engine leaf certificates issued with no_store set to false are stored local to the cluster that issued them. This allows for both primary and Performance Secondary clusters' active node to issue certificates for greater scalability. As a result, these certificates, metadata and any revocations are visible only on the issuing cluster. This additionally means each cluster has its own set of CRLs, distinct from other clusters. These CRLs should either be unified into a single CRL for distribution from a single URI, or server operators should know to fetch all CRLs from all clusters.

Cluster scalability

Most non-introspection operations in the PKI secrets engine require a write to storage, and so are forwarded to the cluster's active node for execution. This table outlines which operations can be executed on performance standby nodes and thus scale horizontally across all nodes within a cluster.

Path	Operations
ca[/pem]	Read
cert/serial-number	Read
cert/ca_chain	Read
config/crl	Read
certs	List
ca_chain	Read
crl[/pem]	Read
issue	Update ^*
revoke/serial-number	Read
sign	Update ^*
sign-verbatim	Update ^*

* Only if the corresponding role has no_store set to true, generate_lease set to false and no metadata is being written. If generate_lease is true the lease creation will be forwarded to the active node; if no_store is false the entire request will be forwarded to the active node. If no_store_cert_metadata=false and metadata argument is provided the entire request will be forwarded to the active node.

PSS support

Go lacks support for PSS certificates, keys, and CSRs using the rsaPSS OID (1.2.840.113549.1.1.10). It requires all RSA certificates, keys, and CSRs to use the alternative rsaEncryption OID (1.2.840.113549.1.1.1).

When using OpenSSL to generate CAs or CSRs from PKCS8-encoded PSS keys, the resulting CAs and CSRs will have the rsaPSS OID. Go and Vault will reject them. Instead, use OpenSSL to generate or convert to a PKCS#1v1.5 private key file and use this to generate the CSR. Vault will, depending on the role and the signing mechanism, still use a PSS signature despite the rsaEncryption OID on the request as the SubjectPublicKeyInfo and SignatureAlgorithm fields are orthogonal. When creating an external CA and importing it into Vault, ensure that the rsaEncryption OID is present on the SubjectPublicKeyInfo field even if the SignatureAlgorithm is PSS-based.

These certificates generated by Go (with rsaEncryption OID but PSS-based signatures) are otherwise compatible with the fully PSS-based certificates. OpenSSL and NSS support parsing and verifying chains using this type of certificate. Note that some TLS implementations may not support these types of certificates if they do not support rsa_pss_rsae_* signature schemes. Additionally, some implementations allow rsaPSS OID certificates to contain restrictions on signature parameters allowed by this certificate, but Go and Vault do not support adding such restrictions.

At this time Go lacks support for signing CSRs with the PSS signature algorithm. If using a managed key that requires a RSA PSS algorithm (such as GCP or a PKCS#11 HSM) as a backing for an intermediate CA key, attempting to generate a CSR (via pki/intermediate/generate/kms) will fail signature verification. In this case, the CSR will need to be generated outside of Vault and the signed final certificate can be imported into the mount.

Go additionally lacks support for creating OCSP responses with the PSS signature algorithm. Vault will automatically downgrade issuers with PSS-based revocation signature algorithms to PKCS#1v1.5, but note that certain KMS devices (like HSMs and GCP) may not support this with the same key. As a result, the OCSP responder may fail to sign responses, returning an internal error.

Issuer storage migration issues

When Vault migrates to the new multi-issuer storage layout on releases prior to 1.11.6, 1.12.2, and 1.13, and storage write errors occur during the mount initialization and storage migration process, the default issuer may not have the correct ca_chain value and may only have the self-reference. These write errors most commonly manifest in logs as a message like failed to persist issuer ... chain to disk: <cause> and indicate that Vault was not stable at the time of migration. Note that this only occurs when more than one issuer exists within the mount (such as an intermediate with root).

To fix this manually (until a new version of Vault automatically rebuilds the issuer chain), a rebuild of the chains can be performed:

curl -X PATCH -H "Content-Type: application/merge-patch+json" -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"manual_chain":"self"}' https://.../issuer/default
curl -X PATCH -H "Content-Type: application/merge-patch+json" -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"manual_chain":""}' https://.../issuer/default

This temporarily sets the manual chain on the default issuer to a self-chain only, before reverting it back to automatic chain building. This triggers a refresh of the ca_chain field on the issuer, and can be verified with:

vault read pki/issuer/default

Issuer Constraints Enforcement

Starting with versions 1.18.3, 1.18.3+ent, 1.17.10+ent and 1.16.14+ent, Vault performs additional verifications when creating or signing leaf certificates for issuers that have constraints extensions. This verification includes validating extended key usage, name constraints, and correct copying of the issuer name onto the certificate. Certificates issued without this verification might not be accepted by end user applications.

Problems with issuance arising from this validation should be fixed by changing the issuer certificate itself, to avoid more problems down the line.

It is possible to completely disable verification by setting environment variable VAULT_DISABLE_PKI_CONSTRAINTS_VERIFICATION to true.

Warning: The use of environment variable VAULT_DISABLE_PKI_CONSTRAINTS_VERIFICATION should be considered as a last resort.

Tutorial

Refer to the Build Your Own Certificate Authority (CA) guide for a step-by-step tutorial.

Have a look at the PKI Secrets Engine with Managed Keys for more about how to use externally managed keys with PKI.

API

The PKI secrets engine has a full HTTP API. Please see the PKI secrets engine API for more details.