Transform sensitive data with Vault
Note
The Transform secrets engine requires a Vault Enterprise Advanced Data Protection (ADP) license or HCP Vault Dedicated Plus tier cluster.
Challenge
Vault's Transit secrets engine provides encryption service; however, the resulting ciphertext does not preserve the original data format or length.
Think of a scenario where an organization must cryptographically protect personally identifiable information (PII) while preserving the data format and length. For example, the database schema expects a certain character length and/or only allows alphanumeric characters.
The preservation of the original data format or length may be driven by compliance with certain industry standards such as HIPAA or PCI.
Solution
Vault Enterprise 1.4 with Advanced Data Protection module introduced the Transform secrets engine to handle secure data transformation and tokenization against provided secrets. Transformation methods encompass NIST vetted cryptographic standards such as format-preserving encryption (FPE) via FF3-1 to encode your secrets while maintaining the data format and length. In addition, it can also perform pseudonymous transformations of the data through other means, such as masking.
This prevents the need for change in your existing database schemas.
Prerequisites
To perform the tasks described in this tutorial, you need to have Vault Enterprise with the Advanced Data Protection module or a HCP Vault Dedicated Plus tier cluster.
- Access to a Vault Enterprise license with the ADP module to run Vault in dev mode. If you do not have a license you can request one from your customer success team or use an HCP Vault Dedicated Plus tier cluster.
- jq installed
Policy requirements
For the purpose of this tutorial, you can use root
token to work
with Vault. However, it is recommended that root tokens are only used for just
enough initial setup or in emergencies. As a best practice, use tokens with
appropriate set of policies based on your role in the organization.
To perform all tasks demonstrated in this tutorial, your policy must include the following permissions:
If you are not familiar with policies, complete the policies tutorial.
Lab setup
Export an environment variable with a valid Vault Enterprise license.
Open a terminal and start a Vault dev server using the Vault Enterprise binary with
root
as the root token.The Vault dev server defaults to running at
127.0.0.1:8200
. The server is initialized and unsealed.Insecure operation
Do not run a Vault dev server in production. This approach starts a Vault server with an in-memory database and runs in an insecure way.
Export an environment variable for the
vault
CLI to address the Vault server.Export an environment variable for the
vault
CLI to authenticate with the Vault server.Note
For these tasks, you can use Vault's root token. However, it is recommended that root tokens are only used for enough initial setup or in emergencies. As a best practice, use an authentication method or token that meets the policy requirements.
The Vault server is ready, you are ready to proceed with the lab.
Transform secrets engine workflow
Transform secrets engine configuration workflow:
- Enable the
transform
secrets engine - Create a role containing the transformations that it can perform
- Create an alphabet defining a set of characters to use for format-preserving encryption (FPE) if not using the built-in alphabets.
- Create a template defining the rules for value matching if not using the built-in template
- Create a transformation to specify the nature of the data manipulation
Alphabets define a set of valid input/output UTF-8 characters to be used when you perform FPE. In this step, you are going to leverage one of the built-in alphabets. Read the create custom alphabets section to learn how to define your own alphabets.
Data transformation templates are constructed of type (regex
),
pattern (regular
expression) and allowed
alphabet used in the input value. Currently, regex is the only supported
type. The pattern defines the data format pattern. For example, the most credit
card numbers would have a pattern that can be expressed as
(\d{4})-(\d{4})-(\d{4})-(\d{4})
in regex.
In this step, the use of the builtin/creditcardnumber
template is
demonstrated. Read the create custom templates
section to learn how to define your own templates.
Transformations define the transformation template, tweak source or the masking character to be used to transform the secrets.
Tweak source types:
Source | Description |
---|---|
supplied (default) | User provide the tweak source which must be a base64-encoded 7-digit string |
generated | Vault generates and returns the tweak source along with the encoded data. The user must securely store the tweak source which will be needed to decrypt the data |
internal | Vault generates a tweak source for the transformation and the same tweak source will be used for every request |
Note
Tweak source is only applicable to the FPE transformation.
Create a new transformation
In this section, you are going to:
- Enable the transform secrets engine
- Create a payments role which includes card-number transformation
- Create a card-number transformation which performs format preserving encryption
Enable the
transform
secrets engine attransform/
.Create a role named "payments" with "card-number" transformation attached which you will create next.
List the existing roles.
Create a transformation named "card-number" which will be used to transform credit card numbers. This uses the built-in template
builtin/creditcardnumber
to perform format-preserving encryption (FPE). The allowed role to use this transformation ispayments
, which you just created.Example Output:
Note
The
allowed_roles
parameter can be set to a wildcard (*
) instead of listing role names. Also, the role name can be expressed using globs at the end for pattern matching (e.g.pay*
).You will learn how to define your own template in the Create custom templates section.
List the existing transformations.
View the details of the newly created
card-number
transformation.
Transform secrets
The Vault client applications must have the following in their policy to perform
data encoding and decoding using the Transform secrets engine enabled at
transform/
.
Encode a value with the
payments
role.Example output:
Decode the value encoded with
payments
role where thevalue
is set to the returnedencoded_value
.Example output:
Create custom templates
Templates define the data format patterns that you wish to keep while transforming the secrets. In this section, you are going to create a transformation template which encodes British passport numbers.
The passport number on a British passport is a pattern consisting of a 9-digit numeric value which can be expressed
using regular expression as (\d{9})
. The parentheses tell Vault to encode all values
grouped within; therefore, (\d{9})
will encode the entire passport number.
If you want to encode the last 7 digits leaving the first two numbers
unchanged, the expression should be \d{2}(\d{7})
.
Display all the exiting templates.
Create a template named
uk-passport-tmpl
.Example output:
This template uses the built-in alphabet,
builtin/numeric
.Create a transformation named
uk-passport
with theuk-passport-tmpl
template.Example output:
Update the
payments
role to include theuk-passport
transformation.The payments role has two transformations. Future requests to encode/decode require that the specific transformation is provided.
Encode a value with the
payments
role with theuk-passport
.Example output:
Note
Remember that you must specify which transformation to use when you send an encode request since the payments role has two transformations associated with it.
Advanced handling
This section walks you through the new template features introduced in Vault Enterprise v1.9:
Encoding customization
In this section, you are going to create a transformation template which encodes Social Security numbers that may have an optional SSN: or ssn: prefix, and which are optionally separated by dashes or spaces.
A United States Social Security number is a 9-digit number, commonly written
using a 3-2-4 digit pattern which can be expressed using the regular expression
(\d{3})[- ]?(\d{2})[- ]?(\d{4})
. The optional prefix can be expressed using
the regular expression (?:SSN[: ]?|ssn[: ]?)?
. The use of non-capturing groups
tells Vault not to encode the prefix if it is present.
You will use the new encode_format
field to specify what the encoded output
should look like. In the value for encode_format
, variables representing the
capture groups of pattern
are used to lay out the result. The variables are in
the form of $1
, $2
, etc…, one for each of the capture groups in pattern; and
in the form of ${name}
or $name
for the named capture groups. For more
detailed information see the Go regexp.Expand
documentation.
When a template has a value for encode_format
, it will always be used. Make
sure that the resulting encoded output can be matched by pattern
, otherwise
decoding will not be possible.
Create a template named
us-ssn-tmpl
.Example output:
Create a transformation named
us-ssn
with theus-ssn-tmpl
template.Example output:
Update the
payments
role to include theus-ssn
transformation.
Validation
Encode values with the
payments
role with theus-ssn
transformation.Example output:
Try encoding value that starts with
SSN
.Example output:
Try encoding value that starts with
ssn
.Example output:
Decode the value encoded with the
payments
role with theus-ssn
transformation where thevalue
is set to theencoded_value
.Example output:
Decoding customization
When running Vault Enterprise v1.9 or later, you can specify one or more
optional formats to use during decoding. In this section, you are going to
modify the template us-ssn-tmpl
, created in the previous section, to add two
decoding formats: one to decode values separated by spaces, and the second to
decode only the last four digits.
Like encode_format
, decode_formats
have variables representing the capture
groups of pattern
to lay out the decoded output. Specifying a decode format is
optional, and it is specified as part of the path when performing the decode
operation.
Create a template named
us-ssn-tmpl
.Example output:
Create a transformation named
us-ssn
with theus-ssn-tmpl
template.Example output:
Update the
payments
role to include theus-ssn
transformation.Encode values with the
payments
role with theus-ssn
transformation.Example output:
Decode values with the
payments
role, theus-ssn
transformation and thespace-separated
decoding format.Example output:
Decode values with the
payments
role, theus-ssn
transformation and thelast-four
decoding format.Example output:
Access control
As the decode format is part of the path of the write operation during decoding, Vault policies can be used to control access to them.
To demonstrate the functionality, create a token with a policy that will only
permit decoding using the last-four
decode format.
Define the policy in the file named
last-four.hcl
.Create the
last-four
with the policy defined inlast-four.hcl
.Create a token with the
last-four
policy attached and store the token in the variable$LAST_FOUR_TOKEN
.Using the token, decode values with the
payments
role, theus-ssn
transformation and thelast-four
decoding format.Example output:
Trying to use the token to decode with the
payments
role with theus-ssn
transformation with thespace-separated
decode format, or without specifying a decode format will fail.Example output:
Create custom alphabets
Alphabet defines a set of characters (UTF-8) that is used for FPE to determine the validity of plaintext and ciphertext values.
These are a number of built-in alphabets available to use.
Alphabets | Description |
---|---|
builtin/numeric | Numbers |
builtin/alphalower | Lower-case letters |
builtin/alphaupper | Upper-case letters |
builtin/alphanumericlower | Numbers and lower-case letters |
builtin/alphanumericupper | Numbers and upper-case letters |
builtin/alphanumeric | Numbers and letters |
New alphabets can be created to satisfy the template requirements.
To learn the command, create a non-zero-numeric
alphabet which contains
non-zero numbers.
Display existing alphabets.
Create an alphabet named non-zero-numeric
.
This new alphabet consists of only characters from the provided set 123456789
.
Data masking
Data masking is used to hide sensitive data from those who do not have a clearance to view them. For example, this allows a contractor to test the database environment without having access to the actual sensitive customer information. Data masking has become increasingly important with the enforcement of General Data Protection Regulation (GDPR) introduced in 2018.
The following steps demonstrate the use of masking to obscure your customer's phone number since it is personally identifiable information (PII).
Note
Masking is a unidirectional operation; therefore, encode
is the
only supported operation.
You will create a phone-number-tmpl template which masks phone numbers with its country code visible.
Create a template named "phone-number-tmpl" with country code.
Example output:
Create a transformation named "phone-number" with the
phone-number-tmpl
template and allow all roles to use it.Example output:
The
type
is set tomasking
and specifies themasking_character
value instead oftweak_source
. The default masking character is*
if you don't specify one.Test and verify the newly created
phone-number
mask transformation by adding thephone-number
transformation to thepayments
role.Example output:
Encode a value with the
payments
role with thephone-number
transformation.Example output:
Batch input processing
When you need to encode more than one secret value, you can send multiple secrets in a request payload as batch_input instead of invoking the API endpoint multiple times to encode secrets individually.
Example Scenario 1:
You received a credit card number, British passport number and a phone number
of a customer and wish to transform all these secrets using the payments
role.
Create an API request payload with multiple values, each with the desired transformation.
Encode all the values with the
payments
role.Example output:
Example Scenario 2:
An on-premises database stores corporate card numbers and your organization decided to migrate the data to another database. You wish to encode those card numbers before storing them in the new database.
Create a request payload with multiple card numbers.
Encode all the values with the
payments
role.Example output:
Decode the values
Create a request payload with the encoded card numbers.
Decode all the values with the
payments
role.Example output:
Additional discussion
Bring your own key (BYOK)
Vault Enterprise users running version 1.12.0 or greater can use the BYOK functionality to import an existing encryption key generated outside of Vault, and use it with Transform secrets engine.
The target key for import can originate from an HSM or other external source, and must be prepared according to its origin before you can import it.
The example shown here will use a 256-bit AES key, referred to as the target key. To successfully import the target key, you must perform the following operations to prepare it.
Generate an ephemeral 256-bit AES key.
Wrap the target key using the ephemeral AES key with AES-KWP.
Wrap the AES key under the Vault wrapping key using RSAES-OAEP with MGF1 and either SHA-1, SHA-224, SHA-256, SHA-384, or SHA-512.
Delete the ephemeral AES key.
Append the wrapped target key to the wrapped AES key.
Base64 encode the result.
A specific code example for preparing and wrapping the key for import is beyond the scope of this tutorial. For more details about wrapping the key for import including instructions for wrapping key from an HSM, refer to the key wrapping guide.
Before you can wrap the key for import, you must read the wrapping key from Vault so that it can be used to prepare your key.
The output is the (4096-bit RSA) wrapping key.
Use the wrapping key value at step 3 in the previously detailed preparation steps. Once you have prepared and base64 encoded the ciphertext, export the value to the environment variable IMPORT_CIPHERTEXT
.
Example:
Create a new transformation role named physical-access
to use for the proximity-card
transformation that you will import the key into.
Create a new template named identifier
to match a proximity card identifier string having the format shown in this example: 8A642EC3-3C8A-40C2-8AC0-A039ECC0FFEE
.
Example output:
Import the key into the proximity-card
transformation; add the allowed_roles parameter and specify the physical-access
role.
Example output:
Try using the newly imported key to encode a value.
The imported key is working, and the encoded value returned by the proximity-card
transformation is using the imported key.
Note
The FPE transformation does not currently support versioning or rotating of its encryption keys.
Next steps
To actually integrate your application with Vault and leverage the transform secrets engine, there are a number of resources which must be configured.
Before the application can even request data transformation, it first needs to authenticate with Vault. Therefore, an auth method (e.g. AWS, Kubernetes, AppRole) must be enabled and configured for the application to use. In addition, an appropriate policy must be created and attached to the client token.
You can codify the Vault configuration using Terraform, and make the configuration repeatable. The Terraform Vault Provider supports the transform secrets engine. It can create policies, enable and configure auth methods and more.
Refer to the Codify Management of Vault Enterprise Using Terraform tutorial to learn how to leverage Terraform.
On the application side, you can run Vault Agent to authenticate with Vault and manage the lifecycle of the client token. Refer to the following tutorials to learn more about Vault Agent:
The Encrypting Data while Preserving Formatting with the Vault Enterprise Transform Secrets Engine blog post introduces some code examples to invoke the transform secrets engine using Vault API.
Clean up
Unset the
VAULT_TOKEN
environment variable.Unset the
VAULT_ADDR
environment variable.Unset the
IMPORT_CIPHERTEXT
environment variable.Remove the payload files.
You can stop the Vault dev server by pressing Ctrl+C where the server is running. Or, execute the following command.
If you are using Vault Dedicated, you can delete the cluster from the HCP Portal.
Summary
The Transform secrets engine performs secure data transformation and tokenization against the input data. Transformation methods may encompass NIST vetted cryptographic standards such as format-preserving encryption (FPE) via FF3-1, but can also be pseudonymous transformations of the data through other means, such as masking. This tutorial walked through the use of the Transform secrets engine step-by-step.
Limits:
The Transform secrets engine obeys the FF3-1 minimum and maximum sizes on the length of an input, which are a function of the alphabet size.
Help and reference
- Transform Secrets Engine (API)
- Transform Secrets Engine
- Encrypting Data while Preserving Formatting with the Vault Enterprise Transform Secrets Engine
Next steps
The Encrypting Data with Transform Secrets Engine tutorial introduces demo applications written in Go and Java to learn how to implement the Transform secrets engine API in your application.
If you are interested in data tokenization, refer to the Tokenize Data with Transform Secrets Engine tutorial.