Detect infrastructure drift and enforce policies
As your organization grows and your infrastructure provisioning workflows mature, it gets harder to enforce consistency and best practices with training and hand-built tooling alone. Terraform can automatically check that your infrastructure satisfies industry best practices and organization-specific standards, with resource and module-specific conditions. Pre- and post-conditions help you define resource requirements in Terraform configurations. By including custom conditions in module definitions, you can ensure that downstream consumers comply with configuration standards, and use modules properly. In addition, you can use HCP Terraform to verify your infrastructure with workspace-specific run tasks and enforce workspace or organization-wide policies with HashiCorp Sentinel or Open Policy Agent (OPA).
Note
HCP Terraform Free Edition includes one policy set of up to five policies. In HCP Terraform Plus Edition, you can connect a policy set to a version control repository or create policy set versions via the API. Refer to HCP Terraform pricing for details.
In this tutorial, you will use both Terraform preconditions and policies to validate configuration and enforce compliance with organizational practices. First, you will use Terraform preconditions to enforce network security conventions. Then, you will learn how to configure and enforce policies in HCP Terraform, preventing infrastructure deployments on certain days of the week. Finally, you will use HCP Terraform's drift detection to detect when infrastructure settings have diverged from your written Terraform configuration.
Pre- and post-conditions help you define resource requirements in Terraform configurations. By including custom conditions in module definitions, you can ensure that downstream consumers comply with configuration standards, and use modules properly.
Prerequisites
This tutorial assumes that you are familiar with the Terraform and HCP Terraform workflows. If you are new to Terraform, complete the Get Started tutorials first. If you are new to HCP Terraform, complete the HCP Terraform Get Started tutorials first.
In order to complete this tutorial, you will need the following:
- Terraform v1.4+ installed locally.
- An AWS account.
- An HCP Terraform account with HCP Terraform locally authenticated.
- An HCP Terraform variable set configured with your AWS credentials.
Create example repository
Visit the template
repository
for this tutorial. Click the Use this template button and select Create a
New Repository. Choose the GitHub owner that you use with HCP Terraform, and
name the new repository learn-terraform-drift-and-policy
. Leave the rest of the
settings at their default values.
Clone example configuration
Clone your example repository, replacing USER
with your own GitHub username.
You will push to this fork later in the tutorial.
Change to the repository directory.
Review infrastructure configuration
This repository contains a local Terraform module that defines a network and bastion host, and a root configuration that uses the module. It also contains Sentinel policies and OPA policy definitions , which you will review later in this tutorial.
Open the modules/network/main.tf
file in your code editor. This configuration
uses the public vpc
module to provision networking resources, including public
and private subnets and a NAT gateway. It then launches a bastion host in one of
the public subnets.
The bastion host is intended to be the single point of entry for any SSH traffic
to instances within the VPC’s private subnets. The configuration also includes a
security group that scopes any ingress SSH traffic to the bastion to just the
192.80.0.0/16
CIDR block, an example CIDR representing your organization’s
network.
Though this configuration references this module locally, in a larger organization, you would likely publish it in your Terraform registry. By including a bastion in the boilerplate of your networking configuration, you can establish a standard for SSH access to instances in your networks.
Define a precondition
The network
module defines a bastion_instance_type
input variable to allow users to account for anticipated usage and workloads. Over-provisioning the bastion would incur unnecessary cost for your organization. As a result, while you want to allow users to specify an instance type, you do not want to allow them to provision an instance that is too big. You will add a precondition to verify that the instance type does not have more than 2 cores, to keep your operating costs low.
First, add the data source below to the module configuration. It accesses the instance type details, including the number of cores, from the AWS provider.
Now, add the precondition to the aws_instance.bastion
resource definition.
Terraform evaluates preconditions and checks whether the configuration satisfies the condition before it will create the plan. In this case, Terraform checks whether the default_cores
value from the aws_ec2_instance_type.bastion
data source is less than 2 cores or less before it creates the plan to provision the bastion instance and your other resources.
Deploy infrastructure
The root Terraform configuration uses the network
module to create a bastion host and networking
components including a VPC, subnets, a NAT gateway, and route tables.
It sets the values for input variables in the terraform.auto.tfvars
file. The
initial value for the bastion instance type is t2.2xlarge
, which has 8 cores
and will fail the precondition as expected.
Set your HCP Terraform organization name as an environment variable to configure your HCP Terraform integration.
Tip
If multiple users in your HCP Terraform organization will run this tutorial,
add a unique suffix to the workspace name in terraform.tf
.
Initialize your configuration. As part of initialization, Terraform creates your
learn-terraform-drift-and-policy
HCP Terraform workspace.
Now, attempt to plan your configuration. The plan will fail because the instance size you specified is too big, and the precondition will return an error.
The t2.2xlarge
instance type has 8 cores, so this Terraform run failed the
precondition defined in the networking module.
Change the bastion_instance_type
variable in terraform.auto.tfvars
to t2.small
.
Now apply your configuration. Now the precondition will pass, and Terraform will
plan your changes. Respond yes
to the prompt to confirm the operation.
Using a precondition to verify resource allocation lets you use the most up to date information from AWS to determine whether or not your configuration satisfies the requirement. While you could have also used variable validation to catch the violation, that would require researching all of the instance types and their capacities and listing all of the acceptable instance types in your configuration, making it less flexible.
Review policy
Configuration-level validation such as variable constraints and preconditions let you socialize standards from within your written configuration. However, module authors and users must voluntarily comply with the standards. Module authors must include conditions in module definitions, and users must consume those modules to provision infrastructure. To enforce infrastructure standards across entire workspaces or organizations, you can use HCP Terraform policies, which work without requiring your users to write their infrastructure configuration in a specific way.
HCP Terraform allows you to choose either Sentinal or the Open Policy Agent (OPA) as your policy engine. This tutorial includes policies for both policy engines. Select the tab below to follow this tutorial with your preferred policy engine.
Navigate to the sentinel
directory in the example repository.
Open the sentinel.hcl
file to review the policy set configuration.
This policy set defines two policies, friday_deploys
and public_ingress
. It
sets the enforcement level to advisory
for the friday_deploys
policy, and to
mandatory
for the public_ingress
policy. When HCP Terraform detects a
failure in an advisory policy, it will notify you of the failures but allows you
to provision your resources anyway. When a mandatory policy fails, HCP Terraform
will refuse to apply the plan until the policy passes. The query format
references the package name declared in the policy file, and the name of the
rule defined for the policy.
In addition to placing guardrails on infrastructure configuration, you may wish
to enforce standards around your organization’s workflows themselves. One common
practice is to prevent infrastructure deployments on Fridays in order to lower
the risk of production incidents before the weekend. The friday_deploys
policy
prevents infrastructure deployments on a certain day of the week.
The example policies include tests, so that you can verify that they work as expected before using them with HCP Terraform. Run your policy's tests now.
Next, test your policies.
Sentinel loads tests from directories that match the name of each of your policies.
The public_ingress
policy parses the planned changes for a Terraform run and
checks whether they include security group updates that allows public ingress
traffic from all CIDRs (0.0.0.0/0
). This policy helps enforce your security
posture by preventing the creation of any overly permissive security groups.
Create a policy set
HCP Terraform organizes policies in policy sets. Policy sets can contain either Sentinel or OPA policies. You can apply a policy set across an organization, or only to specific workspaces.
There are three ways to manage policy sets and their policies: VCS repositories, the HCP Terraform API, or directly through the HCP Terraform UI. In this tutorial, you will configure policy sets through VCS. The VCS workflow lets you collaborate on and safely develop and version your OPA policies, establishing the repository as the source of truth.
You will now create your policy set.
First, log in to HCP Terraform, and select the organization you will use to complete this tutorial.
Navigate to your organization's Settings, then to Policy Sets. Click Connect a new policy set.
Select the Version control provider (VCS) option.
Tip
Review the HCP Terraform VCS tutorial for detailed guidance on how to configure your VCS integration.
On the Configure settings page:
- Select either Sentinel or Open Policy Agent as the policy integration, depending on which you are using for this tutorial.
- Name your policy
learn-terraform-drift-and-policy
. - Set the Scope of Policies to Policies enforced on selected workspaces
- Under Workspaces, select your
learn-terraform-drift-and-policy
workspace. - Under Overrides, uncheck the box next to "This policy set can be overridden in the event of mandatory failures."
- Click Next.
On the Connect to VCS page:
- Select your Github.com integration.
- Select the
learn-terraform-drift-and-policy
repository you created for this tutorial. - Set the Policies path to either
/sentinel
or/opa
, depending on which policy engine you are using for this tutorial. - Click Next.
On the Parameters page:
- Click the + Add parameter button.
- Set the Key to
forbidden_days
and the value to a list containing today, for example:["Monday"]
. - Click the Save parameter button.
- Click the Connect policy set button to connect your policy set to HCP Terraform.
HCP Terraform will print out a summary of your new policy set.
Trigger policy violation
The networking resources you provisioned earlier include a bastion host configured with a security group that restricts ingress traffic to your organization’s internal network. Imagine that an engineer is troubleshooting a production incident and tries to get around this restriction by making the security group more permissive.
To simulate this, update the ingress rule for the aws_security_group.bastion
resource in modules/network/main.tf
.
Run terraform apply
in the repository's root directory to attempt to update the security group. The apply will fail as expected since the ingress rules are too permissive.
HCP Terraform detected the policy failures: the security group allows public ingress, and deploys are blocked today. The CLI output and run details in HCP Terraform list which policies failed.
Using policies in HCP Terraform, you prevented Terraform from creating resources that violate your infrastructure and organization standards.
Before moving on, fix your policy and configuration to allow a successful apply.
Navigate to your organizations Policy sets page, and select your
learn-terraform-drift-and-policy
policy set.
Scroll to the bottom of the page, and select the ... button next to your
forbidden_days
parameter, and click Edit. Set the value to ["Friday"]
(or another day if today is Friday), and click Save parameter.
Revert the change to your for the aws_security_group.bastion
resource in modules/network/main.tf
so that it reflects your actual infrastructure configuration.
Reapply your configuration to bring your workspace back into a healthy state.
Introduce infrastructure drift
Note
Drift detection is available in HCP Terraform Plus Edition. Skip to the clean up step if you do not have access, or refer to HCP Terraform pricing for details.
Custom conditions, input validation, and policy enforcement help organizations maintain their standards at the time of resource provisioning. HCP Terraform can also check whether existing resources in Terraform state still match the intended configuration.
Detect drift
HCP Terraform’s automatic health assessments help make sure that existing resources match their Terraform configuration. To do so, HCP Terraform runs non-actionable, refresh-only plans in configured workspaces to compare the actual settings of your infrastructure against the resources tracked in your workspace’s state file. The assessments do not update your state or infrastructure configuration.
Assessments include two types of checks, which you enable together. Drift detection determines whether resources have changed outside of the Terraform workflow. Health checks verify that any custom conditions you define in your configuration are still valid, for example checking if a certificate is still valid. You can enable assessments on specific workspaces, or across all workspaces in an organization. Assessments only run on workspaces where the last apply was successful. If the last apply failed, the workspace already needs operator attention. Make sure your last apply succeeded before moving on.
Navigate to your learn-terraform-drift-and-policy
workspace in the HCP
Terraform UI. Under the workspace's Settings, select Health.
Select Enable, then click Save settings.
Before the first health assessment will run, you must have a successfuly apply.
Apply your configuration again. There will be no changes. espond to the
confirmation prompt with a yes
.
Create infrastructure drift
Returning to the hypothetical production incident, imagine that an engineer tries to work around the policy by making manual resource changes while troubleshooting.
To simulate this, navigate to your security groups in the AWS console.
Find the bastion_ssh
security group. Select the Inbound rules tab in the
security group details, then click Edit inbound rules. Delete the
192.168.0.0/16
source CIDR and replace it with 0.0.0.0/0
. Then, click Save
rules.
You have now introduced infrastructure drift into your configuration by managing the security group resource outside of the Terraform workflow.
After a few minutes, Terraform will report failed assessments on the workspace overview page. You can also trigger an assessment manually by navigating to your workspaces Health page, and clicking the Start health assessment button.
Click View Details to get more information. HCP Terraform detected the change to your ingress rule and reported what will happen on your next run if you do not update your configuration.
Tip
Drift detection only reports on changes to the resource attributes defined in your configuration. To avoid accidental drift, explicitly set any attributes critical to your operations in your configuration, even if you rely on a provider's default value for that attribute.
The health assessments detected infrastructure drift. These checks ensure that your infrastructure configuration still matches the written configuration and satisfies any defined custom conditions, extending your validation coverage beyond just the time of provisioning. Fixing drift is a manual process, because you need to understand whether you want to keep the infrastructure changes made outside of Terraform, or overwrite them. In this case, you could run another Terraform apply to overwrite the security group update.
Clean up infrastructure
Destroy the resources you created as part of this tutorial to avoid incurring
unnecessary costs. Respond yes
to the prompt to confirm the operation.
Optionally, delete your learn-terraform-drift-and-policy
workspace and
policy set from your HCP Terraform organization.
Next steps
In this tutorial, you used Terraform language features and HCP Terraform policies to make sure that your infrastructure matches your configuration, and complies with your organization’s needs and standards. Configuration-level validation such as preconditions let you specify standards within Terraform configurations. HCP Terraform policies let you enforce standards for an entire workspace or organization. You also used HCP Terraform health assessments to make sure that existing infrastructure still matched Terraform configuration, and had not changed outside of the Terraform workflow.
To learn more about how Terraform features can help you validate your infrastructure configuration, check out the following resources:
Review the policy documentation.
Learn how to configure and use health assessments to detect infrastructure drift.
Learn how to manage your infrastructure costs in HCP Terraform.
Learn how to use HCP Terraform run tasks and HCP Packer to ensure machine image compliance.
Review the health assessment documentation.