Terraform Enterprise backup - recommended pattern

29min
|
Enterprise
Terraform

Many business verticals require business continuity management (BCM) for production services. A reliable backup of your Terraform Enterprise deployment is crucial to ensuring business continuity. The backup should include data held and processed by Terraform Enterprise's components so that operators can restore it within the organization's Recovery Time Objective (RTO) and to their Recovery Point Objective (RPO).

This guide extends the Backup & Restore documentation, which contains more technical detail about the backup and restore process. This guide discusses the best practices, options, and considerations to back up Terraform Enterprise and increase its resiliency. It also recommends redundant, self-healing configurations using public and private cloud infrastructure, which add resilience to your deployment and reduce the chances of requiring backups.

Most of this guide is only relevant to single-region, multi-availability zone External Services mode deployments except where otherwise stated. Refer to Backup a Mounted Disk Deployment section below for specific details if you are running a Mounted Disk deployment. This guide does not cover Demo mode backups.

For region redundancy, repeat the recommendations in this guide for each region and consider the recommendations in the Multi-Region Considerations section at the end of this page.

For recommended patterns for recovery and restoration of TFE, refer to the Terraform Enterprise Recovery & Restoration Recommended Pattern.

Definitions

Business continuity (BC) is a corporate capability. This capability exists whenever organizations can continue to deliver their products and services at acceptable, predefined levels whenever disruptive incidents occur.

Note

The ISO 22301 document uses business continuity rather than disaster recovery (DR). As a result, this tutorial will refer to business continuity instead of disaster recovery.

Two factors heavily determine your organization's ability to achieve BC:

Recovery Time Objective (RTO) is the target time set for the resumption of product, service, or activity delivery after an incident. For example, if an organization has an RTO of one hour, they aim to have their services running within one hour of the service disruption.
Recovery Point Objective (RPO) is the maximum tolerable period that data can be lost after an incident. For example, if an organization has an RPO of one hour, they can tolerate the loss of a maximum of one hour's data before the service disruption.

Based on these definitions, you should assess the valid RTO/RPO for your business and approach BC accordingly. These factors will determine your backup frequency and other considerations discussed later in this guide.

In this guide:

A public cloud availability zone (AZ) is equivalent to a single VMware-based datacenter.
A public cloud multi-availability zone is equivalent to a multi-datacenter VMware deployment.
The main AZ is the Primary. Any other AZs in the same region are the Secondary. The Secondary region is a business continuity/failover region only and is not an active secondary location. You should consider all availability zones equal for the purposes of illustration.

Best Practices

Maintain the backup and restore process

When you deploy Terraform Enterprise:

Test the backup and restoration process and measure the recovery time to ensure it satisfies your organization's RTO/RPO.
Document the backup and restoration process.
Arrange for staff who did not write the documentation process to run a test restore using it. This measure will increase confidence in the backup and restore process.
Regularly test the backup and restoration process to ensure the documentation is reliable, especially if staff leave.

Manage sensitive values

For fully automated deployments, you must manage several common sensitive values. The methods below do not back up these data and you should secure them another way. Do not store any of these sensitive values in version control or allow them to leak into shell histories.

Active/Active deployments must be automated, and have additional sensitive values you must manage.

Process audit logs

Audit log processing helps you identify the root cause during a data recovery incident.

Follow the guidance on Terraform Enterprise logs to aggregate and index logs from the Terraform Enterprise node(s) using a central logging platform such as Splunk, ELK, or a cloud-native solution. These should be used as a diagnostic tool in the event of outage, scanning them for ERROR and FATAL messages as part of root cause analysis.

Terraform Enterprise backup API

The backup API facilitates backups and migrations from one operational mode or deployment method (Standalone or Active/Active) to another.

Only use the backup API to migrate between low-volume implementations, especially in non-production environments. Use cloud-native tooling instead for day-to-day backup and recovery on public cloud, and standard approaches for on-premise deployments as detailed below.

Prepare to backup

The following recommendations will improve your security posture, reduce the effort required to maintain an optimal Terraform Enterprise instance, and speed up deployment time during a restoration.

Harden the server image using CIS benchmarking.
Run Terraform Enterprise on single-use machines — do not run other services on the same VMs.
Remove all unnecessary packages from the operating system.
Deploy immutable instances using automation by repaving instances with patched images rather than patching them in place. This process requires you to maintain the setup configuration in the code used to deploy the system. As a result, you can avoid taking application-layer snapshots using Terraform Enterprise automated recovery, which captures this information in the snapshots.
Pin the version of Terraform Enterprise that the Replicated install.sh script deploys to avoid accidental version upgrades. Use the flag release-sequence=${tfe_release_sequence} where ${tfe_release_sequence} is the Replicated release sequence. Look up the release sequence on this page. For example, for release v202103-3, use 523 as the ${tfe_release_sequence}.

Note

The Automated Recovery function only backs up installation data and not application data. If you have an automated deployment, you don't need to use the Automated Recovery function.

Reference the tab(s) below for specific recommendations relevant to your installation method.

If you are using the online installation method, configure the boot script to run the Replicated install.sh script explicitly without the airgap argument when the new VM starts up. The VM will download the installation media from the Internet and install the service.

Based on the Replicated configuration, the application will connect to the configured object store and database resources automatically.

If you are using the air-gapped installation method, use one of the following ways to ensure the installation media is available to the install configuration.

Include the Terraform Enterprise installation media files on the machine images used to provision the instance (phoenix build practice). Use an automated build pipeline that can manage images, such as Packer. Do not install Terraform Enterprise as part of the image build because the installer needs to retrieve the host configuration data, which is not available before the VM starts. Arrange for Terraform Enterprise to install from the media on the first boot. This method gives you a faster recovery time because the installation media is already on disk, ready for installation. You can also patch the images automatically.
Use a dedicated object store bucket or blob to store the installation media (a 'bootstrap' bucket), if you are on public cloud.
Store your installation media in a network-based artifact repository. The time it takes to transfer the installation media from the repository to the application server could adversely impact your RTO if the media is not in close network proximity (for example, transferring from an on-premise location to a deployment in a cloud environment).

Application Server

We recommend you automatically replace application server nodes when a node or availability zone fails. Replacing the node provides redundancy at the server and availability zone level. Public clouds and VMware have specific services for this.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

Use an Auto Scaling group (ASG) to automatically replace nodes on AWS. Select your deployment for more details.

For Standalone deployments, set an ASG's min_size and max_size to 1. When a node or availability zone fails, the ASG will automatically replace the node. The time it takes for the ASG to replace the node depends on the time it takes the node to be ready. For example, if the node needs to download the installation media from a network, the node will not be ready until the node downloads and installs the installation media.

For both Standalone and Active/Active deployment, populate the ASG vpc_zone_identifier list with at least two subnets. If the region supports additional subnets, we recommend a minimum of three subnets since it provides n-2 AZ redundancy.

Use a zone-balanced Linux virtual machine scale set (VMSS) to automatically replace nodes on Azure. Select your deployment for more details.

For Standalone deployments, set a zone-balanced Linux VMSS's instances to 1. When a node or availability zone fails, the VMSS will automatically replace the node in the same region. The time it takes for the VMSS to replace the node depends on the time it takes the node to be ready. For example, if the node needs to download the installation media from a network, the node will not be ready until the node downloads and installs the installation media.

For both Standalone and Active/Active deployment, if using Terraform to deploy, use the azurerm_linux_virtual_machine_scale_set resource. Set the zones to a minimum of two (preferably three) zones in the region, and set zone_balance to true, which provides zone redundancy.

Use a regional managed instance group (MIG) to automatically replace nodes on GCP. Select your deployment for more details.

For Standalone deployments, set a MIG's target_size to 1. When a node or availability zone fails, the MIG will automatically replace the node in the same region. Enable the Google compute health check and configure the auto-healing policy of the instance group manager. The time it takes for the MIG to replace the node depends on the time it takes the node to be ready. For example, if the node needs to download the installation media from a network, the node will not be ready until the node downloads and installs the installation media.

For both Standalone and Active/Active deployment, if using Terraform to deploy, use the google_compute_region_instance_group_manager resource as this deploys a regional MIG, and thus ensures that the application server layer can automatically recover from a zone failure.

In an External Services mode scenario, the application server is running as a stateless node.

If you deployed the instance from code (where the sensitive data listed above is available to the repaving of the node), you do not need to back up the instance. Redeploying the application server is acceptable.
If you deployed the instance through a vRA/vRO experience and need to manually configure it after it comes up, you should back up the instance. Available VMWare operational practices are likely to already include a strategic backup solution, which we would recommend using in this case. There are many VMware backup solutions, and the relative merits are outside the scope of this guide. HashiCorp customers have successfully backed up Terraform Enterprise virtual machines using Dell EMC RP4VM and Veritas NetBackup, however, the suitability of these products for your business is also out of the scope of this guide and will need to be decided separately.

Object Store

We recommend the following to support the object store's business continuity:

Choose fast storage optimized for use, scales well, and automatically replicates to another zone in the same region. Each public cloud has a well-known option in this space. For private cloud External Services mode deployments, you must use S3-compatible storage.
Configure accidental/MFA deletion protection to prevent accidental deletion.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

The most likely problem with the object store is service inaccessibility or corruption through human error rather than loss of durability, due to AWS's claim of eleven 9s of durability.

As a result, S3 Same-Region Replication is not explicitly required for the Terraform Enterprise object store because it does not add sufficient value: corruption on the primary S3 bucket will be replicated to the secondary automatically.

We recommend the following to ensure you back up your application data appropriately.

Refer to AWS's S3 FAQs for information about S3's durability.
Implement the security best practices for Amazon S3.
Enable versioning on the bucket used as the object store. If you are using a bucket as a bootstrap store to contain installation media, enable versioning on the bucket.
You should use S3 Standard class buckets.
The buckets should be in the same region as the EC2 worker node(s).
Use the VPC endpoint for S3.
AWS Backup is not an option for object stores because S3 is not supported as a source.

The most likely problem with the object store is service inaccessibility or corruption through human error rather than loss of durability, due to Azure's claim of eleven 9s of durability.

We recommend the following to ensure you back up your application data appropriately.

Use a zone-redundant storage account blob (ZRS).
Implement the security recommendations for Azure blob storage.
Enable point-in-time restores.
- In order to do this, you will need to also enable its prerequisites on the storage account:
  - Enable soft delete for blobs and set retention to the maximum 365 days.
  - Enable the change feed
  - Enable blob versioning
- While Microsoft recommend maintaining fewer than one thousand versions per blob, Terraform Enterprise does not overwrite existing objects, so additional blob versions will only exist as a result of workspace or accidental deletion operations.
- Note the Azure limitations on storage container deletion.
Use a Hot access tier, General-purpose V2 Azure blob storage account.
The bucket should be in the same region as the worker node.
Use the Microsoft.Storage endpoint.

The most likely problem with the object store is service inaccessibility or corruption through human error rather than loss of durability, due to GCP's claim of eleven 9s of durability.

We recommend the following to ensure you back up your application data appropriately.

Refer to Google's Storage documentation for information about Google Cloud Storage's (GCS) durability.
Implement the security recommendations for GCS.
Enable object versioning.
Use a regional Standard Storage cloud storage bucket as an object store.
The bucket should be in the same region as the managed instance group.

For on-premise External Services deployments, as the architectural requirements include an S3-compatible storage facility, such as minIO or Dell ECS:

If using minIO, use an active-passive configuration to ensure that objects are replicated. Enable object versioning on the buckets. Refer to minIO setup guide for more information.
If using Dell ECS, we recommend using the default erasure coding scheme (12+4) and, therefore, a minimum of four discrete nodes in the storage cluster intraregion. Because Terraform providers comprise a significant percentage of stored slug objects (created for each workspace run), the slug size stored could be larger or smaller than 128Mb. Refer to the Dell white paper for information about ECS high availability design. We recommend you back up your ECS rig and enable cross-DC replication (refer to Multi-Region Considerations below).

Database

You should configure the database to be in line with Terraform Enterprise's PostgreSQL requirements.

For high availability in a single public cloud region, we recommend deploying the database in a multi-availability zone configuration to add resilience against recoverable outages. For coverage against non-recoverable issues (such as data corruption), take regular snapshots of the database.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

In addition to the general recommendations above, consider the following AWS-specific recommendations:

Implement the security best practices for Amazon RDS.
Use a multi-AZ deployment. AWS will create the primary DB instance in the Primary AZ and synchronously replicate the contents to the standby instance in the Secondary AZ. Refer to AWS's high availability for RDS documentation for more information.
Configure the PostgreSQL database in line with the HashiCorp Terraform Enterprise AWS Reference Architecture.
Configure AWS Backup for RDS, using continuous backup and point-in-time-recovery (PITR).
If using Aurora as the Terraform Enterprise RDS database, you automatically benefit from point-in-time recovery, continuous backup to Amazon S3, and replication across three availability zones. How many days of retention you require is a business decision, but HashiCorp recommends the maximum 35-day retention for maximum flexibility, particularly in regulated environments.
Additionally configure database snapshots.
- The default DB backup is taken from the standby instance once a day. Snapshots can be used to achieve an RPO of less than a day, and also to facilitate region-redundancy if they are stored in region-replicated buckets. They are also persist beyond the 35-day PITR window above.
- Trigger DB snapshots automatically at required points during the day. The organization's needs and RPO will determine when you should trigger DB snapshots.
- Continuously monitor the length of time it takes to do the backup and compare this to the RPO to avoid overlapping backups.
In both cases, ensure that backups and snapshots are secure and restrict access to only required staff.
Keep up-to-date with the AWS RDS documentation.
Actively and continuously monitor operational health, and configure automatic event notifications.
Store your snapshots in Amazon S3 in the same region as the platform to reduce recovery time.
If Terraform is being used to deploy standard RDS, in the aws_db_instance resource:
- Set backup_window to a suitable period in line with company policy and regulations. The default backup window should be 30 minutes.
- Set multi_az to true
If Terraform is being used to deploy Aurora RDS, in the aws_rds_cluster resource:
- Set availability_zones to a list of at least three EC2 availability zones. AWS will increase availability zones to at least three if you specify less than three; however, we recommend using at least three to maximize the database layer's recoverability.
- Set preferred_backup_window and preferred_maintenance_window to times convenient to your business model.
For both RDS and Aurora RDS, if using Terraform as above, set backup_retention_period to suitable periods according to company policy and regulations. The recommended retention is the current maximum of 35 days since this maximizes the recoverability of the data; however, you should be aware of the costs associated with the potential level of data retention. Use snapshots to retain DB copies for longer than this.

In addition to the general recommendations above, consider the following Azure-specific recommendations:

Follow the performance best practices for Azure Database for PostgreSQL.
Enable high availability on the Azure Database for PostgreSQL service (default configuration).
- This provides automatic single-region redundancy for recoverable errors in seconds.
- For unrecoverable errors such as data corruption or user error, configure a point-in-time recovery which would require database snapshots.
- Consider the overview of BC with Azure Database for PostgreSQL - Single Server.
Configure the PostgreSQL database in line with the HashiCorp Terraform Enterprise Azure Reference Architecture and the Azure backup and restore guide.
Keep up-to-date with the Azure Database for PostgreSQL documentation.
Always ensure that event notifications and operational health are being actively monitored.
Ensure that backups and snapshots are secure and restrict access to only required staff. Azure encrypted automatic database snapshots by default.
Use servers that can support up to 16Tb storage - bear in mind the supported regions for these. This option currently provides eight days of coverage.
If Terraform is being used to deploy Azure Database for PostgreSQL, set the azurerm_postgresql_server resource's backup_retention_days to a suitable period inline with company policy and regulations. The recommended retention is the current maximum of 35 days since this maximizes the recoverability of the data; however, you should be aware of the costs associated with the potential level of data retention. Use snapshots to retain DB copies for longer than this.

In addition to the general recommendations above, consider the following GCP-specific recommendations:

Adopt Google CloudSQL best practices detailed here. In particular, note that exports take longer to create because an external file is created in Cloud Storage that can be used to recreate your data. Exports are unaffected if you delete the instance.
Use a multi-AZ HA regional deployment instance so GCP automatically creates the primary DB instance in the Primary AZ and synchronously replicates the contents to the standby instance in the Secondary AZ. Failover is automatic. Refer to GCP's Replication in Cloud SQL documentation for more information.
Configure the PostgreSQL database in line with the HashiCorp Terraform Enterprise GCP Reference Architecture.
Keep up-to-date with the GCP CloudSQL documentation.
Always ensure that the database is being actively monitored.
Ensure that backups and snapshots are secure and restrict access to only required staff.
Data in a Cloud SQL database is encrypted automatically. The configuration of Google-managed encryption versus customer-managed encryption is down to site-specific policy.
If Terraform is being used to deploy CloudSQL, in the google_sql_database_instance resource:
- In the settings stanza, set availability_type to REGIONAL to enable high availability
- Under the settings stanza, in the backup_configuration subblock, set enabled and point_in_time_recovery_enabled to true, set an appropriate start_time for backups to run.

In addition to the general recommendations above, consider the following VMware-specific recommendations:

We understand that customers with private clouds are likely to have an established backup policy for databases already, possibly including a software partnership with a recognized backup vendor. In this case, for External Services mode deployments, we recommend you use these existing practices and tooling.

We make these additional recommendations for database backups:

Refer to the PostgreSQL continuous archiving and PITR document to be able to replay write-ahead logs (WALs).
Establish automated snapshots in line with the established RPO.

Redis Cache

This section is only relevant if you are running an Active/Active deployment.

Because the Redis instance serves as an active memory cache for Terraform Enterprise, you don't need to maintain backups. However, we recommend you ensure regional availability to protect against zone failure.

Note

Enabling Redis RDB backups may be unnecessary due to the ephemeral nature of the data in the cache at any given time.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

AWS has a significant number of business continuity configuration options for Redis.

If you use Terraform to deploy Terraform Enterprise, refer to AWS ElastiCache section of the Active/Active deployment guide for an example Redis configuration.

Your aws_elasticache_replication_group.tfe resource should look similar to the one found below. This configuration is for a Redis (cluster mode disabled) cluster of three nodes, one in each availability zone to confer n-2 zone redundancy.

resource "aws_elasticache_replication_group" "tfe" {
  ## ...

  num_cache_clusters = 3
  preferred_cache_cluster_azs = [var.availability_zones]
  multi_az_enabled = true
  automatic_failover_enabled = true
}

Note

You should set the preferred_cache_cluster_azs argument to a list of availability zones equal to the number of cluster nodes. The first availability zone in the list will be the primary zone for the cluster. Duplicates are allowed.

Note

The setup will increase cost, so you should be mindful when setting up your Redis clusters. Setting a minimum of two cache clusters with the above configuration will ensure failover capability.

Azure Cache for Redis has built-in high availability.

If you use Terraform to deploy Terraform Enterprise, refer to the Azure Cache for Redis section of the Active/Active deployment guide.

Your azurerm_redis_cache.tfe resource should look similar to the one found below. This configuration is for a Redis (cluster mode disabled) cluster of three nodes, one in each availability zone to confer n-2 zone redundancy.

resource "azurerm_redis_cache" "tfe" {
  ## ...

  capacity  = 3
  family    = "P"
  sku_name  = "Premium"
}

Note

The Azure Premium tier is currently available in preview.

Note

The setup will increase cost, so you should be mindful when setting up your Redis clusters. Setting a minimum of two cache clusters with the above configuration will ensure failover capability.

The Standard Tier of the GCP Memorystore for Redis service provides high availability through replication and automatic failover capability. However, this tier provides only a second node, which provides an n-1 zone redundancy. The Standard Tier is currently the highest.

If you use Terraform to deploy Terraform Enterprise, refer to the GCP Memorystore for Redis section of the Active/Active deployment guide for an example configuration.

Multi-Region Considerations

Terraform Enterprise's application architecture is currently single-region. The additional configuration should be for business continuity purposes only and not for cross-region, Active/Active capability. Support for the below would be on a best-endeavors basis only. In addition, cross-region functionality on every application tier is not supported in every region. Check support as part of architectural planning.

Generally, we recommend you repeat the recommendations in this guide for each region to achieve region redundancy in a Terraform Enterprise deployment.

Note

Cross-region deployments incur additional hosting costs.

Recommendations common to the most-used cloud vendors include:

Use automated deployments to easily and quickly deploy the Application Server layer in the Secondary region.
In a cross-region-redundant failover situation, the object store and database would already be present in the Secondary region. You would need to flip the DNS so the service address points to the new region. After you have tested flipping the DNS, we recommend you script the DNS manipulation to automate the process during an actual outage.

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

The following additional considerations provide an n-1 region redundancy on AWS. Since both cross-region S3 replication and Aurora read replicas can provide replicas in multiple Secondary regions, it is possible to offer greater than n-1 region redundancy if required.

Use AWS S3 cross-region replication (CRR) on the object store; this means creating a pair of buckets, one in each region, configured to replicate from the Primary to the Secondary.
Use S3 CRR on the buckets that store applicable database snapshots and on the bootstrap buckets that store the air-gapped installation media. Doing this locates critical data local to the ASG in the respective region.
Use Aurora as the RDS DBaaS solution and enable cross-region read replicas.

The following additional considerations will provide an n-1 region redundancy on Azure:

Use Azure geo-zone-redundant storage (GZRS) on the object store storage accounts. GZRS provides sixteen 9s of durability, maintains the recommended ZRS replication on objects in the Primary region, and replicates data to the Secondary region.
- In the second region, use LRS, not ZRS. During a whole-region outage, it is part of BC planning and incident management to decide how long the outage will last and whether to persist with LRS in the Secondary region or to migrate to ZRS in that region.
- GZRS is not supported in every region.
- Geo-redundant data transfer happens asynchronously so there will always be a potential for data loss.
Use read replicas in Azure Database for PostgreSQL to replicate the database to the Secondary region.
- Not all Secondary regions can be used with all Primary regions. Refer to this matrix for details.
- We recommend using Azure Paired Regions when region redundancy is required and you need the most up-to-date BC replicas.
- If using Terraform to deploy the database, set geo_redundant_backup_enabled = true in the azurerm_postgresql_server resource.
Using the Azure Cache for Redis Premium tier, you can enable geo-replication between two regions, where the writes in the Primary region are replicated to the Secondary region.
- Read through and satisfy the geo-replication prerequisites.

The following additional considerations will provide an n-1 region redundancy on GCP:

For cross-region object store GCS buckets, we recommend dual-region buckets since you need a separate Secondary region to replicate the object store to. We do not recommend multi-region buckets unless you intend to enable more than one Secondary region.
- For geo-redundant storage, Cloud Storage data is redundant within at least one geographic place as soon as you upload it.
- Be aware of the pairs of regions present in each dual-region bucket class. If you require cross-region deployment in Europe, due to EUR4 being europe-north1 and europe-west4 and the requirement to colocate the MIG in the same location as the GCS bucket, you must deploy Terraform Enterprise to one of these two regions to ensure a working instance with successful cross-region replication.
- If the above region choices are not possible or greater than n-1 region redundancy is required, you need to either develop independent replication or use multi-region buckets. We recommend discussing this matter with a HashiCorp Customer Success Manager during the planning phase.
CloudSQL supports cross-region replication.
- If using Terraform to deploy the database, set location to the Secondary region in the backup_configuration subblock of the settings stanza of the google_sql_database_instance resource.
There is no current support for geo-location of Memorystore for Redis replicas, so we recommend you repeat the above advice for each region.

Since this guide refers to multiple availability zones and maps these zones to separate VMware datacenters, multi-region deployments require connected datacenters in different countries or continents.

Repeat the recommendations in this guide for each region and use the strategic connections between regions to migrate Terraform Enterprise workloads during outages. The key concepts are to ensure:

The S3-compatible object storage is replicated across region.
The PostgreSQL database snapshots are made available in the BC region(s).
You can switch global DNS/service addressing to the appropriate region in an outage.
Full scenario testing conducted before production.

Backup a Mounted Disk Deployment

The backup approach for a Mounted Disk operational mode is simpler than for External Services mode because it involves a single machine and possibly its business continuity instance. Also, a Mounted Disk deployment backup ensures the integrity of the machine and its attached data disk.

Tip

Read the Definitions and Best Practices, General Information and Preparation sections before continuing this section.

We recommend using Mounted Disk mode when provisioning on private cloud if the added complexity of managing an on-premise database and S3-compatible storage are not readily supported in your environment. In the event of an eventual move to the Active/Active deployment mode, supporting these external services with the addition of Redis services will be required.

We do not recommend using Mounted Disk deployments on public cloud since External Services mode provides better scalability and Mounted Disk mode does not support Active/Active deployments. For Twelve Factor compliance, use the same operational mode for both production and non-production.

Ensure to quiesce the database on Mounted Disk instances — your backup software may or may not do this automatically.

Mounted Disk mode uses a separate mountable volume (data disk) that can come in many flavors. To ensure data integrity, ensure the mountable volume has the following capabilities (in this order):

Continuous volume replication
Use of the same volume mounted on the original instance
A backup restored to another volume

Click on the tab(s) below relevant to your cloud deployment for additional cloud-specific recommendations.

AWS has recommended backup/snapshot options to back up a Mounted Disk deployment.

For on-premise Mounted Disk mode deployments, refer to the Application Server VMware tab above for recommendations for server backup.

In addition:

If your primary Mounted Disk node and a backup machine have their isolated data disks and maintain a mirroring strategy such as lsyncd, corruption on the primary volume will be replicated to the disk attached to the passive node. Maintain regular additional snapshots/backups of the data disk.

Make copies available in multiple datacenters to confer DC-redundancy.

Note

Do not start more than one Mounted Disk mode instance against the same database simultaneously. If you are using a load balancer and a warm server with the data disk visible in the other datacenter, ensure Terraform Enterprise is not running on it while the primary is.

Next Steps

In this guide, you learned best practices for preparing and backing up Terraform Enterprise's main components.

Create modules

Recover Terraform Enterprise

This tutorial also appears in:

7 tutorials

Reliability
Architect workloads to perform within expectations and meet resiliency and recovery targets.