Connect to a PostgreSQL cluster deployed to Aurora
This topic describes how to connect Terraform Enterprise to a highly-available PostgreSQL cluster deployed to AWS Aurora.
Warning
Connecting to a database cluster is in beta. These instructions describe an example scenario that we tested and verified for non-production use cases. You should evaluate your requirements and business needs to determine the optimal architecture and configurations for your specific environment.
Overview
To connect Terraform Enterprise to a highly-available PostgreSQL cluster deployed to AWS Aurora, deploy the Aurora cluster and specify the cluster endpoint in the Terraform Enterprise configuration.
It is optional, but you can create and run a test workload against Terraform Enterprise to measure the resilience of your high availability PostgreSQL cluster.
AWS Aurora
AWS Aurora is a managed database service that natively supports high-availability and a writer or cluster endpoint that does not require load balancing. Aurora supports read-only endpoints, but Terraform Enterprise does not support them.
Refer to the following topics in the AWS documentation for additional information about Aurora:
Requirements
During testing, the following deployment configuration resulted in seven successful failover recoveries after 10 iterations. Refer to Measure failover resilience for additional information:
- Release v202409-1
- Operational mode to either
active-active
orexternal
- Set the
TFE_DATABASE_HOST
variable an HAProxy load balancer - Set the
TFE_DATABASE_RECONNECT_ENABLED
totrue
- Terraform Enterprise nodes hosted on Google Kubernetes Engine (GKE)
- Terraform Enterprise deployed to three nodes
Terraform Enterprise does not support RDS proxy.
Deploy an Aurora cluster
Deploy an RDS cluster with Terraform. Refer to rds_cluster
documentation in the Terraform registry for configuration instructions.
The following example configuration provisions a cluster called experiment
and two cluster instances:
Measure failover resilience
You can collect recovery time objective (RTO) data to assess the resilience of your HA system. Refer to the following topics for additional information:
In the example scenario, we executed test workloads against the instance every 15 seconds for 10 iterations. If the workload did not report success within 10 seconds, we consider the instance unhealthy. The instance is also considered non-operational if any run fails. We considered Terraform Enterprise to be fully operational when five consecutive runs finished successfully.
We observed the following outcomes after triggering 10 failovers:
- Seven failed over successfully within approximately one minute.
- Two failed over and returned to partial operation. 30-50 percent of the runs executed after failover continued to fail, but Terraform Enterprise successfully completed some of those runs. Manually restarting the Terraform Enterprise nodes resolved the issues.
- One failover never returned to operation. Manually restarting the Vault process inside the Terraform Enterprise node or fully restarting all nodes resolved the issue.
- Recovery times ranged from a minimum RTO of less than 25 seconds to a maximum of one minute.
- Average RTO was 51 seconds across successful failovers.
Troubleshooting
You may need to manually address issues after a failover to return to functionality. For example, the Vault process may still be connected to a read-only instance if the affected instance can not process runs.
Refer to Unable to write to database after a failover in the Terraform troubleshooting documentation for symptoms and solutions.