Scale your Nomad cluster with horizontal cluster autoscaling

40min
|
Nomad
Terraform
Packer

As enterprises have accelerated their cloud adoption in the past 2-3 years, horizontal autoscaling has become a critical must-have capability for orchestrators. The dynamic nature of cloud environments allow for compute resources to be easily commissioned and be billed on a usage basis.

Horizontal autoscaling is a feature that enables enterprises to:

Scale up infrastructure on an on-demand basis that is aligned with business SLAs.
Scale down infrastructure to minimize costs based on real demand.
Handle application load spikes or dips in real-time.
Handle cluster-wide excess capacity or shortages in real-time.
Reduce operator overhead and remove hard dependencies on manual intervention.

This tutorial provides a basic demo for running full horizontal application and cluster autoscaling using the Nomad Autoscaler. During this tutorial you will:

Deploy sample infrastructure running a demonstration web application.
Review autoscaler policies to see their behaviors and thresholds.
While monitoring the included dashboard:
1. Generate traffic and observe the application scale up.
2. Generate additional traffic and observe the cluster scale out.
3. Finally, stop the traffic and observe the application scale down and the cluster scale in.

Note

The infrastructure built as part of the demo has billable costs and is not suitable for production use. Please consult the reference architecture for production configuration.

Requirements

In order to build and run the demo, you need the following applications with the listed version or greater locally.

HashiCorp Nomad 1.0.0+
HashiCorp Packer 1.7.0+
HashiCorp Terraform 0.14.0+
rakyll/hey latest

If you are running this demo in a Windows environment it is recommended to use the Windows Subsystem for Linux.

Cloud specific dependencies

There are not specific dependencies for Amazon Web Services.

Fetch the Nomad Autoscaler demos

Download the latest code for the Autoscaler demos from the GitHub repository. You can use git to clone the repository or download the ZIP archive.

Clone the hashicorp/nomad-autoscaler-demos repository.

$ git clone https://github.com/hashicorp/nomad-autoscaler-demos

$ cd nomad-autoscaler-demos/cloud

Check out the learn tag. Using this tag ensures that the instructions in this guide match your local copy of the code.

$ git checkout learn

$ wget https://github.com/hashicorp/nomad-autoscaler-demos/archive/learn.zip

Unarchive the downloaded release.

$ unzip learn.zip

The unzipping process creates a directory named nomad-autoscaler-demos-learn. Change into it and into the cloud folder.

$ cd nomad-autoscaler-demos-learn/cloud

Change into the cloud specific demonstration directory

The AWS-specific demonstration code is located in the aws directory. Change there now.

$ cd aws

Create the demo infrastructure

There are specific steps to build the infrastructure depending on which provider you wish to use. Please navigate to the appropriate section below.

Configure AWS credentials

Configure AWS credentials for your environment so that Terraform can authenticate with AWS and create resources.

To do this with IAM user authentication, set your AWS access key ID as an environment variable.

$ export AWS_ACCESS_KEY_ID="<YOUR_AWS_ACCESS_KEY_ID>"

Now set your secret key.

$ export AWS_SECRET_ACCESS_KEY="<YOUR_AWS_SECRET_ACCESS_KEY>"

Tip

If you don't have access to IAM user credentials, use another authentication method described in the AWS provider documentation.

Build demo environment AMI

First, use Packer to build an AMI that is used for launching the Nomad server and client instances. Replace the placeholders where necessary. Packer will tag the AMI with the created values, so you can identify your new resources in the targeted environment. The region flag can be omitted if you are using the us-east-1 region.

$ cd packer

$ packer build \
    -var 'created_email=<your_email_address>' \
    -var 'created_name=<your_name>' \
    -var 'region=<your_desired_region>' \
    aws-packer.pkr.hcl

Now, navigate to the Terraform AWS environment you will be using to build the infrastructure components.

$ cd ../terraform/control

Build Terraform variables file

In order for Terraform to run correctly you'll need to provide the appropriate variables within a file named terraform.tfvars. Create your own variables file by copying the provided terraform.tfvars.sample file.

$ cp terraform.tfvars.sample terraform.tfvars

Update the variables for your environment

region - The region to deploy your infrastructure. This must match the region you deployed your AMI into.
availability_zones - A list of specific availability zones eligible to deploy your infrastructure into.
ami - The AMI ID created by your Packer run. You will receive it in the output from packer build earlier.
key_name - The name of the AWS EC2 Key Pair that you want to associate to the instances.
owner_name - Added to the created infrastructure as a tag.
owner_email - Added to the created infrastructure as a tag.

The most important are ami, region, and key_name

For example, if your Packer run created an AMI ami-03180edfa45c0fce2 in region us-east-1, and your AWS EC2 Key Pair is named user-us-east-1, your variables file would look similar to the following:

region             = "us-east-1"
availability_zones = ["us-east-1a"]
ami                = "ami-0bd21458ecf89f85e"
key_name           = "user-us-east-1"
owner_name         = "alovelace"
owner_email        = "alovelace@example.com"

Run Terraform

While in the terraform/control folder, provision your demo infrastructure by using the Terraform "init, plan, apply" cycle:

$ terraform init

$ terraform plan

$ terraform apply --auto-approve

Once the terraform apply finishes, a number of useful pieces of information should be output to your console. These include URLs to deployed resources as well as a Nomad Autoscaler job.

...
Outputs:

ip_addresses = <<EOT

Server IPs:
 * instance hashistack-server-1 - Public: 3.239.96.90, Private: 172.31.74.142


To connect, add your private key and SSH into any client or server with
`ssh ubuntu@PUBLIC_IP`. You can test the integrity of the cluster by running:

  $ consul members
  $ nomad server members
  $ nomad node status

The Nomad UI can be accessed at http://hashistack-nomad-server-1582576471.us-east-1.elb.amazonaws.com:4646/ui
The Consul UI can be accessed at http://hashistack-nomad-server-1582576471.us-east-1.elb.amazonaws.com:8500/ui
Grafana dashboard can be accessed at http://hashistack-nomad-client-1880216998.us-east-1.elb.amazonaws.com:3000/d/AQphTqmMk/demo?orgId=1&refresh=5s
Traefik can be accessed at http://hashistack-nomad-client-1880216998.us-east-1.elb.amazonaws.com:8081
Prometheus can be accessed at http://hashistack-nomad-client-1880216998.us-east-1.elb.amazonaws.com:9090
Webapp can be accessed at http://hashistack-nomad-client-1880216998.us-east-1.elb.amazonaws.com:80

CLI environment variables:
export NOMAD_CLIENT_DNS=http://hashistack-nomad-client-1880216998.us-east-1.elb.amazonaws.com
export NOMAD_ADDR=http://hashistack-nomad-server-1582576471.us-east-1.elb.amazonaws.com:4646


EOT

Retrieve GCP account information

$ gcloud auth login

Create an application credential file.

$ gcloud auth application-default login

Next, choose the organization and billing account to use for your demo infrastructure. You can find them using the gcloud organizations list and gcloud beta billing accounts list commands. Make a note of the organization's ID value and the billing account's ACCOUNT_ID value.

$ gcloud organizations list
DISPLAY_NAME                  ID  DIRECTORY_CUSTOMER_ID
org                     <ORG_ID>              ZZZZZZZZZ

$ gcloud beta billing accounts list
ACCOUNT_ID            NAME         OPEN  MASTER_ACCOUNT_ID
<ACCOUNT_ID>     Account 1         True  AAAAAA-BBBBBB-CCCCCC

Build Terraform variables file

For Terraform to run correctly, you'll need to provide the values retrieved in the previous step into appropriate variables within a file named terraform.tfvars.

Navigate to the Terraform control folder and create your own variables file by copying the provided terraform.tfvars.sample file.

$ cd ./terraform/control

$ cp terraform.tfvars.sample terraform.tfvars

Update the variables for your environment

org_id - The organization ID retrieved from the gcloud organization list command output.
billing_account - The billing account ID retrieved from the gcloud beta billing accounts list command output.

For example, if your organization ID is 123456789 and billing account ID is 824A25-7184CD-9217A0, your variables file would look similar to the following.

org_id          = "123456789"
billing_account = "824A25-7184CD-9217A0"

Run Terraform

While in the terraform/control folder, provision your demo infrastructure by using the Terraform "init, plan, apply" cycle.

$ terraform init

$ terraform plan

$ terraform apply --auto-approve

Note

Terraform may fail with an error message that says Error 403: Compute Engine API has not been used in project .... If this happens, run terraform apply --auto-approve again.

Once the terraform apply finishes, a number of useful pieces of information should be output to your console. These include URLs to deployed resources as well as a Nomad Autoscaler job.

...
Outputs:

stack_detail = <<EOT

You can set the gcloud project setting for CLI use with `gcloud config set project
hashistack-driven-bengal`, otherwise you will need to set the `--project`
flag on each command.

To connect to any instance running within the environment you can use the
`gcloud compute ssh ubuntu@<instance_name>` command within your terminal or use the UI.

You can test the integrity of the cluster by running:

  $ consul members
  $ nomad server members
  $ nomad node status

The Nomad UI can be accessed at http://104.155.144.228:4646/ui
The Consul UI can be accessed at http://104.155.144.228:8500/ui
Grafana dashbaord can be accessed at http://34.72.51.47:3000/d/AQphTqmMk/demo?orgId=1&refresh=5s
Traefik can be accessed at http://34.72.51.47:8081
Prometheus can be accessed at http://34.72.51.47:9090
Webapp can be accessed at http://34.72.51.47:80

CLI environment variables:
export NOMAD_CLIENT_DNS=http://34.72.51.47
export NOMAD_ADDR=http://104.155.144.228:4646


EOT

Configure deployment environment variables

$ az login

[
  {
    "cloudName": "AzureCloud",
    "id": "<SUBSCRIPTION_ID>",
    "isDefault": true,
    "name": "Free Trial",
    "state": "Enabled",
    "tenantId": "<TENANT_ID>",
    "user": {
      "name": "user@example.com",
      "type": "user"
    }
  }

Take a note of the values for <SUBSCRIPTION_ID> and <TENANT_ID> end export them as environment variables:

$ export ARM_SUBSCRIPTION_ID=<SUBSCRIPTION_ID>
$ export ARM_TENANT_ID=<TENANT_ID>

Next, create an application ID and password that will be used to run Terraform.

$ az ad sp create-for-rbac --role="Owner" --scopes="/subscriptions/$ARM_SUBSCRIPTION_ID"

{
  "appId": "<CLIENT_ID>",
  "displayName": "azure-cli-...",
  "name": "http://azure-cli-...",
  "password": "<CLIENT_SECRET>",
  "tenant": "<TENANT_ID>"
}

Export the values for <CLIENT_ID> and <CLIENT_SECRET> as environment variables as well:

$ export ARM_CLIENT_ID=<CLIENT_ID>
$ export ARM_CLIENT_SECRET=<CLIENT_SECRET>

Run Terraform

Navigate to the Terraform control folder and execute the Terraform configuration to deploy the demo infrastructure:

$ cd ./terraform/control

$ terraform init

$ terraform plan

$ terraform apply --auto-approve

Once the terraform apply finishes, a number of useful pieces of information should be output to your console. These include URLs to deployed resources as well as a Nomad Autoscaler job.

...
Outputs:

ip_addresses = <<EOT

Server IPs:
 * instance server-1 - Public: 52.188.111.20, Private: 10.0.2.4


To connect, add your private key and SSH into any client or server with
`ssh -i azure-hashistack.pem -o IdentitiesOnly=yes ubuntu@PUBLIC_IP`.
You can test the integrity of the cluster by running:

  $ consul members
  $ nomad server members
  $ nomad node status

The Nomad UI can be accessed at http://52.249.185.10:4646/ui
The Consul UI can be accessed at http://52.249.185.10:8500/ui
Grafana dashbaord can be accessed at http://52.249.187.190:3000/d/AQphTqmMk/demo?orgId=1&refresh=5s
Traefik can be accessed at http://52.249.187.190:8081
Prometheus can be accessed at http://52.249.187.190:9090
Webapp can be accessed at http://52.249.187.190:80

CLI environment variables:
export NOMAD_CLIENT_DNS=http://52.249.187.190
export NOMAD_ADDR=http://52.249.185.10:4646


EOT

Copy the export commands underneath the CLI environment variables heading and run these in the shell session you will run the rest of the demo from.

Explore the demo environment

This demo includes several applications that have their own web interface. The output of the terraform apply command lists their URLs. Visit some of them and explore. The Terraform process also runs a number of Nomad jobs that provide metrics, dashboards, a demo application, and routing provided by Traefik.

It may take a few seconds for all the applications to start. If any of the URLs doesn't load the first time, wait a little and retry it.

The application contains a pre-configured scaling policy. You can view it by opening the job file or calling the Nomad API. The application scales based on the average number of active connections, and it targets an average of 10 connections per instance of the web application.

$ curl "${NOMAD_ADDR}/v1/scaling/policies?pretty"

Nomad returns the list of scaling policies currently installed in the cluster. In this case, you get just the one policy for the webapp job.

[
  {
    "ID": "6b2d3602-70ae-d1fa-bc6a-81f7757a2863",
    "Enabled": true,
    "Type": "horizontal",
    "Target": {
      "Group": "demo",
      "Namespace": "default",
      "Job": "webapp"
    },
    "CreateIndex": 11,
    "ModifyIndex": 11
  }
]

Run the Nomad Autoscaler job

The Nomad Autoscaler job does not run automatically. This gives you the opportunity to look through the jobfile and understand it better before deploying.

Open up the aws_autoscaler.nomad file in a text editor. The most interesting parts of the aws_autoscaler.nomad file are the template sections. The first defines the agent config where it configures the prometheus, aws-asg and target-value plugins.

      template {
        data = <<EOF
nomad {
  address = "http://{{env "attr.unique.network.ip-address" }}:4646"
}

apm "prometheus" {
  driver = "prometheus"
  config = {
    address = "http://{{ range service "prometheus" }}{{ .Address }}:{{ .Port }}{{ end }}"
  }
}

...

target "aws-asg" {
  driver = "aws-asg"
  config = {
    aws_region = "{{ $x := env "attr.platform.aws.placement.availability-zone" }}{{ $length := len $x |subtract 1 }}{{ slice $x 0 $length}}"
  }
}

strategy "target-value" {
  driver = "target-value"
}
EOF

        destination = "${NOMAD_TASK_DIR}/config.hcl"
      }

The second is where it defines the cluster scaling policy and writes this to a local directory for reading.

      template {
        data = <<EOF
scaling "cluster_policy" {
  enabled = true
  min     = 1
  max     = 2

  policy {
    cooldown            = "2m"
    evaluation_interval = "1m"

    check "cpu_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_cpu{node_class=\"hashistack\"}*100/(nomad_client_unallocated_cpu{node_class=\"hashistack\"}+nomad_client_allocated_cpu{node_class=\"hashistack\"}))/count(nomad_client_allocated_cpu{node_class=\"hashistack\"})"

      strategy "target-value" {
        target = 70
      }
    }

...

    check "mem_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_memory{node_class=\"hashistack\"}*100/(nomad_client_unallocated_memory{node_class=\"hashistack\"}+nomad_client_allocated_memory{node_class=\"hashistack\"}))/count(nomad_client_allocated_memory{node_class=\"hashistack\"})"

      strategy "target-value" {
        target = 70
      }
    }

...

    target "aws-asg" {
      dry-run             = "false"
      aws_asg_name        = "hashistack-nomad_client"
      node_class          = "hashistack"
      node_drain_deadline = "5m"
    }
  }
}
EOF

        destination = "${NOMAD_TASK_DIR}/policies/hashistack.hcl"
      }

Once you have an understanding of the autoscaler job and the policies it contains, deploy it to the cluster using the nomad job run command. If you get an error, verify that the NOMAD_ADDR environment variable has been properly set according to the preceding Terraform output.

$ nomad job run aws_autoscaler.nomad

Open up the gcp_autoscaler.nomad file in a text editor. The most interesting parts of the autoscaler job file are the template sections. The first defines the agent config where it configures the prometheus, gce-mig and target-value plugins.

      template {
        data = <<EOF
nomad {
  address = "http://{{env "attr.unique.network.ip-address" }}:4646"
}

apm "prometheus" {
  driver = "prometheus"
  config = {
    address = "http://{{ range service "prometheus" }}{{ .Address }}:{{ .Port }}{{ end }}"
  }
}

target "gce-mig" {
  driver = "gce-mig"
}

strategy "target-value" {
  driver = "target-value"
}
EOF

        destination = "${NOMAD_TASK_DIR}/config.hcl"
      }

The second is where it defines the cluster scaling policy and writes this to a local directory for reading.

      template {
        data = <<EOF
scaling "cluster_policy" {
  enabled = true
  min     = 1
  max     = 2

  policy {

    cooldown            = "2m"
    evaluation_interval = "1m"

    check "cpu_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_cpu{node_class=\"hashistack\"}*100/(nomad_client_unallocated_cpu{node_class=\"hashistack\"}+nomad_client_allocated_cpu{node_class=\"hashistack\"}))/count(nomad_client_allocated_cpu{node_class=\"hashistack\"})"

      strategy "target-value" {
        target = 70
      }
    }

    check "mem_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_memory{node_class=\"hashistack\"}*100/(nomad_client_unallocated_memory{node_class=\"hashistack\"}+nomad_client_allocated_memory{node_class=\"hashistack\"}))/count(nomad_client_allocated_memory{node_class=\"hashistack\"})"

      strategy "target-value" {
        target = 70
      }
    }

    target "gce-mig" {
      project             = "hashistack-driven-bengal"
      region              = "us-central1"
      mig_name            = "hashistack-nomad-client"
      node_class          = "hashistack"
      node_drain_deadline = "5m"
    }
  }
}
EOF

        destination = "${NOMAD_TASK_DIR}/policies/hashistack.hcl"
      }

$ nomad job run gcp_autoscaler.nomad

Open up the azure_autoscaler.nomad file in a text editor. The most interesting parts of the autoscaler job file are the template sections. The first defines the agent config where it configures the prometheus, azure-vmss and target-value plugins.

template {
  data = <<EOF
nomad {
  address = "http://{{environment "attr.unique.network.ip-address" }}:4646"
}

apm "prometheus" {
  driver = "prometheus"
  config = {
    address = "http://{{ range service "prometheus" }}{{ .Address }}:{{ .Port }}{{ end }}"
  }
}

target "azure-vmss" {
  driver = "azure-vmss"
  config = {
    subscription_id = "${subscription_id}"
  }
}

strategy "target-value" {
  driver = "target-value"
}
EOF

   destination = "${NOMAD_TASK_DIR}/config.hcl"
}

The second is where it defines the cluster scaling policy and writes this to a local directory for reading.

template {
  data = <<EOF
scaling "cluster_policy" {
  enabled = true
  min     = 1
  max     = 2

  policy {

    cooldown            = "2m"
    evaluation_interval = "1m"

    check "cpu_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_cpu{node_class=\"hashistack\"}*100/(nomad_client_unallocated_cpu{node_class=\"hashistack\"}+nomad_client_allocated_cpu{node_class=\"hashistack\"}))/count(nomad_client_allocated_cpu{node_class=\"hashistack\"})"

      strategy "target-value" {
        target = 70
      }
    }

    check "mem_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_memory{node_class=\"hashistack\"}*100/(nomad_client_unallocated_memory{node_class=\"hashistack\"}+nomad_client_allocated_memory{node_class=\"hashistack\"}))/count(nomad_client_allocated_memory{node_class=\"hashistack\"})"

      strategy "target-value" {
        target = 70
      }
    }

    target "azure-vmss" {
      resource_group      = "${resource_group}"
      vm_scale_set        = "clients"
      node_class          = "hashistack"
      node_drain_deadline = "5m"
    }
  }
}
EOF

  destination = "${NOMAD_TASK_DIR}/policies/hashistack.hcl"
}

Once you have an understanding of the job file, submit it to the Nomad cluster ensuring the NOMAD_ADDR environment variable has been exported.

$ nomad job run azure_autoscaler.nomad

If you wish, in another terminal window you can export the NOMAD_ADDR environment variable and then follow the Nomad Autoscaler logs. Use the allocation ID output when you ran nomad job run on the autoscaler job.

$ nomad alloc logs -stderr -f <alloc-id>

Open the scenario's Grafana dashboard

Retrieve the Grafana link from your Terraform output. Open it in a browser. It might take a minute to fully load if you didn't take some time earlier to look around.

Once loaded, you will receive a dashboard similar to this.

Screenshot of Grafana dashboard

Generate application load

In order to generate some initial load, you will use the hey application. This will cause the application to scale up slightly.

Run a load generator

$ hey -z 10m -c 20 -q 40 $NOMAD_CLIENT_DNS:80 &

Viewing the autoscaler logs or the Grafana dashboard should show the application count increase from 1 to 2. Once this scaling has taken place, you can trigger additional load on the app that causes further scaling.

The application count is the graph in the top left.

2021-03-03T21:51:29.288Z [INFO]  policy_eval.worker: scaling target: id=8e8be187-0ead-a0ef-5e41-a0db5f29f434 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target from=1 to=2 reason="scaling up because factor is 1.700000" meta=map[nomad_policy_id:56258c47-1a28-837c-568d-80dd9e9b3054]
2021-03-03T21:51:29.301Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=8e8be187-0ead-a0ef-5e41-a0db5f29f434 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target desired_count=2
2021-03-03T21:51:29.301Z [INFO]  policy_eval.worker: policy evaluation complete: id=8e8be187-0ead-a0ef-5e41-a0db5f29f434 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target

Run a second load generator

$ hey -z 10m -c 20 -q 40 $NOMAD_CLIENT_DNS:80 &

This again causes the application to scale, this time from 2 to 4, which in-turn reduces the available resources on your cluster. The reduction is such that the Autoscaler will decide a cluster scaling action is required and trigger the appropriate action.

When watching the dashboard, you may see the cluster scale to three clients. This is because many cloud providers will add an additional instance to a scale-up request and then terminate the slowest instance to start.

Once this instance is reaped, you will see the expected two clients.

2021-03-03T21:52:59.287Z [INFO]  policy_eval.worker: scaling target: id=203a21b7-302f-ba72-bc1a-3d9ac6748a14 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target from=2 to=4 reason="scaling up because factor is 2.000000" meta=map[nomad_policy_id:56258c47-1a28-837c-568d-80dd9e9b3054]
2021-03-03T21:52:59.323Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=203a21b7-302f-ba72-bc1a-3d9ac6748a14 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target desired_count=4
2021-03-03T21:52:59.323Z [INFO]  policy_eval.worker: policy evaluation complete: id=203a21b7-302f-ba72-bc1a-3d9ac6748a14 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target

The additional allocations require more memory than is available on the current client. In response, the autoscaler starts another client to run some of the allocations.

2021-03-03T21:53:29.659Z [INFO]  policy_eval.worker: scaling target: id=6cd5c72c-664d-0111-0e5b-8446638059ee policy_id=484d733c-b28c-4a6e-364c-b36f10c6bb0b queue=cluster target=aws-asg from=1 to=2 reason="scaling up because factor is 1.360405" meta=map[nomad_policy_id:484d733c-b28c-4a6e-364c-b36f10c6bb0b]
2021-03-03T21:53:50.319Z [INFO]  internal_plugin.aws-asg: successfully performed and verified scaling out: action=scale_out asg_name=hashistack-nomad_client desired_count=2
2021-03-03T21:53:50.319Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=6cd5c72c-664d-0111-0e5b-8446638059ee policy_id=484d733c-b28c-4a6e-364c-b36f10c6bb0b queue=cluster target=aws-asg desired_count=2
2021-03-03T21:53:50.320Z [INFO]  policy_eval.worker: policy evaluation complete: id=6cd5c72c-664d-0111-0e5b-8446638059ee policy_id=484d733c-b28c-4a6e-364c-b36f10c6bb0b queue=cluster target=aws-asg

Remove load on the application

Now, simulate a reduction in load on the application by stopping the running hey processes using the pkill command.

$ pkill hey
[2]  + 36851 terminated  hey -z 10m -c 20 -q 40 $NOMAD_CLIENT_DNS:80
[1]  + 36827 terminated  hey -z 10m -c 20 -q 40 $NOMAD_CLIENT_DNS:80

The reduction in load causes the Autoscaler to firstly scale in the task group from 4 to 1. Once the task group has scaled in a sufficient amount, the Autoscaler scales in the cluster from 2 to 1. It performs this work by selecting a node to remove, draining the node of all work, and then terminating it within the provider.

Once you stop the load generating processes, the autoscaler reduces the application's allocation count from 4 to 1

2021-03-03T21:57:29.287Z [INFO]  policy_eval.worker: scaling target: id=5f7f0f2b-fe20-9fb6-5132-7554cd40c875 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target from=4 to=1 reason="capped count from 0 to 1 to stay within limits" meta="map[nomad_autoscaler.count.capped:true nomad_autoscaler.count.original:0 nomad_autoscaler.reason_history:[scaling down because factor is 0.000000] nomad_policy_id:56258c47-1a28-837c-568d-80dd9e9b3054]"
2021-03-03T21:57:29.308Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=5f7f0f2b-fe20-9fb6-5132-7554cd40c875 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target desired_count=1
2021-03-03T21:57:29.308Z [INFO]  policy_eval.worker: policy evaluation complete: id=5f7f0f2b-fe20-9fb6-5132-7554cd40c875 policy_id=56258c47-1a28-837c-568d-80dd9e9b3054 queue=horizontal target=nomad-target

Once the surplus allocations are stopped, the autoscaler then scales the cluster back to 1 client in response to excess available memory.

2021-03-03T21:58:29.585Z [INFO]  policy_eval.worker: scaling target: id=9fb3b478-1a93-827b-4d5e-b9fa4924668b policy_id=484d733c-b28c-4a6e-364c-b36f10c6bb0b queue=cluster target=aws-asg from=2 to=1 reason="scaling down because factor is 0.178571" meta=map[nomad_policy_id:484d733c-b28c-4a6e-364c-b36f10c6bb0b]
2021-03-03T21:58:29.653Z [INFO]  internal_plugin.aws-asg: triggering drain on node: node_id=616d429e-74f7-c3a7-2052-445980e28987 deadline=5m0s
2021-03-03T21:58:29.668Z [INFO]  internal_plugin.aws-asg: received node drain message: node_id=616d429e-74f7-c3a7-2052-445980e28987 msg="Drain complete for node 616d429e-74f7-c3a7-2052-445980e28987"
2021-03-03T21:58:34.987Z [INFO]  internal_plugin.aws-asg: received node drain message: node_id=616d429e-74f7-c3a7-2052-445980e28987 msg="All allocations on node "616d429e-74f7-c3a7-2052-445980e28987" have stopped"
2021-03-03T21:58:34.987Z [INFO]  internal_plugin.aws-asg: node drain complete: node_id=616d429e-74f7-c3a7-2052-445980e28987
2021-03-03T21:58:34.987Z [INFO]  internal_plugin.aws-asg: pre scale-in tasks now complete
2021-03-03T21:58:45.522Z [INFO]  internal_plugin.aws-asg: successfully detached instances from AutoScaling Group: action=scale_in asg_name=hashistack-nomad_client instances=[i-016ca3ede8b32bd49]
2021-03-03T21:59:56.219Z [INFO]  internal_plugin.aws-asg: successfully terminated EC2 instances: action=scale_in asg_name=hashistack-nomad_client instances=[i-016ca3ede8b32bd49]
2021-03-03T21:59:56.431Z [INFO]  policy_eval.worker: successfully submitted scaling action to target: id=9fb3b478-1a93-827b-4d5e-b9fa4924668b policy_id=484d733c-b28c-4a6e-364c-b36f10c6bb0b queue=cluster target=aws-asg desired_count=1
2021-03-03T22:00:43.194Z [INFO]  policy_eval.worker: policy evaluation complete: id=9fb3b478-1a93-827b-4d5e-b9fa4924668b policy_id=484d733c-b28c-4a6e-364c-b36f10c6bb0b queue=cluster target=aws-asg

Destroy the demo infrastructure

Once you are done experimenting with the autoscaler, use the terraform destroy command to deprovision the demo infrastructure.

It is important to destroy the created infrastructure as soon as you are finished with the demo to avoid unnecessary charges in your cloud provider account. To do this, issue the terraform destroy command.

$ terraform destroy --auto-approve

Perform cloud-specific cleanup activities

Deregister the AMI that you created with Packer in the beginning of this demo. You can use the AWS console or the AWS CLI if you have it installed.

This set of commands will extract the AMI ID from your variables file, deregister the image, and delete the backing EBS snapshot.

$ export IMAGE=$(awk '/ami/ {print $3}' terraform.tfvars | tr -d "\"")
$ export REGION=$(awk '/region/ {print $3}' terraform.tfvars | tr -d "\"")
$ export SNAP=$(aws ec2 describe-images --image-id $IMAGE --region $REGION --output json --query 'Images[0].BlockDeviceMappings[0].Ebs.SnapshotId' --no-paginate | tr -d "\"")
$ aws ec2 deregister-image --image-id $IMAGE --region $REGION
$ aws ec2 delete-snapshot --snapshot-id $SNAP --region $REGION

Next steps

Now that you have explored horizontal cluster autoscaling with this demonstration, continue learning about the Nomad Autoscaler.

Scale an application horizontally

Scale cluster nodes with the Nomad Autoscaler

This tutorial also appears in:

5 tutorials

Nomad 1.0
Explore Nomad v1.0 features like Nomad Autoscaler and the event-stream.
- Nomad
6 tutorials

Expand Nomad with Ecosystem Add-ins
Explore applications that enhance how you use your Nomad cluster through their use of the Nomad HTTP API or plug-in interface.
- Nomad