Use hcdiag with Nomad

16min
|
Nomad

HashiCorp Diagnostics — hcdiag — is a troubleshooting data-gathering tool that you can use to collect and archive important data from Consul, Nomad, Vault, and TFE server environments. The information gathered by hcdiag is well-suited for sharing with teams during incident response and troubleshooting.

In this tutorial, you will:

Run a Nomad server in "dev" mode, inside an Ubuntu Docker container
Install hcdiag from the official HashiCorp Ubuntu package repository
Execute basic hcdiag commands against this Nomad service
Explore the contents of files created by the hcdiag tool
Learn about additional hcdiag features and how to use a custom configuration file with hcdiag

The hcdiag information in this tutorial can be used to troubleshoot and report on any Nomad cluster.

Prerequisites

You will need a local install of Docker running on your machine for this tutorial. You can find the instructions for installing Docker here.

Set up the environment

Run an ubuntu Docker container in detached mode with the -d flag. The --rm flag instructs Docker to delete the container once it has been stopped and the -t flag allocates a pseudo-tty which keeps the container running until it is stopped manually.

$ docker run -d --rm -t --name nomad ubuntu:22.04

Open an interactive shell session in the container with the -it flags.

$ docker exec -it nomad /bin/bash

Tip

: Your terminal prompt will now appear differently to show that you are in a shell in the Ubuntu container - for example, it may look something like root@a931b3c8ca00:/#. The rest of the commands in the tutorial are to be run in this Ubuntu container shell.

Update apt-get and install the necessary dependencies.

$ apt-get update && apt-get install -y wget gpg

Create a working directory and change into it.

$ mkdir /tmp/nomad-hcdiag && cd /tmp/nomad-hcdiag

Install and start Nomad

Add the HashiCorp repository:

$ wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor > /usr/share/keyrings/hashicorp-archive-keyring.gpg && echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com jammy main" | tee /etc/apt/sources.list.d/hashicorp.list

Install the nomad package.

$ apt-get update && apt-get install -y nomad

Create an agent configuration file named nomad.hcl and enable ACLs.

$ echo "acl { enabled = true }" > nomad.hcl

Run the nomad agent in dev mode as a background process. This may take a few seconds.

$ sudo nomad agent -dev -config=nomad.hcl > /dev/null 2>&1 &

Bootstrap the ACLs and save the management token SecretID to a file.

$ nomad acl bootstrap | grep -i secret | awk -F '=' '{print $2}' | xargs > management.token

Set the NOMAD_TOKEN variable to use the management token.

$ export NOMAD_TOKEN=$(cat management.token)

Test connectivity to the cluster by running a nomad status command.

$ nomad node status

Install and run the `hcdiag` tool

Install the latest hcdiag release from the HashiCorp repository.

$ apt-get install -y hcdiag

This is a minimal environment, so make sure the SHELL environment variable is set:

$ export SHELL=/bin/sh

Run hcdiag against the Nomad cluster. This may take a few minutes.

$ hcdiag -nomad
2022-08-26T18:56:20.940Z [INFO]  hcdiag: Ensuring destination directory exists: directory=.
2022-08-26T18:56:20.940Z [INFO]  hcdiag: Checking product availability
2022-08-26T18:56:21.051Z [INFO]  hcdiag: Gathering diagnostics
2022-08-26T18:56:21.051Z [INFO]  hcdiag.product: Running operations for: product=host
2022-08-26T18:56:21.052Z [INFO]  hcdiag.product: running operation: product=host runner="uname -v"
2022-08-26T18:56:21.051Z [INFO]  hcdiag.product: Running operations for: product=nomad
2022-08-26T18:56:21.052Z [INFO]  hcdiag.product: running operation: product=nomad runner="nomad version"
2022-08-26T18:56:21.054Z [INFO]  hcdiag.product: running operation: product=host runner=disks
2022-08-26T18:56:21.055Z [INFO]  hcdiag.product: running operation: product=host runner=info
2022-08-26T18:56:21.057Z [INFO]  hcdiag.product: running operation: product=host runner=memory
2022-08-26T18:56:21.057Z [INFO]  hcdiag.product: running operation: product=host runner=process
2022-08-26T18:56:21.058Z [INFO]  hcdiag.product: running operation: product=host runner=network
2022-08-26T18:56:21.059Z [INFO]  hcdiag.product: running operation: product=host runner=/etc/hosts
2022-08-26T18:56:21.062Z [INFO]  hcdiag.product: running operation: product=host runner=iptables
2022-08-26T18:56:21.063Z [WARN]  hcdiag.product: result: runner=iptables status=fail result="map[iptables -L -n -v:]" error="exec error, command=iptables -L -n -v, format=string, error=exec: "iptables": executable file not found in $PATH"
2022-08-26T18:56:21.063Z [INFO]  hcdiag.product: running operation: product=host runner="/proc/ files"
2022-08-26T18:56:21.077Z [INFO]  hcdiag.product: running operation: product=host runner=/etc/fstab
2022-08-26T18:56:21.080Z [INFO]  hcdiag: Product done: product=host statuses="map[fail:1 success:9]"
2022-08-26T18:56:21.111Z [INFO]  hcdiag.product: running operation: product=nomad runner="nomad node status -self -json"
2022-08-26T18:56:21.169Z [INFO]  hcdiag.product: running operation: product=nomad runner="nomad agent-info -json"
2022-08-26T18:56:21.221Z [INFO]  hcdiag.product: running operation: product=nomad runner="nomad operator debug -log-level=TRACE -node-id=all -max-nodes=10 -output=/tmp/nomad-hcdiag/hcdiag-2022-08-26T185620Z3020652466 -duration=2m0s -interval=30s"
2022-08-26T18:58:25.321Z [INFO]  hcdiag.product: running operation: product=nomad runner="GET /v1/agent/members?stale=true"
2022-08-26T18:58:25.323Z [INFO]  hcdiag.product: running operation: product=nomad runner="GET /v1/operator/autopilot/configuration?stale=true"
2022-08-26T18:58:25.325Z [INFO]  hcdiag.product: running operation: product=nomad runner="GET /v1/operator/raft/configuration?stale=true"
2022-08-26T18:58:25.326Z [INFO]  hcdiag.product: running operation: product=nomad runner="log/docker nomad"
2022-08-26T18:58:25.328Z [INFO]  hcdiag.product: result: runner="log/docker nomad" status=skip
  result=
  | /bin/sh: 1: docker: not found
   error="docker not found, container=nomad, error=exec error, command=docker version, error=exit status 127"
2022-08-26T18:58:25.328Z [INFO]  hcdiag.product: running operation: product=nomad runner=journald
2022-08-26T18:58:25.329Z [INFO]  hcdiag.product: result: runner=journald status=skip
  result=
  | /bin/sh: 1: journalctl: not found
   error="journald not found on this system, service=nomad, error=exec error, command=journalctl --version, error=exit status 127"
2022-08-26T18:58:25.330Z [INFO]  hcdiag: Product done: product=nomad statuses="map[skip:2 success:7]"
2022-08-26T18:58:25.330Z [INFO]  hcdiag: Recording manifest
2022-08-26T18:58:25.331Z [INFO]  hcdiag: Created Results.json file: dest=/tmp/nomad-hcdiag/hcdiag-2022-08-26T185620Z3020652466/Results.json
2022-08-26T18:58:25.332Z [INFO]  hcdiag: Created Manifest.json file: dest=/tmp/nomad-hcdiag/hcdiag-2022-08-26T185620Z3020652466/Manifest.json
2022-08-26T18:58:25.366Z [INFO]  hcdiag: Compressed and archived output file: dest=hcdiag-2022-08-26T185620Z.tar.gz
2022-08-26T18:58:25.370Z [INFO]  hcdiag: Writing summary of products and ops to standard output
product  success  fail  unknown  total
host     9        1     0        10
nomad    7        0     2        9

Tip

This is an extremely minimal environment which doesn't provide some of the system services that hcdiag uses to gather information -- seeing a few errors, like in the output above, is normal.

Tip

You can also invoke hcdiag without options to gather all available environment and product information. To learn about all executable options, run hcdiag -h.

Examine the results

hcdiag generates an archive file with the troubleshooting data about the cluster in the current working directory.

Extract the archive.

$ tar xzf hcdiag-*.tar.gz

Tip

: The extracted directory uses a timestamp as part of the filename. This means any references to it used in this tutorial will be different than what you will see on your local machine.

Navigate to the directory of the same name -- in this case it's hcdiag-2022-08-26T185620Z, but yours will be different.

$ cd hcdiag-2022-08-26T185620Z

The directory contains the Manifest.json file, which includes information about the hcdiag run, including start and end time, duration, number of errors encountered, and the configuration options used.

Click here to see the example Manifest.json file

hcdiag-2022-08-26T185620Z/Manifest.json

{
    "started_at": "2022-08-26T18:56:20.940217005Z",
    "ended_at": "2022-08-26T18:58:25.330171011Z",
    "duration": "124.494620806 seconds",
    "num_ops": 19,
    "configuration": {
        "hcl": {},
        "operating_system": "auto",
        "serial": false,
        "dry_run": false,
        "consul_enabled": false,
        "nomad_enabled": true,
        "terraform_ent_enabled": false,
        "vault_enabled": false,
        "since": "2022-08-23T18:56:20.940103372Z",
        "until": "0001-01-01T00:00:00Z",
        "includes": null,
        "destination": ".",
        "debug_duration": 10000000000,
        "debug_interval": 5000000000
    },
    "version": {
        "version": "0.4.0",
        "revision": "54c8d7c",
        "build_date": "Fri Aug 26 16:18:45 UTC 2022"
    },
    "ops": {
        "host": [
            {
                "op": "info",
                "error": "",
                "status": "success"
            },
            {
                "op": "memory",
                "error": "",
                "status": "success"
            },
            {
                "op": "process",
                "error": "",
                "status": "success"
            },
            {
                "op": "network",
                "error": "",
                "status": "success"
            },
            {
                "op": "/proc/ files",
                "error": "",
                "status": "success"
            },
            {
                "op": "/etc/fstab",
                "error": "",
                "status": "success"
            },
            {
                "op": "uname -v",
                "error": "",
                "status": "success"
            },
            {
                "op": "disks",
                "error": "",
                "status": "success"
            },
            {
                "op": "/etc/hosts",
                "error": "",
                "status": "success"
            },
            {
                "op": "iptables",
                "error": "exec error, command=iptables -L -n -v, format=string, error=exec: \"iptables\": executable file not found in $PATH",
                "status": "fail"
            }
        ],
        "nomad": [
            {
                "op": "nomad version",
                "error": "",
                "status": "success"
            },
            {
                "op": "nomad agent-info -json",
                "error": "",
                "status": "success"
            },
            {
                "op": "log/docker nomad",
                "error": "docker not found, container=nomad, error=exec error, command=docker version, error=exit status 127",
                "status": "skip"
            },
            {
                "op": "GET /v1/operator/raft/configuration?stale=true",
                "error": "",
                "status": "success"
            },
            {
                "op": "journald",
                "error": "journald not found on this system, service=nomad, error=exec error, command=journalctl --version, error=exit status 127",
                "status": "skip"
            },
            {
                "op": "nomad node status -self -json",
                "error": "",
                "status": "success"
            },
            {
                "op": "nomad operator debug -log-level=TRACE -node-id=all -max-nodes=10 -output=/tmp/nomad-hcdiag/hcdiag-2022-08-26T185620Z3020652466 -duration=2m0s -interval=30s",
                "error": "",
                "status": "success"
            },
            {
                "op": "GET /v1/agent/members?stale=true",
                "error": "",
                "status": "success"
            },
            {
                "op": "GET /v1/operator/autopilot/configuration?stale=true",
                "error": "",
                "status": "success"
            }
        ]
    },
    "redactions": [
        {
            "ID": "[143 248 116 208 123 195 112 34 115 100 70 180 244 97 215 253]",
            "replace": "REDACTED@REDACTED"
        }
    ]
}

The directory also contains the Results.json file, which includes detailed information about the cluster, the nodes and their configurations, and other details about the environment. The example below has been snipped from the original output.

Click here to see the example Results.json file

hcdiag-2022-08-26T185620Z/Results.json

{
    "host": {
        "/etc/fstab": {
            "result": "# UNCONFIGURED FSTAB FOR BASE SYSTEM\n",
            "error": "",
            "status": "success",
            "params": {
                "os": "linux",
                "sheller": {
                    "command": "cat /etc/fstab",
                    "redactions": [
                        {
                            "ID": "[143 248 116 208 123 195 112 34 115 100 70 180 244 97 215 253]",
                            "replace": "REDACTED@REDACTED"
                        }
                    ],
                    "shell": ""
                }
            }
        },
        "info": {
            "result": {
                "hostname": "355bc526a501",
                "os": "linux",
                "platform": "ubuntu",
                "platformFamily": "debian",
                "platformVersion": "22.04",
                "kernelVersion": "5.10.104-linuxkit",
                "kernelArch": "x86_64",
                "virtualizationSystem": "",
                "virtualizationRole": "guest",
                "hostId": "94ac468a-0000-0000-857f-aaff2ccc4d8f",
                "uptime": 111986,
                "bootTime": 1661428195,
                "procs": 5
            },
            "error": "",
            "status": "success"
        },
        {"note":"content omitted"},
        "host": {
            "hostname": "57ad86696a15",
            "uptime": 688427,
            "bootTime": 1645639813,
            "procs": 5,
            "os": "linux",
            "platform": "ubuntu",
            "platformFamily": "debian",
            "platformVersion": "20.04",
            "kernelVersion": "5.10.47-linuxkit",
            "kernelArch": "x86_64",
            "virtualizationSystem": "docker",
            "virtualizationRole": "guest",
            "hostId": "b02242cb-0000-0000-ada9-5adcb8324189"
        },
        "memory": {
        "result": {
            "total": 8346509312,
            "available": 7297323008,
            "used": 371412992,
            "usedPercent": 4.449920057790022,
            "free": 5547741184,
            "active": 800210944,
            "inactive": 1364885504,
            "wired": 0,
            "laundry": 0,
            "buffers": 302014464,
            "cached": 2125340672,
            "writeBack": 0,
            "dirty": 7589888,
            "writeBackTmp": 0,
            "shared": 364507136,
            "slab": 569602048,
            "sreclaimable": 525963264,
            "sunreclaim": 43638784,
            "pageTables": 5935104,
            "swapCached": 0,
            "commitLimit": 5246992384,
            "committedAS": 4825337856,
            "highTotal": 0,
            "highFree": 0,
            "lowTotal": 0,
            "lowFree": 0,
            "swapTotal": 1073737728,
            "swapFree": 1073737728,
            "mapped": 354729984,
            "vmallocTotal": 35184372087808,
            "vmallocUsed": 12947456,
            "vmallocChunk": 0,
            "hugePagesTotal": 0,
            "hugePagesFree": 0,
            "hugePageSize": 2097152
        },
        "error": "",
        "status": "success"
        },
    {"note":"content omitted"},
    },
    "nomad": {
        "GET /v1/agent/members": {
            "runner": {
                "path": "/v1/agent/members",
                "client": {
                    "product": "nomad",
                    "baseurl": "http://127.0.0.1:4646"
                }
            },
            "result": {
                "Members": [
                    {
                        "Addr": "127.0.0.1",
                        "DelegateCur": 4,
                        "DelegateMax": 5,
                        "DelegateMin": 2,
                        "Name": "57ad86696a15.global",
                        "Port": 4648,
                        "ProtocolCur": 2,
                        "ProtocolMax": 5,
                        "ProtocolMin": 1,
                        "Status": "alive",
                        "Tags": {
                            "bootstrap": "1",
                            "build": "1.2.6",
                            "dc": "dc1",
                            "expect": "1",
                            "id": "bbb092e0-1265-df13-d6c3-8a34ee8a9ca8",
                            "mvn": "1",
                            "port": "4647",
                            "raft_vsn": "2",
                            "region": "global",
                            "role": "nomad",
                            "rpc_addr": "127.0.0.1",
                            "vsn": "1"
                        }
                    }
                ],
                "ServerDC": "dc1",
                "ServerName": "57ad86696a15",
                "ServerRegion": "global"
            },
            "error": ""
        },
        {"note":"content omitted"},
        "nomad node status -json": {
            "runner": {
                "command": "nomad node status -json"
            },
            "result": [
                {
                    "Address": "127.0.0.1",
                    "CreateIndex": 7,
                    "Datacenter": "dc1",
                    "Drain": false,
                    "Drivers": {
                        "docker": {
                            "Attributes": null,
                            "Detected": false,
                            "HealthDescription": "Failed to connect to docker daemon",
                            "Healthy": false,
                            "UpdateTime": "2022-03-03T17:21:33.954742403Z"
                        },
                        "exec": {
                            "Attributes": {
                                "driver.exec": "true"
                            },
                            "Detected": true,
                            "HealthDescription": "Healthy",
                            "Healthy": true,
                            "UpdateTime": "2022-03-03T17:21:33.954886082Z"
                        },
                        "java": {
                            "Attributes": null,
                            "Detected": false,
                            "HealthDescription": "",
                            "Healthy": false,
                            "UpdateTime": "2022-03-03T17:21:33.954630793Z"
                        },
                        "qemu": {
                            "Attributes": null,
                            "Detected": false,
                            "HealthDescription": "",
                            "Healthy": false,
                            "UpdateTime": "2022-03-03T17:21:33.954490615Z"
                        },
                        "raw_exec": {
                            "Attributes": {
                                "driver.raw_exec": "true"
                            },
                            "Detected": true,
                            "HealthDescription": "Healthy",
                            "Healthy": true,
                            "UpdateTime": "2022-03-03T17:21:33.954814412Z"
                        }
                    },
                    "ID": "45889c69-3fcc-a3c1-c6d2-403b09bf436e",
                    "LastDrain": null,
                    "ModifyIndex": 9,
                    "Name": "57ad86696a15",
                    "NodeClass": "",
                    "SchedulingEligibility": "eligible",
                    "Status": "ready",
                    "StatusDescription": "",
                    "Version": "1.2.6"
                }
            ],
            "error": ""
        },
        "nomad operator debug -output=hcdiag-2022-08-26T185620Z -duration=30s": {
            "runner": {
                "command": "nomad operator debug -output=hcdiag-2022-08-26T185620Z -duration=30s"
            },
            "result": "Starting debugger...\n\nNomad CLI Version: Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)\n           Region: \n        Namespace: \n          Servers: (1/1) [57ad86696a15.global]\n          Clients: (1/1) [45889c69-3fcc-a3c1-c6d2-403b09bf436e]\n         Interval: 30s\n         Duration: 30s\n\nCapturing cluster data...\nConsul - Collecting Consul API data from: http://127.0.0.1:8500\nUnable to contact Consul leader, skipping: Get \"http://127.0.0.1:8500/v1/status/leader\": dial tcp 127.0.0.1:8500: connect: connection refused\nVault - Collecting Vault API data from: https://vault.service.consul:8200\n    Capture interval 0000\nCreated debug directory: hcdiag-2022-08-26T185620Z/nomad-debug-2022-03-03-172359Z",
            "error": ""
        },
        "nomad version": {
            "runner": {
                "command": "nomad version"
            },
            "result": "Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)",
            "error": ""
        }
    }
}

Finally, the directory contains a sub-directory named nomad-debug-{TIMESTAMP}, which includes additional information about the cluster, clients, servers, and job-related components.

$ ls -l nomad-debug-2022-03-03-172359Z
total 24
drwxr-xr-x 3 root root 4096 Mar  3 17:41 client
drwxr-xr-x 2 root root 4096 Mar  3 17:41 cluster
-rw-r--r-- 1 root root 3799 Mar  3 17:24 index.html
-rw-r--r-- 1 root root 1504 Mar  3 17:24 index.json
drwxr-xr-x 3 root root 4096 Mar  3 17:41 interval
drwxr-xr-x 3 root root 4096 Mar  3 17:41 server

$ ls -l nomad-debug-2022-03-03-172359Z/cluster
total 44
-rw-r--r-- 1 root root  7052 Mar  3 17:24 agent-self.json
-rw-r--r-- 1 root root  4584 Mar  3 17:24 cli-flags.json
-rw-r--r-- 1 root root 11151 Mar  3 17:24 eventstream.json
-rw-r--r-- 1 root root   471 Mar  3 17:24 members.json
-rw-r--r-- 1 root root   104 Mar  3 17:24 namespaces.json
-rw-r--r-- 1 root root    10 Mar  3 17:24 regions.json
-rw-r--r-- 1 root root   139 Mar  3 17:24 vault-sys-health.json

$ ls -l nomad-debug-2022-03-03-172359Z/interval/0000
total 60
-rw-r--r-- 1 root root     2 Mar  3 17:24 allocations.json
-rw-r--r-- 1 root root     2 Mar  3 17:24 csi-plugins.json
-rw-r--r-- 1 root root     2 Mar  3 17:24 csi-volumes.json
-rw-r--r-- 1 root root     2 Mar  3 17:24 deployments.json
-rw-r--r-- 1 root root     2 Mar  3 17:24 evaluations.json
-rw-r--r-- 1 root root     2 Mar  3 17:24 jobs.json
-rw-r--r-- 1 root root    42 Mar  3 17:24 license.json
-rw-r--r-- 1 root root 16085 Mar  3 17:24 metrics.json
-rw-r--r-- 1 root root  1028 Mar  3 17:24 nodes.json
-rw-r--r-- 1 root root   119 Mar  3 17:24 operator-autopilot-health.json
-rw-r--r-- 1 root root   149 Mar  3 17:24 operator-raft.json
-rw-r--r-- 1 root root   378 Mar  3 17:24 operator-scheduler.json

Configuration file

You can configure hcdiag's behavior with a HashiCorp Configuration Language (HCL) formatted file. Using this file, you can configure behavior by adding your own custom runners, redacting sensitive content using regular expressions, excluding commands, and more.

To run hcdiag with a custom configuration file, just create the file and point hcdiag at it with the -config flag:

$ hcdiag -config /path/to/your/configfile

Tip

This minimal environment doesn't ship with most common command-line text editors,so you'll want to install one with apt-get install nano or apt-get install vim, depending on which one you prefer.

Here is a minimal configuration file. It adds a simple agent-level (global) redaction which instructs hcdiag to replace all sensitive content in the format PASSWORD=sensitive. This is a contrived example; please refer to the official hcdiag Documentation for more detailed information about how redactions work and how to use them.

diag.hcl

agent {
    redact "regex" {
        match = "PASSWORD=\\S*"
        replace = "<PASSWORD REDACTED>"
    }
}

If you create this file as diag.hcl and execute hcdiag with hcdiag -config diag.hcl, any runner output that might capture passwords in this format would show <PASSWORD REDACTED> in place of this sensitive content.

Additional notes

hcdiag can also be run against an existing cluster by setting the appropriate environment variables on the machine running the tool. To do so, set the NOMAD_ADDR environment variable to the address of a server in the cluster and NOMAD_TOKEN to a token's SecretID with proper access if ACLs are enabled. The machine also needs to have the nomad binary available in the environment path.

About ACLs

To complete a full diagnostic successfully with ACLs enabled, hcdiag should to be run with the management token. This is because one of the endpoints it queries is /v1/operator/raft/configuration, which explicitly requires the management token. Without that token, hcdiag will print a warning message in the output that references a 403 Forbidden error and skip the raft configuration endpoint.

[INFO]  hcdiag.product: running operation: product=nomad runner="GET /v1/operator/raft/configuration" result=%!s(<nil>) error="403 Forbidden"

Despite this warning, hcdiag can still be used as long as the token set in NOMAD_TOKEN has read permissions on the /agent, /nodes, /operator, and /plugins endpoints. The results will just be missing diagnostic information from the raft configuration endpoint.

The following policy can be used to grant the necessary permissions to the token.

agent {
    policy = "read"
}

node {
    policy = "read"
}

operator {
    policy = "read"
}

plugin {
    policy = "read"
}

Cleanup

Exit the Ubuntu container to return to your terminal prompt.

$ exit

Stop the Docker container. It will automatically be deleted because of the -rm flag passed to the docker run command used in the beginning of the tutorial.

$ docker stop nomad

Production usage tips

By default, the hcdiag tool includes files for up to 72 hours back from the current time. You can specify the desired time range using the -include-since flag.

If you are concerned about impacting performance of your Nomad servers, you can ensure that runners run serially, instead of concurrently, by invoking hcdiag with the -serial flag.

Deploying hcdiag in production involves a workflow similar to the following:

Place the hcdiag binary on the Nomad system in scope - this could be a Nomad server or a Nomad client.
When running with a configuration file and the -config flag, ensure that the specified configuration file is readable by the user that executes hcdiag.
Ensure that the current directory (or the destination directory you've chosen with the dest flag) is writable by the user that executes hcdiag.
Ensure connectivity to the HashiCorp products that hcdiag needs to connect to during the run. Export any required environment variables for establishing connection or passing authentication tokens as necessary.
Decide on a duration for information gathering, noting that the default is to gather for up to 72 hours back in server log output. Adjust your needs as necessary with the -include-since flag. For example, to include only 24 hours of log output, invoke as:
```
$ hcdiag -nomad -include-since 24h
```
Limit what is gathered with the -includes flag. For example, -includes /var/log/consul-*,/var/log/nomad-* instructs hcdiag to only gather logs matching the specified Consul and Nomad filename patterns.
Use redactions to prevent sensitive information like keys or passwords from reaching hcdiag's output or the generated bundle files.
Use the -dryrun flag to observe what hcdiag will do without anything actually being done for testing configuration and options.

Summary

In this tutorial, you learned about the hcdiag tool, and used it to gather information from a running Nomad server environment. You also learned about some of hcdiag's configuration flags, the configuration file, and production specific tips for using hcdiag.

Next steps

For additional information about the tool, check out the hcdiag GitHub repository.

There are also hcdiag guides for other HashiCorp tools including Vault, Terraform, and Consul.

Multi-region deployments

Monitor job service metrics