Collect resource utilization metrics
Understanding the resource utilization of an application is important, and Nomad
supports reporting detailed statistics in many of its drivers. The main
interface for outputting resource utilization is the alloc status
command with
the -stats
flag.
This section will use the job named "docs" from the previous sections, but these operations and command largely apply to all jobs in Nomad.
As a reminder, here is the output of the run command from the previous example:
To fetch the detailed usage statistics, issue the following command. Your
allocation id will be different; replace 04d9627d
with the allocation id from
your running "docs" job:
The output indicates that the job is running near the limit of configured CPU but has plenty of memory headroom. You can use this information to alter the job's resources to better reflect its actual needs:
Adjusting resources is very important for a variety of reasons:
- Ensuring your application does not get OOM killed if it hits its memory limit.
- Ensuring the application performs well by ensuring it has some CPU allowance.
- Optimizing cluster density by reserving what you need and not over-allocating.
While single point in time resource usage measurements are useful, it is often more useful to graph resource usage over time to better understand and estimate resource usage. Nomad supports outputting resource data to statsite and statsd and is the recommended way of monitoring resources. For more information about outputting telemetry, consult the Telemetry Guide.
For more advanced use cases, the resource usage data is also accessible via the
client's HTTP API. Learn more about it in the allocation
API documentation.