Nomad Autoscaler Telemetry
The Nomad Autoscaler agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. To configure the telemetry output please see the agent configuration.
This data can be accessed via the /v1/metrics
HTTP endpoint, via sending a
signal to the Nomad Autoscaler process or via a number of integrations.
To view this data via sending a signal to the Nomad Autoscaler process: on Unix,
this is USR1
while on Windows it is BREAK
. Once Nomad Autoscaler receives
the signal, it will dump the current telemetry information to the agent's stderr
.
This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.
Below is sample output of a telemetry dump:
Runtime Metrics
The runtime metrics help understand the Nomad Autoscaler agent's memory and load pressure performance.
Metric | Description | Type |
---|---|---|
nomad-autoscaler.runtime.num_goroutines | Number of running goroutines | Gauge |
nomad-autoscaler.runtime.alloc_bytes | The number of allocated heap bytes | Gauge |
nomad-autoscaler.runtime.sys_bytes | The total bytes of memory obtained from the OS | Gauge |
nomad-autoscaler.runtime.malloc_count | Cumulative count of heap objects allocated | Gauge |
nomad-autoscaler.runtime.free_count | Cumulative count of heap objects freed | Gauge |
nomad-autoscaler.runtime.heap_objects | Number of allocated heap objects | Gauge |
nomad-autoscaler.runtime.total_gc_pause_ns | Cumulative nanoseconds in GC stop-the-world pauses | Gauge |
nomad-autoscaler.runtime.total_gc_runs | Number of completed GC cycles | Gauge |
nomad-autoscaler.runtime.gc_pause_ns | Number of nanoseconds to complete the last GC cycle | Timer |
Policy Metrics
Policy metrics provide insights into the performance of the Nomad Autoscaler's policy handling.
Metric | Description | Type | Labels |
---|---|---|---|
nomad-autoscaler.policy.total_num | The number of policies currently held within the autoscaler | Gauge | |
nomad-autoscaler.policy.source.error_count | Tracks the number of errors generated by the policy sources | Counter | policy_source |
Scaling Metrics
Scaling metrics provide insight into the performance of scaling actions as well as overall success and failure counters.
Metric | Description | Type | Labels |
---|---|---|---|
nomad-autoscaler.scale.evaluate_ms | The time taken to evaluate the checks within a single policy | Timer | policy_id, target_name |
nomad-autoscaler.scale.invoke_ms | The time taken to invoke scaling based on the scaling evaluations | Timer | policy_id, target_name |
nomad-autoscaler.scale.invoke.success_count | Tracks the number of successful scaling actions triggered | Counter | policy_id, target_name |
nomad-autoscaler.scale.invoke.error_count | Tracks the number of unsuccessful scaling actions triggered | Counter | policy_id, target_name |
Plugin Metrics
Plugin metrics provide insight into the performance of Nomad Autoscaler plugins and help identify potential bottle necks or latency issues.
Metric | Description | Type | Labels |
---|---|---|---|
nomad-autoscaler.plugin.manager.access_ms | The time taken to dispense a plugin | Timer | |
nomad-autoscaler.target.status.invoke_ms | The time taken to perform the target plugin status call | Timer | policy_id, plugin_name |
nomad-autoscaler.target.scale.invoke_ms | The time taken to perform the target plugin scale call | Timer | policy_id, plugin_name |
nomad-autoscaler.apm.query.invoke_ms | The time taken to perform the APM plugin query call | Timer | policy_id, plugin_name |
nomad-autoscaler.strategy.run.invoke_ms | The time taken to perform the strategy plugin run call | Timer | policy_id, plugin_name |