Nomad Autoscaler Telemetry

The Nomad Autoscaler agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. To configure the telemetry output please see the agent configuration.

This data can be accessed via the /v1/metrics HTTP endpoint, via sending a signal to the Nomad Autoscaler process or via a number of integrations.

To view this data via sending a signal to the Nomad Autoscaler process: on Unix, this is USR1 while on Windows it is BREAK. Once Nomad Autoscaler receives the signal, it will dump the current telemetry information to the agent's stderr.

This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.

Below is sample output of a telemetry dump:

[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 219856.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4316568.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 36243.000
[2020-08-25 10:01:20 +0100 BST][S] 'nomad-autoscaler.runtime.gc_pause_ns': Count: 5 Min: 38083.000 Mean: 69764.400 Max: 122291.000 Stddev: 31487.808 Sum: 348822.000 LastUpdated: 2020-08-25 10:01:26.574809 +0100 BST m=+1.241576679
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4370504.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 220853.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 37240.000

Runtime Metrics

The runtime metrics help understand the Nomad Autoscaler agent's memory and load pressure performance.

Metric	Description	Type
`nomad-autoscaler.runtime.num_goroutines`	Number of running goroutines	Gauge
`nomad-autoscaler.runtime.alloc_bytes`	The number of allocated heap bytes	Gauge
`nomad-autoscaler.runtime.sys_bytes`	The total bytes of memory obtained from the OS	Gauge
`nomad-autoscaler.runtime.malloc_count`	Cumulative count of heap objects allocated	Gauge
`nomad-autoscaler.runtime.free_count`	Cumulative count of heap objects freed	Gauge
`nomad-autoscaler.runtime.heap_objects`	Number of allocated heap objects	Gauge
`nomad-autoscaler.runtime.total_gc_pause_ns`	Cumulative nanoseconds in GC stop-the-world pauses	Gauge
`nomad-autoscaler.runtime.total_gc_runs`	Number of completed GC cycles	Gauge
`nomad-autoscaler.runtime.gc_pause_ns`	Number of nanoseconds to complete the last GC cycle	Timer

Policy Metrics

Policy metrics provide insights into the performance of the Nomad Autoscaler's policy handling.

Metric	Description	Type	Labels
`nomad-autoscaler.policy.total_num`	The number of policies currently held within the autoscaler	Gauge
`nomad-autoscaler.policy.source.error_count`	Tracks the number of errors generated by the policy sources	Counter	policy_source

Scaling Metrics

Scaling metrics provide insight into the performance of scaling actions as well as overall success and failure counters.

Metric	Description	Type	Labels
`nomad-autoscaler.scale.evaluate_ms`	The time taken to evaluate the checks within a single policy	Timer	policy_id, target_name
`nomad-autoscaler.scale.invoke_ms`	The time taken to invoke scaling based on the scaling evaluations	Timer	policy_id, target_name
`nomad-autoscaler.scale.invoke.success_count`	Tracks the number of successful scaling actions triggered	Counter	policy_id, target_name
`nomad-autoscaler.scale.invoke.error_count`	Tracks the number of unsuccessful scaling actions triggered	Counter	policy_id, target_name

Plugin Metrics

Plugin metrics provide insight into the performance of Nomad Autoscaler plugins and help identify potential bottle necks or latency issues.

Metric	Description	Type	Labels
`nomad-autoscaler.plugin.manager.access_ms`	The time taken to dispense a plugin	Timer
`nomad-autoscaler.target.status.invoke_ms`	The time taken to perform the target plugin status call	Timer	policy_id, plugin_name
`nomad-autoscaler.target.scale.invoke_ms`	The time taken to perform the target plugin scale call	Timer	policy_id, plugin_name
`nomad-autoscaler.apm.query.invoke_ms`	The time taken to perform the APM plugin query call	Timer	policy_id, plugin_name
`nomad-autoscaler.strategy.run.invoke_ms`	The time taken to perform the strategy plugin run call	Timer	policy_id, plugin_name