Nvidia GPU Device Plugin
Name: nomad-device-nvidia
The Nvidia device plugin is used to expose Nvidia GPUs to Nomad.
Note: The Nvidia device plugin setup has changed in Nomad 1.2. You must
add a plugin
block to your clients configuration and install the
external Nvidia device plugin into their
plugin_dir
prior to upgrading. See plugin options below for an example.
Note the job specification remains the same.
Fingerprinted Attributes
Attribute | Unit |
---|---|
memory | MiB |
power | W (Watt) |
bar1 | MiB |
driver_version | string |
cores_clock | MHz |
memory_clock | MHz |
pci_bandwidth | MB/s |
display_state | string |
persistence_mode | string |
Runtime Environment
The nvidia-gpu
device plugin exposes the following environment variables:
NVIDIA_VISIBLE_DEVICES
- List of Nvidia GPU IDs available to the task.
Additional Task Configurations
Additional environment variables can be set by the task to influence the runtime environment. See Nvidia's documentation.
Installation Requirements
In order to use the nomad-device-nvidia
device driver the following prerequisites must be met:
- GNU/Linux x86_64 with kernel version > 3.10
- NVIDIA GPU with Architecture > Fermi (2.1)
- NVIDIA drivers >= 340.29 with binary
nvidia-smi
- Docker v19.03+
Container Toolkit Installation
Follow the NVIDIA Container Toolkit installation instructions from Nvidia to prepare a machine to use docker containers with Nvidia GPUs. You should be able to run this simple command to test your environment and produce meaningful output.
Plugin Configuration
The nomad-device-nvidia
device plugin supports the following configuration in the agent
config:
enabled
(bool: true)
- Control whether the plugin should be enabled and running.ignored_gpu_ids
(array<string>: [])
- Specifies the set of GPU UUIDs that should be ignored when fingerprinting.fingerprint_period
(string: "1m")
- The period in which to fingerprint for device changes.
Limitations
The Nvidia integration only works with drivers who natively integrate with Nvidia's container runtime library.
Nomad has tested support with the docker
driver. Support for
lxc
should be possible by installing the Nvidia hook
but is not tested or documented by Nomad.
Source Code & Compiled Binaries
The source code for this plugin can be found at hashicorp/nomad-device-nvidia. You can also find pre-built binaries on the releases page.
Examples
Inspect a node with a GPU:
Display detailed statistics on a node with a GPU:
Run the following example job to see that the GPU was mounted in the container: