Considerations for Stateful Workloads
By default, Nomad's allocation storage is ephemeral. Nomad can discard it during new deployments, when rescheduling jobs, or if it loses a client. This is undesirable when running persistent workloads such as databases.
This document explores the options for persistent storage of workloads running in Nomad. The information provided is for practitioners familiar with Nomad and with a foundational understanding of storage basics.
Considerations
Consider access patterns, performance, reliability and availability needs, and maintenance to choose the most appropriate storage strategy.
Local storage is performant and available. If it has enough capacity it does not need much maintenance. But it is not redundant; if a single node, disk, or group of disks fails, data loss and service interruption will occur.
A geographically distributed networked storage with multiple redundancies, including disks, controllers, and network paths, provides higher availability and resilience, and can tolerate multiple hardware failures before risking data loss. But the performance and reliability of networked storage depends on the network. It can have higher latency and lower throughput than local storage, and may require more maintenance.
Consider whether Nomad is running in the public cloud or on-premises, and what storage options are available in that environment. From there, the most optimal choice will depend your organizational and application needs.
Public cloud
Public cloud providers offer different storage services with various tradeoffs. Usually they're comprised of local disks, network attached block devices, and networked shared storage.
AWS
AWS service | Availability | Persistence | Performance | Suitability |
---|---|---|---|---|
Instance Storage | Locally on some instance types | Limited, not persistent across instance stops/terminations or hardware failures | High throughput and low latency | Temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content |
Elastic Block Store | Zonal block devices attached to one or more instances | Persistent, with an independent lifecycle | Configurable, but higher latency than Instance Store | General purpose persistent storage |
Elastic File System | Regional/Multi-regional file storage that can be available to multiple instances | Persistent, with an independent lifecycle | Configurable, but with less throughput and higher latency than Instance Store or EBS | File storage that needs to be available to multiple instances in multiple zones (even only as a failover) |
Azure
Azure service | Availability | Persistence | Performance | Suitability |
---|---|---|---|---|
Ephemeral OS disks | Locally on some instance types | Limited, not persistent across instance stops/terminations or hardware failures | High throughput and low latency | Temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content |
Managed Disks | Zonal or regional block devices attached to one or more VMs | Persistent, with an independent lifecycle | Configurable | General purpose persistent storage |
Azure Files | Zonal/Regional/Multi-regional file storage that can be available to multiple VMs | Persistent, with an independent lifecycle | Configurable | File storage that needs to be available to multiple VMs in multiple zones (even only as a failover) |
GCP
GCP service | Availability | Persistence | Performance | Suitability |
---|---|---|---|---|
Local SSD | Locally on some instance types | Limited, not persistent across instance stops/terminations or hardware failures | High throughput and low latency | Temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content |
Persistent Disk | Zonal or regional block devices attached to one or more instances | Persistent, with an independent lifecycle | Configurable | General purpose persistent storage |
Filestore | Zonal/Regional file storage that can be available to multiple instances | Persistent, with an independent lifecycle | Configurable | File storage that needs to be available to multiple VMs in multiple zones (even only as a failover) |
Private cloud or on-premises
When running workloads on-premises in a self-managed private cloud, SAN/NAS systems or Software Defined Storage like Portworx or Ceph usually provide non-local storage. Compute instances can access the storage using a block protocol such as iSCSI, FC, NVMe-oF, or a file protocol such as NFS, CIFS, or both. Dedicated storage teams manage these systems in most organizations.
Consuming persistent storage from Nomad
Since environments differ depending on application requirements, consider performance, reliability, availability, and maintenance when choosing the most appropriate storage driver.
CSI
Container Storage Interface is a vendor-neutral specification that allows storage providers to develop plugins that orchestrators such as Nomad can use. Some CSI plugins can dynamically provision and manage volume lifecycles, including snapshots, deletion, and dynamic resizing. The exact feature set each plugin supports will depend on the plugin and the underlying storage platform.
Find a list of plugins and their feature set in the Kubernetes CSI Developer Documentation.
While Nomad follows the CSI specification, some plugins may implement orchestrator-specific logic that makes them incompatible with Nomad. You should validate that your chosen plugin works with Nomad before using it. Refer to the plugin documentation from the storage provider for more information.
There are three CSI plugin subtypes:
- Controller: Communicates with the storage provider to manage the volume lifecycle.
- Node: Runs on all Nomad clients and handles all local operations (for
example, mounting/unmounting volumes in allocations). The node must be
privileged
to perform those operations. - Monolithic: Combines both the above roles.
All types can and should be run as Nomad jobs - system
jobs for Node and
Monolithic, service
for Controllers. More information can be found on the CSI
concepts documentation page.
CSI plugins are useful when storage requirements are quickly and constantly
evolving. For example, an environment that sees new workloads with persistent
storage added or removed frequently is well suited for CSI. However, they
present some challenges in terms of maintenance - most notably, they need to run
continuously, be configured (including authentication and connectivity to the
storage platform), and updated to keep track with new features and bug fixes and
keep compatibility with the underlying storage platform. They also introduce a
couple of moving parts, can be difficult to troubleshoot, and have a complex
security profile (due to needing to run as privileged
containers in order to
be able to mount volumes).
The Stateful Workloads with CSI tutorial and the Nomad CSI demo repository offer guidance and examples on how to use CSI plugins with Nomad and include job files for running the plugins and configuration files for creating and consuming volumes.
Host volumes
Host volumes mount paths from the host (the Nomad client) into allocations. Nomad is aware of host volume availability and makes use of it for job scheduling. However, Nomad does not know about the volume's underlying characteristics, such as if it is a standard folder on a local ext4 filesystem, backed by a distributed networked storage such as GlusterFS, or a mounted NFS/CIFS volume from a NAS or a public cloud service such as AWS EFS. Therefore you can use host volumes for both local somewhat persistent storage and for highly persistent networked storage.
Because you need to declare host volumes in the Nomad agent's configuration file, you must restart the Nomad client to reconfigure them. This makes host volumes impractical if you frequently change your storage configuration. Furthermore, it might require coordination between different personas to configure and consume host volumes. For example, a Nomad Administrator must modify Nomad's configuration file to add/update/remove host volumes to make them available for consumption by Nomad Operators. Or, with networked host volumes, a Storage Administrator will need to provision the volumes and make them available to the Nomad clients. A System Administrator will then mount them on the Nomad clients.
Host volumes backed by local storage help persist data that is not critical, for example an on-disk cache that can be rebuilt if needed. When backed by networked storage such as NFS/CIFS-mounted volumes or distributed storage via GlusterFS/Ceph, host volumes provide a quick option to consume highly available and reliable storage.
Refer to the Stateful workloads with Nomad host volumes tutorial to learn more about using host volumes with Nomad.
NFS caveats
A few caveats with NFS-backed host volumes include ACLs, reliability, and performance. NFS mount options should be the same on all mounting Nomad clients.
Depending on your NFS version, the UID/GID (user/group IDs) can differ between
the different Nomad clients, leading to issues when an allocation on another
host tries to access the volume. The only way to ensure this isn't an issue is
to use NFS v4 with ID mapping based on Kerberos or to have a reliable
configuration management/image-building process that ensures UID/GIDs
synchronize between hosts. You should use hard mounts to prevent data loss,
optionally with intr
to enable the option to interrupt NFS requests, which
prevents the whole system from locking up in case of NFS server unavailability.
A significant factor in the performance of NFS-backed storage is the wsize
and
rsize
mount options that determine the maximum read/write size of a block.
Smaller sizes mean bigger operations will be split into smaller chunks,
significantly impacting performance. The underlying storage system's vendor
provides the optimal sizes. For example, AWS
EFS
recommends a value of 1048576
bytes of data for both wsize
and rsize
.
To learn more about NFS mount options, visit Red Hat's NFS documentation.
Ephemeral disks
Nomad ephemeral disks, describe the best-effort persistence of a Nomad allocation's folder. They support data migrations between hosts (which require network connectivity between the Nomad client nodes) and are size-aware for scheduling. Since persistence is the best effort, however, you will lose data if the client or underlying storage fails. Ephemeral disks are perfect for data that you can rebuild if needed, such as an in-progress cache or a local copy of data.
Storage comparison
With the information laid out in this document, use the following table to choose the storage options that best addresses your Nomad storage requirements.
Storage option | Advantages | Disadvantages | Ideal for |
---|---|---|---|
CSI volumes |
|
|
|
Host volumes backed by local storage |
|
|
|
Host volumes backed by networked or clustered storage |
|
|
|
Ephemeral disks |
|
|
|
Additional resources
To learn more about Nomad and the topics covered in this document, visit the following resources:
Allocations
- Monitoring your allocations and their storage with Nomad's event stream
- Best practices for cluster setup