Storage

Stateful applications require storage to be attached to its pods.

Volumes

Volume types:

emptyDir - a simple directory. Preserved for the pod existence time. Multiple containers in a pod can share data this way. It can be backed by a dir on a node, or tmpfs on a node
hostPath - worker node’s path. It’s dangerous, because pod may get access to all files on the node.
nfs
azureFile
azureDisk
persistentVolumeClaim
…

Different types have different configuration options. E.g., emptyDir can have “medium” and “sizeLimit” configured.

Volumes can be mounted as read/write, or read-only. In multi-mount situations, depending on the provider, only one, or many pods can write to the same volume.

Persistent Volumes

The direct volume mounting described above is problematic, because it exposes the details about the volume to pod/deployment definition (e.g., I’d have to specify the URL of NFS storage there, or I have to specify that Azure infra is used for storage). This makes the manifests tied to a specific cloud provider.

Additional objects were introduced to abstract the storage info away:

The volume providing is divided between two personas:

cluster admins - defines available PersistentVolumes
cluster user/dev - requests storage for their app using PersistentVolumeClaim.

Storage objects:

Persistent Volume - represents a storage volume; it stores information about it;
Persistent Volume Claim - represents user’s claim on the persistent volume. Its lifecycle is not tied to that of a pod, so the ownership of the persistent volume is decoupled from the pod. To use a persistent volume, user first needs to claim it. When the volume is no longer needed, the user releases it by deleting the claim object.

To use the volume, pod needs to refer to a PersistentVolumeClaim that is bound to a PersistentVolume. The claim might either reference exact name of the PersistentVolume, or just list requirements to allow K8s to bind any fitting PersistentVolume to it.

Multiple pods can use the same volume by referencing the same claim. Depending on the storage type, access will be enabled or not. E.g., some storage providers allow writing from one node only (so multiple pods on that node can write). Other nodes cannot write.

PV that is ready to be claimed has “Available” status.

Reclaim Policies

PV can configure its reclaim policy:

Retain - when PVC is deleted, the PV becomes “Released”, the volume is retained. Admin must manually remove it. If a PV is claimed, and then the PVC is deleted, the PV’s Status becomes “Released”, it cannot be claimed in this state. It can be claimed again if the object gets edited and the .spec.claimRef gets removed (or the whole PV needs to be deleted and then recreated). If PV gets deleted, the data stays intact.
Delete - when PVC is deleted, the PV and the underlying data are deleted as well. It is used with the automatically provisioned PVs.

PV is just a pointer to the data.

Access Modes

Access Modes of a PersistentVolume refer to how many nodes can mount it (not pods). Types:

ReadWriteOnce - only one node can mount it in read/write mode. Others can’t mount it at all.
ReadOnlyMany - many nodes can mount it in read-only mode
ReadWriteMany - many nodes can mount it in read/write mode.

A PersistentVolume can support multiple modes.

Deletion

Deleting a PV will await for a bound PVC to be deleted. Deleting a PVC will await for a bound pod to be deleted.

Auto Provisioning

PersistentVolumes might also be created automatically on-demand as needed, if the automated provisioner is installed (e.g. AKS creates Azure Disk automatically when it’s needed).

Here, the order gets reversed. Instead of creating a PVC for a pre-existing PV, a PVC gets created first, and then a proper PV gets created by a provisioner.

Storage Class

Cloud provider offers some StorageClasses (e.g. Azure offers AzureDisk and AzureFile). A PVC should contain information which StorageClass it expects.

The PVC uses storageClassName to choose Storage Class. If it’s ommited, cloud providers have some default choice. Setting storageClassName to "" disables dynamic provisioning and causes an existing PV to be selected for binding.

Depending on K8s implementation, creating new PVC with some storageClass will instantiate the PV immediately, or only after some pod actually will need some storage (using our PVC). E.g., GKE will create PV immediately, while kind will wait for pod. This is because kind create local storage and it needs to know where a pod will be scheduled (on which node) to create the storage there. The behaviour is controlled with the “volume binding mode” config of a storage class.

We can define our own storageClasses.

← Events

Configuration →