Wednesday, November 17, 2021

Kubernetes for Developers #25: PersistentVolume and PersistentVolumeClaim in-detail

In the previous article (Kubernetes for Developers #24: Kubernetes Volume hostPath in-detail), we discussed about hostPath volume for persisting container data in the worker node file system. However, this data is available only to the pods which are scheduled on the same worker node. This is not a feasible solution for multi-node cluster.

This problem can be solved by using external storage volumes like awsElasticBlockStore, azureDisk, GCE PD, nfs etc. However, developer must have knowledge on the network storage infrastructure details to use in the pod definition.

It means, when the developer wants to use awsEBS volume in the Pod, the developer should know the details of EBS ID and file type. If there is a change in the storage details, developer must make changes in all the pod definitions.

Kubernetes solves the above problem by using PersistentVolume and PersistentVolumeClaim. It decouples underlying storage details from the application pod definitions. Developers don’t have to know the underlaying storage infrastructure which is being used. It is more of cluster administrator responsibility.

As per diagram,
  • PersistentVolumes(PV) are cluster-level resources like worker nodes. It not belonging to any namespace.
  • PersistenVolumeClaims(PVC) can be created in a specific namespace only and it can be used by pods within same namespace only.
  • Cluster Administrator sets up cloud storage infrastructure i.e., AWS Elastic Block Storage and GCE Persistent Disk as per the need.
  • Cluster Administrator creates Kubernetes PersistentVolumes (PV) with different size and access modes by referring AWS EBS/GCE PD as per application requirements.
  • Whenever pod requires persistent storage, Kubernetes Developer creates PersistentVolumeClaim (PVC) with minimum size and access mode, and Kubernetes finds an adequate Persistent Volume with same size and access mode and binds volume (PV) to the claim (PVC).
  • Pod refers PersistentVolumeClaim (PVC) as volume whenever it is required.
  • Once PersistentVolume is bound to PVC, it cannot be used by others until it is released (i.e., we must delete PVC to reuse PV by others).
  • Kubernetes Developer don’t have to know the underlaying storage details. They just have to create PersistentVolumeClaim (PVC) whenever pod requires persistent storage.
Access Modes

The following access modes are supported by PersistentVolume(PV)
  • ReadWriteOnce (RWO): Only single worker node can mount the volume for reading and writing at the same time.
  • ReadOnlyMany (ROX): Multiple worker nodes can mount the volume for reading at the same time.
  • ReadWriteMany (RWX): Multiple worker nodes can mount the volume for reading and writing at the same time.

Reclaim Policy

Reclaim Policy tell us what happens to PersistentVolume(PV) when the PersistentVolumeClaim(PVC) is deleted.
  • Delete: It deletes volume contents and makes the volume available to be claimed again as soon as PVC is deleted.
  • Retain: PersistentVolume(PV) contents will be persisted after PVC is deleted and it cannot be re-used until Cluster Administrator reclaim the volume manually.
In general, Cluster Administrator creates multiple PersistantVolumes(PV) by using any one of cloud storages i.e. AWS EBS or GCE PD
// Ex: creating aws EBS from cli
$ aws ec2 create-volume \
  --availability-zone=eu-east-1a
  --size=10 --volume-type=gp2 ebs-data-id

Cluster Administrator creates following PV by using ebs-id
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-vol1
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""
  awsElasticBlockStore:
    volumeID: ebs-data-id
    fsType: ext4


For local testing, lets use hostPath PersistentVolume. Create a directory called “/mydata” and “index.html” file under “mydata” directory.
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-vol1
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ""
  hostPath:
    path: "/mydata"

As per above yaml, Volume configured at “/mydata” host directory with the size of 1Gi and an access mode of “ReadWriteOnce(RWO)

save above yaml content as "pv-vol1.yaml" and run the following kubectl command
// create persistentvolume(pv)
$ kubectl apply -f pv-vol1.yaml
persistentvolume/pv-vol1 created

// display pv
$ kubectl get pv
NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      
pv-vol1   1Gi        RWO            Retain          Available
Here, status showing "Available". It means, PV is not yet bound to a PersistentVolumeClaim (PVC)

Next step is to create persistentvolumeclaim(pvc) to request physical storage for the pod
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-vol-1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi
  storageClassName: ""

save above yaml content as "pvc-vol-1.yaml" and run the following kubectl command
// create pvc
$ kubectl apply -f pvc-vol-1.yaml
persistentvolumeclaim/pvc-vol-1 created

// display pvc
$ kubectl get pvc
NAME        STATUS   VOLUME   CAPACITY   ACCESS MODES  
pvc-vol-1   Bound    pv-vol1   1Gi        RWO        
Here, PersistentVolumeClaim is bound to PersistentVolume i.e. pv-vol1

Next step is to create a pod to use persistentvolumeclaim as a volume
apiVersion: v1
kind: Pod
metadata:
  name: pod-pv-pvc
spec:
  containers:
    - name: nginx
      image: nginx:alpine
      ports:
        - containerPort: 80
          protocol: TCP
      volumeMounts:
        - name: pod-pv-vol
          mountPath: /usr/share/nginx/html
  volumes:
    - name: pod-pv-vol
      persistentVolumeClaim:
        claimName: pvc-vol-1

save above yaml content as "pod-pv-pvc.yaml" and run the following kubectl command
// create pod
$ kubectl apply -f pod-pv-pvc.yaml
pod/pod-pv-pvc created

// display pods
$ kubectl get po
NAME             READY   STATUS      RESTARTS   AGE
pod-pv-pvc        1/1     Running     0          1m

run the following kubectl command to forward a port from local machine to the pod
// syntax
// kubectl port-forward <pod-name> <local-port>:<container-port>
$ kubectl port-forward pod-pv-pvc 8081:80
Forwarding from 127.0.0.1:8081 -> 80
Forwarding from [::1]:8081 -> 80

$ curl http://localhost:8081
text message text Tue Nov  16 12:01:10 UTC 2021

We have successfully configured a Pod to use PersistentVolumeClaim as physical storage. 

run the following kubectl commands to delete the resources
$ kubectl delete pod pod-pv-pvc
$ kubectl delete pvc pvc-vol-1
$ kubectl delete pv pv-vol1


Kubernetes for Developers Journey.
Happy Coding :)

Saturday, November 6, 2021

Kubernetes for Developers #24: Kubernetes Volume hostPath in-detail

In the previous article (Kubernetes for Developers #23: Kubernetes Volume emptyDir in-detail), we discussed about emptyDir volume for storing and sharing data among multiple/single container(s) in the pod. However, emptyDir volume and its contents get deleted automatically when the Pod is deleted from the worker node.

Kubernetes hostPath volume helps us to persist volume contents even after pod deleted from the worker node.

K8 hostPath volume mounts a file or directory from the worker node filesystem into the pod.

A pod running on the same worker node can only mount to the file/directory of that node.
  • It is useful when the container wants to access docker system files from the host (i.e., /var/lib/docker)
  • It is useful when the container needs to access kubeconfig file (or) CA certificates (or) /var/logs from the host
  • It is useful when the container needs to access host /sys files for cAdvisor
  • It is useful when the container wants to check given path exists in the host before running
Kubernetes hostPath volume supports following types while mounting

Type

Description

Directory

A directory must exist in the specified path on the host

DirectoryOrCreate

An empty directory will be created when the specified path does not exist on the host

File

A file must exist in the specified path on the host

FileOrCreate

An empty file will be created when the specified path does not exist on the host

Socket

A UNIX socket must exist in the specified path



apiVersion: v1
kind: Pod
metadata:
  name: pod-vol-hostpath
spec:
  containers:
    - name: alpine
      image: alpine
      command:
        [
          "sh",
          "-c",
          'while true; do echo "random message text `date`" >> html/index.html;sleep 10;done',
        ]
      volumeMounts:
        - name: vol-hostpath
          mountPath: /html
    - name: nginx
      image: nginx:alpine
      ports:
        - containerPort: 80
          protocol: TCP
      volumeMounts:
        - name: vol-hostpath
          mountPath: /usr/share/nginx/html
  volumes:
    - name: vol-hostpath
      hostPath:
        path: /mydoc
        type: DirectoryOrCreate


As per above yaml ,

  1. A multi-container pod gets created with volume type “hostPath” named as “vol-hostpath” and mounted on “/mydoc” directory from the host filesystem
  2. “mydoc” directory gets created automatically when a pod is assigned to a worker-node if not exists on the host filesystem as we specified volume type “DirectoryOrCreate”
  3. First “alpine” container creates random text message for every 10 seconds and appends to /html/index.html file.
  4. First “alpine” container mounted a volume at ‘/html’. So, all the new/modified files under this directory referring to “/mydoc” host filesystem
  5. Second “nginx” container mounted a same volume at ‘/usr/share/nginx/html’ (this is the default directory for nginx to serve index.html file ). As we mounted same volume which has “index.html”, nginx web server serves the file (i.e., index.html) which is created by the first container.
  6. As first container adds new random message to index.html file for every 10 seconds, we see different message each time when we request index.html from nginx webserver.
  7. Volume contents won’t be deleted on Pod termination. So, whenever the new pod is scheduled on the same node with same hostpath will see all the previous contents.

save above yaml content as "pod-vol-hostpath.yaml" and run the following kubectl command
// create pod
$ kubectl apply -f pod-vol-hostpath.yaml
pod/pod-vol-hostpath created

// display pods
$ kubectl get po
NAME                    READY   STATUS      RESTARTS   AGE
pod-vol-hostpath        2/2     Running     0          1m10s

run the following kubectl command to forward a port from local machine to the pod
// syntax
// kubectl port-forward <pod-name> <local-port>:<container-port>
$ kubectl port-forward pod-vol-hostpath 8081:80
Forwarding from 127.0.0.1:8081 -> 80
Forwarding from [::1]:8081 -> 80

run the following curl command to check random messages which are appending after every 10 seconds
$ curl http://localhost:8081
random message text Tue Nov  7 12:01:10 UTC 2021

$ curl http://localhost:8081
random message text Tue Nov  7 12:01:10 UTC 2021
random message text Tue Nov  7 12:01:20 UTC 2021
random message text Tue Nov  7 12:01:30 UTC 2021

Volume contents won’t be deleted on Pod termination. So, whenever the new pod is scheduled on the same node with same hostpath will see all the previous contents.

delete the Pod and recreate all above steps to check existing data is printing while doing curl command
// delete pod
$ kubectl delete pod/pod-vol-hostpath
pod/pod-vol-hostpath deleted

// create pod
$ kubectl apply -f pod-vol-hostpath.yaml
pod/pod-vol-hostpath created

// display pods
$ kubectl get po
NAME                    READY   STATUS      RESTARTS   AGE
pod-vol-hostpath        2/2     Running     0          1m10s

// syntax
// kubectl port-forward <pod-name> <local-port>:<container-port>
$ kubectl port-forward pod-vol-hostpath 8081:80
Forwarding from 127.0.0.1:8081 -> 80
Forwarding from [::1]:8081 -> 80


$ curl http://localhost:8081
random message text Tue Nov  7 12:01:10 UTC 2021
random message text Tue Nov  7 12:01:20 UTC 2021
random message text Tue Nov  7 12:01:30 UTC 2021
random message text Tue Nov  7 14:12:40 UTC 2021

// first 3 lines are generated by the previous pod

It is confirmed that curl command showing both previous pod generated contents and new pod contents.

Kubernetes for Developers Journey.
Happy Coding :)

Tuesday, November 2, 2021

Kubernetes for Developers #23: Kubernetes Volume emptyDir in-detail

Containers are ephemeral. It means, any container generated data gets stored into its own filesystem and will be deleted automatically if the container is deleted or restarted.

In Docker world, docker volumes provide a way to store container data into the host machine as permanent storage. However, it is less managed and limited for multi-node cluster.

Kubernetes volumes provide a way for containers to access external disk storage or share storage among containers.

Kubernetes volumes are not top-level object as like Pod, Deployment etc., however these are component of a pod and defined as part of pod YAML specification. K8 Volumes are available to all containers in the pod and must be mounted in each container specific file location.

Kubernetes supports many types of volumes like, 

  • emptyDir: Used for mounting temporary empty directory from worker node Disk/RAM
  • awsElasticBlockStore: Used for mounting AWS EBS volume into the pod
  • azureDisk: Used for mounting Microsoft Azure data disk into the pod
  • azureFile: Used for mounting Microsoft Azure File volume into the pod
  • gcePersistentDisk: Used for mounting Google PD into the pod
  • hostPath: Used for mounting Worker node filesystem into the pod
  • nfs: Used for mounting existing NFS (Network file system) into the pod
  • configMap/secret: Used for mounting these values into the pod
  • persistentVolumeClaim: Used for mounting dynamically provisioned storage into the pod

A Pod can use any number of volume types simultaneously to persist container data.

emptyDir volume


An empty directory created when a Pod is assigned to a node and remains active until pod is running. All containers in the pod can read/write the contents to the emptyDir volume. An emptyDir volume will be erased automatically once the pod is terminated from the node.

A container crashing does not remove a Pod from a node. The data in an emptyDir volume is safe across container crashes. It only erased when Pod is deleted from the node.

  • It is useful for sharing files between containers which are running in the same pod
  • It is useful for doing disk-based merge sort on large dataset where memory is low
  • It is useful when container filesystem is read-only and wants to write data temporarily.

apiVersion: v1
kind: Pod
metadata:
  name: volume-emptydir
spec:
  containers:
    - name: alpine
      image: alpine
      command:
        [
          "sh",
          "-c",
          'mkdir var/mydoc; while true; do echo "random message text `date`" >> var/mydoc/index.html;sleep 10;done',
        ]
      volumeMounts:
        - name: vol-emptydir
          mountPath: /var/mydoc
    - name: nginx
      image: nginx:alpine
      ports:
        - containerPort: 80
          protocol: TCP
      volumeMounts:
        - name: vol-emptydir
          mountPath: /usr/share/nginx/html
  volumes:
    - name: vol-emptydir
      emptyDir: {}

As per above yaml ,
  1. A multi-container pod gets created with volume type “emptyDir” named as “vol-emptydir”
  2. “vol-emptydir” volume gets created automatically when a pod is assigned to a worker-node.
  3. As name says, volume contains empty files/directories at initial stage.
  4. First “alpine” container creates random text message for every 10 seconds and appends to /var/mydoc/index.html file.
  5. First “alpine” container mounted a volume at ‘/var/mydoc’. So, all the files under this directory copied into volume (i.e., index.html file).
  6. Second “nginx” container mounted a same volume at ‘/usr/share/nginx/html’ (this is the default directory for nginx to serve index.html file ). As we mounted same volume which has “index.html”, nginx web server serves the file (i.e., index.html) which is created by the first container.
  7. As first container adds new random message to index.html file for every 10 seconds, we see different message each time when we request index.html from nginx webserver.
  8. Volume and its contents get deleted automatically when the Pod is deleted
  9. By default, Volume contents get stored on the worker node disk. However, “emptyDir” volume contents can be stored into the memory (RAM) by setting “medium” attribute

save above yaml content as "pod-vol-emptydir.yaml" and run the following kubectl command

// create pod
$ kubectl apply -f pod-vol-emptydir.yaml
pod/pod-vol-emptydir created

// display pods
$ kubectl get po
NAME                    READY   STATUS      RESTARTS   AGE
pod-vol-emptydir        2/2     Running     0          2m39s

run the following kubectl command to forward a port from local machine to the pod
// syntax
// kubectl port-forward <pod-name> <local-port>:<container-port>
$ kubectl port-forward pod-vol-emptydir 8081:80
Forwarding from 127.0.0.1:8081 -> 80
Forwarding from [::1]:8081 -> 80

run the following curl command to check random messages which are appending after every 10 seconds
$ curl http://localhost:8081
random message text Tue Nov  2 15:48:50 UTC 2021

$ curl http://localhost:8081
random message text Tue Nov  2 15:48:50 UTC 2021
random message text Tue Nov  2 15:49:00 UTC 2021
random message text Tue Nov  2 15:49:10 UTC 2021

An emptyDir volume does not persist data after pod termination. So, delete the Pod and recreate all above steps to check any existing data is printing while doing curl command
// delete pod
$ kubectl delete pod/pod-vol-emptydir
pod/pod-vol-emptydir deleted

// create pod
$ kubectl apply -f pod-vol-emptydir.yaml
pod/pod-vol-emptydir created

// display pods
$ kubectl get po
NAME                    READY   STATUS      RESTARTS   AGE
pod-vol-emptydir        2/2     Running     0          1m10s

// syntax
// kubectl port-forward <pod-name> <local-port>:<container-port>
$ kubectl port-forward pod-vol-emptydir 8081:80
Forwarding from 127.0.0.1:8081 -> 80
Forwarding from [::1]:8081 -> 80

$ curl http://localhost:8081
random message text Tue Nov  2 16:01:50 UTC 2021
It is confirmed that only new data is printing after pod recreation when using emptyDir volume.

By default, Volume contents get stored on the worker node disk. However, “emptyDir” volume contents can be stored into the memory (RAM) by setting “medium” attribute

volumes:
 - name: vol-emptydir
   emptyDir:
   medium: Memory

Kubernetes for Developers Journey.
Happy Coding :)