Kubernetes
Introduction

It is a container orchestration tool to sustain high availability, scalability, but also used for disaster recovery (backup and restore)
Architecture
A cluster is made up of a master node and couples of worker node
Namespace
Namespaces are intended for use in environments with many users spread across multiple teams, or projects. For clusters with a few to tens of users, you should not need to create or think about namespaces at all. Start using namespaces when you need the features they provide.
Namespaces are a way to divide cluster resources between multiple users
Prevent resource starvation: By setting resource quotas, you can prevent individual pods or containers from consuming too many resources and causing resource starvation for other pods or containers running on the same node.
Ensure fair resource allocation: Resource quotas can ensure that each namespace or user on the cluster receives a fair share of the available resources, preventing any one user or application from monopolizing resources.
Enforce compliance and governance: Resource quotas can be used to enforce compliance and governance policies, such as limiting the amount of data that can be stored in a particular namespace or restricting the use of certain types of resources.
Worker Node
Each nodes contains kubelet. It is used to faciliate the communication with master node. For instance, kubelet receives the signal of starting container and uses the container runtime to start the pod and monitors its life cycle, including readiness and liveliness probes and reports back to kube-APIserver.
Worker node is a actual work happening, it has containers of different applications deployed on it
Master Node
For kube-APIserver, you can accept commands that view or change the state of the cluster, including launching pods, so that you can use the kubectl command frequently
Etcd is the clusters database, includes all of the cluster configuration data and more dynamic information, such as what nodes are part of the cluster, what pods should be running and where they should be running.
Kube-scheduler is responsible for scheduling pods onto nodes, it discovers a pod object that doesn't yet have an assignment to a node, it chooses a node and simply writes the name of that node into the pod object.
Kube-controller-manager continuously monitors the state of the cluster through kubeAPIserver, Whenever the current state of the cluster doesn't match the desired state, kube-controller-manager will attempt to make changes to achieve the desired state.
Components
Node
The node is equal to a virtual machine
Pod
It is the smallest unit in kubernetes
A node can contain multiple pods
It is an abstract layer over container, which is a container runtime
Mostly 1 container on the pod
Each pod has its own internal IP address, they can communicate with each other inside the same virtual network by their IP address
If a pod is dead, it will be replaced by new pod with new internal IP address
Ingress
An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting. An Ingress controller is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
name: nginx-demo
namespace: default
spec:
rules:
- http:
paths:
- path: /nginx
backend:
serviceName: nginx
servicePort: 80
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: html-file
mountPath: /usr/share/nginx/html
volumes:
- name: html-file
configMap:
name: nginx-index-v1
---
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: ingress-nginx
labels:
app: nginx
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
---
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-index-v1
data:
index.html: Nginx V1
Service
An abstract way to expose an application running on a set of Pods as a network service
The set of Pods targeted by a Service is usually determined by a selector that you define
There are mainly 4 types of Services:
Cluster IP (default): Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster.
Node Port: Clients send requests to the IP address of a node and with nodePort defined
Load Balancer: Clients send requests to the IP address of a network load balancer.
External Name: Maps the Service to the contents of the external Name field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
type: NodePort
selector:
app: webapp
ports:
- protocol: TCP
port: 3000
targetPort: 3000
nodePort: 30100
Deployment
It is declared based on the blueprint of the pods
Desired state can be defined in a Deployment, and the controller manager will change the actual state to the desired state at a controlled rate.
It will create a ReplicaSet, which will further create the pod. If you create a deployment with name counter, it will create a ReplicaSet with name counter-<replica-set-id>, which will further create a Pod with name counter-<replica-set->-<pod-id>.
It is using RollingUpdate(default) strategy, a new ReplicaSet is created and the Deployment moves the Pods from the old ReplicaSet to the new one at a controlled rate.
Auto scale the replicas of pod based on the rule defined
Only one PVC will be created that both the pods will be sharing
apiVersion: apps/v1
kind: Deployment
metadata:
name: counter
spec:
replicas: 3
selector:
matchLabels:
app: counter
template:
metadata:
labels:
app: counter
spec:
containers:
- name: counter
image: "kahootali/counter:1.1"
volumeMounts:
- name: counter
mountPath: /app/
volumes:
- name: counter
persistentVolumeClaim:
claimName: counter
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: counter
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Mi
storageClassName: efs

Stateful Set
Similar with deployment, but applicable to stateful application (with disk storage)
It create pods by itself but not reply on replica set
Every replica of a stateful set will have its own state, and each of the pods will be creating its own PVC(Persistent Volume Claim). So a statefulset with 3 replicas will create 3 pods, each having its own Volume, so total 3 PVCs.
StatefulSets don’t create ReplicaSet or anything of that sort, so you cant rollback a StatefulSet to a previous version. You can only delete or scale up/down the Statefulset. If you update a StatefulSet, it also performs RollingUpdate i.e. one replica pod will go down and the updated pod will come up, then the next replica pod will go down in same manner
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: counter
spec:
serviceName: "counter-app"
selector:
matchLabels:
app: counter
replicas: 1
template:
metadata:
labels:
app: counter
spec:
containers:
- name: counter
image: "kahootali/counter:1.1"
volumeMounts:
- name: counter
mountPath: /app/
volumeClaimTemplates:
- metadata:
name: counter
spec:
accessModes: [ "ReadWriteMany" ]
storageClassName: efs
resources:
requests:
storage: 50Mi

DaemonSet
A DaemonSet is a controller that ensures that the pod runs on all the nodes of the cluster. If a node is added/removed from a cluster, DaemonSet automatically adds/deletes the pod.
The created number of pods is equal to the number of nodes
Some typical use cases of a DaemonSet is to run cluster level applications like:
Monitoring Exporters: You would want to monitor all the nodes of your cluster so you will need to run a monitor on all the nodes of the cluster like NodeExporter.
Logs Collection Daemon: You would want to export logs from all nodes so you would need a DaemonSet of log collector like Fluentd to export logs from all your nodes.
DaemonSet don’t create ReplicaSet or anything of that sort, so you cant rollback a StatefulSet to a previous version.
All pods are sharing the same Volume
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: counter-app
spec:
selector:
matchLabels:
app: counter
template:
metadata:
name: counter-app
labels:
app: counter
spec:
tolerations:
- effect: NoSchedule
operator: Exists
containers:
- name: counter
image: "kahootali/counter:1.1"
volumeMounts:
- name: counter
mountPath: /app/
volumes:
- name: counter
persistentVolumeClaim:
claimName: counter
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: counter
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Mi
storageClassName: efs

Config Map & Secret
apiVersion: v1
kind: ConfigMap
metadata:
name: game-demo
data:
# property-like keys; each key maps to a simple value
player_initial_lives: "3"
ui_properties_file_name: "user-interface.properties"
# file-like keys
game.properties: |
enemy.types=aliens,monsters
player.maximum-lives=5
user-interface.properties: |
color.good=purple
color.bad=yellow
allow.textmode=true
apiVersion: v1
kind: Secret
metadata:
name: demo-secret
type: Opaque
data:
username: YWRtaW4=
password: cGFzc3dvcmQ=
Both can be used to store environment variables in key-value pair, but secret is used to store sensitive data (e.g password) as the data will be encrypted, the resulting ciphertext is then stored in etcd
Kubernetes will mount the secret as a volume into the container where the application is running. The secrets volume is then decrypted and the data is made available to the application as files or environment variables.
apiVersion: apps/v1
kind: Deployment
metadata:
name: loki
spec:
template:
spec:
containers:
- name: loki
image: grafana/loki
envFrom:
- configMapRef:
name: game-demo
- secretRef:
name: demo-secrets
Here is an example for attaching config map and secret to deployment env
Cron Job

Kubernetes will need at least three different objects (CronJob, Job, and Pod) to fulfill the cron task.
When the controller finds a CronJob to execute (meaning the current time matches the time specified via cron syntax), it will create another object called Job
A Job creates one or more Pods based on the configuration passed down from CronJob via Job in
jobTemplate
and will continue to retry execution of the Pods until a specified number of them successfully terminate
apiVersion: batch/v1
kind: CronJob
metadata:
name: helloworld
spec:
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: task
image: busybox
// act as a entry point of docker image
command:
- /bin/sh
- -c
- date; echo 'Hello World from Cronitor.io';
Vs Application Cron job
Advantage
Not rely on any framework or language
Can be managed simply via infra as code with yaml file
Disadvantage
Not suitable for complex logic, e.g: the dependency between cron job, retry based on logicial failure
Difficult to manually trigger the cron job
Volume
There are several types of volume
emptyDir
Temporary storage that exists only during pod lifetime
Data is lost when pod is deleted
volumes:
- name: cache-volume
emptyDir: {}
hostPath
Mounts a directory from the host node's filesystem
Data persists pod restarts but tied to specific node
Good for accessing node logs or docker socket
volumes:
- name: docker-socket
hostPath:
path: /var/run/docker.sock
ConfigMap
For mounting configuration data as files
Read-only by default
volumes:
- name: config-volume
configMap:
name: my-config
PersistentVolume (PV) and PersistentVolumeClaim (PVC)
For persistent storage that survives pod restarts
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: my-pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce # Can be mounted as read-write by a single node
resources:
requests:
storage: 10Gi # Requesting 10GB of storage
storageClassName: standard # What kind of storage to use
Storage Class
Defines what type of storage you want
Determines storage characteristics like:
Performance (IOPS, throughput)
Reliability
Backup policies
Cost tier
Here is an example
# Storage Class Definition
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-storage
provisioner: diskplugin.csi.alibabacloud.com
parameters:
type: cloud_essd # The type of storage
reclaimPolicy: Delete # how to handle when storage is deleted
---
# PVC using the Storage Class
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-storage
spec:
storageClassName: fast-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
# Pod using the PVC
apiVersion: v1
kind: Pod
metadata:
name: database-pod
spec:
containers:
- name: database
image: mysql:5.7
volumeMounts:
- name: storage
mountPath: /var/lib/mysql
volumes:
- name: storage
persistentVolumeClaim:
claimName: database-storage
Volume Mount
To attach the volume to the container path
apiVersion: apps/v1
kind: Deployment
metadata:
name: counter
spec:
replicas: 3
selector:
matchLabels:
app: counter
template:
metadata:
labels:
app: counter
spec:
containers:
- name: counter
image: "kahootali/counter:1.1"
# Mount volume to container path
volumeMounts:
- name: counter
mountPath: /app/
# Declare the volume
volumes:
- name: counter
persistentVolumeClaim:
claimName: counter
Resource Management
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 500m
memory: 1024Mi
Resource Management can be declared as a part of deployment file
A resource request is the amount of CPU and memory that a container requires to run, and it is used by Kubernetes to allocate resources to the container when it is scheduled to run on a node. When a container with a resource request is scheduled to run on a node, Kubernetes will find a node that has enough available resources to meet the container's request, and then allocate those resources to the container.
A resource limit, on the other hand, is the maximum amount of CPU and memory that a container is allowed to consume. If a container exceeds its resource limit. When a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.
Health Detection

livenessProbe:
httpGet:
path: /v1/health
periodSeconds: 300
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /v1/health
Liveness probes are crucial for ensuring your application stays up and running. If a liveness probe fails, Kubernetes will restart the pod to restore service.
Readiness probes check if your application is ready to receive requests. If a readiness probe fails, Kubernetes will remove the pod’s IP address from the service load balancer. This ensures no requests are forwarded to the pod until it becomes ready again.
Service Discovery
A cluster-aware DNS server, such as CoreDNS, watches the Kubernetes API for new Services and creates a set of DNS records for each one. If DNS has been enabled throughout your cluster then all Pods should automatically be able to resolve Services by their DNS name.
For example, if you have a Service called
my-service
in a Kubernetes namespacemy-ns
, the control plane and the DNS Service acting together create a DNS record formy-service.my-ns
. Pods in themy-ns
namespace should be able to find the service by doing a name lookup formy-service
(my-service.my-ns
would also work).
Commands
# get pod list
kubectl get pods --namespace <namespace>
# get the pod details
kubectl describe pod <podname> -n <namespace>
# get the log of pod
kubectl logs <podname> -n <namespace>
# go into the shell
kubectl exec --stdin --tty <podname> -- /bin/bash
# port forward
kubectl port-forward <podname> <local port>:<container port>
References
Last updated
Was this helpful?