Scaling
It’s not very convenient to manage pods individually. We need ways to deploy pods in multiple replicas, which is a base for high availability of the service. Additionally, we need a way to keep pods running in case some node fails.
ReplicaSet
ReplicaSet allows us to create a group of pod replicas, instead of just one pod.
The pods managed by the ReplicaSet are selected using an immutable label
selector
(similarly to Services). There
is also a template
, which defines the pod(s) that will be created under
ReplicaSet. Such a pod has to conform to the selector specified by the
ReplicaSet.
Pod names are generated based on the ReplicaSet’s name, but it can be changed
with the generateName
setting.
ReplicaSets are rarely used directly due to their lacking pods updates possibilities.
Updates
We’re free to change replicas
count and the number of pods will reflect the
setting.
If we modify the template
of some existing ReplicaSet, the existing pods will
not be updated. Instead, just the pods created by ReplicaSet in the future will
have the new settings applied.
Replacing a Pod
Sometimes we might want to investigate some issue in one of the pods, while keeping ReplicaSet running with proper scaling. We could temporarily increment the replicas config, but we’d have to rememeber to decrement it back later on. Instead, we can just change the labels of the faulty pod so that it does not conform to ReplicaSet’s selector. The ReplicaSet’s controller will create a new pod for its needs while we can start investigating the faulty pod.
Ownership
Pods managed by a ReplicaSet have a special “ownerReference” section in their “metadata”. A pod can have multiple owners.
Pods are auto-deleted when the owners are deleted (unless the --cascade=orphan
parameter is applied while removing the owner).
If a pod is taken out of the ReplicaSet (like this) the “ownerReference” metadata is deleted from it automatically.
Deployment
Deployments manage Pods via a ReplicaSet. They are mostly used for stateless
workloads. Labels applied in the template
for Pods are also applied to the
ReplicaSet that manages these pods.
In addition to settings available to ReplicaSets, Deployments also contain the
strategy
configuration. It dictates how Pods are replaced during updates.
Updates
Compared to ReplicaSets, updating Pod template causes all the existing Pods to be redeployed to meet the new requirements. Anytime we update Pod’s template a hash of it is calculated, and a new ReplicaSet is created with that hash being used for one of the selectors.
Strategies
There are two update strategies supported by the Deployment:
- Recreate - all pods deleted at the same time. There is some downtime until the new pods get created. It should be used when downtime is acceptable or when apps should not run in mixed versions
- RollingUpdate - old pods are gradually replaced with new ones. It’s the
default. The number of pods to be replaced at a time is configurable:
- maxSurge - the maximum number of pods above the configured
replicas
to be run during the update. The deployment may run more pods than desired replicas to keep the app available during the update. It’s an absolute number or percentage. The default is 25%. - maxUnavailable - the max number of Pods (realtive to
replicas
) that may be unavailable during the update.It’s an absolute number or percentage. The default is 25%.
- maxSurge - the maximum number of pods above the configured
Pausing
Pausing might be useful for:
- checking the state of the app in-between the update to see how mixed versions work.
- applying multiple update operations without immediate action from the deployment-controller.
Rollback
If deployment update rollout is failing, we can rollback with:
If the deployment is paused, rollback will also be paused.
History
We can rollback to older versions as well.
Restart
We can restart all pods of a deployment with:
All the pods are deleted and replaced with new ones. The strategy
setting is
respected.
Example
Creating a simple deployment: kubectl create deployment kiada --image=luksa/kiada:0.1
Kubectl sends a POST request to /deployments
of K8s API to create a
Deployment object. Kubernetes creates a Pod object based on Deployment. The
pod is assigned to a Worker Node. Kubelet on a worker node pulls the image and
runs the container.
We can track a deployment with k rollout status deployment my-deployment
.
Scaling
kubectl scale deployment kiada --replicas=3
It makes sense to skip replicas
setting from the manifest file of the
deploment. This way, when we reapply the manifest in the future, the replicas
setting will not be overwritten to the default again.
Ownership
Similarly to ReplicaSet, deletion of the Deploymeny auto-removes the ReplicaSet
and the Pods. To circumvent that, we can use the --cascade=orphan
parameter -
it will preserve both the Pods and ReplicaSet. In such a case, when we recreate
the deployment, the existing ReplicaSet/Pods are reused.
Pods Deletion
When scaling down, K8s selects pods to delete based on some priorities:
- pods that are not started
- pods collocated on the same node with greater number of replicas
- pods that lived shorter
- pods with a greater number of restarts
We can also influence the priority by applying pod-deletion-cost
annotation to
specific pods.
Logs
There is no easy way to display logs from all the pods in a ReplicaSet/Deployment. Instead, we have to use label selector:
--prefix
prefixes each log with the container that it came from--all-containers
displays logs from all containers of the pods
Strategies of deployment
Here’re the popular deployment strategies. Some of them are supported by K8s out-of-the-box, some are not.
- Recreate - remove all pods, create all new pods
- Rolling update - gradually replace pods
- Canary - replace a small number of pods, if the new ones work well, replace the rest
- A/B testing - create a small number of new pods, redirect some users (based on some condition) to the new pods.
- Blue/Green - deploy new pods in parallel with the old ones. When the new ones are ready, switch all the traffic to the new ones. Then delete the old Pods.
- Shadowing - deploy new pods in parallel with the old ones. Route all user traffic to both versions, but return to the users just the responses from the old version. In the meanitme, observe the responses from the new ones to make sure that they work as expected.
Only the first two strategies are supported by K8s. The other strategies require some manual work.
Canary
We can create a separate deployment with the new version and scale it to low
number of replicas. The labels of the new pods should match the selector
of
the Service that was used for the old version. This way, the new pods are added
to the pool ofw pods behind the service. If we see that the new version works, we
can go ahead and update the old deployment and delete the canary one.
A/B Testing
Some Ingress controllers have this capability.
Blue/Green
We create a separate deployment with the new versions (Green). We do not route
any traffic at these. All the traffic still hits the old deployment (Blue). When
we’re ready, we’d change the selector
of the Service to match the label
configured in the Green deployment. Then, we delete the Blue deployment.