16.1 Introducing DaemonSets

A DaemonSet is an API object that ensures that exactly one replica of a Pod is running on each cluster node. By default, daemon Pods are deployed on every node, but you can use a node selector to restrict deployment to some of the nodes.

16.1.1 Understanding the DaemonSet object

A DaemonSet contains a Pod template and uses it to create multiple Pod replicas, just like Deployments, ReplicaSets, and StatefulSets. However, with a DaemonSet, you don’t specify the desired number of replicas as you do with the other objects. Instead, the DaemonSet controller creates as many Pods as there are nodes in the cluster. It ensures that each Pod is scheduled to a different Node, unlike Pods deployed by a ReplicaSet, where multiple Pods can be scheduled to the same Node, as shown in the following figure.

Figure 16.1 DaemonSets run a Pod replica on each node, whereas ReplicaSets scatter them around the cluster.

What type of workloads are deployed via DaemonSets and why

A DaemonSet is typically used to deploy infrastructure Pods that provide some sort of system-level service to each cluster node. Thes includes the log collection for the node’s system processes, as well as its Pods, daemons to monitor these processes, tools that provide the cluster’s network and storage, manage the installation and update of software packages, and services that provide interfaces to the various devices attached to the node.

The Kube Proxy component, which is responsible for routing traffic for the Service objects you create in your cluster, is usually deployed via a DaemonSet in the kube-system Namespace. The Container Network Interface (CNI) plugin that provides the network over which the Pods communicate is also typically deployed via a DaemonSet.

Although you could run system software on your cluster nodes using standard methods such as init scripts or systemd, using a DaemonSet ensures that you manage all workloads in your cluster in the same way.

Understanding the operation of the DaemonSet controller

Just like ReplicaSets and StatefulSets, a DaemonSet contains a Pod template and a label selector that determines which Pods belong to the DaemonSet. In each pass of its reconciliation loop, the DaemonSet controller finds the Pods that match the label selector, checks that each node has exactly one matching Pod, and creates or removes Pods to ensure that this is the case. This is illustrated in the next figure.

Figure 16.2 The DaemonSet controller’s reconciliation loop

When you add a Node to the cluster, the DaemonSet controller creates a new Pod and associates it with that Node. When you remove a Node, the DaemonSet deletes the Pod object associated with it. If one of these daemon Pods disappears, for example, because it was deleted manually, the controller immediately recreates it. If an additional Pod appears, for example, if you create a Pod that matches the label selector in the DaemonSet, the controller immediately deletes it.

16.1.2 Deploying Pods with a DaemonSet

A DaemonSet object manifest looks very similar to that of a ReplicaSet, Deployment, or StatefulSet. Let’s look at a DaemonSet example called demo, which you can find in the book's code repository in the file ds.demo.yaml. The following listing shows the full manifest.

Listing 16.1 A DaemonSet manifest example

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: demo
spec:
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: demo
        image: busybox
        command:
        - sleep
        - infinity

The DaemonSet object kind is part of the apps/v1 API group/version. In the object's spec, you specify the label selector and a Pod template, just like a ReplicaSet for example. The metadata section within the template must contain labels that match the selector.

NOTE

The selector is immutable, but you can change the labels as long as they still match the selector. If you need to change the selector, you must delete the DaemonSet and recreate it. You can use the --cascade=orphan option to preserve the Pods while replacing the DaemonSet.

As you can see in the listing, the demo DaemonSet deploys Pods that do nothing but execute the sleep command. That’s because the goal of this exercise is to observe the behavior of the DaemonSet itself, not its Pods. Later in this chapter, you’ll create a DaemonSet whose Pods actually do something.

Quickly inspecting a DaemonSet

Create the DaemonSet by applying the ds.demo.yaml manifest file with kubectl apply and then list all DaemonSets in the current Namespace as follows:

$ kubectl get ds
NAME   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
demo   2         2         2       2            2           <none>          7s

NOTE

The shorthand for DaemonSet is ds.

The command’s output shows that two Pods were created by this DaemonSet. In your case, the number may be different because it depends on the number and type of Nodes in your cluster, as I’ll explain later in this section.

Just as with ReplicaSets, Deployments, and StatefulSets, you can run kubectl get with the -o wide option to also display the names and images of the containers and the label selector.

$ kubectl get ds -o wide
NAME   DESIRED   CURRENT   ...   CONTAINERS   IMAGES    SELECTOR
Demo   2         2         ...   demo         busybox   app=demo

Inspecting a DaemonSet in detail

The -o wide option is the fastest way to see what’s running in the Pods created by each DaemonSet. But if you want to see even more details about the DaemonSet, you can use the kubectl describe command, which gives the following output:

$ kubectl describe ds demo
Name:           demo
Selector:       app=demo
Node-Selector:  <none>
Labels:         <none>
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 2
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status:  2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=demo
  Containers:
   demo:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sleep
      infinity
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From                  Message
  ----    ------            ----  ----                  -------
  Normal  SuccessfulCreate  40m   daemonset-controller  Created pod: demo-wqd22
  Normal  SuccessfulCreate  40m   daemonset-controller  Created pod: demo-w8tgm

The output of the kubectl describe commands includes information about the object’s labels and annotations, the label selector used to find the Pods of this DaemonSet, the number and state of these Pods, the template used to create them, and the Events associated with this DaemonSet.

Understanding a DaemonSet’s status

During each reconciliation, the DaemonSet controller reports the state of the DaemonSet in the object’s status section. Let’s look at the demo DaemonSet’s status. Run the following command to print the object’s YAML manifest:

$ kubectl get ds demo -o yaml
...
status:
  currentNumberScheduled: 2
  desiredNumberScheduled: 2
  numberAvailable: 2
  numberMisscheduled: 0
  numberReady: 2
  observedGeneration: 1
  updatedNumberScheduled: 2

As you can see, the status of a DaemonSet consists of several integer fields. The following table explains what the numbers in those fields mean.

Table 16.1 DaemonSet status fields

Value Description
currentNumberScheduled The number of Nodes that run at least one Pod associated with this DaemonSet.
desiredNumberScheduled The number of Nodes that should run the daemon Pod, regardless of whether they actually run it.
numberAvailable The number of Nodes that run at least one daemon Pod that’s available.
numberMisscheduled The number of Nodes that are running a daemon Pod but shouldn’t be running it.
numberReady The number of Nodes that have at least one daemon Pod running and ready
updatedNumberScheduled The number of Nodes whose daemon Pod is current with respect to the Pod template in the DaemonSet.

The status also contains the observedGeneration field, which has nothing to do with DaemonSet Pods. You can find this field in virtually all other objects that have a spec and a status. You’ll learn about this field in chapter 20, so ignore it for now.

You’ll notice that all the status fields explained in the previous table indicate the number of Nodes, not Pods. Some field descriptions also imply that more than one daemon Pod could be running on a Node, even though a DaemonSet is supposed to run exactly one Pod on each Node. The reason for this is that when you update the DaemonSet’s Pod template, the controller runs a new Pod alongside the old Pod until the new Pod is available. When you observe the status of a DaemonSet, you aren’t interested in the total number of Pods in the cluster, but in the number of Nodes that the DaemonSet serves.

Understanding why there are fewer daemon Pods than Nodes

In the previous section, you saw that the DaemonSet status indicates that two Pods are associated with the demo DaemonSet. This is unexpected because my cluster has three Nodes, not just two.

I mentioned that you can use a node selector to restrict the Pods of a DaemonSet to some of the Nodes. However, the demo DaemonSet doesn’t specify a node selector, so you’d expect three Pods to be created in a cluster with three Nodes. What’s going on here? Let’s get to the bottom of this mystery by listing the daemon Pods with the same label selector defined in the DaemonSet.

NOTE

Don’t confuse the label selector with the node selector; the former is used to associate Pods with the DaemonSet, while the latter is used to associate Pods with Nodes.

The label selector in the DaemonSet is app=demo. Pass it to the kubectl get command with the -l (or --selector) option. Additionally, use the -o wide option to display the Node for each Pod. The full command and its output are as follows:

$ kubectl get pods -l app=demo -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE           ...
demo-w8tgm   1/1     Running   0          80s   10.244.2.42   kind-worker    ...
demo-wqd22   1/1     Running   0          80s   10.244.1.64   kind-worker2   ...

Now list the Nodes in the cluster and compare the two lists:

$ kubectl get nodes
NAME                 STATUS   ROLES                  AGE   VERSION
kind-control-plane   Ready    control-plane,master   22h   v1.23.4
kind-worker          Ready    <none>                 22h   v1.23.4
kind-worker2         Ready    <none>                 22h   v1.23.4

It looks like the DaemonSet controller has only deployed Pods on the worker Nodes, but not on the master Node running the cluster’s control plane components. Why is that?

In fact, if you’re using a multi-node cluster, it’s very likely that none of the Pods you deployed in the previous chapters were scheduled to the Node hosting the control plane, such as the kind-control-plane Node in a cluster created with the kind tool. As the name implies, this Node is meant to only run the Kubernetes components that control the cluster. In chapter 2, you learned that containers help isolate workloads, but this isolation isn’t as good as when you use multiple separate virtual or physical machines. A misbehaving workload running on the control plane Node can negatively affect the operation of the entire cluster. For this reason, Kubernetes only schedules workloads to control plane Nodes if you explicitly allow it. This rule also applies to workloads deployed through a DaemonSet.

Deploying daemon Pods on control plane Nodes

The mechanism that prevents regular Pods from being scheduled to control plane Nodes is called Taints and Tolerations. You’ll learn more about it in chapter 23. Here, you’ll only learn how to get a DaemonSet to deploy Pods to all Nodes. This may be necessary if the daemon Pods provide a critical service that needs to run on all nodes in the cluster. Kubernetes itself has at least one such service—the Kube Proxy. In most clusters today, the Kube Proxy is deployed via a DaemonSet. You can check if this is the case in your cluster by listing DaemonSets in the kube-system namespace as follows:

$ kubectl get ds -n kube-system
NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR      AGE
kindnet      3         3         3       3            3           <none>             23h
kube-proxy   3         3         3       3            3           kubernetes.io...   23h

If, like me, you use the kind tool to run your cluster, you’ll see two DaemonSets. Besides the kube-proxy DaemonSet, you’ll also find a DaemonSet called kindnet. This DaemonSet deploys the Pods that provide the network between all the Pods in the cluster via CNI, the Container Network Interface, which you’ll learn more about in chapter 19.

The numbers in the output of the previous command indicate that the Pods of these DaemonSets are deployed on all cluster nodes. Their manifests reveal how they do this. Display the manifest of the kube-proxy DaemonSet as follows and look for the lines I’ve highlighted:

$ kubectl get ds kube-proxy -n kube-system -o yaml
apiVersion: apps/v1
kind: DaemonSet
...
spec:
  template:
    spec:
      ...
      tolerations:
      - operator: Exists
      volumes:
      ...

The highlighted lines aren’t self-explanatory and it’s hard to explain them without going into the details of taints and tolerations. In short, some Nodes may specify taints, and a Pod must tolerate a Node’s taints to be scheduled to that Node. The two lines in the previous example allow the Pod to tolerate all possible taints, so consider them a way to deploy daemon Pods on absolutely all Nodes.

As you can see, these lines are part of the Pod template and not direct properties of the DaemonSet. Nevertheless, they’re considered by the DaemonSet controller, because it wouldn’t make sense to create a Pod that the Node rejects.

Inspecting a daemon Pod

Now let’s turn back to the demo DaemonSet to learn more about the Pods that it creates. Take one of these Pods and display its manifest as follows:

$ kubectl get po demo-w8tgm -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-03-23T19:50:35Z"
  generateName: demo-
  labels:
    app: demo
    controller-revision-hash: 8669474b5b
    pod-template-generation: "1"
  name: demo-w8tgm
  namespace: bookinfo
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: demo
    uid: 7e1da779-248b-4ff1-9bdb-5637dc6b5b86
  resourceVersion: "67969"
  uid: 2d044e7f-a237-44ee-aa4d-1fe42c39da4e
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - kind-worker
  containers:
  ...

Each Pod in a DaemonSet gets the labels you define in the Pod template, plus some additional labels that the DaemonSet controller itself adds. You can ignore the pod-template-generation label because it’s obsolete. It’s been replaced by the label controller-revision-hash. You may remember seeing this label in StatefulSet Pods in the previous chapter. It serves the same purpose—it allows the controller to distinguish between Pods created with the old and the new Pod template during updates.

The ownerReferences field indicates that daemon Pods belong directly to the DaemonSet object, just as stateful Pods belong to the StatefulSet object. There's no object between the DaemonSet and the Pods, as is the case with Deployments and their Pods.

The last item in the manifest of a daemon Pod I want you to draw your attention to is the spec.affinity section. You'll learn more about Pod affinity in chapter 23, where I explain Pod scheduling in detail, but you should be able to tell that the nodeAffinity field indicates that this particular Pod needs to be scheduled to the Node kind-worker. This part of the manifest isn’t included in the DaemonSet’s Pod template, but is added by the DaemonSet controller to each Pod it creates. The node affinity of each Pod is configured differently to ensure that the Pod is scheduled to a specific Node.

In older versions of Kubernetes, the DaemonSet controller specified the target node in the Pod’s spec.nodeName field, which meant that the DaemonSet controller scheduled the Pod directly without involving the Kubernetes Scheduler. Now, the DaemonSet controller sets the nodeAffinity field and leaves the nodeName field empty. This leaves scheduling to the Scheduler, which also takes into account the Pod’s resource requirements and other properties.

16.1.3 Deploying to a subset of Nodes with a node selector

A DaemonSet deploys Pods to all cluster nodes that don’t have taints that the Pod doesn’t tolerate, but you may want a particular workload to run only on a subset of those nodes. For example, if only some of the nodes have special hardware, you might want to run the associated software only on those nodes and not on all of them. With a DaemonSet, you can do this by specifying a node selector in the Pod template. Note the difference between a node selector and a pod selector. The DaemonSet controller uses the former to filter eligible Nodes, whereas it uses the latter to know which Pods belong to the DaemonSet. As shown in the following figure, the DaemonSet creates a Pod for a particular Node only if the Node's labels match the node selector.

Figure 16.3 A node selector is used to deploy DaemonSet Pods on a subset of cluster nodes.

The figure shows a DaemonSet that deploys Pods only on Nodes that contain a CUDA-enabled GPU and are labelled with the label gpu: cuda. The DaemonSet controller deploys the Pods only on Nodes B and C, but ignores node A, because its label doesn’t match the node selector specified in the DaemonSet.

NOTE

CUDA or Compute Unified Device Architecture is a parallel computing platform and API that allows software to use compatible Graphics Processing Units (GPUs) for general purpose processing.

Specifying a node selector in the DaemonSet

You specify the node selector in the spec.nodeSelector field in the Pod template. The following listing shows the same demo DaemonSet you created earlier, but with a nodeSelector configured so that the DaemonSet only deploys Pods to Nodes with the label gpu: cuda. You can find this manifest in the file ds.demo.nodeSelector.yaml.

Listing 16.2 A DaemonSet with a node selector

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: demo
  labels:
    app: demo
spec:
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      nodeSelector:
        gpu: cuda
      containers:
      - name: demo
        image: busybox
        command:
        - sleep
        - infinity

Use the kubectl apply command to update the demo DaemonSet with this manifest file. Use the kubectl get command to see the status of the DaemonSet:

$ kubectl get ds
NAME   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
demo   0         0         0       0            0           gpu=cuda        46m

As you can see, there are now no Pods deployed by the demo DaemonSet because no nodes match the node selector specified in the DaemonSet. You can confirm this by listing the Nodes with the node selector as follows:

$ kubectl get nodes -l gpu=cuda
No resources found

Moving Nodes in and out of scope of a DaemonSet by changing their labels

Now imagine you just installed a CUDA-enabled GPU to the Node kind-worker2. You add the label to the Node as follows:

$ kubectl label node kind-worker2 gpu=cuda
node/kind-worker2 labeled

The DaemonSet controller watches not just DaemonSet and Pod, but also Node objects. When it detects a change in the labels of the kind-worker2 Node, it runs its reconciliation loop and creates a Pod for this Node, since it now matches the node selector. List the Pods to confirm:

$ kubectl get pods -l app=demo -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP            NODE           ...
demo-jbhqg   1/1     Running   0          16s   10.244.1.65   kind-worker2   ...

When you remove the label from the Node, the controller deletes the Pod:

$ kubectl label node kind-worker2 gpu-
node/kind-worker2 unlabeled

$ kubectl get pods -l app=demo
NAME         READY   STATUS        RESTARTS   AGE
demo-jbhqg   1/1     Terminating   0          71s

Using standard Node labels in DaemonSets

Kubernetes automatically adds some standard labels to each Node. Use the kubectl describe command to see them. For example, the labels of my kind-worker2 node are as follows:

$ kubectl describe node kind-worker2
Name:               kind-worker2
Roles:              <none>
Labels:             gpu=cuda
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kind-worker2
                    kubernetes.io/os=linux

You can use these labels in your DaemonSets to deploy Pods based on the properties of each Node. For example, if your cluster consists of heterogeneous Nodes that use different operating systems or architectures, you configure a DaemonSet to target a specific OS and/or architecture by using the kubernetes.io/arch and kubernetes.io/os labels in its node selector.

Suppose your cluster consists of AMD- and ARM-based Nodes. You have two versions of your node agent container image. One is compiled for AMD CPUs and the other is compiled for ARM CPUs. You can create a DaemonSet to deploy the AMD-based image to the AMD nodes, and a separate DaemonSet to deploy the ARM-based image to the other nodes. The first DaemonSet would use the following node selector:

nodeSelector:
        kubernetes.io/arch: amd64

The other DaemonSet would use the following node selector:

nodeSelector:
        kubernetes.io/arch: arm

This multiple DaemonSets approach is ideal if the configuration of the two Pod types differs not only in the container image, but also in the amount of compute resources you want to provide to each container. You can read more about this in chapter 22.

NOTE

You don’t need multiple DaemonSets if you just want each node to run the correct variant of your container image for the node’s architecture and there are no other differences between the Pods. In this case, using a single DaemonSet with multi-arch container images is the better option.

Updating the node selector

Unlike the Pod label selector, the node selector is mutable. You can change it whenever you want to change the set of Nodes that the DaemonSet should target. One way to change the selector is to use the kubectl patch command. In chapter 14, you learned how to patch an object by specifying the part of the manifest that you want to update. However, you can also update an object by specifying a list of patch operations using the JSON patch format. You can learn more about this format at jsonpatch.com. Here I show you an example of how to use JSON patch to remove the nodeSelector field from the object manifest of the demo DaemonSet:

$ kubectl patch ds demo --type='json' -p='[{ "op": "remove", "path": "/spec/template/spec/nodeSelector"}]'daemonset.apps/demo patched

Instead of providing an updated portion of the object manifest, the JSON patch in this command specifies that the spec.template.spec.nodeSelector field should be removed.

16.1.4 Updating a DaemonSet

As with Deployments and StatefulSets, when you update the Pod template in a DaemonSet, the controller automatically deletes the Pods that belong to the DaemonSet and replaces them with Pods created with the new template.

You can configure the update strategy to use in the spec.updateStrategy field in the DaemonSet object’s manifest, but the spec.minReadySeconds field also plays a role, just as it does for Deployments and StatefulSets. At the time of writing, DaemonSets support the strategies listed in the following table.

Table 16.2 The supported DaemonSet update strategies

Value Description
RollingUpdate In this update strategy, Pods are replaced one by one. When a Pod is deleted and recreated, the controller waits until the new Pod is ready. Then it waits an additional amount of time, specified in the spec.minReadySeconds field of the DaemonSet, before updating the Pods on the other Nodes. This is the default strategy.
OnDelete The DaemonSet controller performs the update in a semi-automatic way. It waits for you to manually delete each Pod, and then replaces it with a new Pod from the updated template. With this strategy, you can replace Pods at your own pace.

The RollingUpdate strategy is similar to that in Deployments, and the OnDelete strategy is just like that in StatefulSets. As in Deployments, you can configure the RollingUpdate strategy with the maxSurge and maxUnavailable parameters, but the default values for these parameters in DaemonSets are different. The next section explains why.

The RollingUpdate strategy

To update the Pods of the demo DaemonSet, use the kubectl apply command to apply the manifest file ds.demo.v2.rollingUpdate.yaml. Its contents are shown in the following listing.

Listing 16.3 Specifying the RollingUpdate strategy in a DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: demo
spec:
  minReadySeconds: 30
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
        ver: v2
    spec:
      ...

In the listing, the type of updateStrategy is RollingUpdate, with maxSurge set to 0 and maxUnavailable set to 1.

NOTE

These are the default values, so you can also remove the updateStrategy field completely and the update is performed the same way.

When you apply this manifest, the Pods are replaced as follows:

$ kubectl get pods -l app=demo -L ver
NAME         READY   STATUS        RESTARTS   AGE   VER
demo-5nrz4   1/1     Terminating   0          10m
demo-vx27t   1/1     Running       0          11m

$ kubectl get pods -l app=demo -L ver
NAME         READY   STATUS        RESTARTS   AGE   VER
demo-k2d6k   1/1     Running       0          36s   v2
demo-vx27t   1/1     Terminating   0          11m

$ kubectl get pods -l app=demo -L ver
NAME         READY   STATUS    RESTARTS   AGE   VER
demo-k2d6k   1/1     Running   0          126s  v2
demo-s7hsc   1/1     Running   0          62s   v2

Since maxSurge is set to zero, the DaemonSet controller first stops the existing daemon Pod before creating a new one. Coincidentally, zero is also the default value for maxSurge, since this is the most reasonable behavior for daemon Pods, considering that the workloads in these Pods are usually node agents and daemons, of which only a single instance should run at a time.

If you set maxSurge above zero, two instances of the Pod run on the Node during an update for the time specified in the minReadySeconds field. Most daemons don't support this mode because they use locks to prevent multiple instances from running simultaneously. If you tried to update such a daemon in this way, the new Pod would never be ready because it couldn’t obtain the lock, and the update would fail.

The maxUnavailable parameter is set to one, which means that the DaemonSet controller updates only one Node at a time. It doesn’t start updating the Pod on the next Node until the Pod on the previous node is ready and available. This way, only one Node is affected if the new version of the workload running in the new Pod can’t be started.

If you want the Pods to update at a higher rate, increase the maxUnavailable parameter. If you set it to a value higher than the number of Nodes in your cluster, the daemon Pods will be updated on all Nodes simultaneously, like the Recreate strategy in Deployments.

TIP

To implement the Recreate update strategy in a DaemonSet, set the maxSurge parameter to 0 and maxUnavailable to 10000 or more, so that this value is always higher than the number of Nodes in your cluster.

An important caveat to rolling DaemonSet updates is that if the readiness probe of an existing daemon Pod fails, the DaemonSet controller immediately deletes the Pod and replaces it with a Pod with the updated template. In this case, the maxSurge and maxUnavailable parameters are ignored.

Likewise, if you delete an existing Pod during a rolling update, it's replaced with a new Pod. The same thing happens if you configure the DaemonSet with the OnDelete update strategy. Let's take a quick look at this strategy as well.

The OnDelete update strategy

An alternative to the RollingUpdate strategy is OnDelete. As you know from the previous chapter on StatefulSets, this is a semi-automatic strategy that allows you to work with the DaemonSet controller to replace the Pods at your discretion, as shown in the next exercise. The following listing shows the contents of the manifest file ds.demo.v3.onDelete.yaml.

Listing 16.4 Setting the DaemonSet update strategy

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: demo
spec:
  updateStrategy:
    type: OnDelete
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
        ver: v3
    spec:
      ...

The OnDelete strategy has no parameters you can set to affect how it works, since the controller only updates the Pods you manually delete. Apply this manifest file with kubectl apply and then check the DaemonSet as follows to see that no action is taken by the DaemonSet controller:

$ kubectl get ds
NAME   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
demo   2         2         2       0            2           <none>          80m

The output of the kubectl get ds command shows that neither Pod in this DaemonSet is up to date. This is to be expected since you updated the Pod template in the DaemonSet, but the Pods haven't yet been updated, as you can see when you list them:

$ kubectl get pods -l app=demo -L ver
NAME         READY   STATUS    RESTARTS   AGE   VER
demo-k2d6k   1/1     Running   0          10m   v2
demo-s7hsc   1/1     Running   0          10m   v2

To update the Pods, you must delete them manually. You can delete as many Pod as you want and in any order, but let's delete only one for now. Select a Pod and delete it as follows:

$ kubectl delete po demo-k2d6k --wait=false
pod "demo-k2d6k" deleted

You may recall that, by default, the kubectl delete command doesn't exit until the deletion of the object is complete. If you use the --wait=false option, the command marks the object for deletion and exits without waiting for the Pod to actually be deleted. This way, you can keep track of what happens behind the scenes by listing Pods several times as follows:

$ kubectl get pods -l app=demo -L ver
NAME         READY   STATUS        RESTARTS   AGE   VER
demo-k2d6k   1/1     Terminating   0          10m   v2
demo-s7hsc   1/1     Running       0          10m   v2

$ kubectl get pods -l app=demo -L ver
NAME         READY   STATUS    RESTARTS   AGE   VER
demo-4gf5h   1/1     Running   0          15s   v3
demo-s7hsc   1/1     Running   0          11m   v2

If you list the DaemonSets with the kubectl get command as follows, you’ll see that only one Pod has been updated:

$ kubectl get ds
NAME   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
demo   2         2         2       1            2           <none>          91m

Delete the remaining Pod(s) to complete the update.

Considering the use of the OnDelete strategy for critical daemon Pods

With this strategy, you can update cluster-critical Pods with much more control, albeit with more effort. This way, you can be sure that the update won’t break your entire cluster, as might happen with a fully automated update if the readiness probe in the daemon Pod can’t detect all possible problems.

For example, the readiness probe defined in the DaemonSet probably doesn’t check if the other Pods on the same Node are still working properly. If the updated daemon Pod is ready for minReadySeconds, the controller will proceed with the update on the next Node, even if the update on the first Node caused all other Pods on the Node to fail. The cascade of failures could bring down your entire cluster. However, if you perform the update using the OnDelete strategy, you can verify the operation of the other Pods after updating each daemon Pod and before deleting the next one.

16.1.5 Deleting the DaemonSet

To finish this introduction to DaemonSets, delete the demo DaemonSet as follows:

$ kubectl delete ds demo
daemonset.apps "demo" deleted

As you’d expect, doing so will also delete all demo Pods. To confirm, list the Pods as follows:

$ kubectl get pods -l app=demo
NAME         READY   STATUS        RESTARTS   AGE
demo-4gf5h   1/1     Terminating   0          2m22s
demo-s7hsc   1/1     Terminating   0          6m53s

This concludes the explanation of DaemonSets themselves, but Pods deployed via DaemonSets differ from Pods deployed via Deployments and StatefulSets in that they often access the host node’s file system, its network interface(s), or other hardware. You’ll learn about this in the next section.

results matching ""

    No results matching ""