Node Failure in Kubernetes

  1. Check all the nodes are healthy.
  2. Check failed the node.
  3. Check CPU, Memory, Disk space on the node.
  4. Check Kubelet Status.
  5. Check Certificates.

Check all the nodes are healthy.

$ kubectl get nodes
NAME         STATUS     ROLES                  AGE   VERSION
kubemaster   Ready      control-plane,master   38h   v1.20.2
kubenode01   NotReady   <none>                 37h   v1.20.2
kubenode02   Ready      <none>                 37h   v1.20.2

If you are reported as NotReady check details about the nodes using the kubectl describe node

Check failed the node

$ kubectl describe node kubenode01
Name:               kubenode01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubenode01
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 29 Jan 2021 01:22:58 +0000
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  kubenode01
  AcquireTime:     <unset>
  RenewTime:       Sat, 30 Jan 2021 15:19:06 +0000
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Fri, 29 Jan 2021 08:08:14 +0000   Fri, 29 Jan 2021 08:08:14 +0000   WeaveIsUp           Weave pod has set this
  MemoryPressure       Unknown   Sat, 30 Jan 2021 15:17:52 +0000   Sat, 30 Jan 2021 15:19:49 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Sat, 30 Jan 2021 15:17:52 +0000   Sat, 30 Jan 2021 15:19:49 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Sat, 30 Jan 2021 15:17:52 +0000   Sat, 30 Jan 2021 15:19:49 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Sat, 30 Jan 2021 15:17:52 +0000   Sat, 30 Jan 2021 15:19:49 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.56.3
  Hostname:    kubenode01
Capacity:
  cpu:                2
  ephemeral-storage:  40593612Ki
  hugepages-2Mi:      0
  memory:             2040788Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  37411072758
  hugepages-2Mi:      0
  memory:             1938388Ki
  pods:               110
System Info:
  Machine ID:                 63b75d07d8cc40709d065a83e1965f1a
  System UUID:                E97DD35B-7625-6944-B998-56973029AD53
  Boot ID:                    b72ebe6d-9393-47ba-a7e9-8a22397e346d
  Kernel Version:             4.15.0-135-generic
  OS Image:                   Ubuntu 18.04.5 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.11
  Kubelet Version:            v1.20.2
  Kube-Proxy Version:         v1.20.2
PodCIDR:                      10.244.1.0/24
PodCIDRs:                     10.244.1.0/24
Non-terminated Pods:          (3 in total)
  Namespace                   Name                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                ------------  ----------  ---------------  -------------  ---
  default                     nginx-pod           0 (0%)        0 (0%)      0 (0%)           0 (0%)         31h
  kube-system                 kube-proxy-xvclq    0 (0%)        0 (0%)      0 (0%)           0 (0%)         37h
  kube-system                 weave-net-7cgwj     100m (5%)     0 (0%)      200Mi (10%)      0 (0%)         37h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                100m (5%)    0 (0%)
  memory             200Mi (10%)  0 (0%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:              <none>

Each node has a set of conditions that can point us in a direction as to why a node might have failed. Depending on the status they are either set to true or false or unknown.

Check CPU, Memory, Disk space on the node.

top

top - 15:25:41 up  9:49,  1 user,  load average: 0.68, 0.53, 0.44
Tasks: 134 total,   3 running,  88 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.5 us,  1.0 sy,  0.0 ni, 97.0 id,  0.2 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :  2040788 total,   108180 free,   780784 used,  1151824 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1215864 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                   
 2994 root      20   0 1098956 361876  69096 S   2.7 17.7  23:05.74 kube-apiserver                                                                            
  843 root      20   0 1960740 104676  63732 S   1.7  5.1  13:06.56 kubelet                                                                                   
  980 root      20   0 1566360  93160  45852 S   0.7  4.6   4:26.80 dockerd                                                                                   
 2913 root      20   0 10.121g  67016  22440 S   0.7  3.3   5:22.83 etcd                                                                                      
 3128 root      20   0  816780 105336  60328 R   0.7  5.2   6:03.55 kube-controller                                                                           
 3068 root      20   0  747620  48912  32364 S   0.3  2.4   1:24.29 kube-scheduler                                                                            
 4073 root      20   0  107700   5156   4324 S   0.3  0.3   0:00.74 containerd-shim                                                                           
 4129 root      20   0  743820  36220  26272 S   0.3  1.8   0:04.31 kube-proxy                                                                                
 5943 root      20   0  747404  37640  29316 S   0.3  1.8   1:07.45 coredns     
df -h

Filesystem      Size  Used Avail Use% Mounted on
udev            984M     0  984M   0% /dev
tmpfs           200M  1.6M  198M   1% /run
/dev/sda1        39G  3.2G   36G   9% /
tmpfs           997M     0  997M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           997M     0  997M   0% /sys/fs/cgroup
vagrant         234G  102G  132G  44% /vagrant
tmpfs           200M     0  200M   0% /run/user/1000
service kubelet status
sh: 0: getcwd() failed: No such file or directory
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Fri 2021-01-29 08:07:10 UTC; 1 day 7h ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 843 (kubelet)
    Tasks: 19 (limit: 2360)
   CGroup: /system.slice/kubelet.service
           └─843 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kub

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

Check Certificates

openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 2 (0x2)
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = kubenode02-ca@1611883386
        Validity
            Not Before: Jan 29 00:23:06 2021 GMT
            Not After : Jan 29 00:23:06 2022 GMT
        Subject: CN = kubenode02@1611883386
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                RSA Public-Key: (2048 bit)
                Modulus:
                    00:dc:0a:4c:41:e3:2f:f6:09:7e:61:77:ff:60:43:
                    34:6f:58:c1:7d:3e:51:d3:f5:d2:6e:f0:6f:42:ec:
                    2b:7e:70:07:18:3e:4c:2f:eb:ec:07:24:5f:27:f8:
                    3e:d9:12:5b:ca:05:ba:0d:6a:34:91:58:4a:05:e9:
                    bf:44:a7:7e:56:e0:d9:89:6e:ac:0f:ef:cc:5a:5e:
                    20:98:07:95:d2:87:82:03:7f:33:8f:df:7a:43:e6:
                    14:06:b2:25:d0:74:d8:f4:99:ab:26:0a:d3:1c:66:
                    f7:7a:40:61:17:5f:68:77:9f:ae:98:51:a1:cc:c9:
                    58:7c:0a:d9:1e:5b:2d:7a:eb:04:ac:ee:49:a8:ab:
                    03:e6:d1:f0:ea:92:01:7c:55:2e:a9:7f:bd:fa:59:
                    5c:17:65:4a:e7:fc:44:d8:35:cf:9d:a6:cd:cd:17:
                    f0:76:97:86:f4:dc:8b:68:c0:c8:d6:da:68:03:b0:
                    56:db:70:93:dd:97:60:82:29:be:2c:83:1f:55:2e:
                    a9:78:cc:94:64:32:bb:8e:f5:73:79:0b:99:96:d9:
                    e6:c6:61:ba:ed:87:80:14:57:51:db:f2:48:fb:1c:
                    97:0a:5e:67:44:22:24:92:f4:26:5b:f9:00:2b:ce:
                    08:3c:31:9e:cd:b0:95:d3:14:42:cb:6e:e4:69:b0:
                    6e:01
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier: 
                keyid:E0:C1:BC:49:11:7D:9E:4D:90:2E:89:E9:79:B7:9E:D9:1A:C3:7E:86

            X509v3 Subject Alternative Name: 
                DNS:kubenode02
    Signature Algorithm: sha256WithRSAEncryption
         0c:7d:22:f5:0d:5b:25:02:ed:8a:34:44:0d:11:80:d9:7e:47:
         1c:e7:d1:60:e2:bb:38:53:bd:23:75:ab:c1:72:72:9e:38:09:
         45:f8:ce:d5:52:31:d6:51:44:96:5d:56:09:89:5b:8a:e8:ee:
         30:4f:30:ba:6e:fe:06:0d:8c:2e:85:fa:c3:97:42:a0:6d:1d:
         98:a4:9d:d6:6d:b8:e1:a5:56:b2:13:19:5d:85:0a:81:49:dd:
         bf:ca:3d:fd:34:56:8e:00:0c:7f:30:31:d9:1d:46:76:af:6f:
         2d:94:a3:6f:04:bb:3a:aa:5f:d3:7e:b4:b6:86:5a:0a:ea:d8:
         9c:4d:e8:7e:97:10:e9:8b:9e:4d:fb:5b:32:26:fa:f0:05:ae:
         a8:d7:34:e2:3e:f8:83:7e:df:e8:dc:c5:f7:f9:81:26:4a:ed:
         3e:41:80:20:68:ce:76:16:6f:89:82:e2:42:44:c3:0e:43:dd:
         02:8d:e5:11:94:3b:71:63:5b:72:a3:63:3f:b6:1f:d5:f0:d6:
         b8:81:1d:32:cf:92:91:71:20:44:d3:70:1e:d3:c9:a7:60:72:
         4a:9f:2d:be:64:77:f2:47:1c:d3:0e:ed:04:07:f6:37:b1:69:
         d7:70:8f:2f:f2:ff:c7:92:11:9c:41:79:4d:fd:ec:43:17:3d:
         00:e8:27:b7
-----BEGIN CERTIFICATE-----
MIIDKzCCAhOgAwIBAgIBAjANBgkqhkiG9w0BAQsFADAjMSEwHwYDVQQDDBhrdWJl
bm9kZTAyLWNhQDE2MTE4ODMzODYwHhcNMjEwMTI5MDAyMzA2WhcNMjIwMTI5MDAy
MzA2WjAgMR4wHAYDVQQDDBVrdWJlbm9kZTAyQDE2MTE4ODMzODYwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDcCkxB4y/2CX5hd/9gQzRvWMF9PlHT9dJu
8G9C7Ct+cAcYPkwv6+wHJF8n+D7ZElvKBboNajSRWEoF6b9Ep35W4NmJbqwP78xa
XiCYB5XSh4IDfzOP33pD5hQGsiXQdNj0masmCtMcZvd6QGEXX2h3n66YUaHMyVh8
CtkeWy166wSs7kmoqwPm0fDqkgF8VS6pf736WVwXZUrn/ETYNc+dps3NF/B2l4b0
3ItowMjW2mgDsFbbcJPdl2CCKb4sgx9VLql4zJRkMruO9XN5C5mW2ebGYbrth4AU
V1Hb8kj7HJcKXmdEIiSS9CZb+QArzgg8MZ7NsJXTFELLbuRpsG4BAgMBAAGjbTBr
MA4GA1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8E
AjAAMB8GA1UdIwQYMBaAFODBvEkRfZ5NkC6J6Xm3ntkaw36GMBUGA1UdEQQOMAyC
Cmt1YmVub2RlMDIwDQYJKoZIhvcNAQELBQADggEBAAx9IvUNWyUC7Yo0RA0RgNl+
Rxzn0WDiuzhTvSN1q8Fycp44CUX4ztVSMdZRRJZdVgmJW4ro7jBPMLpu/gYNjC6F
+sOXQqBtHZikndZtuOGlVrITGV2FCoFJ3b/KPf00Vo4ADH8wMdkdRnavby2Uo28E
uzqqX9N+tLaGWgrq2JxN6H6XEOmLnk37WzIm+vAFrqjXNOI++IN+3+jcxff5gSZK
7T5BgCBoznYWb4mC4kJEww5D3QKN5RGUO3FjW3KjYz+2H9Xw1riBHTLPkpFxIETT
cB7TyadgckqfLb5kd/JHHNMO7QQH9jexaddwjy/y/8eSEZxBeU397EMXPQDoJ7c=
-----END CERTIFICATE-----

Leave a Reply

Your email address will not be published.

ANOTE.DEV