Configure High Availability in Kubernetes

What happens when you lose the master node in your cluster?

As long as the workers are up and containers are alive, your application are still running.

Users can access the application until things start to fail.

If a container or pod on the worker node crashes and the pod was part of replicaset, the master needs to instruct the worker to load a new pod. But the Master is not available and so are the controllers and schedulers on the master. There is no one to recreate the pod and no one to schedule it on nodes. Similarly, since the Kube-API Server is not available you cannot access the cluster externally through the Kubectl tool or through API for management purposes. Which is why you must consider multiple master node in a High availability configuration in the production environment. A high availability configuration is where you have redundancy across every component in the cluster so as to avoid a single point of failure.

The master node hosts the control plane components including the Kube-API, Kube Controller Manager, Kube Scheduler and ETCD. In High Availability setup, with an additional master node, you have the same components running on the new master as well.

How does that work?

  • Running multiple instances of the same components?
  • Are they going to do the same thing twice?
  • How do they share the work among themselves?

The Kube API server is responsible for receiving requests and processing them or providing information about the cluster. They work on one request at a time. So, API Servers on all cluster nodes can be alive and running at the same time in an active mode.

the Kubectl utility talks to the Kube-API Server to get things done and we point the Kubectl utility to reach the master node at port 6443. That is where the Kube-API Server listens and this is configured in the Kube-config file. So, what if there are two masters, where do we point the Kubectl to? We can send a request to either one of them but we should not be sending the same request to both of them. So it is better to have a load balancer of some kind configured in the front of the master node that split traffic between the API Servers. So, we then point the Kubectl utility to that load balancer. You may use Nginx or High availability proxy or any other load balancer for this purpose.

What about the scheduler and the controller manager?

These are controllers that watch the state of the cluster and take actions.

The controller manager consists of controllers like the replication controller that is constantly watching the state of pods and taking necessary actions, like creating a new pod, when one fails. If multiple instances of those run in parallel, then they might duplicate actions, resulting in more pods than actually needed. The same is true with a scheduler. As such they must not run in parallel. They run in an active and standby mode. So, who decides which among the two is active and which is passive. This is achieved through a leader election process. How does that work? When a controller manager process is configured. You may specify the leader elect option which is by default set to true. With this option when the controller manager process starts it tries to gain a lease or a lock on an endpoint object in Kubernetes named as Kube-controller-manager endpoint. Whichever process first updates the endpoint with this information gains the lease and becomes the active of the two. The other becomes passive it holds the lock for the lease duration specified using the leader-elect-lease-duration option which is by default set to 15 seconds. The active process then renews the lease every 10 seconds which is the default value for the option leader-elect-renew-deadline. Both the processes try to become the leader every two seconds, set by the leader-elect-retry-period option. That way if one process fails maybe because the first must of crashes then the second process can acquire the lock and become the leader. The Kube scheduler follows a similar approach and has the same command-line option.


With ETCD there are two topologies that you can configure in Kubernetes.

  1. ETCD is a part of the master nodes. It is called a stacked control plan nodes topology. But one node goes down both an ETCD member and control plane instance is lost and redundancy is compromised.
    • Easier to setup
    • Easier to manage
    • Fewer Sever
    • Risk during Failure
  2. ETC is separated from the control plane nodes and runs on its own set of servers. It is called an external ETCD Topology. This is less risky as a failed control plan node does not impact the ETCD Cluster and the data it stores. However, it is harder to set up and requires twice the number of servers for the external ETCD nodes.
    • Less Risk
    • Harder to setup
    • More Servers

The Kube-API Server is the only component that talks to the ETCD server. If you look into the Kube-API service configuration options, we have a set of options specifying where the ETCD server is.

cat /etc/systemd/system/kube-apiserver.service

So regardless of the topology, we use and wherever we configure ETC Servers, whether on the same servers or on a separate server, we need Kube-API Server is pointing to the right address of the ETCD servers. ETCD is a distributed system, so the Kube-API server or any other component that wishes to talk to it can reach the ETCD server at any of its instances. You can read and write data through any of the available ETCD Server instances. This is why we specify a list of ETCD-Servers in the Kube-API Server configuration.

Leave a Reply

Your email address will not be published.