Failover in Kubernetes: It Does Exist!
On the one hand, monitoring and failover management are the pillars of any project's availability. On the other hand, you might question if failover management in Kubernetes is needed at all. Indeed, everything seems to be self-balanced, self-scaled, and self-restored in Kubernetes. The system looks like a magical fairy that saves you from all infrastructure problems and guarantees your project never fails. So, instead of asking "when" and "how," most of you would ask "why do failover cluster in K8s?" Unfortunately, our fairytale, like any other, turns into stark reality with the chime of bells.
At DevOpsProdigy, I spend a great deal of time consulting different teams about the pros and cons of various DevOps solutions. Most of them center around Kubernetes, and I want to share a few thoughts about how DevOpsProdigy ensures high-availability in Kubernetes. Don't take these as strict guidelines, just as a few insights from past mistakes.
In the good old days of dedicated servers and bare-metal solutions, with identical virtual or hardware servers, we used to apply three basic approaches:
- synchronize the code and static methods
- synchronize configs
- replicate data stores
And there we go. We can switch to a failover replica whenever we need to! Everyone is happy, Cinderella can go to the ball!
What traditional options are there to secure high availability for our K8s application? First of all, the instructions tell us to install many machines and create many master replicas. Every master should have etcd, API, MC, and scheduler enabled, and their number should be enough to reach a consensus within the cluster. In that case, our cluster will rebalance and keep working perfectly even if several replicas or masters fail. Magic is in the air again!
But what if our cluster is located in a single data center? Imagine a mechanical shovel cuts our cable, a lightning bolt strikes the data center, or climate change brings on a second Genesis flood? We are doomed and our cluster vanishes. What magic tricks will our fairy have for such scenarios?
First and foremost, do keep one more failover cluster to be able to switch to it at any moment. The infrastructures of both clusters must be identical. All non-standard plugins for the file system work, plus custom solutions for ingres, etc. will be carbon-copied for two or more clusters, depending on your budget and DevOps capacities. It is important, however, to clearly define two sets of all applications — deployments, statefulsets, daemonsets, cronjobs, etc. — and specify which of them will work continuously in backup mode, and which will be deployed only after you switch to a failover cluster.
So, here is the question: will our failover cluster be identical to our production cluster? No, it will not. I think our previous routine of copying everything for monolithic projects and hardware infrastructure does not work in K8s. A one-fits-all approach won’t work, and here’s why.
Let's start with the basic K8s functions.
- Deployments will be identical. Applications that can catch outgoing traffic will be running all the time.
- As for the config files, we will also decide on a case-by-case basis. It’s better not to keep a database in K8s and set access to the working DB in the configmaps (the failover process for the working DB, by the way, will be developed separately). Accordingly, we need to have a separate configmap file to provide access to a failover DB.
- The same applies to secrets; i.e., passwords to access DB and API keys. Either a production or a failover secret must work at any one time. So here we have two K8s entities whose failover copies are identical with the production copies.
- Cronjob is the third. Never will cronjobs on failover mirror the set of production cronjobs! Let's look at an example. If we deploy a failover cluster with all enabled cronjobs, our clients will receive, for instance, two email notifications instead of one. In short, any synchronization with external sources will be implemented twice. No one wants that, right?
How does the Internet recommend to organize a failover cluster? The second most popular answer on "Why do failover cluster in K8s?" is to use Kubernetes Federation.
What is Kubernetes Federation? Let's call it a big meta-cluster. Remember the K8s architecture with its master and several nodes? Well, every node in Kubernetes Federation is a separate cluster. Similarly with K8s, in K8s Federation we work with the same entities and primitives, but juggle with separate clusters, not machines. K8s allows you to sync resources in multiple clusters. You can be sure that every deployment across the K8s Federation will exist in every cluster. Plus, the Federation allows you to customize resources wherever necessary. If we change a deployed configmap or secret in one cluster, the other clusters will stay unaffected.
K8s Federation is a pretty young tool which doesn't support the entire set of K8s resources. When the first version of the documentation saw the light, it claimed to support only configmaps, ReplicaSets deployments, and ingress, excluding secrets and volume. It is indeed a very limited set of resources, especially if you like to have fun and pass your own resources to K8s via custom resource definition, for example. On the bright side, the K8s Federation provides flexibility in managing our ReplicaSet. If, for example, we want to run ten replicas of our application, K8s Federation will divide this number proportionately among the number of clusters by default. And, the good news is that we can still configure all of them! You can specify that your production cluster needs to have six replicas of our application, with the remaining four replicas on the failover cluster to save resources or to experiment. Although it is also quite convenient, we still have to search for new solutions, adjust deployment, etc.
Is it possible to approach the failover process in K8s more easily somehow? What helpful tools do we have?
First, we always have some kind of CI/CD system that generates yamls for our containers so we don’t have to create/apply them manually in our servers.
Second, there are several clusters as well as a few (if we're smart enough) registries we have backed up too. Not to mention a wonderful kubectl that can work with multiple clusters simultaneously.
So, in my opinion, the simplest and smartest decision for creating failover clusters is a primitive parallel deployment. If there is some pipeline in the CI\CD system, we will, first, build containers, then test and roll out applications via kubectl in several independent clusters. We can build simultaneous calculations on several clusters. Accordingly, at this stage, we can also set deployment configurations. You can define a set of configurations for our production and failover clusters, roll out a production environment in the production cluster on the CI/CD system, and roll out a failover environment for the failover cluster. Unlike in K8S Federation, we don't need to check and re-define resources in separate clusters after each deployment. It has been done already. We can be proud of ourselves.
There are, however, two serious concerns. The file system is the first one. Usually, we have either a physical volume or an external repository. If we store our files in PV inside the cluster, we better use good old-fashioned Isync or our preferred way to sync the files. Roll it out to all machines and prosper!
The second obstacle is our database. Again, good guys do not keep their database in K8s. Data failover, in that case, is organized by the same token: master-slave replication followed by changing the master. Finally, verify that the copy is up and running, and go dancing! If, however, we keep our DB within the cluster, there are many ready-made solutions to deploy a DB inside K8s by organizing the same master-slave replica.
There are gazillions of presentations, posts, and books about DB failover, and I can add nothing new here. Just one piece of advice: follow your dreams, develop your own complicated hacks, but please, please think through your failover scenarios.
Now, let's dig into the very process of switching to the failover site in case of the Apocalypse.
First of all, we deploy our stateless applications simultaneously. They do not affect the business logic of our applications or our project. We can constantly keep two sets of applications running, and they can start balancing the traffic.
Second of all, we will decide whether we need to configure our slave replicas. Let's imagine we have a production and a failover cluster in K8s, plus an external master database and failover master database. There are three potential scenarios of how those applications can start interacting with each other in production.
- The DB may switch over and we will have to switch traffic from production to failover DB.
- Our cluster may fail and we will have to switch to the failover cluster while continuing working with the production DB.
- In addition, both production cluster and production DB may shut down and we will switch to the failover cluster and failover DB, then redefine our configs to make our applications work with a new DB.
What conclusions can we draw from all this?
First and foremost, to live with a failover settings is to live happily. It's expensive, though. Ideally, there must be more than one failover. In a perfect world, a few failovers should be enough. One failover will be located in one data center, and the other one via a different hosting provider. Believe me — I found it out the hard way. Once there was a fire in a data center and I suggested to switch to a failover. Unfortunately, the failover servers were located in the very same spot.
The second and last conclusion: if your application in K8s connects with some external sources (database or some external API), you must set it as a service with an external Endpoint. In that case, when you switch from your database, you won't have to deploy all your dozens of applications that reference that same database. Define your database as a separate service and use it as if it's inside the cluster. If the DB fails, you will only need to change the IP in one place, and continue to live long and prosper.