In 2015, only 10% of us were using some container orchestration solution (Kubernetes of one of its competitors). By 2017, 71% of us were using Kubernetes - based on a survey conducted by 451 Research. More and more teams opted for Kubernetes because of its efficiency and cross-cloud integration.
With the increased popularity, more and more teams started to look for lessons learned and best practices on the topic of deploying and running applications with confidence using Kubernetes. This article summarizes my experiences - with the hope that you can learn a few new tricks after reading it.
I am going to present a very similar topic at NodeSummit in a few weeks - if you’d like to discuss these points in person, find me at the conference!
Before putting Kubernetes into production, it is essential to understand its concepts. If you are new to Kubernetes, stop reading this article for a few minutes and check out these resources:
Once you tried running Kubernetes locally, it is time to understand better its internals and set up a cluster in the cloud. To do so, I’d recommend doing Kelsey Hightower‘s Kubernetes The Hard Way. It walks you through all the prerequisites, then goes into how you can bootstrap the etcd cluster, the Kubernetes control plane, and worker nodes, setting up DNS, then finally, how to do smoke tests.
While you might not want to do that for your production cluster, I found it useful to understand better how each component of Kubernetes works together. Depending on the cloud vendor you picked, most probably they’ll have managed Kubernetes services. For a comparison, check out this or this article.
Once you have your production Kubernetes cluster up and running, it is time to create the production version of your images. When I am talking about production images, I am referring to the following Docker best practices, including:
Pods - the smallest computing units in Kubernetes - can have the following states:
These states are determined using probes. Kubernetes defines both the liveness and the readiness probes. Liveness probes are used to signal Kubernetes if a container has to be restarted, while the readiness probes determine if a container can serve traffic.
To make sure you don’t fail any requests, we’ve open-sourced a tiny library that helps you implement these checks, and to make sure that your application stops in a graceful manner. The library is called terminus, and it extends your Node.js applications with health checks and graceful shutdown procedures.
Helm is the package manager for Kubernetes. Helm helps you manage Kubernetes applications — Helm Charts helps you define, install, and upgrade Kubernetes application. In the Helm universe, you find three big concepts:
- Chart is a Helm package - it contains all the resource definitions an application needs,
- Repository is the place where Helm Charts live - you can think of it as the npm or maven registry,
- Release is a running instance of a Helm Chart.
With Helm, you can add MySQL to your Kubernetes cluster as simply as:
To read learn more on Helm, I’d recommend to check out out the following resources:
As Kubernetes is managed through a REST API, it is your top priority to secure that interface.
To secure it, you can:
For a more comprehensive list, check out the official security guidelines.
When creating Pods or Deployments in Kubernetes, you can optionally define how much CPU or memory each container can use. One of the simplest scenarios for limiting resources a container can use is the following:
Limits and requests for CPU resources are measured in CPU units. Fractional requests are allowed. One CPU, in Kubernetes, is equivalent to:
Limits and requests for memory are measured in bytes.
In my experience the sooner you start using resource requests and limits, the better of you will be, because:
- You can move CPU/memory intensive applications to dedicated node pools.
- With node pools, you eliminate noisy neighbors (if you don’t have CPU limits, a container may use up all the CPU resources a node has).
- It enables you to scale your cluster—you will know how much traffic a pod can serve with the given resources.
Disaster recovery is usually a set of policies and procedures for what to do once a disaster hits mission-critical system components. It can (but doesn’t necessarily) mean data loss too.
When it comes to Kubernetes, it’s a good practice to backup your cluster regularly. However, doing frequent backups are not enough—in my experience, these backups are only valuable if you know how to use them. I’d recommend scheduling practice runs in which you restore your whole service from the ground up.
For Kubernetes, I’d recommend a tool called ark to create and restore these backups.
Ark gives you tools to backup and restore your Kubernetes cluster resources and persistent volumes.
Ark lets you:
- Take backups of your cluster and restore in case of loss.
- Copy cluster resources across cloud providers.
- Replicate your production environment for development and testing environments.
Do you see anything missing from the list? What’s your process to make sure your cluster and your applications are ready to serve production traffic? Please let me know in the comments below!