Should the Kubernetes cluster have a backup?

A rule of the thumb is that any data that has value and needs to be preserved for business or technical or other requirements should have a backup.

Sebastian Kiljan

Aug 5, 2024

•

3

min.

A rule of the thumb is that any data that has value and needs to be preserved for business or technical or other requirements should have a backup. Consistent backup is critical to be able to recover from different disasters like human error or software and hardware failures.

Kubernetes as a platform brings a lot of complexity and challenges to the traditional way of doing backups. Important piece that is missing in Kubernetes is the lack of reliable way for native container level consistent backup. One of the reasons for that is that Kubernetes workloads by default don’t have a stopped state that could be used for consistent backup. Workload could be running with one or more replicas of pods or zero replica that translates to no existent workload and without external custom coordination process make traditional way of backups not usable. Over the years there were several attempts to create native backups for containers on the operating system level but they are not mature enough to be used in production or they are still in the experimental phase.

Even without native backup for containers Kubernetes platform provides several other options that provide efficient and reliable and consistent backup for any data that need to be preserved.

The most important aspect of backup on Kubernetes platform is to understand what kind of data needs to be backed up and what needs to be done to achieve consistency.

Methodologies like Gitops and Infrastructure as the code which are widely used in Kubernetes deployments require that all configuration is stored in Git repository. In these deployments Git repository is treated as a single source of truth and all changes are applied to Kubernetes cluster via Git repository. This approach simplifies what exactly needs to be preserved. Backup could be only limited to the Git repository that contains all data needed to recreate Kubernetes workloads from scratch. Tools like ArgoCD or Flux do all the heavy lifting and lower entrypoint for operators and users. In this scenario the most important aspect is to backup repositories that contain all data. Recovery is quite straightforward and depends on disaster level but basic recovery could require manual bootstrap ArgoCD and later on recreate applications by ArgoCD on Kubernetes cluster. This option only supports stateless workloads and backup only Kubernetes objects without any data on volumes.

Kubernetes deployment over years evolved from pure stateless deployment to mixed ones with stateful workloads like databases or distributed storage.

Flexibility is one of the main functionalities of Kubernetes which introduce user programmable controllers known as operator and custom resource definition concepts to provide reliable stateful workloads.

Operators and custom resource definitions allow to create an easy to use interface even for complex software like databases or distributed storage and reliably automate common operators work. Usually operators over time provide new functionalities like replication or backup but it solely depends on what software they manage and how much afford was provided to develop it. Operators for databases usually provide an efficient way of doing consistent backup and store it in object storage. Also they have easy ways to recover or provision new databases from existing backups. Operators functionality and maturity could be different between implementation and need to be always checked and tested before used with production backup.

When previously scenarios are not available for backup needs the only option left is to have custom backup for Kubernetes.

In the past the most common way was to create a custom backup in-house solution that was created for a specific kind of application. This kind of approach has several limitations including lack of portability and scalability or even worse overlooked missing backups.

Velero by Vmware is top open source solution for Kubernetes backups. Is easy to use by operators and users and mature over years and became production grade. Besides basic operations like backup and restore of Kubernetes objects, Velero supports backup of volumes using file system backup or snapshot depending on the hosting environment. Flexible configuration includes hooks which could be used to provide application specific commands to create consistent backup.