Data Migration in Kubernetes: A DevOps Guide

Migrating data in Kubernetes can be challenging, but with the right approach, it doesn’t have to be. Whether you’re moving data within a namespace, between clusters, or even restructuring your setup, various tools like CSI Clones, SQL dumps, Restic, Kopia, and Velero can help. This guide breaks down different migration strategies, highlighting their pros, cons, and best practices to ensure a smooth and efficient transition.

Jakub Piasek

Feb 3, 2025

10

min.

Data Migration in Kubernetes: A DevOps Guide

Kubernetes (k8s) has become a very popular orchestrator in many companies in the last few years. As it has grown very fast it also wasn’t standardized quickly and a lot of companies face issues now when they need to reorganize their setup to meet some global policies and bring consistent standards among projects etc. During that process, it’s very likely that some of the applications need to be migrated as well as the data used by them. Let’s be real - most of our manifests are either defined in code and synced with GitOps tool or some other pipeline, but data? That’s always an issue… Whether you're shifting data within a cluster or between clusters, there are several ways (of course there are many more ways than described in this article) to move the data depending on the size, limitations and types of data that you’re moving. Let’s jump into the topic and see what potential tools and methods may be helpful depending on your scenario.

0. Moving Data Within a Kubernetes Namespace

  1. CSI Clone

This is kind of very specific scenario when you need to create new pvc in the same namespace as the existing one. In my experience, the most popular use case for that was when in the project we needed to change the storage class of the PVC (and during the migration, you can also resize a volume when you need to add some additional GB’s). Of course, sizing volumes down is not possible that way. It may also have a use when you want to deploy a second instance of the application with the same dataset to compare some settings/behavior e.g. depending on the load. The whole process is quick and simple - what you need to do is just deploy a simple k8s object - more can be found in the official K8S documentation (link here: CSI Volume Cloning ).

Plus: quick (even for large volumes), simple, allows to change storage class and enlarge PVC.
Minuses: works only within a ns.

1. Moving SQL types data

In this part, we will divide it into 2 as the approach will be different when we discuss migrating ‘SQL’ type data and ‘non-sql’ types of data.

1. SQL Type Migration – The Traditional Way

For databases inside Kubernetes, you can migrate data using tried-and-true SQL methods:

  1. Dump the database using mysqldump (MySQL) or pg_dump (PostgreSQL).

  2. Upload the dump to an artifact repository like Artifactory or cloud storage (S3, GCS) - *usually you can do this with curl/wget. If the curl/wget is not present save dump data inside the db volume and just deploy an additional pod that will attach the same pvc. It’s possible even if you have a ‘read-write-once' volume - just make sure that the new pod will run on the same worker node as the database pod and the PVC will be mounted without issue. Then just execute curl/wget to upload data to someplace where you can then easily download it.

  3. Hint: It’s also a good way to provide backups of your data with some cronjob that will run, create a db dump, and upload a backup with a timestamp.

  4. Download and restore the dump to the target environment - if you’re missing curl/wget do the same as during backup and just deploy some additional pod that will be removed after successful data migration.

Example Commands for MySQL Migration

Dump database my_database:
mysqldump -u root -p my_database > backup.sql
Upload backup file to registry (e.g. nexus or artifactory):
curl --upload-file https://your.registry.com/data-example/migration backup.sql
Get backup file into the new db location:
curl -v https://your.registry.com/data-example/migration backup.sql -O
Restore database fram backup to new_database:
mysql -u root -p new_database < backup.sql

Plus: SQL-type data provides consistency, and good traceability if you upload data to the registry with proper naming, can be used as a backup method, is quick (as SQL backups are usually small), and works between clusters but with that data can be also migrated to ‘non-k8s’ SQL instance.
Minuses: Requires deploying additional resources sometimes.

2. Moving non-sql types data

Now I think it’s important to really well prepare for the migration that you want to provide. Firstly I’d strongly advise checking what kind of data you store in the PVC. It happens also sometimes that on the pvc there are some additional data backups created that are not needed, but nobody disabled them so your volume is 20% of real user data and 80% of backup which is not necessary as you have external backups implemented. In that case, every GB counts and it’s important to make sure that you don’t move some useless data.

a. Let’s start with something simple

There are applications like some dashboards, uptime monitors, and some portfolio/demo websites that in general contain static and not very often changing content. They provide some GUI to upload data and do configuration and then the data goes to a volume, but is this really the best approach? In my opinion not. If you have that kind of application I’d strongly consider changing it to the ‘serverless’ version. If you don’t need to store many heavy images or videos just upload them directly to your repository and copy to the docker container within the pipeline build process. If you need to store some more heavy data (but still constant) - create a dedicated place in the registry for storing and then within the pipeline download the files and copy them to the docker image. With that, you achieve a ready-to-use container that is not attached to any volume. Then just put your application manifests within some code and deploy with pipeline or e.g. argoCD - having that setup will allow the migrating of these applications to be simple, fast, and smooth. But of course, it’s not always possible.

b. Data Migration Using Restic/Kopia

To move actual (large) data, Restic and Kopia are great choices, but any other backup-restore tool will do its job:

Example: Backing Up Data with Restic

restic -r s3:s3.amazonaws.com/bucket-name backup /data

Example: Restoring Data with Restic

restic -r s3:s3.amazonaws.com/bucket-name restore latest --target /restore-location

Ofc there are many other backup/restore tools for your data that you can use, it's just an example, but the idea behind is the same - create a backup and restore in the new location.

Plus: Simple, can be automated script, perfect for large-scale volumes
Minuses: Usually long (the more data you have the longer it is), potential issues with data/directory permissions as during the restore process some of them may sometimes get overwritten.

c. Full Kubernetes Object Migration with Velero

In the end, there is a tool that may cover all the mentioned earlier points and just move the whole k8s ns from one place to another. Velero is a powerful backup and restore tool that helps migrate entire Kubernetes environments. It supports:

  • Backing up Persistent Volumes, ConfigMaps, Secrets, and Deployments.

  • Restoring data in a new cluster or namespace.

  • Cloud integration with AWS S3, GCS, and other storage backends.

Example: Backing up a Cluster with Velero

velero backup create my-cluster-backup --include-namespaces=my-namespace --storage-location=my-storage

Example: Restoring a Backup with Velero

velero restore create --from-backup my-cluster-backup

Plus: Migrate everything, effective and reliable.
Minuses: Not very simple to configure, if you use some gitops tool you’ll need them to sync applications in a new location.

3. Key Considerations to Avoid Migration Nightmares

When planning a migration, keep these potential risks in mind:

a. User Roles and Privileges

  • Ensure that appropriate RBAC permissions are in place (you know - different clusters or ns may require additional RBAC setups or doing some requests in companies' IT systems).

  • Verify service accounts have the necessary access in the target cluster (e.g. if the deployments can pull required images or they need some docker config secret).

b. Firewall and Network Policies

  • Open required ports to allow smooth data transfers (and make sure the app in a new place will be able to access all necessary services - e.g. SMTP, LDAP/aad/entraID.

  • Update Kubernetes network policies to permit necessary traffic.

c. Downtime – Minimize It!

  • Use rolling updates or blue-green deployments to avoid service disruptions (if possible, with large-scale data sometimes it’s better to scale down the application to avoid data inconsistency or set up an application in ‘read-only’ mode).

  • Schedule migrations during off-peak hours for minimal impact.

d. DNS and URL Changes

  • Update DNS records to reflect the new cluster.

  • Set up proper redirections to prevent broken links and downtime.

Conclusion

Kubernetes data migration can be smooth and efficient with the right tools and planning. Whether you’re cloning a PVC, using SQL dumps, leveraging Velero, or utilizing Restic/Kopia for seamless backups, understanding potential risks and best practices is key. With careful preparation, you can execute data migrations confidently and keep your Kubernetes workloads running without a hitch.

Data Migration in Kubernetes: A DevOps Guide

Migrating data in Kubernetes can be challenging, but with the right approach, it doesn’t have to be. Whether you’re moving data within a namespace, between clusters, or even restructuring your setup, various tools like CSI Clones, SQL dumps, Restic, Kopia, and Velero can help. This guide breaks down different migration strategies, highlighting their pros, cons, and best practices to ensure a smooth and efficient transition.

Jakub Piasek

Feb 3, 2025

10

min.

Data Migration in Kubernetes: A DevOps Guide

Kubernetes (k8s) has become a very popular orchestrator in many companies in the last few years. As it has grown very fast it also wasn’t standardized quickly and a lot of companies face issues now when they need to reorganize their setup to meet some global policies and bring consistent standards among projects etc. During that process, it’s very likely that some of the applications need to be migrated as well as the data used by them. Let’s be real - most of our manifests are either defined in code and synced with GitOps tool or some other pipeline, but data? That’s always an issue… Whether you're shifting data within a cluster or between clusters, there are several ways (of course there are many more ways than described in this article) to move the data depending on the size, limitations and types of data that you’re moving. Let’s jump into the topic and see what potential tools and methods may be helpful depending on your scenario.

0. Moving Data Within a Kubernetes Namespace

  1. CSI Clone

This is kind of very specific scenario when you need to create new pvc in the same namespace as the existing one. In my experience, the most popular use case for that was when in the project we needed to change the storage class of the PVC (and during the migration, you can also resize a volume when you need to add some additional GB’s). Of course, sizing volumes down is not possible that way. It may also have a use when you want to deploy a second instance of the application with the same dataset to compare some settings/behavior e.g. depending on the load. The whole process is quick and simple - what you need to do is just deploy a simple k8s object - more can be found in the official K8S documentation (link here: CSI Volume Cloning ).

Plus: quick (even for large volumes), simple, allows to change storage class and enlarge PVC.
Minuses: works only within a ns.

1. Moving SQL types data

In this part, we will divide it into 2 as the approach will be different when we discuss migrating ‘SQL’ type data and ‘non-sql’ types of data.

1. SQL Type Migration – The Traditional Way

For databases inside Kubernetes, you can migrate data using tried-and-true SQL methods:

  1. Dump the database using mysqldump (MySQL) or pg_dump (PostgreSQL).

  2. Upload the dump to an artifact repository like Artifactory or cloud storage (S3, GCS) - *usually you can do this with curl/wget. If the curl/wget is not present save dump data inside the db volume and just deploy an additional pod that will attach the same pvc. It’s possible even if you have a ‘read-write-once' volume - just make sure that the new pod will run on the same worker node as the database pod and the PVC will be mounted without issue. Then just execute curl/wget to upload data to someplace where you can then easily download it.

  3. Hint: It’s also a good way to provide backups of your data with some cronjob that will run, create a db dump, and upload a backup with a timestamp.

  4. Download and restore the dump to the target environment - if you’re missing curl/wget do the same as during backup and just deploy some additional pod that will be removed after successful data migration.

Example Commands for MySQL Migration

Dump database my_database:
mysqldump -u root -p my_database > backup.sql
Upload backup file to registry (e.g. nexus or artifactory):
curl --upload-file https://your.registry.com/data-example/migration backup.sql
Get backup file into the new db location:
curl -v https://your.registry.com/data-example/migration backup.sql -O
Restore database fram backup to new_database:
mysql -u root -p new_database < backup.sql

Plus: SQL-type data provides consistency, and good traceability if you upload data to the registry with proper naming, can be used as a backup method, is quick (as SQL backups are usually small), and works between clusters but with that data can be also migrated to ‘non-k8s’ SQL instance.
Minuses: Requires deploying additional resources sometimes.

2. Moving non-sql types data

Now I think it’s important to really well prepare for the migration that you want to provide. Firstly I’d strongly advise checking what kind of data you store in the PVC. It happens also sometimes that on the pvc there are some additional data backups created that are not needed, but nobody disabled them so your volume is 20% of real user data and 80% of backup which is not necessary as you have external backups implemented. In that case, every GB counts and it’s important to make sure that you don’t move some useless data.

a. Let’s start with something simple

There are applications like some dashboards, uptime monitors, and some portfolio/demo websites that in general contain static and not very often changing content. They provide some GUI to upload data and do configuration and then the data goes to a volume, but is this really the best approach? In my opinion not. If you have that kind of application I’d strongly consider changing it to the ‘serverless’ version. If you don’t need to store many heavy images or videos just upload them directly to your repository and copy to the docker container within the pipeline build process. If you need to store some more heavy data (but still constant) - create a dedicated place in the registry for storing and then within the pipeline download the files and copy them to the docker image. With that, you achieve a ready-to-use container that is not attached to any volume. Then just put your application manifests within some code and deploy with pipeline or e.g. argoCD - having that setup will allow the migrating of these applications to be simple, fast, and smooth. But of course, it’s not always possible.

b. Data Migration Using Restic/Kopia

To move actual (large) data, Restic and Kopia are great choices, but any other backup-restore tool will do its job:

Example: Backing Up Data with Restic

restic -r s3:s3.amazonaws.com/bucket-name backup /data

Example: Restoring Data with Restic

restic -r s3:s3.amazonaws.com/bucket-name restore latest --target /restore-location

Ofc there are many other backup/restore tools for your data that you can use, it's just an example, but the idea behind is the same - create a backup and restore in the new location.

Plus: Simple, can be automated script, perfect for large-scale volumes
Minuses: Usually long (the more data you have the longer it is), potential issues with data/directory permissions as during the restore process some of them may sometimes get overwritten.

c. Full Kubernetes Object Migration with Velero

In the end, there is a tool that may cover all the mentioned earlier points and just move the whole k8s ns from one place to another. Velero is a powerful backup and restore tool that helps migrate entire Kubernetes environments. It supports:

  • Backing up Persistent Volumes, ConfigMaps, Secrets, and Deployments.

  • Restoring data in a new cluster or namespace.

  • Cloud integration with AWS S3, GCS, and other storage backends.

Example: Backing up a Cluster with Velero

velero backup create my-cluster-backup --include-namespaces=my-namespace --storage-location=my-storage

Example: Restoring a Backup with Velero

velero restore create --from-backup my-cluster-backup

Plus: Migrate everything, effective and reliable.
Minuses: Not very simple to configure, if you use some gitops tool you’ll need them to sync applications in a new location.

3. Key Considerations to Avoid Migration Nightmares

When planning a migration, keep these potential risks in mind:

a. User Roles and Privileges

  • Ensure that appropriate RBAC permissions are in place (you know - different clusters or ns may require additional RBAC setups or doing some requests in companies' IT systems).

  • Verify service accounts have the necessary access in the target cluster (e.g. if the deployments can pull required images or they need some docker config secret).

b. Firewall and Network Policies

  • Open required ports to allow smooth data transfers (and make sure the app in a new place will be able to access all necessary services - e.g. SMTP, LDAP/aad/entraID.

  • Update Kubernetes network policies to permit necessary traffic.

c. Downtime – Minimize It!

  • Use rolling updates or blue-green deployments to avoid service disruptions (if possible, with large-scale data sometimes it’s better to scale down the application to avoid data inconsistency or set up an application in ‘read-only’ mode).

  • Schedule migrations during off-peak hours for minimal impact.

d. DNS and URL Changes

  • Update DNS records to reflect the new cluster.

  • Set up proper redirections to prevent broken links and downtime.

Conclusion

Kubernetes data migration can be smooth and efficient with the right tools and planning. Whether you’re cloning a PVC, using SQL dumps, leveraging Velero, or utilizing Restic/Kopia for seamless backups, understanding potential risks and best practices is key. With careful preparation, you can execute data migrations confidently and keep your Kubernetes workloads running without a hitch.

© 2024 QualityMinds, All rights reserved

© 2024 QualityMinds, All rights reserved

© 2024 QualityMinds, All rights reserved