Skip to content
All posts
DEVOPSCLOUD

Kubernetes migrations without downtime: a project manager's runbook

· 2 min read

Kubernetes migrations without downtime: a project manager's runbook

The riskiest sentence in any migration plan is "and then we cut over." Big-bang cutovers concentrate all the risk into one irreversible night, which is exactly when you have the least information and the most pressure. I've never seen one go fully to plan.

The alternative isn't more heroics — it's sequencing. Move one slice at a time, run old and new in parallel, and keep a rollback within reach until the very end.

The shape of a safe migration

  1. Inventory honestly. List every service, its dependencies, its data, and its owner. The migration order falls out of the dependency graph — leaves first, shared infrastructure last.
  2. Strangle, don't rewrite. Put a routing layer in front and move traffic per-slice, so the old system keeps serving everything you haven't migrated yet.
  3. Run in parallel. New service live in the cluster, old one still warm. Mirror traffic to the new one and compare before it serves a single real user.
  4. Shift traffic gradually. 1% → 10% → 50% → 100%, with automated rollback if error rate or latency crosses a threshold.
# Canary: 10% to the new cluster, rollback gate on error rate.
http:
  - route:
      - destination: { host: orders, subset: legacy }
        weight: 90
      - destination: { host: orders, subset: k8s }
        weight: 10
  1. Make rollback boring. Every step has a tested way back. A rollback you've rehearsed is a non-event; one you're improvising at 2am is an incident.

The part that's actually project management

The technology above is well-trodden. What sinks migrations is everything around it:

  • Sequencing so no team is blocked waiting on another's slice.
  • A definition of done per slice — migrated and observable and the old path retired, not just "traffic moved."
  • Communication so stakeholders see steady, visible progress instead of a quiet six months ending in a scary weekend.

Treat the migration as a series of small, reversible deliveries with clear owners, and the "big cutover" simply never has to happen. The last slice moving to 100% should be the least dramatic moment of the whole project.