How we migrated 40+ workloads from a colocation data center to AWS over 14 weeks — with continuous CDC replication, parallel-run validation, and a fully-automated cutover playbook that delivered zero customer-visible downtime.
The client operated a real-time payments platform processing ~2.4M transactions per day from a colocation data center in the US East region. Their lease was expiring in 18 weeks, with a hardware refresh quote of $4.1M. Cloud migration was on the table — but every previous attempt had stalled at the data layer.
Three constraints made this hard:
The previous vendor's plan called for a 4-hour maintenance window. The client rejected it. They needed a true zero-downtime cutover — and a partner who would own delivery risk end-to-end.
We structured the engagement around a 6R-style discovery, followed by parallel-run replication and a wave-based cutover that let the client validate every workload before committing.
Dependency mapping across 40+ workloads, 6R decisioning (rehost / replatform / refactor), and target VPC + landing zone design.
Weeks 1–3AWS Control Tower landing zone, Terraform module catalogue, VPN/Direct Connect to on-prem, and CI/CD bootstrap.
Weeks 3–6Debezium-based CDC streaming on-prem PostgreSQL → AWS RDS. Workloads rehosted to EC2/EKS, served read-only traffic via mirror.
Weeks 6–114 cutover waves over 2 weekends, weighted DNS shift, on-prem decommission. Post-cutover cost optimization and right-sizing.
Weeks 11–14During Phase 3, the on-prem and AWS environments ran side-by-side for 5 weeks. Production write traffic stayed on-prem; AWS served shadow reads through CDC-replicated tables. This let us validate every workload under real load before any user traffic shifted.
┌─────────────── ON-PREM (LEGACY) ────────────────┐ │ │ Merchants ─┼─► F5 LB ─► App Tier (VMs) ─► PostgreSQL │ │ │ │ └───────────────────────────────────────┼──────────┘ │ CDC (Debezium) ▼ ┌──────────────── AWS (TARGET) ────────────────────┐ │ │ │ MSK (Kafka) ──► RDS PostgreSQL (replica) │ │ │ │ (shadow) ─┼──► ALB ──► EKS ───┘ │ │ │ │ │ Prometheus + Grafana + Loki (O11y) │ └──────────────────────────────────────────────────┘
| Metric | Before | After | Δ |
|---|---|---|---|
| Cutover downtime | 4 hr planned | 0 minutes | −100% |
| Monthly infra spend | $340K | $231K | −32% |
| Deploy frequency | 1× / week | 12× / day | +60× |
| p99 transaction latency | 340 ms | 180 ms | −47% |
| Time to provision new env | 3 weeks | 18 minutes | −99% |
| PCI-DSS audit findings | 11 open | 0 | Pass |
Zero-downtime migrations are our default mode. Tell us about your workload and we'll come back within 24 hours with a phased plan.
Start a conversation