NodeOps360 — Zero-downtime AWS Migration

Zero-downtime on-prem → AWS migration for a fintech platform

How we migrated 40+ workloads from a colocation data center to AWS over 14 weeks — with continuous CDC replication, parallel-run validation, and a fully-automated cutover playbook that delivered zero customer-visible downtime.

Industry

FinTech · Payments

Engagement

Fixed-Bid

Duration

14 weeks

Team Size

6 engineers

Practices

Migration · IaC · O11y

40+

Workloads Migrated

Hours Downtime

32%

Infra Cost Reduction

14w

End-to-End Delivery

A colocation lease running out — and zero room for downtime

The client operated a real-time payments platform processing ~2.4M transactions per day from a colocation data center in the US East region. Their lease was expiring in 18 weeks, with a hardware refresh quote of $4.1M. Cloud migration was on the table — but every previous attempt had stalled at the data layer.

Three constraints made this hard:

Regulatory: A PCI-DSS audit was scheduled mid-migration; we couldn't break compliance evidence chains.
Stateful core: 11 TB of OLTP data across PostgreSQL and SQL Server, with sub-second replication SLAs.
Cutover blast radius: Any downtime longer than 90 seconds would breach merchant SLAs and trigger financial penalties.

The previous vendor's plan called for a 4-hour maintenance window. The client rejected it. They needed a true zero-downtime cutover — and a partner who would own delivery risk end-to-end.

Four phases. One cutover.

We structured the engagement around a 6R-style discovery, followed by parallel-run replication and a wave-based cutover that let the client validate every workload before committing.

Phase 01

Discover & Design

Dependency mapping across 40+ workloads, 6R decisioning (rehost / replatform / refactor), and target VPC + landing zone design.

Weeks 1–3

Phase 02

Foundation

AWS Control Tower landing zone, Terraform module catalogue, VPN/Direct Connect to on-prem, and CI/CD bootstrap.

Weeks 3–6

Phase 03

Replicate & Parallel-Run

Debezium-based CDC streaming on-prem PostgreSQL → AWS RDS. Workloads rehosted to EC2/EKS, served read-only traffic via mirror.

Weeks 6–11

Phase 04

Cutover & Optimize

4 cutover waves over 2 weekends, weighted DNS shift, on-prem decommission. Post-cutover cost optimization and right-sizing.

Weeks 11–14

Parallel-run topology

During Phase 3, the on-prem and AWS environments ran side-by-side for 5 weeks. Production write traffic stayed on-prem; AWS served shadow reads through CDC-replicated tables. This let us validate every workload under real load before any user traffic shifted.

            ┌─────────────── ON-PREM (LEGACY) ────────────────┐
            │                                                 │
  Merchants ─┼─►  F5 LB  ─►  App Tier (VMs)  ─►  PostgreSQL   │
            │                                       │          │
            └───────────────────────────────────────┼──────────┘
                                                    │ CDC (Debezium)
                                                    ▼
            ┌──────────────── AWS (TARGET) ────────────────────┐
            │                                                  │
            │   MSK (Kafka) ──► RDS PostgreSQL (replica)    │
            │                       │                          │
  (shadow) ─┼──► ALB ──► EKS  ───┘                          │
            │            │                                     │
            │      Prometheus + Grafana + Loki (O11y)          │
            └──────────────────────────────────────────────────┘

What changed for the business

Metric	Before	After	Δ
Cutover downtime	4 hr planned	0 minutes	−100%
Monthly infra spend	$340K	$231K	−32%
Deploy frequency	1× / week	12× / day	+60×
p99 transaction latency	340 ms	180 ms	−47%
Time to provision new env	3 weeks	18 minutes	−99%
PCI-DSS audit findings	11 open	0	Pass

Metric

Before

After

Cutover downtime

4 hr planned

0 minutes

−100%

Monthly infra spend

$340K

$231K

−32%

Deploy frequency

1× / week

12× / day

+60×

p99 transaction latency

340 ms

180 ms

−47%

Time to provision new env

3 weeks

18 minutes

−99%

PCI-DSS audit findings

11 open

Pass

"NodeOps360 didn't sell us a migration plan and walk away. They built it, ran it side-by-side with our prod for five weeks, and stayed in the war room through every cutover wave. The fact that our customers never noticed is the highest compliment I can give."

VP Engineering · Fintech Payments Platform

Zero-downtime on-prem → AWS migration for a fintech platform

A colocation lease running out — and zero room for downtime

Four phases. One cutover.