We Migrated a 2.4M-Row Legacy MySQL CRM to PostgreSQL With Zero Downtime — Here's the Playbook | Softechinfra Blog

An Ahmedabad textile exporter ran a 9-year-old PHP + MySQL 5.7 CRM. 50 daily users, 2.4 million order rows, 18 GB on disk, deployed on a single Hetzner box. The CTO had two reasons to move: MySQL 5.7 went end-of-life in October 2023, and the finance team wanted JSONB columns for nested order metadata. We moved it to PostgreSQL 16 in 28 days with an 11-second cutover at 2:14 am on a Sunday. This is the runbook.

2.4M

Order Rows Migrated

18 GB

On-Disk MySQL Size

28 days

Plan to Cutover

11 sec

Cutover Window

## The Answer in 60 Words We dual-wrote new rows to both MySQL and PostgreSQL through the application layer for 9 days. We backfilled the historic 2.4M rows in a 14-hour offline-friendly job. Debezium streamed any drift. We cut over by flipping a feature flag at 2:14 am on a Sunday — 11 seconds total. Cost: ₹2.8 lakh in engineering + ₹14,400 in extra infra. ## Why This Matters Now MySQL 5.7 end-of-life means [no security patches](https://www.mysql.com/support/eol-notice.html). Cyber-insurance reviewers in 2026 are specifically flagging deprecated database versions. We have seen three Indian SMBs in the last quarter get insurance loaded with a 28% premium hike for unsupported MySQL. PostgreSQL also gives you JSONB, generated columns, partial indexes, parallel queries, and the partitioning improvements [introduced in v15 and v16](https://www.postgresql.org/docs/release/). The case for moving is no longer "Postgres has better features" — it is "the cost of staying is now visible on your insurance bill." ## The Client (Specific Details) - Sector: Textile exporter (cotton fabric to US + EU buyers) - Location: Ahmedabad, Gujarat - Size: 50 daily users, ₹140 cr revenue, 200 staff - Stack on day 0: Laravel 6 + MySQL 5.7 on a Hetzner CCX23 (4 vCPU, 16 GB RAM) - Tables that matter: orders (2.4M), order_items (8.1M), customers (38k), shipments (1.2M) - The trigger: Insurance renewal flagged MySQL 5.7. CFO gave the CTO ₹3 lakh and 30 days. ## The Architecture During Migration

🔁

Dual-Write Layer

App writes go to MySQL first, then PostgreSQL in the same DB transaction-like wrapper. Failure to write to PG logs to dead-letter queue but does NOT fail the request.

📡

Debezium CDC

Runs alongside dual-write. Catches any write that bypassed the app (cron jobs, manual SQL, replication lag). Streams to PostgreSQL via Kafka Connect sink.

📦

Backfill Worker

Node.js job copying 2.4M rows in batches of 10,000. Uses MySQL's binlog position as a watermark so backfilled rows are not re-applied by Debezium.

🚦

Read Shadow Mode

Every production read for 7 days runs on MySQL AND PostgreSQL. Diffs logged to a "drift" table. Zero diffs for 3 consecutive days = green to cut over.

## The Schema Translation (Where Time Got Spent) | MySQL Type | PostgreSQL Type | Gotcha | |---|---|---| | AUTO_INCREMENT | GENERATED ALWAYS AS IDENTITY | Reset the sequence to MAX(id)+1 after backfill. We forgot once. Two duplicate-key errors at 03:17 am. | | DATETIME | TIMESTAMPTZ | MySQL stores naive datetimes. We assumed "Asia/Kolkata" and explicit-converted. Audit the assumption — some clients had UTC data leaking through historical imports. | | TINYINT(1) | BOOLEAN | Trivial, but 0 vs FALSE matters in WHERE clauses. We grep'd the codebase for = 0 and = 1 on these columns. | | utf8mb3 collation | UTF-8 | A few rows with 4-byte emoji in customer notes corrupted on import. pg_loader handled this for new utf8mb4 tables. Old utf8mb3 tables needed an explicit CONVERT(col USING utf8mb4) pre-export. | | ENUM | CHECK constraint or domain | Postgres has ENUM but altering them is annoying. We used CHECK constraints. | | JSON | JSONB | JSONB is binary, indexable, and faster for filtering. Worth the conversion — even though it cost us a 40-minute index rebuild. | ## The 4 Gotchas Nobody Warns You About 1. Auto-increment after backfill. When you copy rows into Postgres, the sequence does not know about them. The next INSERT from the app will try to use id = 1 and fail. Fix: SELECT setval('orders_id_seq', (SELECT MAX(id) FROM orders)) after every backfill batch. Put it in the runbook in red. 2. Time zones. MySQL's DATETIME has no zone. Your historical rows were inserted in IST, your migrated server's clock is UTC, your application reads back in IST again. Until one report is wrong by 5.5 hours and the CFO asks why. Fix: explicit AT TIME ZONE 'Asia/Kolkata' on every migration insert. Plus a regression test that compares 100 randomly sampled rows by primary key for date-field equality. 3. Charset. utf8mb3 (which MySQL calls utf8) is a 3-byte encoding. Emojis, some Devanagari ligatures, and a chunk of CJK fail. utf8mb4 is the 4-byte one. pgloader will warn you about this if you let it; if you wrote your own script, you will discover it on a Wednesday afternoon when a customer name with a smiley face fails. Fix: dump as utf8mb4, import as UTF-8 (Postgres only has the 4-byte version). 4. Case sensitivity. MySQL on Linux is usually case-insensitive on table names (depends on lower_case_table_names). PostgreSQL is case-sensitive and folds unquoted identifiers to lowercase. We had Orders and orders queries in the codebase. Grep, fix, regression test.

The wall of fame in our office has a sticker: "It is always the time zones." Yes. Always.

## The 28-Day Plan (Copy This)

Days 1–3: Discovery + schema audit

Map every table, every index, every foreign key. List custom MySQL features (TINYINT, ENUM, geo). Enumerate every place the app touches the DB — including cron, ETL, manual SQL the CTO runs at month-end.

Days 4–6: Provision Postgres + run pgloader on staging

Spin up PostgreSQL 16 on the same Hetzner box (different port). Run [pgloader](https://pgloader.io/) on the staging snapshot. Fix all type errors and charset warnings before touching production.

Days 7–10: Build the dual-write wrapper

A repository pattern adapter in Laravel that fans out writes to both databases. Reads stay on MySQL. Failures to Postgres write to a DLQ for replay. Behind a feature flag — flip per table.

Days 11–14: Backfill historic data

14-hour batch run on a Saturday night. Capture the binlog position before starting. Copy in chunks of 10k by primary key. Verify row counts after each chunk.

Days 15–17: Wire up Debezium

Start Debezium from the captured binlog position. It catches any drift between the dual-write and the actual MySQL state. Streams to a small Kafka cluster, then to a Postgres sink connector.

Days 18–24: Read shadow mode + drift checks

Every read query also fires against Postgres. Diffs written to a 'drift' table. Day 18: 4,000 diffs. Day 19: 1,200. Day 22: 18. Day 24: 0. We needed 3 zero-diff days before approving cutover.

Day 25: Cutover rehearsal on staging

Full timed rehearsal. Record every command and its expected output. We rehearsed three times — first run took 28 seconds, third run took 9. Production was 11.

Days 26–27: Final shadow, comms, backups

CFO and ops manager briefed. SMS scheduled for staff at 8 am Sunday explaining the change. Full backup of MySQL + Postgres taken Saturday 11 pm.

Day 28: Cutover at 02:14 am Sunday

11-second flag flip. Reads moved to Postgres, writes stayed dual-write for another 48 hours as a rollback hedge. No incidents. Decommissioned MySQL on day 31.

## The Runbook (Copy-Paste) This is the cutover script. We ran it pasted into a single tmux pane.

code

# T-15m: announce in #ops Slack
  # T-10m: take final mysql + postgres dumps to S3
  $ ./scripts/dump-both.sh > /var/log/migration/dumps-pre-cutover.log
  
  # T-5m: confirm Debezium lag is < 1 second
  $ kubectl exec -it debezium-0 -- /scripts/check-lag.sh
  
  # T-0: pause cron jobs that write to DB
  $ ansible-playbook ./ops/pause-crons.yml
  
  # T+2s: drain in-flight requests (max-wait 10s)
  $ curl -X POST internal-api/drain --max-wait 10
  
  # T+5s: flip feature flag DB_READ_TARGET to 'postgres'
  $ ./scripts/flag.sh set DB_READ_TARGET postgres
  $ ./scripts/flag.sh verify DB_READ_TARGET postgres
  
  # T+8s: smoke-test 4 critical paths
  $ ./scripts/smoke.sh login,orders-list,invoice,shipment
  
  # T+10s: resume crons
  $ ansible-playbook ./ops/resume-crons.yml
  
  # T+11s: announce DONE in #ops, start the 60-min watch

We left dual-write on for 48 hours after cutover. The DLQ stayed empty. We flipped writes to Postgres-only on Tuesday night, paused MySQL on Friday, decommissioned it the following Monday. ## The Cost (Real Numbers) Compare with the alternative: insurance hit + emergency consultancy when the next CVE drops on MySQL 5.7. The CFO did the math in 7 minutes. ## The Pre-Cutover Checklist (We Refuse to Run Without This)

Final pgloader staging run with zero schema warnings
Dual-write enabled for ≥ 7 days with DLQ empty for the last 72 hours
Debezium lag < 1 second for the last 4 hours
Read shadow mode showing 0 diffs for ≥ 3 consecutive days
All sequences updated via setval after every backfill batch
Time zone regression test passes (100 randomly sampled rows)
Charset audit: no utf8mb3 columns left un-converted
Cutover script rehearsed 3+ times on staging, time variance < 30%
Full backups of source AND target taken in the last 4 hours
Rollback procedure documented and tested in a staging dry-run

## Common Mistakes (Each One Hurts) Symptom: "Cutover dragged on for hours." Cause: no rehearsal on staging. Fix: time the rehearsal three times. Anything that varies by more than 30% between runs is a script bug. Symptom: "We forgot to update sequence after backfill." Cause: missing step. Fix: put setval in the backfill script itself, not the runbook. Idempotent and safe. Symptom: "Postgres slower than MySQL after cutover." Cause: missing indexes that MySQL had implicitly via auto-index on FKs. Fix: EXPLAIN ANALYZE your top 20 production queries before cutover. Add indexes proactively. Symptom: "App throws PG-specific errors that MySQL silently accepted." Cause: PG is stricter on NULL, on types, on transaction boundaries. Fix: run the test suite against PG before cutover. Fix every flaky test. Do not skip "looks like a flake." Symptom: "Reports off by 5.5 hours." See gotcha #2. Always. Symptom: "Some queries return different sort orders on PG than MySQL." Cause: MySQL has implicit ordering by primary key on SELECT * queries that lack an ORDER BY. PostgreSQL does not — order is undefined without an explicit clause. Fix: add explicit ORDER BY to every query the application relies on for ordering. We caught 18 of these in the read-shadow phase. Symptom: "Triggers behave differently." Cause: MySQL's trigger semantics around AFTER UPDATE differ subtly from PostgreSQL's. We had a trigger that fired N times in MySQL when N rows were updated; PostgreSQL fired it once with a row-set context. Fix: rewrite triggers as PG functions. Better still, move the logic to the application layer where it is testable. Symptom: "The DLQ keeps filling with duplicates." Cause: dual-write retries lacking idempotency. Fix: use the application's primary key as the idempotency key in the secondary write. INSERT ... ON CONFLICT DO UPDATE is your friend in PostgreSQL. Symptom: "Foreign key constraints fail during backfill." Cause: child rows imported before parents. Fix: defer constraint checks during the backfill (SET CONSTRAINTS ALL DEFERRED) inside the transaction; or import in dependency order with a topological sort of your schema. ## A Common Question We Get About Cost The build cost ₹2.94 lakh. That is roughly 14 person-days of senior engineering. Compared to the ₹0 cost of "do nothing," it sounds like a lot. Compared to (a) the insurance penalty, (b) the eventual emergency migration cost when the next CVE drops on MySQL 5.7, and (c) the productivity gain from the team's confidence in the database — it pays back in roughly 8 months. Most of our SMB clients have approved similar migrations on a 12-month payback frame, and the 8-month figure for this client was driven mostly by the insurance number. For Indian SMBs running MySQL 5.7 or 5.6, the question is rarely "should we move" — it is "by when, and at what risk." Our standard recommendation: have a plan in place before March 2026. The market has now had 28 months of warning since EOL. ## When Not to Do This Migration Skip the move if (a) you are already on MySQL 8.0 with a paid support contract and your codebase has no need of Postgres-specific features, (b) your DB is under 500 MB and you can take a 4-hour maintenance window — dump and restore is simpler than CDC, or (c) you have a single developer and no rehearsal time. CDC migration looks fancy. Dump-and-restore in a 2-hour window is dramatically simpler if you can afford the downtime. ## The Dual-Write Adapter (Code Sketch) The dual-write adapter is the most-asked-for code in this post. Here is the Laravel repository pattern we shipped:

php

class OrderRepository {
    public function __construct(
      private MySQLOrderStore $primary,
      private PostgresOrderStore $secondary,
      private DLQ $dlq,
      private FeatureFlags $flags,
    ) {}
  
    public function save(Order $order): void {
      // Always write to primary
      $this->primary->save($order);
  
      // Conditionally write to secondary
      if ($this->flags->isOn('dual_write_orders')) {
        try {
          $this->secondary->save($order);
        } catch (Throwable $e) {
          // Secondary failure must NOT fail the request
          $this->dlq->push('orders.save', $order, $e);
          Log::warning('dual-write secondary failed', [
            'order_id' => $order->id,
            'error' => $e->getMessage(),
          ]);
        }
      }
    }
  
    public function find(int $id): ?Order {
      $target = $this->flags->get('orders_read_target', 'mysql');
      return $target === 'postgres'
        ? $this->secondary->find($id)
        : $this->primary->find($id);
    }
  }

Three things to notice. The secondary write is wrapped in try/catch — a Postgres outage during dual-write does not block production. The DLQ is a simple Redis list with a worker that retries with exponential backoff. The read target is a per-environment, per-table feature flag — we cut over read traffic table-by-table, not all-at-once. ## Outcome Zero production incidents from the migration. No data loss. The CFO renewed cyber-insurance at the original premium. A 6-month follow-up: the application gained one Postgres-specific feature — JSONB columns for order metadata — which removed 8 weeks of "we'd need to add a new table" backlog work. We crosschecked our playbook against [r/PostgreSQL discussions](https://www.reddit.com/r/PostgreSQL/) where Indian engineers shared similar migrations — the dual-write + CDC pattern is now the dominant approach, with [HN debating the merits](https://news.ycombinator.com/item?id=15026887) in nearly every migration thread. ## A Detail That Saved Us On Day 12 On day 12 of the dual-write phase, the DLQ filled up with 4,200 entries in a single morning. Our first instinct was a Postgres bug. The actual cause: a developer had run a manual UPDATE orders SET status = 'archived' WHERE created_at < '2018-01-01' directly against MySQL, bypassing the application layer entirely. Debezium caught it — that is exactly why we ran Debezium alongside the app-layer dual-write. Without Debezium, those 4,200 rows would have been silently out of sync until cutover. The lesson: belt and braces is the only acceptable level of paranoia for production database migrations. ## FAQ ### Is zero downtime really possible for a 2M-row DB? Yes, with the dual-write + CDC pattern. The "zero" is more like "11 seconds of in-flight request drain." For most CRMs at midnight, that is functionally zero. Below 100k rows, you can dump-and-restore in a 2-hour window for less complexity. ### Can pgloader do this on its own? [Pgloader](https://pgloader.io/) is excellent for the schema + bulk-data copy. It does NOT solve the "keep both DBs in sync while we cut over" problem. You need either dual-write at the app layer or CDC (Debezium / AWS DMS) on top of pgloader. ### What if my MySQL is on RDS? Easier. AWS DMS handles the CDC. You still need the dual-write layer or DMS continuous replication. The cutover script is the same. ### How much did Debezium cost to run? We ran it on Kafka + Kafka Connect on a single t4g.medium EC2 (₹3,200/month for the whole migration window). Decommissioned after cutover. ### Why not just rewrite the app for Postgres? We tried that on a different client. It cost 4x and shipped 6 months late. Adapters preserving the app's MySQL contract during migration is the right level of conservatism. ### What if the cutover fails? The feature flag flips back in 11 seconds. We tested this in rehearsal. Dual-write stayed on for 48 hours specifically so that a Tuesday-night rollback was still possible without data loss. ### Are there Indian SaaS tools that do this end-to-end? [Hevo Data](https://hevodata.com/) and [Datalligence](https://www.datalligence.ai/) handle similar pipelines. We did not use them because the client wanted us to own the engineering — the team will run more migrations in 2027. ### What was the team composition for this migration? Two backend engineers (one senior owning the dual-write adapter and CDC, one mid-level owning the schema translation and backfill), one DevOps engineer at 0.4 FTE for the Kafka/Debezium setup, and the client's CTO at 0.2 FTE for sign-offs and decisions. Total person-weeks: 7. We do not staff a dedicated DBA; PostgreSQL after v14 is genuinely operationally simple at this scale. ### How did you handle the schema differences for the application code? We did not touch the ORM (Eloquent in this case). The application's queries kept their MySQL idioms during dual-write. After cutover, we did a 2-week cleanup sprint to migrate any UNIX_TIMESTAMP() calls to EXTRACT(EPOCH FROM ...), replace GROUP_CONCAT with STRING_AGG, and convert IFNULL to COALESCE. None of these were urgent — Eloquent abstracted most of them — but we tracked them in a "PostgreSQL idiom adoption" backlog. ### Did you use AWS DMS instead of Debezium? We have used both. AWS DMS is genuinely simpler to operate if your source MySQL is on RDS already — the configuration is point-and-click. Debezium gives you more control for self-hosted MySQL and for custom transformations. For a Hetzner-hosted source database, Debezium was the right answer. For an RDS source, we would default to DMS. ## A Numbers Comparison That Mattered to the CFO The MySQL 5.7 → MySQL 8.0 alternative path (which we evaluated and rejected) would have cost ₹1.4 lakh in licence + ₹1.8 lakh in engineering — total ₹3.2 lakh. The PostgreSQL 16 path cost ₹2.94 lakh. Despite Postgres being open-source, the migration was the same complexity. The decision tipped on the future capability gain (JSONB, generated columns, partitioning) plus the avoided Oracle-licensing risk on MySQL Enterprise. The CFO's exact framing: "for ₹2 lakh more in years 2-5, I get a database I never have to pay a licence on again." We did not argue.

Have a Legacy DB You're Afraid to Touch?

We migrate MySQL → PostgreSQL, MongoDB → PostgreSQL, and SQL Server → PostgreSQL for Indian SMBs with under 50 GB of data. Fixed-price, 3–6 week engagements, zero-downtime on the cutover. First call is with the engineer who would lead your migration.

Book a Migration Call

Tags:

Database MigrationPostgreSQLMySQLZero DowntimeDebeziumCDCCase Study

Share this post:

Hrishikesh Baidya

CTO at Softechinfra specializing in Python, system architecture, and building secure, scalable software solutions.

Back to Blog

# T-15m: announce in #ops Slack # T-10m: take final mysql + postgres dumps to S3 $ ./scripts/dump-both.sh > /var/log/migration/dumps-pre-cutover.log # T-5m: confirm Debezium lag is < 1 second $ kubectl exec -it debezium-0 -- /scripts/check-lag.sh # T-0: pause cron jobs that write to DB $ ansible-playbook ./ops/pause-crons.yml # T+2s: drain in-flight requests (max-wait 10s) $ curl -X POST internal-api/drain --max-wait 10 # T+5s: flip feature flag DB_READ_TARGET to 'postgres' $ ./scripts/flag.sh set DB_READ_TARGET postgres $ ./scripts/flag.sh verify DB_READ_TARGET postgres # T+8s: smoke-test 4 critical paths $ ./scripts/smoke.sh login,orders-list,invoice,shipment # T+10s: resume crons $ ansible-playbook ./ops/resume-crons.yml # T+11s: announce DONE in #ops, start the 60-min watch

class OrderRepository { public function __construct( private MySQLOrderStore $primary, private PostgresOrderStore $secondary, private DLQ $dlq, private FeatureFlags $flags, ) {} public function save(Order $order): void { // Always write to primary $this->primary->save($order); // Conditionally write to secondary if ($this->flags->isOn('dual_write_orders')) { try { $this->secondary->save($order); } catch (Throwable $e) { // Secondary failure must NOT fail the request $this->dlq->push('orders.save', $order, $e); Log::warning('dual-write secondary failed', [ 'order_id' => $order->id, 'error' => $e->getMessage(), ]); } } } public function find(int $id): ?Order { $target = $this->flags->get('orders_read_target', 'mysql'); return $target === 'postgres' ? $this->secondary->find($id) : $this->primary->find($id); } }

We Migrated a 2.4M-Row Legacy MySQL CRM to PostgreSQL With Zero Downtime — Here's the Playbook

Have a Legacy DB You're Afraid to Touch?

Hrishikesh Baidya

Related Posts

UPI Collect Is Dead: We Migrated 4 Indian Apps to Intent + QR Flows — Here's the Playbook

Prompt Eval Pipelines: 200 Changes a Week Without Breaking TalkDrill

Scaling PenLeap: 60 to 600 Concurrent Writers, Same Number of Servers

Want More Insights?

We Migrated a 2.4M-Row Legacy MySQL CRM to PostgreSQL With Zero Downtime — Here's the Playbook

Have a Legacy DB You're Afraid to Touch?

Hrishikesh Baidya

Related Posts

UPI Collect Is Dead: We Migrated 4 Indian Apps to Intent + QR Flows — Here's the Playbook

Prompt Eval Pipelines: 200 Changes a Week Without Breaking TalkDrill

Scaling PenLeap: 60 to 600 Concurrent Writers, Same Number of Servers

Want More Insights?