The Database Migration That Ate Our Sprint

database postgres mongodb migration architecture consulting

A consulting war story about a PostgreSQL-to-MongoDB migration that went sideways, what we missed in planning, and the uncomfortable lessons about knowing when not to migrate.

The pitch

The client wanted to move from PostgreSQL to MongoDB. Their reasoning sounded solid on paper: the product had evolved into something with deeply nested, polymorphic data structures — think configurable forms where every customer had different fields, validation rules, and relationships between entities. They were storing JSON blobs in jsonb columns and felt like they were fighting the relational model at every turn.

I joined the project two weeks into the migration planning phase. The team had already picked MongoDB, set up a staging cluster, and started writing migration scripts. The energy was high. Everyone was tired of writing awkward queries against jsonb columns with GIN indexes that sometimes helped and sometimes didn't.

What nobody had done was measure anything.

The numbers we should have gathered first

The application served about 12,000 requests per minute during peak hours. The PostgreSQL instance was an RDS db.r6g.xlarge, running at around 40% CPU utilization on a normal day. Average query time across the board was 23ms. The slowest queries — the ones driving the migration conversation — topped out at about 800ms, and there were maybe fifteen of them.

Fifteen queries. Out of roughly two hundred distinct query patterns in the application.

I asked if anyone had run EXPLAIN ANALYZE on those fifteen. Two people had looked at some of them. Nobody had looked at all of them systematically.

Note

Before committing to a database migration, profile your actual query patterns. Not the ones that annoy you — all of them. You need a baseline, or you have no way to measure whether the migration improved anything.

Week one: the honeymoon

The initial data migration script was surprisingly clean. The team had written a Node.js pipeline that read from PostgreSQL, transformed rows into documents, and bulk-inserted into MongoDB. For the core entities — users, organizations, basic config — it worked well. Documents looked natural. Queries against them were fast.

The polymorphic form data was where things got interesting. In PostgreSQL, we had a form_submissions table with a jsonb column for the actual field values, plus foreign keys to form_definitions, customers, and projects. The team decided to denormalize everything into a single MongoDB document per submission:

{
  "submissionId": "sub_8f3a2c",
  "customer": {
    "id": "cust_991",
    "name": "Acme Corp",
    "tier": "enterprise"
  },
  "formDefinition": {
    "version": 3,
    "fields": [...]
  },
  "values": { ... },
  "project": {
    "id": "proj_42",
    "name": "Q1 Audit"
  }
}

This felt right. Everything you need in one place. No joins.

Week two: the cracks

The first problem was updates. When a customer changed their name or tier, we now had to update every document that embedded that customer. In PostgreSQL, you update one row. In our new MongoDB schema, a single customer name change could touch 40,000 documents.

We wrote a background job to handle fan-out updates. It worked, but it introduced eventual consistency into a system that had been immediately consistent. The frontend team started filing bugs: "Customer name shows old value on the submissions page but new value on the customer page." We added cache invalidation. Then we added retry logic for the fan-out job. Then we added a dead-letter queue for failed updates.

The second problem was reporting. The existing reporting module ran SQL queries with GROUP BY, HAVING, window functions — the full toolkit. Translating those into MongoDB aggregation pipelines was possible but painful. Some queries that were five lines of SQL became forty-line aggregation pipelines that nobody on the team could read without squinting.

-- PostgreSQL: straightforward
SELECT p.name, COUNT(*) as submissions,
       AVG(EXTRACT(EPOCH FROM (fs.completed_at - fs.created_at))) as avg_completion_secs
FROM form_submissions fs
JOIN projects p ON fs.project_id = p.id
WHERE fs.status = 'completed'
  AND fs.created_at > NOW() - INTERVAL '30 days'
GROUP BY p.name
HAVING COUNT(*) > 10
ORDER BY avg_completion_secs DESC;

The MongoDB equivalent worked, but it was the kind of code where if someone asked "what does this do?" in a code review, you'd need five minutes to explain it.

The turning point

Three weeks in, we ran benchmarks on the migrated staging environment. The fifteen slow queries? Seven of them were faster in MongoDB. Five were about the same. Three were slower — significantly slower, because they involved cross-document lookups that had been simple joins before.

Overall average query time went from 23ms to 31ms. We had made the system slower on average while fixing a handful of edge cases.

I sat down with the tech lead and we went back to those original fifteen PostgreSQL queries. Spent a day with EXPLAIN ANALYZE and the PostgreSQL docs. Found that:

Four queries were missing indexes entirely. Adding composite indexes dropped them from 600-800ms to under 50ms.
Three queries were doing sequential scans on the jsonb column because the GIN index wasn't covering the specific access pattern. A targeted jsonb_path_ops index fixed those.
Five queries were fine once we added WHERE clause conditions that the application should have been sending all along (the ORM was generating overly broad queries).
Three queries were genuinely awkward in a relational model but were only used by an internal admin tool with maybe twenty users.

Total time to fix: about four days. Compared to the three weeks we'd already spent on the migration, with at least three more weeks of work ahead.

The conversation nobody wanted to have

We recommended stopping the migration. Keeping PostgreSQL. Applying the index and query fixes. For the admin tool's three awkward queries, we suggested a small read-replica with materialized views refreshed every few minutes — good enough for internal use.

The client's CTO was not thrilled. They'd already told their board about the MongoDB migration. There were slides. There was a timeline. "Modernizing the data layer" was a bullet point in the quarterly objectives.

But the numbers were clear, and to their credit, they listened. We rolled back the migration work, applied the PostgreSQL optimizations, and the team got their three weeks back.

What I took away from this

The urge to migrate databases usually comes from pain with specific queries, not a fundamental mismatch between your data and your database. Those are different problems with different solutions.

I'm not anti-MongoDB. For the right workload — high write throughput on loosely structured data, horizontal scaling requirements, document-oriented access patterns — it's a strong choice. But "we have some jsonb columns and some slow queries" is not that workload.

The uncomfortable truth about most database migrations is that they're driven by frustration, not measurement. You hit a wall with your current database, and instead of investigating the wall, you assume the whole building is wrong.

If I could go back and give the team one piece of advice on day one, it would be this: before you write a single migration script, spend one day — just one day — trying to fix the problems you have with the database you already run. If that doesn't work, migrate with confidence. If it does work, you just saved yourself months.

Has anyone else been through a migration like this — one that turned out to be unnecessary? I'd be curious how far you got before pulling the plug.