Every Exception Was Caught. No Error Was Handled.

code-quality consulting debugging error-handling

A client's codebase had try-catch blocks wrapped around everything. Nothing ever crashed. Nothing ever worked correctly either. The error handling strategy was actually an error hiding strategy.

A client brought me in to help with what they described as "intermittent data inconsistencies." Orders occasionally had wrong totals. User preferences sometimes reset after being saved. A background sync with their inventory system was drifting by a few percent each week, and nobody could figure out why.

The codebase was a mid-size TypeScript backend — maybe 80,000 lines across a dozen services. My first impression was positive. It was well-structured, had reasonable test coverage, and the error rate in their monitoring dashboard was basically zero.

That last part should have been the red flag.

The pattern

Nearly every function that called an external service, touched the database, or did anything remotely risky looked like this:

async function updateUserPreferences(userId: string, prefs: UserPreferences) {
  try {
    await db.users.update({ where: { id: userId }, data: { preferences: prefs } });
  } catch (error) {
    logger.error('Failed to update preferences', { error, userId });
  }
}

At first glance, it's fine. You catch the error, you log it, the application keeps running. But look at what's missing: there's no re-throw. There's no return value indicating failure. The caller has no idea that the operation failed. It just continues, assuming everything worked.

This pattern was everywhere. Over 200 try-catch blocks across the codebase, and about 140 of them swallowed the exception completely. The application couldn't crash because errors had nowhere to go. They were logged, absorbed, and forgotten.

What the callers assumed

The real damage wasn't in the catch blocks — it was in the code that came after. Functions downstream of these silent failures operated on assumptions that were no longer true.

async function processOrder(order: Order) {
  await reserveInventory(order.items);      // might silently fail
  await calculateTax(order);                // uses stale data if the above failed
  await chargePayment(order.total);         // charges the wrong amount
  await sendConfirmation(order);            // confirms an order that isn't right
}

Each step assumed the previous one succeeded. When reserveInventory swallowed a database timeout, the order continued processing with unreserved items. The customer got charged, the confirmation went out, and the inventory count was wrong by a few units. Multiply that by hundreds of orders a day, and you get that slow inventory drift they'd been chasing for months.

The fix was painful

The right fix was obvious: let errors propagate. Doing that across 140 call sites in a running production system isn't something you ship in one commit, though. Every catch block that swallowed an error was, in its own broken way, keeping the application upright. Remove the catch, and you'd better make sure the caller handles the failure — or you're trading silent data corruption for loud service outages.

We tagged every swallowing catch block with a // FIXME: error swallowed comment and triaged them into three buckets. About 30 were in critical paths — payments, inventory, user data mutations — and needed immediate attention. Another 50 were in background jobs where the right behavior was retry-with-backoff, not ignore-and-continue. The remaining 60 were in less critical paths where we could take our time.

For the critical ones, the pattern we settled on was straightforward:

async function updateUserPreferences(userId: string, prefs: UserPreferences) {
  try {
    await db.users.update({ where: { id: userId }, data: { preferences: prefs } });
  } catch (error) {
    logger.error('Failed to update preferences', { error, userId });
    throw new PreferenceUpdateError(userId, { cause: error });
  }
}

Re-throw a typed error. Let the caller decide what to do. If the caller is an API endpoint, return a 500. If it's a background job, retry. If it's part of a multi-step workflow, roll back. The point is that the decision happens at the level where there's enough context to make it, not three layers deep inside a utility function.

Note

A catch block that only logs is not error handling. It's error suppression with a paper trail.

What surfaced

In the first week after we started letting errors propagate through the critical paths, the team's error monitoring dashboard went from "basically zero" to about 400 errors per day. The number alarmed the engineering manager, but none of these were new errors. They'd been happening all along — the codebase had just been hiding them.

About 60% were transient database connection timeouts that resolved on retry. Another 25% were downstream API failures that needed circuit-breaking instead of silent swallowing. The remaining 15% were genuine bugs — null references, invalid state transitions, malformed data — that had been silently corrupting data for months.

The inventory drift stopped within a week. The preference resets stopped too. The order total issues took longer because we had to unwind some bad data, but the root cause was gone.

How it happens

Nobody sat down and decided to build an error-suppression system. It happened gradually. An early developer wrote one try-catch that swallowed an error because the feature needed to ship and proper error handling wasn't scoped into the sprint. Someone else saw that pattern, assumed it was intentional, and copied it. Code review didn't catch it because each individual catch block looked reasonable in isolation. The linter didn't flag it because swallowing exceptions is syntactically valid.

Over two years, the pattern spread through the codebase like a quiet infection. The monitoring showed a healthy system. The logs contained all the evidence, but nobody was correlating ERROR-level log lines with actual outcomes. Why would you? The dashboards were green.

I keep thinking about what kind of automated check would have caught this earlier. A lint rule for catch blocks that don't re-throw? That would flag legitimate cases too — sometimes you genuinely want to swallow an error for a non-critical side effect like analytics tracking. A heuristic that alerts when your error rate is suspiciously low? That feels backwards, but honestly, a system processing thousands of requests a day with a flat-zero error rate probably isn't handling errors. It's hiding them.