The Deploy That Dropped Requests in Silence

Every deploy was losing a handful of HTTP requests, but nobody noticed until a payment callback disappeared. The fix wasn't in the deployment pipeline — it was in the application code that never learned how to shut down.


The client had a Kubernetes setup that looked solid on paper. Rolling deployments, health checks, readiness probes, the works. Deploys happened multiple times a day without drama. The monitoring dashboards stayed green. Everyone was happy.

Then a Stripe webhook went missing.

A customer had paid for a subscription upgrade, Stripe confirmed the charge, but the system never processed the webhook. No error in the logs. No failed request in Stripe's dashboard — it showed a 200 response. The money moved, the entitlement didn't.

It took us two days to connect this to deployments. And another day to realize it had been happening on every single deploy for months.

The gap nobody sees

Here's what was happening. Kubernetes sends a SIGTERM to your pod when it wants to shut it down. The application has a grace period — 30 seconds by default — to finish what it's doing and exit cleanly. After that, Kubernetes sends SIGKILL and the process is gone.

The Node.js application running in this pod did not handle SIGTERM at all. When the signal arrived, the process just... stopped. Mid-request. Whatever HTTP connection was being processed got severed. The client (in this case, Stripe's webhook delivery) received a connection reset, but Stripe had already gotten bytes back from the TCP handshake, so it logged the attempt as delivered.

Meanwhile, Kubernetes had already removed the pod from the Service endpoints a few milliseconds before sending SIGTERM. But "a few milliseconds" is doing a lot of heavy lifting there. The kube-proxy rules take time to propagate. The ingress controller takes time to update its upstream list. During that window, new requests still route to the dying pod.

Warning

The gap between endpoint removal and SIGTERM delivery is not deterministic. In a busy cluster, it can be hundreds of milliseconds — enough for dozens of requests to land on a pod that's about to die.

What graceful shutdown actually requires

Most tutorials show you something like this and call it done:

process.on('SIGTERM', () => {
  console.log('Shutting down...');
  process.exit(0);
});

This is worse than handling nothing. You're now explicitly killing the process while requests might still be in flight. What you actually need is a sequence:

function gracefulShutdown(server: http.Server) {
  let isShuttingDown = false;
 
  process.on('SIGTERM', () => {
    if (isShuttingDown) return;
    isShuttingDown = true;
 
    // Stop accepting new connections
    server.close(() => {
      // All existing connections have finished
      process.exit(0);
    });
 
    // Force exit after timeout if connections hang
    setTimeout(() => {
      console.error('Forcing exit — connections did not drain in time');
      process.exit(1);
    }, 25_000);
  });
}

server.close() stops the server from accepting new connections and waits for existing ones to finish. The 25-second timeout is a safety net — you want it shorter than Kubernetes' terminationGracePeriodSeconds (default 30) so the application exits on its own terms rather than getting killed.

But there's a subtlety that trips up almost every team I've worked with. HTTP keep-alive connections don't close when you call server.close(). The server stops accepting new connections, but existing keep-alive connections stay open, waiting for potential reuse. If your load balancer holds keep-alive connections (and most do), server.close() might never call its callback.

The fix is to track connections and destroy idle ones during shutdown:

const connections = new Set<net.Socket>();
 
server.on('connection', (conn) => {
  connections.add(conn);
  conn.on('close', () => connections.delete(conn));
});
 
process.on('SIGTERM', () => {
  server.close(() => process.exit(0));
 
  // Destroy idle keep-alive connections
  for (const conn of connections) {
    conn.end();
  }
});

The timing trick that actually fixed our webhook problem

Even with proper connection draining, there's still the propagation delay. Kubernetes removes the pod from endpoints and sends SIGTERM roughly at the same time, but the actual routing update across all nodes takes a beat. Requests can still arrive after your pod starts shutting down.

The pragmatic solution is deliberately ugly: sleep before you start the shutdown sequence.

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

Those five seconds give kube-proxy and the ingress controller time to propagate the endpoint removal. By the time your application starts its graceful shutdown, no new traffic should be arriving. I've seen teams try to solve this with elaborate readiness probe toggling, but the preStop sleep is simpler and more reliable.

For the client's setup, we combined the preStop hook with the connection-draining code. The full sequence became:

  1. Kubernetes removes pod from endpoints
  2. preStop sleeps for 5 seconds (traffic drains from routing layer)
  3. SIGTERM arrives, application stops accepting connections
  4. In-flight requests complete (up to 24 seconds)
  5. Application exits cleanly

How many requests were we losing?

We added a counter. During the shutdown window, we logged every request that was in flight when SIGTERM arrived. On average, each deploy killed 3-7 requests. With 8-12 deploys per day across the fleet, that was 30-80 dropped requests daily.

Most were idempotent API calls that clients simply retried. Nobody noticed. But webhooks don't retry the same way — Stripe retries on failure, sure, but if the response looks like a success (partial TCP response), it moves on. We estimated about 15 webhook deliveries per week were getting silently eaten. The subscription upgrade was just the first one someone actually investigated.

The uncomfortable part

This is a solved problem. Every framework has docs on graceful shutdown. Kubernetes has had preStop hooks since 1.0. The pattern of "sleep, then drain" is written up in a hundred blog posts.

But I keep finding it missing. I've seen it in Java services, Python workers, Go binaries — not just Node. The application starts as a prototype, gets containerized, gets deployed to Kubernetes, and nobody thinks about shutdown behavior because SIGTERM handling isn't a feature anyone puts in a sprint. It's infrastructure plumbing, and it's invisible until something like a payment webhook disappears.

The honest question I don't have a great answer for: how do you make sure these operational concerns get addressed in a codebase that's moving fast? Code review catches logic bugs, tests catch regressions, but "does this service shut down properly" lives in a gap between application development and platform engineering that nobody fully owns.