A 20-person team was running 14 microservices with three full-time engineers just keeping the infrastructure alive. We consolidated six of them into a modular monolith and cut their deploy time by 70%.
A single slow database query triggered aggressive retries across four microservices. Within minutes, the entire order pipeline was down. Here's how we traced it and what we changed.
A consulting story about a minor field rename in an internal API that cascaded into a production incident, and what we put in place to stop it from happening again.