The Team That Had More AI Agents Than Engineers

ai architecture consulting debugging developer-experience

A client's platform had 23 AI agents built by a team of 8. Nobody could tell me what half of them did. Agent sprawl is the new microservices sprawl, and the cleanup looks depressingly similar.

I got called into a Series B fintech company in May to help with what the VP of Engineering described as "AI infrastructure growing pains." That's a polite way to put it. What I found was 23 distinct AI agents — document parsers, customer support bots, internal search tools, code generators, compliance checkers, data enrichment pipelines — built and maintained by a team of eight engineers.

Nobody could give me a complete list. I had to find them by grepping for API keys.

How they got here

The story was familiar. Eighteen months ago, the company had zero AI features. Then the CEO came back from a conference and declared they needed to be "AI-first." Two senior engineers built a document summarizer that saved the operations team about four hours a day. It worked. Management got excited. More agents followed.

Each one started as a quick proof of concept. A weekend project that got deployed on Monday. A Slack bot that someone's intern built. A RAG pipeline over the knowledge base that one engineer threw together because he was tired of answering the same questions from sales.

The problem wasn't that any individual agent was bad. Some were genuinely useful. The problem was that nobody ever stopped to ask: do we need all of these? And who owns them after the person who built them moves on to the next thing?

By the time I arrived, three of the original eight engineers had left. Their agents kept running.

The audit

I spent the first week just cataloging what existed. Here's what I found:

23 agents running across 4 different LLM providers (OpenAI, Anthropic, Mistral, and a self-hosted Llama instance someone had spun up on a GPU box under their desk)
7 agents that called at least one other agent as part of their workflow
3 agent-to-agent chains where no human was in the loop at all
Monthly LLM API spend of $34,000, up from $6,000 six months prior
Zero centralized logging of prompts, responses, or token usage
Two agents that did approximately the same thing — summarize customer support tickets — built by different engineers who didn't know the other existed

The GPU box under the desk was my favorite. It had been running for four months. The engineer who set it up had quit in March. Nobody knew the machine was there until I asked why there was a workstation in the server closet with a 3090 in it drawing 350 watts.

Warning

If you can't produce a list of every AI agent running in your organization — what it does, who owns it, what it costs, and what data it accesses — you have an agent sprawl problem. Most teams I talk to can't.

The microservices déjà vu

I've been doing this long enough to remember when the same thing happened with microservices. Around 2016, 2017, teams went from monoliths to 40 microservices overnight because someone read a blog post about how Netflix does it. Same pattern: each service made sense in isolation, nobody tracked the total system complexity, and within a year you had services calling services calling services with no one able to draw the dependency graph on a whiteboard.

Agent sprawl follows the same trajectory, but it's worse in two ways.

First, microservices at least had well-defined interfaces. An HTTP endpoint either returns 200 or it doesn't. An AI agent's behavior is probabilistic. The same input can produce different outputs. Chain three of those together and you've got a system that's functionally non-deterministic. Good luck writing tests for that.

Second, agents have ongoing costs that scale with usage in ways that are hard to predict. A microservice sitting idle costs you a container. An agent that someone wired into a Slack channel and forgot about costs you tokens every time someone posts a message — and you won't notice until the invoice arrives.

What we cut

After the audit, I sat down with the engineering lead and we made three lists: keep, merge, and kill.

We killed 9 agents outright. Five of them hadn't been invoked in over 30 days. Two were POCs that never should have been deployed. The other two were the duplicated ticket summarizers — we kept the better one.

We merged 4 pairs of agents that had overlapping functionality into 2 consolidated services with proper API contracts.

That left 10 agents. Still a lot for eight engineers, but each one had a clear owner, a documented purpose, and instrumented token usage. We moved all of them onto a shared gateway that handled authentication, logging, rate limiting, and cost allocation by team.

The monthly API bill dropped to $19,000. Not because we optimized prompts or switched models — just because we stopped running agents nobody used.

The real lesson

The interesting thing is that nobody at the company thought they had a problem. They were proud of how quickly they'd adopted AI. The CEO loved saying they had "AI agents across every function." The engineering team felt productive — they were building things and shipping them fast.

But shipping fast without tracking what you've shipped is just accumulating hidden liabilities. We've known this about code for decades. It applies to agents too.

If I could give one piece of advice to teams building with AI right now, it would be this: treat every agent like a service. Give it an owner, a README, a cost budget, and a retirement date. Review the list quarterly. Kill the ones that aren't earning their keep.

The teams that will do well with AI agents aren't the ones that build the most. They're the ones that resist the urge to build one for every problem and instead ask whether the last three they built are actually working. That's a harder question than it sounds, partly because "working" for an AI agent is fuzzier than for a traditional service. But it's worth asking before you end up with a GPU under someone's desk and an API bill that nobody can explain.