The ORM Was Running 847 Queries Per Page Load
A client's dashboard took 11 seconds to render. Everyone blamed the database. The real problem was an ORM doing exactly what we told it to — we just never looked at what that meant.
The ticket said "dashboard is slow." That's it. No reproduction steps, no timing data, no theory about root cause. Just a Jira ticket with a priority bump from a product manager who was tired of waiting.
I was three weeks into a consulting engagement with a mid-size fintech company. Their internal operations dashboard — the one their support team lived in eight hours a day — had gotten progressively slower over the past six months. Nobody could point to a single change that caused it. It just... decayed.
The Usual Suspects
First thing I did was open the Network tab. The main API call for the dashboard overview page took 11.2 seconds. Not the rendering, not the JavaScript bundle, not some third-party script. A single REST endpoint returning JSON.
The backend was a Python Django application with PostgreSQL. The team had already tried the obvious: they'd added database indexes on the columns they thought were slow, bumped the connection pool size, and upgraded the RDS instance from db.r5.large to db.r5.2xlarge. Cost went up. Performance didn't budge.
When I asked if anyone had looked at the actual queries hitting the database, I got blank stares. They were using Django's ORM exclusively. Nobody had opened a SQL shell in months.
Counting the Damage
I enabled Django's query logging in the staging environment and hit the dashboard endpoint once.
847 queries. One HTTP request produced eight hundred and forty-seven database round trips.
The response time breakdown was almost comical: no single query took more than 12ms. The database was fast. The problem was volume — 847 sequential network round trips to a database server with 1.2ms of latency overhead each, plus the ORM's object hydration cost multiplied across thousands of rows.
Here's what the code looked like, simplified:
class DashboardView(APIView):
def get(self, request):
accounts = Account.objects.filter(org=request.user.org)
result = []
for account in accounts:
recent_transactions = account.transactions.order_by('-created')[:5]
result.append({
'account': AccountSerializer(account).data,
'transactions': TransactionSerializer(recent_transactions, many=True).data,
'owner': UserSerializer(account.owner).data,
})
return Response(result)If you've worked with Django (or any ORM with lazy loading), you can already see it. The accounts queryset fetches maybe 60 accounts. Then for each account, we hit the database for transactions. Then we access account.owner, which triggers another query because it's a foreign key that wasn't eagerly loaded. That's 60 + 60 + 60 = 180 queries just from this loop. But the serializers themselves referenced nested relations — transaction.category, transaction.merchant, owner.department — each one another lazy load.
Sixty accounts times fourteen lazy-loaded relations equals 847 queries. Math checks out.
The Fix Nobody Wanted to Hear
The solution wasn't clever. It was select_related and prefetch_related — Django's built-in tools for eager loading:
accounts = (
Account.objects
.filter(org=request.user.org)
.select_related('owner', 'owner__department')
.prefetch_related(
Prefetch(
'transactions',
queryset=Transaction.objects.order_by('-created')
.select_related('category', 'merchant')[:5]
)
)
)847 queries became 4. Response time went from 11.2 seconds to 340ms.
Note
select_related call uses SQL JOINs for foreign key relationships. prefetch_related runs a separate query and does the joining in Python — better for reverse relations and many-to-many fields where a JOIN would create row multiplication.The team's reaction was a mix of relief and frustration. They'd spent two months and a few thousand dollars in AWS costs trying to solve a problem that took one afternoon once someone actually looked at the query log.
Why This Keeps Happening
I've seen this pattern at four different clients in the past year alone. The conditions are always the same:
The ORM works perfectly at first. When there are 10 accounts in dev and 50 in staging, nobody notices the N+1 pattern. The response time is 200ms and everyone moves on. Then production grows. The dataset increases gradually — not fast enough to trigger an alert, but fast enough that six months later the support team is complaining.
The other factor is that ORMs hide the cost model. In raw SQL, you'd never write 847 separate SELECT statements to populate one page. The absurdity is visible. But when the code reads like account.owner.department.name, it looks like accessing a property on an object. The network call is invisible.
I don't think ORMs are the problem. I think the problem is using an ORM without ever looking at what it produces. It's like writing code without ever running it.
What We Put in Place
We added a middleware that logged query count per request in staging:
class QueryCountMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
reset_queries()
response = self.get_response(request)
count = len(connection.queries)
if count > 50:
logger.warning(f"{request.path} executed {count} queries")
response['X-Query-Count'] = count
return responseFifty queries per request became the threshold for investigation. Not a hard failure — just a log line that someone would notice. We also added a panel in Django Debug Toolbar for local development that makes query counts impossible to miss.
The real cultural shift was smaller: the team started running django-silk during code review for any endpoint that touched the database. Before this, "it works" was the bar for merging. Now "it works in how many queries" became part of the conversation.
The Uncomfortable Part
That dashboard endpoint had been in production for fourteen months. It was written by a competent developer who simply didn't know to think about query count as a performance vector. The code was clean, well-tested, and followed every pattern in their style guide. The ORM did exactly what it was asked to do.
I keep coming back to the idea that performance problems in ORM-heavy codebases aren't bugs — they're latent. They're correct code that happens to scale badly, and nothing in the typical development loop (write, test, review, deploy) surfaces the issue until real data at real volume makes it undeniable.
Makes me wonder how many production systems right now are running hundreds of queries per request and nobody's counted yet.