stevencodes.swe
Posts
stevencodes.swe - June 29, 2025

stevencodes.swe - June 29, 2025

Video deep dive, funny comments, dev tool rec

June 29, 2025

👋 Hey friends!

This week’s been a fun one. I rolled out a new video format: Backend Triage Quizzes, and it seems like people like it. One of them is about to hit 100K views, and the comment section has been hilarious. I’m loving how many of you are jumping in to diagnose the problems, ask questions, share war stories, or just say something funny.

In this week’s newsletter:

Sneak peek of what’s inside Chapter 3 (Query Pathologies)
Triage quiz deep dive
Funny comments from the quiz thread
A cool dev tool I discovered this week

Let’s get into it 👇

The Backend Lowdown: Chapter 3 Preview

Every newsletter will include a snippet from my book in progress, The Backend Lowdown, available for $1 right now on Gumroad!

N+1 Queries

One of the best parts about using an ORM is the ease of use. Unfortunately, yielding a tool this sharp can make it all too easy to cut yourself. Enter the N+1 query: harmless at first glance, ruthless at scale.

What is an N+1 Query?

An N+1 query occurs when your application makes one query to fetch a list of records (the "1"), and then issues an additional query for each item in that list (the "N"). Put another way, the term "N+1" comes from running 1 query for the parent set, then N additional queries for each child.

At a small scale, you might never notice. But as the number of records grows, so does the number of queries - linearly. And because each query involves a round trip to the database, this pattern can crush performance in production while remaining invisible in development or test environments.

An Example

Let's say you have an Author model and a Book model, and you're rendering a list of authors with their books:

# Rails / ActiveRecord
Author.all.each do |author|
  puts "#{author.name} wrote #{author.books.count} books"
end

This will issue:

One query to load all authors.
Then one query per author to load their books.

Behind the scenes:

-- 1 query for authors
SELECT * FROM authors;

-- 1 query per author to load books
SELECT * FROM books WHERE author_id = 1;
SELECT * FROM books WHERE author_id = 2;
-- ...

So if you have 100 authors, your application just made 101 database calls. That's an N+1.

Other environments are no better:

# Phoenix / Ecto
Repo.all(Author)
|> Enum.map(fn author ->
  "#{author.name} wrote #{length(Repo.preload(author, :books).books)} books"
end)

Same idea, same problem. The pattern may be dressed differently, but the damage is the same: chatty, repetitive queries that scale badly.

Video Deep Dive: Backend Triage Quiz

I want to first comment that I really love making these videos. Some of them are based off things I’ve actually been bit by in production or have seen others get bit by, some of them I conjure up to illustrate how things can go wrong.

Two of the most difficult parts of being on call and triaging an issue are knowing where to look for signals / symptoms and simultaneously putting those clues together to understand the root cause. Of course triaging issues is more complicated than just that, but these videos try to cram a scenario into roughly 60 seconds so a lot of the realism must fall by the wayside.

Unfortunately it’s not quite easy to make a video for knowing where to look, so these videos focus on the other part: putting those clues together to understand the root cause.

Let’s dive into one of the triage quizzes from this week! The scenario: checkout requests are blowing up and there are HTTP 503s everywhere.

If you haven’t seen the video and would like to try and solve the quiz yourself, scroll down!

This situation is actually pretty realistic. It’s a prime example of how something non-obvious like a long running report can silently wreck your production app. While the query in question was egregiously bad (sum data from all orders up until an hour ago??), there were some other bad ideas. Let’s delineate (non-exhaustively):

Running Analytical Queries on an OLTP DB

Running analytical queries on an OLTP database (i.e., your app’s database) is usually not a good idea. Analytical queries typically cast wide nets (meaning they try to look at a lot of records / data in one fell swoop) and in general are pretty beefy which can hinder the performance of your app’s database. At the very least, this report query should have been run off a read replica database instead to take the pressure off the primary.

Not Following Query / Insert Best Practices

There’s a whole slew of best practices when it comes to inserting records in a database. This query made some very questionable choices such as:

Not batching insertions (e.g., insert 1000 records at a time) which would have improved the query’s performance (though in this scenario it may not have made that much of a difference)
Grabbing an ACCESS EXLUSIVE lock (heavyweight lock → blocks all reads / writes on the table) instead of something more lightweight like an advisory lock
Running a ginormous SELECT query (all orders up until 1 hour ago… yikes)

Not Using Safeguards

For it to have gone this far, the company / engineers in question clearly did not have any safeguards in place such as:

statement_timeout or lock_timeout which would not have let the expensive report query continue on for so long
Some sort of performance testing with a staging environment and production-like dataset that would have potentially caught the bad locking behavior

Not Having Better Monitoring

A few things should have been done which would have made an impact here:

No long query/lock alerts were set in our APM so we only got paged when there were downstream effects. Setting those alerts up can make a huge difference
No connection pool usage alerting was set up so we were never even notified about it hitting a certain threshold, e.g. 80%

With all of the above, this scenario very likely would not have happened and would have been caught so much earlier.

@stevencodes.swe
Checkouts are failing. Connection pool is full. Can you figure out what’s blocking the DB? Only one of these fixes will work. Drop your gu... See more

Community Corner

Just for laughs, here are some of my favorite comments from the triage quiz video above:

Classic “drop table” joke 😆

Dev Tool Highlight

Not a back end-specific tool, but I just heard about Relative CI from a coworker. It’s a cloud service for in-depth bundle analysis and monitoring that plugs straight into your CI/CD pipeline. On every build, Relative CI breaks down your app’s bundle size by asset, module, and package so you can see exactly what changed and why. I haven’t personally used it yet myself, but if catching bundle-size regressions early and keeping a lean, performant front end is important to you, Relative CI looks like a great “set-and-forget” addition to your pipeline. If you try it out, I’d love to hear what you think!

That’s a wrap for this week. If something here made your day smoother, feel free to reply and tell me about it. And if you think a friend or teammate would enjoy this too, I’d be grateful if you shared it with them.

Until next time,
— Steven