stevencodes.swe - July 20

War story, back end algorithm, dev tool rec

👋 Hey friends!

Here’s what I’ve got for you this week:

  • Another snippet from Chapter 3 of The Backend Lowdown (getting close to finishing this chapter!)

  • New video format: back end algorithms

  • War story: the autovacuum mystery

  • Dev tool recommendation: Claude Code

Let’s dive into it! 👇

The Backend Lowdown: Chapter 3 Preview

Every newsletter will include a snippet from my book in progress, The Backend Lowdown, available for $1 right now on Gumroad!

SELECT * & Over-Selecting Columns

By defaulting to SELECT *, or otherwise fetching more columns than your application actually uses, you're paying a hidden tax on I/O, CPU, network, and memory. Over-selecting bloats result sets, slows down query pipelines, and can turn an otherwise efficient read into a costly operation, especially on wide tables or high‑throughput workloads.

Why grabbing every field kills throughput

When you ask the database to return every column, it has to:

  1. Read more bytes from disk: Wider rows mean fewer rows per page. Every table page fetch retrieves fewer usable results, resulting in more page reads overall.

  2. Consume more network bandwidth: Transferring unused columns over the wire increases latency for both the database and your application.

  3. Increase CPU and memory pressure: Deserializing large row buffers and materializing unnecessary fields in your runtime takes CPU cycles and bloats your process heap.

Even if you only need one or two fields, a SELECT * forces the engine to load every attribute into memory and hand it off to the client. In high‑concurrency environments, those extra bytes add up, slowing down caches, compounding I/O costs, and exacerbating GC pauses.

New Video Format: Back End Algorithms

This week I started covering a new video topic, back end algorithms. There are a surprising number of algorithms out there powering real world back end systems and I find them fascinating so I thought you would too! I’ve already covered a few such as power of 2 choices (see below), rendezvous hashing, and more! Let me know what you think!

@stevencodes.swe

Two coin flips vs a $100k load balancer? Discover how the power of two choices slashes queue overload and crushes tail latency, no fancy g... See more

War Story: The Autovacuum Mystery

A few days ago at work, Sentry decided to throw a party in our Slack channel. Alert after alert about GenServer.call/3 timeouts started flooding in. You know that feeling when your error monitoring goes from 0 to 100 real quick? It was a lot like that.

The hunt was on. First stop: our primary Aurora Postgres database. DataDog showed a massive spike in write IOPS that definitely wasn't normal. But write IOPS alone don't usually cause Elixir processes to timeout...

After digging deeper, we found our smoking gun: autovacuum was running on one of our largest tables. And it turns out it had been running too infrequently. This table had accumulated so many dead tuples that when autovacuum finally kicked in, it had a mountain of cleanup work to do. The vacuum process ended up holding locks long enough to block queries, causing our Elixir GenServers to timeout waiting for database responses.

The Lesson: Neglected Tables Bite Back

This incident perfectly illustrates why PostgreSQL's autovacuum settings need to be tuned per table. We'd been running with defaults on this high-activity table, which meant:

  • Dead tuples accumulated for way too long

  • When vacuum finally triggered, it had massive work to do

  • The cleanup operation held locks longer than our application timeouts

  • Cascade of failures as GenServers gave up waiting

Different Tables, Different Needs

Not all tables are created equal:

  • High-write tables: Need frequent small vacuums to avoid big cleanup operations

  • Read-heavy tables: Can tolerate less frequent vacuuming

  • Growing tables: May need threshold adjustments as they scale

  • Critical path tables: Should never accumulate enough dead tuples to cause blocking

The Fix

We adjusted the autovacuum settings to run more frequently with smaller batches:

ALTER TABLE problem_table SET (
  autovacuum_vacuum_scale_factor = 0.02,    -- vacuum at 2% dead tuples vs default 20%
  autovacuum_vacuum_cost_limit = 1000       -- throttle to prevent IO spikes
);

The moral? Don't wait for autovacuum to become auto-problem. Regular small cleanups beat occasional massive ones, especially when your application timeouts are measured in seconds, not minutes.

Dev Tool Recommendation: Claude Code

To anyone who’s been paying a modicum of attention to the AI space, this recommendation will not come as a surprise to you, but then again maybe you’re like me and you tend to get very comfortable using the tools you like. Until not too long ago, I was pretty happy using ChatGPT and occasionally some of Anthropic’s models through Cursor. However, I decided to finally give Claude Code a whirl and I am honestly shocked at how good it is. It’s decent with just hooking it up to Sonnet, but putting Opus in the driver’s seat is scary good. It’s not perfect of course, all models are far from it right now, but it is good. If you’re not used to using a tool via the command line, I say relax and just try it out, it may surprise you. My favorite part about it is the little to do list it comes up with to accomplish the task you give it and watching it knock out each item one by one. The only major pain point I’ve experienced besides rate limits are when it gets stuck trying to run shell commands. If you’re like me and have been comfortable using GPT’s models, try checking out Claude Code!

That’s a wrap for this week. If something here made your day smoother, feel free to reply and tell me about it. And if you think a friend or teammate would enjoy this too, I’d be grateful if you shared it with them.

Until next time,
Steven