👋 Hey friends,

Here’s what I’ve got in store for you this week:

  • Another snippet from The Backend Lowdown

  • Recent post recap: DB Load / AAS charts

  • Next post preview: Write contention

Let’s get into it 👇

Every newsletter will include a snippet from my book in progress, The Backend Lowdown, available for $5 right now on Gumroad!

Get The Backend Lowdown →

Observing and Deploying Migrations Safely (cont.)

Watch for Lock Contention

Before running a migration - especially on large or high-traffic tables - it's important to check whether your database is already under lock pressure, also known as lock contention.

Lock contention happens when multiple database operations are trying to acquire conflicting locks on the same resource (like a table or row), and some of them have to wait. This can slow down queries, cause timeouts, or block your migration entirely.

If your migration needs to lock a table - even briefly - and another long-running query is already holding a conflicting lock, your migration will just sit there waiting. Worse, if other queries pile up behind it, you can get a cascading slowdown or even an outage.

That's why it's good practice to check for:

  • Long-running queries - may be holding locks that your migration will conflict with

  • Autovacuum activity - can hold lightweight locks, especially if it's vacuuming a large or bloated table

  • Active transactions - especially those that have been open for a long time

You can check for ungranted (i.e. waiting) locks with:

SELECT * FROM pg_locks WHERE NOT granted;

Or tools like:

  • pg_stat_activity - to see current queries and lock wait status

  • Your APM (New Relic, Datadog, etc.) - to catch rising DB latency or blocked transactions

  • App-level logs (e.g. Rails + Lograge) - to see if requests are slowing down or queueing

If you see signs of contention, it's best to wait, monitor, or reschedule your migration for a lower-traffic time - like off-peak hours or a dedicated maintenance window.

Monitor for Deployment Pipeline Failures

If your migration runs inside a CI/CD pipeline step, and that migration blocks for too long (e.g. on a lock), your entire deploy can get stuck - potentially leading to:

  • Autoscaling delays

  • Worker exhaustion

  • Connection pool saturation

Always have a way to:

  • Cancel a stuck deploy

  • Apply schema changes out-of-band (e.g. via a direct console, heroku run, etc.)

Recent Post Recap

In case you missed it, this week I posted about one of the most useful charts I’ve found for diagnosing database performance issues:

DB Wait / Average Active Sessions (AAS)

This chart shows how many sessions are active and what they’re doing (CPU) or waiting on. Most APMs break it down into categories like:

  • CPU - the session is doing real work

  • I/O - the session is waiting on storage/network I/O

  • Locks - the session is blocked waiting on a lock

The primary visual for the video, though it looks a bit different in real APMs

Even though the chart will look a little different depending on your tool (CloudWatch, Datadog DBM, etc.), the idea is the same and it’s been instrumental for me in pinpointing real DB problems fast.

Here's the mental model I use:

  • AAS ≈ how many sessions are “active” at once (running on CPU or stuck waiting on something)

  • If AAS is consistently above your available CPU, you're building a line (queueing).

  • The real value is breaking it down by wait category (locks, I/O, CPU, etc.). That breakdown is your hypothesis generator.

Here are 5 ways to interpret the graph:

  1. AAS spike + mostly CPU → compute-bound (check top SQL + query plans)

  2. AAS spike + mostly locks → contention-bound (hot rows/indexes, write patterns)

  3. AAS spike + mostly I/O → storage-bound (scans, cache misses, missing indexes)

  4. AAS flat, latency up → often app-side (connection pool, retries, downstream dependencies)

  5. AAS “sawtooth” after deploys → correlation isn’t proof, but it narrows the search

Next I’m going to zoom in on write contention: how to spot it using the same AAS chart, and why it’s so sneaky.

Write contention is when multiple writes compete for the same row(s) (or a small hotspot set), so connections spend more time waiting on each other than doing useful work. In most systems, this shows up as rising latency, even if you didn’t “change much.”

Stay tuned! I’ll show the exact signal to look for and a couple concrete patterns that cause it.

That’s a wrap for this week. If something here made your day smoother, feel free to reply and tell me about it. And if you think a friend or teammate would enjoy this too, I’d be grateful if you shared it with them.

Until next time,
Steven

Keep Reading