Don't Hold a Lock Across Await

#resources #resources/programming #resources/programming/rust #resources/programming/concurrency

Don't hold a lock across await

Rule zero of Tokio code: never hold a mutex guard across .await unless that's the deliberate design and you reviewed it on purpose. The reasons stack — compile-time, runtime, and correctness.

The bad shape

let mut state = shared.lock().unwrap();
state.bump();
do_async_work().await;     // guard still held here

What goes wrong:

tokio::spawn requires Send. Most guard types are not Send. If your function is spawn-able, holding a guard across .await won't compile.
Even if the guard type is Send, holding it across .await blocks every other task that wants the same lock for the duration of the await. With long awaits, this is throughput death.
If the awaited work also tries to acquire the same lock somewhere down the call tree, you've built a deadlock.

The good shape

{
    let mut state = shared.lock().unwrap();
    state.bump();
} // guard drops here

do_async_work().await;

The guard exists inside an inner scope and drops before the await. Now the critical section is short, sync-only, and doesn't entangle with the runtime.

Picking the mutex

The default order for which mutex to reach for:

Message passing or owner task — no shared mutability, no mutex needed
std::sync::Mutex or parking_lot::Mutex — for tiny, sync-only critical sections
tokio::sync::Mutex — only when the guarded operation truly must cross .await

The async mutex isn't a strict upgrade. It's slower than a sync mutex, and using it everywhere is not automatically better. Reach for it only when the critical section legitimately spans an await — for example, holding a connection while running a query through it. If the section is short and doesn't await, a normal mutex is the right tool.

When you legitimately need to span an await

The pattern that actually requires tokio::sync::Mutex is rare. It usually means you're guarding a stateful resource whose operations are themselves async — a connection pool that needs to serialize queries, a config that's being asynchronously refreshed. For most "shared mutable state" cases, the answer is not "use the async mutex"; it's "restructure so the lock doesn't have to span an await."

Two restructurings that almost always work:

Read, then act. Lock, copy the data out, drop the guard, then do the async work.
Owner task + channels. One task owns the resource. Others send commands over an mpsc and get results back over oneshot. No shared mutex at all.

The second is preferable for genuinely-async stateful resources — provider sessions, long-lived clients, caches with async refresh.

The sync cousin: a guard in a scrutinee

The "scope the guard tightly" instinct applies even when no .await is involved. A lock guard taken inside a match or if let scrutinee lives until the end of the whole block, not just the lookup — clippy flags it as significant_drop_in_scrutinee. It's not a deadlock the way the await version is, but it holds the lock across the arm bodies for no reason and reads as if the lock matters longer than it does.

// guard held across the entire if-let body
if let Some(handle) = self.worker.lock().take() {
    registry.push(handle);   // lock still held here
}

Mxr had twelve of these in one take_all shutdown function. The fix was a helper that scopes the lock to the take() and hands back owned data:

fn take_named(slot: &Mutex<Option<JoinHandle<()>>>, name: &str) -> Option<Named> {
    slot.lock().take().map(|handle| Named { name: name.into(), handle })
}
// caller: registry.extend(take_named(&self.worker, "worker"));

The guard drops at the end of the take() expression; the caller never holds it. Twelve near-identical blocks collapsed to twelve one-liners, and the lock lives exactly as long as it should. Same move as the await rule: make the guard's lifetime so visibly short that holding it too long is obviously wrong.

Why this is more than a rule

The compiler catches some cases (non-Send guards across awaits in spawned tasks). The runtime catches none — long holds compile fine and run fine, just slowly. Deadlocks show up under load. The cost of getting this wrong is paid in production incidents, not in compiler errors.

The rule is brittle to enforce socially. Every async PR needs the same checklist. Better to make the rule structural: prefer designs where the lock can't span an await because there's no lock, or where the guard's lifetime is so visibly short that holding it across an await is obviously wrong.