Lock-free data structures avoid mutexes by using atomic compare-and-swap (CAS) operations. They give better throughput under high contention and avoid deadlock entirely. The catch: they're harder to reason about, easier to write incorrectly, and don't always outperform locks at low contention.
The CAS primitive
// Atomically: if (*ptr == expected) { *ptr = new_val; return true; }
bool compare_exchange(int* ptr, int expected, int new_val);
// Spinlock built on CAS:
while (!compare_exchange(&lock, 0, 1)) {
/* spin */
}
// critical section
lock = 0;Lock-free stack (Treiber)
Push: read top, CAS to swap top with new node. Pop: read top, CAS top with top->next. Both retry on CAS failure. No mutex, no deadlock, scales linearly with cores under low contention.
The ABA problem
Thread 1 reads top = A. Thread 1 paused. Thread 2 pops A, pushes B, pushes A back. Thread 1 resumes, CAS sees top == A, succeeds — but the structure changed underneath. Fix: tag pointer with version counter (use 128-bit CAS or double-word CAS).
When NOT to go lock-free
Low contention: a mutex is faster (no retry loops). Code complexity: lock-free is hard to debug and hard for others to maintain. If your critical section is short and threads rarely collide, a regular lock is the right answer.
When TO go lock-free
High contention queues (producer/consumer at millions of ops/sec). Real-time systems where you can't tolerate priority inversion. Hot paths where mutex acquisition is a measured bottleneck. Otherwise, prefer locks.