Skip to content
jieqi's archive
Go back

Cache Coherence and the MESI Protocol: Part 3

This is a continuation from part 2, which covers measurements of cache effects under sequential and random access patterns. We explore cache coherence, which is the uniformity of shared resource data that is stored in multiple local caches. This is something to consider when multiple execution contexts (threads or processes) use the same region of memory.

Write-Through vs Write-Back

These are two strategies for keeping cache and main memory in sync when you write data.

Write-through is simple: every write to cache immediately also writes to main memory. Cache and RAM are always identical.
The problem is performance — if you’re incrementing a counter in a loop, every single increment fires a write to main memory over the FSB, even though nobody else needs that value yet.

Write-back is smarter: writes only update the cache line and mark it dirty. The write to main memory is deferred until the cache line gets evicted (LRU etc). A tight loop modifying the same variable fires one eventual write to RAM instead of thousands — much better for FSB bandwidth.

The Problem: Multiple Caches, One Memory

Write-back creates a problem on multi-processor systems. Each CPU has its own cache, so if CPU 1 has a dirty cache line (modified but not yet written back to RAM) and CPU 2 tries to read that same address, CPU 2 fetches from main memory and gets the stale old value. The correct value is sitting in CPU 1’s cache, unreachable.

This is the cache coherence problem. The hardware must solve it: when CPU 2 requests an address, the system needs to know that CPU 1 holds a dirty version and intercept the read to supply the correct data. This is what cache coherence protocols like MESI handle.

The MESI Protocol

Each cache line in every processor’s cache is tagged with one of four states (2 bits are used):

For any given pair of caches, the permitted states of a given cache line are:

MESI
M
E
S
I

When a line is M or E in one cache, all other caches must hold it as I.

State Transitions

The MESI protocol is a finite-state machine driven by two types of stimuli.

Processor requests — issued by the local CPU:

Bus-side requests — snooped from other processors via the shared bus:

Snooping: every cache monitors all bus transactions. When a bus request appears, each cache controller checks whether it holds that line and reacts — updating state, supplying data, or invalidating its copy. The memory controller also snoops, stepping in when no cache can supply the line.

All cache lines start Invalid.

Processor-initiated transitions

Initial StateOperationResponse
InvalidPrRdIssue BusRd on bus.
Others check if they have a valid copy and inform sender.
→ S if another cache has a copy; → E if none (must wait for all to report).
Data supplied by another cache or fetched from main memory.
InvalidPrWrIssue BusRdX on bus. → M.
Others send value if they have it, else fetch from main memory.
Others see BusRdX and invalidate their copies.
Write into cache block.
ExclusivePrRdCache hit — no bus traffic. State remains E.
ExclusivePrWrCache hit — no bus traffic. Silent E→M upgrade.
SharedPrRdCache hit — no bus traffic. State remains S.
SharedPrWrIssue BusUpgr on bus. → M.
Other caches see BusUpgr and mark their copies Invalid.
ModifiedPrRdCache hit — no bus traffic. State remains M.
ModifiedPrWrCache hit — no bus traffic. State remains M.

Bus-initiated transitions

Initial StateOperationResponse
InvalidBusRdNo state change. Signal ignored.
InvalidBusRdX / BusUpgrNo state change. Signal ignored.
ExclusiveBusRd→ S (implies a read in another cache).
Put FlushOpt on bus with block contents.
ExclusiveBusRdX→ I.
Put FlushOpt on bus with block contents.
SharedBusRdNo state change.
May put FlushOpt on bus with block contents (design choice — one Shared cache supplies the data).
SharedBusRdX / BusUpgr→ I (cache that sent BusRdX/BusUpgr becomes Modified).
May put FlushOpt on bus with block contents (design choice).
ModifiedBusRd→ S.
Put FlushOpt on bus with data.
Received by sender of BusRd and memory controller, which writes to main memory.
ModifiedBusRdX→ I.
Put FlushOpt on bus with data.
Received by sender of BusRdX and memory controller, which writes to main memory.

Key insight: E→M is silent — a PrWr on an Exclusive line needs no bus transaction. S→M requires a BusUpgr broadcast to every CPU holding the line. This is why the CPU tries to keep lines Exclusive — it avoids the broadcast on the next write.

Not all transitions generate bus traffic. PrRd and PrWr that hit a cached line (M, E, or S) are resolved entirely within the local cache — no bus activity. Bus transactions only occur when:

Implications for Performance

False Sharing

Write Amplification

Variants: MESIF and MOESI

Resources

Next Post
Measurements of Cache Effects: Part 2