L2 vs. L3 cache: What’s the Difference?


Yesterday we discussed how caches work, what the difference is between L1 and L2, and the various design elements that determine how fast (and how effective) a CPU’s cache is. Today, we’re going to take one step further and explore the difference between L2 and L3 caches.

At its simplest level, an L3 cache is just a larger, slower version of the L2 cache. Back when most chips were single-core processors, this was generally true. The first L3 caches were actually built on the motherboard itself, connected to the CPU via the backside bus. When AMD launched its K6-III processor family, many existing K6/K-2 motherboards could accept a K6-III as well. Typically these boards had 512K-2MB of L2 cache — when a K6-III, with its integrated L2 cache was inserted, these slower, motherboard-based caches became L3 instead.

By the turn of the century, slapping an additional L3 cache on a chip had become an easy way to improve performance — Intel’s first consumer-oriented Pentium 4 “Extreme Edition” was a repurposed Gallatin Xeon with a 2MB L3 on-die. Adding that cache was sufficient to buy the Pentium 4 EE a 10-20 percent performance boost over the standard Northwood line.

Cache and the multi-core curveball

As multicore processors became more common, L3 cache started appearing more frequently on consumer hardware. These chips, like Intel’s Nehalem and AMD’s K10 (Barcelona) used L3 as more than just a larger, slower backstop for L2. In addition to this function, the L3 cache is often shared between all of the processors on a single piece of silicon. That’s in contrast to the L1 and L2 caches, both of which tend to be private and dedicated to the needs of each particular core. (AMD’s Bulldozer design is an exception to this — Bulldozer, Piledriver, and Steamroller all share a common L1 instruction cache between the two cores in each module).

Intel’s Haswell-E, for example, has eight separate cores that all back up to a common L3 cache.


Private L1/L2 caches and a shared L3 is hardly the only way to design a cache hierarchy, but it’s a common approach that multiple vendors have adopted. Giving each individual core a dedicated L1 and L2 cuts access latencies and reduces the chance of cache contention — meaning two different cores won’t overwrite vital data that the other put in a location in favor of their own workload. The common L3 cache is slower but much larger, which means it can store data for all the cores at once. Sophisticated algorithms are used to ensure that Core 0 tends to store information closest to itself, while Core 7 across the die also puts necessary data closer to itself.

Unlike the L1 and L2, which are nearly always CPU-focused and private, the L3 can also be shared with other devices or capabilities. Intel’s Sandy Bridge CPUs shared an 8MB L3 cache with the on-die graphics core (Ivy Bridge gave the GPU its own dedicated slice of L3 cache in lieu of sharing the entire 8MB).

In contrast to the L1 and L2 caches, both of which are typically fixed and vary only very slightly (and mostly for budget parts) both AMD and Intel offer different chips with significantly different amounts of L3. Intel typically sells at least a few Xeons with lower core counts, higher frequencies, and a higher L3 cache-per-CPU ratio. Intel’s Core i7 processors have maintained an 8MB L3 since the debut of Nehalem in 2008 (roughly 2MB of L3 for every CPU core) but the highest-end parts are typically pegged at 2.5MB of cache per CPU core.

Today, the L3 is characterized as a pool of fast memory common to all the CPUs on an SoC. It’s often gated independently from the rest of the CPU core and can be dynamically partitioned to balance access speed, power consumption, and storage capacity. While not nearly as fast as L1 or L2, it’s often more flexible and plays a vital role in managing inter-core communication. With Intel having already added L4 to its Skylake chips, it’s possible we’ll see the L3 take a more simplified role — with some of its functions and capabilities shifting over to the newer, larger pool of cache.