L2 Cache: What it does
We often take for granted that having an L2 cache means that your system runs faster than it would if it wasn’t there, but what does that L2 cache actually do?
L2 cache, just like any other cache, acts as sort of a middle man between two mediums, in this case, your CPU’s L1 cache and your system memory (as well as other storage mediums). When the CPU wants to request a bit of data, it first searches in its L1 cache to see if it can find it there; if it does, then this results in what is known as a cache hit and the CPU retrieves it from the extremely fast, low latency L1 cache.
If it can’t retrieve it from L1 cache, it then goes to the L2 cache where it attempts to do the same – obtain a cache “hit.” In the event of a miss, the CPU must then go all the way to system memory in order to retrieve the data it needs. With the L2 cache of today’s CPUs operating at a much higher frequency and at much lower latency than system memory, if the L2 cache weren’t there or the cache mapping technique wasn’t as effective, we would see considerably lower performance figures from our systems.
4-way versus 8-way Set Associative L2 Cache
We just established that the function of the L2 cache is to provide access to commonly used data in system RAM. It does so by essentially mapping the cache lines of the L2 cache to multiple addresses in the system memory (the number of which is defined by the cacheable memory area of the L2 cache).
There are a number of methods that can be used to dictate how this mapping should occur. On one end of the spectrum we have a direct mapped cache, which divides the system memory into a number of equal sections, each one being mapped to a single cache line in the L2 cache.
The beauty of a direct mapped cache allows it to be searched relatively quickly and effectively since everything is organized into sections of equal size, but with this comes the sacrifice of hit rate because the technique does not allow for any bias toward more frequently used sections of data.
On the other end of the spectrum, we have a fully associative cache, which is the exact opposite of a direct mapped cache. Instead of equally dividing up the memory into sections mapped to individual address lines, a fully associate cache acts as more of a dynamic entity that allows for a cache line to be mapped to any section of system memory.
This flexibility allows for a much greater hit rate since allowances can be made for the most frequently used data, but at the same time since there is no organized structure to the mapping technique, searching through a fully associative cache is much slower than through a direct mapped cache.
Establishing a mid-point between these two cache mapping techniques, we have a set associative cache, which is what we’re used to with the current crop of processors available today.
A set associative cache divides the cache into various sections, referred to as sets, with each set containing a number of cache lines. With a 4-way set associative L2 cache, each set contains 4 cache lines, and in an 8-way set associative L2 cache, each set contains 8 cache lines.
The beauty of this is that the cache acts as if it were a direct mapped cache except that instead of the 1 cache line per memory section requirement, we get x number of cache lines per section of memory addresses.
This helps to sustain a balance between the pros and the cons of a direct mapped and a fully associative cache.
In the case of the Coppermine Pentium III and the Coppermine128 Celeron, the 8-way set associative L2 cache of the Coppermine Pentium III allows for a higher hit rate for the L2 cache than the 4-way set associative L2 cache of the Coppermine128 Celeron.
This combined with the fact that the 256KB of L2 cache on the Coppermine Pentium III should also theoretically result in a higher hit rate (especially with larger system memory sizes), and we have the potential for quite a performance difference between the Pentium III and the Celeron at the same clock speed.