Decoupled L3 Cache

With Nehalem Intel introduced an on-die L3 cache behind a smaller, low latency private L2 cache. At the time, Intel maintained two separate clock domains for the CPU (core + uncore) and a third for what was, at the time, an off-die integrated graphics core. The core clock referred to the CPU cores, while the uncore clock controlled the speed of the L3 cache. Intel believed that its L3 cache wasn't incredibly latency sensitive and could run at a lower frequency and burn less power. Core CPU performance typically mattered more to most workloads than L3 cache performance, so Intel was ok with the tradeoff.

In Sandy Bridge, Intel revised its beliefs and moved to a single clock domain for the core and uncore, while keeping a separate clock for the now on-die processor graphics core. Intel now felt that race to sleep was a better philosophy for dealing with the L3 cache and it would rather keep things simple by running everything at the same frequency. Obviously there are performance benefits, but there was one major downside: with the CPU cores and L3 cache running in lockstep, there was concern over what would happen if the GPU ever needed to access the L3 cache while the CPU (and thus L3 cache) was in a low frequency state. The options were either to force the CPU and L3 cache into a higher frequency state together, or to keep the L3 cache at a low frequency even when it was in demand to prevent waking up the CPU cores. Ivy Bridge saw the addition of a small graphics L3 cache to mitigate this situation, but ultimately giving the on-die GPU independent access to the big, primary L3 cache without worrying about power concerns was a big issue for the design team.

When it came time to define Haswell, the engineers once again went to Nehalem's three clock domains. Ronak (Nehalem & Haswell architect, insanely smart guy) tells me that the switching between designs is simply a product of the team learning more about the architecture and understanding the best balance. I think it tells me that these guys are still human and don't always have the right answer for the long term without some trial and error.

The three clock domains in Haswell are roughly the same as what they were in Nehalem, they just all happen to be on the same die. The CPU cores all run at the same frequency, the on-die GPU runs at a separate frequency and now the L3 + ring bus are in their own independent frequency domain.

Now that CPU requests to L3 cache have to cross a frequency boundary there will be a latency impact to L3 cache accesses. Sandy Bridge had an amazingly fast L3 cache, Haswell's L3 accesses will be slower.

The benefit is obviously power. If the GPU needs to fire up the ring bus to give/get data, it no longer has to drive up the CPU core frequency as well. Furthermore, Haswell's power control unit can dynamically allocate budget between all areas of the chip when power limited.

Although L3 latency is up in Haswell, there's more access bandwidth offered to each slice of the L3 cache. There are now dedicated pipes for data and non-data accesses to the last level cache.

Haswell's memory controller is also improved, with better write throughput to DRAM. Intel has been quietly telling the memory makers to push for even higher DDR3 frequencies in anticipation of Haswell.

Feeding the Beast: 2x Cache Bandwidth in Haswell TSX
Comments Locked

245 Comments

View All Comments

  • rundll - Friday, October 5, 2012 - link

    Four cores and 95 W tdp.
    What is this?
  • meloz - Friday, October 5, 2012 - link

    Yes this caught my eye and I would like an answer, too.

    Maybe it is one SKU with GT3 for desktop? Or maybe it is a 6 core part?

    Or maybe.....it is the mother of all overclocking processors. Muhahahahah!
  • Kevin G - Friday, October 5, 2012 - link

    I suspect that 95W is the rated socket limit. This is similar to how Intel advertises Ivy Bridge at 77 W on the desktop but tells motherboard manufacturers to build around the higher 95 W figure.

    What is odd is that Haswell will move some of the VRM circuitry on the package which should restrict just how far off that 95W figure motherboards can deviate.
  • meloz - Friday, October 5, 2012 - link

    What a great article, Anand!

    Felt so good to read a 'proper' Anandtech article after so long, instead of the usual Apple worship and cheap fillers.

    Haswell is looking very good. Would make an ideal upgrade for Sandy Bridge users. AMD is done, but thankfully Intel sees some threat from ARM so that will keep them innovating.

    I hope Intel make a sensible choice with Haswell SKUs and get away from their artifical crippling and segmentation tendencies. That's about the only thing that can ruin Haswell.
  • Wolfpup - Friday, October 5, 2012 - link

    Once again they bump up the number of transistors being used on their worthless video-and this time they even lower CPU performance (L3 cache) to appease their worthless video.

    Interesting article, but I guess I misunderstood previous articles...I thought Conroe through Ivy Bridge had 4 integer execution units per core? (As does Piledriver?)
  • haukionkannel - Friday, October 5, 2012 - link

    Good article and information that you need win 8 to fully utilize Haswell was new information to me. It will be interesting to see how much better Haswell will be with win 8 compared to win 7. Seems to be same kind of dilemma as with AMD Bulldoser/piledriver where there seems to be some kind of better performance with new OS, but how much will reamain to be seen.
  • Belard - Friday, October 5, 2012 - link

    Apple owns various CPU tech and design companies such as P.A. Semi. They can build their own CPUs (not x86 of course)...

    Apple will do what they can to take out the middleman.
  • jwcalla - Friday, October 5, 2012 - link

    Apple doesn't have any fabs though and if Samsung isn't willing to re-sign another contract, they're going to be in a bit of a bind. In other words, it won't be cheap. And even if Samsung does re-up, you can be sure that it'll come with an additional $1.05b price tag to offset any "losses" in their mobile division.

    I felt the first page overestimated Apple's influence quite a bit. They have ~5% desktop marketshare and 0% in the server space. Not to trivialize any loss in CPU sales, but Intel's primary headwinds don't involve a possible Apple switch to ARM.
  • Kevin G - Friday, October 5, 2012 - link

    Apple's influence comes from the mobile market which is beginning to dwarf the PC market (and is larger than the server market in terms of volume). Apple is the largest tablet maker and a major smart phone manufacturer. There hardware is backed by one of the largest digital media markets. To do this Apple is the worlds largest consumer of flash memory whom orders are large enough to directly affect NAND pricing.

    With the rest of the industry going ultra mobile, they'll have to compete with Apple who is already entrenched. Sure the PC will survive but mainly for legacy work and applications. Their isn't enough of a PC market in the future to be viable long term with so many players.
  • jwcalla - Friday, October 5, 2012 - link

    While all this is true, the first page seems to indicate that Intel is really pushing the low power envelop partly because of rumors that Apple will move away from Intel chips in their laptop / ultrabook products.

    While I'm sure Intel is happy to be in MBAs, etc., losing that business isn't going to be as big a deal as the other pressures facing the PC market (as you mention).

    Now if WinRT on ultrabooks / laptops began to take off... that would be a huge problem for Intel.

Log in

Don't have an account? Sign up now