Broadwell CPU Architecture

We’ll kick off our look at Broadwell-Y with Broadwell’s CPU architecture. As this is a preview Intel isn’t telling us a great deal about the CPU at this time, but they have given us limited information about Broadwell’s architectural changes and what to expect for performance as a result.

With Broadwell Intel is at the beginning of the next cycle of their tick-tock cadence. Whereas tock products such as Haswell and Sandy Bridge designed to be the second generation of products to use a process node and as a result are focused on architectural changes, tick products such as Ivy Bridge and now Broadwell are the first generation of products on a new process node and derive much (but not all) of their advantage from manufacturing process improvements. Over the years Intel has wavered on just what a tick should contain – it’s always more than simply porting an architecture to a new process node – but at the end of the day Broadwell is clearly derived from Haswell and will be taking limited liberties in improving CPU performance as a result.

Intel's Tick-Tock Cadence
Microarchitecture Process Node Tick or Tock Release Year
Conroe/Merom 65nm Tock 2006
Penryn 45nm Tick 2007
Nehalem 45nm Tock 2008
Westmere 32nm Tick 2010
Sandy Bridge 32nm Tock 2011
Ivy Bridge 22nm Tick 2012
Haswell 22nm Tock 2013
Broadwell 14nm Tick 2014
Skylake 14nm Tock 2015

All told, Intel is shooting for a better than 5% IPC improvement over Haswell. This is similar to Ivy Bridge (4%-6%), though at this stage in the game Intel is not talking about expected clockspeeds or the resulting overall performance improvement. Intel has made it clear that they don’t regress on clockspeeds, but beyond that we’ll have to wait for further product details later this year to see how clockspeeds will compare.

To accomplish this IPC increase Intel will be relying on a number of architectural tweaks in Broadwell. Chief among these are bigger schedulers and buffers in order to better feed the CPU cores themselves. Broadwell’s out-of-order scheduling window for example is being increased to allow for more instructions to be reordered, thereby improving IPC. Meanwhile the L2 translation lookaside buffer (TLB) is being increased from 1K to 1.5K entries to reduce address translation misses.

The TLBs are also receiving some broader feature enhancements that should again improve performance. A second miss handler is being added for TLB pages, allowing Broadwell to utilize both handlers at once to walk memory pages in parallel. Otherwise the inclusion of a 1GB page mode should pay off particularly well for servers, granting Broadwell the ability to handle these very large pages on top of its existing 2MB and 4K pages.

Meanwhile, as is often the case Intel is once again iterating on their branch predictor to cut down on missed branches and unnecessary memory operations. Broadwell’s branch predictor will see its address prediction improved for both branches and returns, allowing for more accurate speculation of impending branching operations.

Of course efficiency increases can only take you so far, so along with the above changes Intel is also making some more fundamental improvements to Broadwell’s math performance. Both multiplication and division are receiving a performance boost thanks to performance improvements in their respective hardware. Floating point multiplication is seeing a sizable reduction in instruction latency from 5 cycles to 3 cycles, and meanwhile division performance is being improved by the use of an even larger Radix-1024 (10bit) divider. Even vector operations will see some improvements here, with Broadwell implementing a faster version of the vector Gather instruction.

Finally, while it’s not clear whether these will be part of AES-NI or another instruction subset entirely, Intel is once again targeting cryptography for further improvements. To that end Broadwell will bring with it improvements to multiple cryptography instructions.

Meanwhile it’s interesting to note that in keeping with Intel’s power goals for Broadwell, throughout all of this Intel put strict power efficiency requirements in place for any architecture changes. Whereas Haswell was roughly a 1:1 ratio of performance to power – a 1% increase in performance could cost no more than a 1% increase in power consumption – Broadwell’s architecture improvements were required to be at 2:1. While a 2:1 mandate is not new – Intel had one in place for Nehalem too – at the point even on the best of days meaningful IPC improvements are hard to come by at 1:1, never mind 2:1. The end result no doubt limited what performance optimizations Intel could integrate into Broadwell’s design, but it also functionally reduces power requirements for any given performance level, furthering Intel’s goals in getting Core performance in a mobile device. In the case of Broadwell this means Broadwell’s roughly 5% performance improvement comes at a cost of just a 2.5% increase in immediate power consumption.

With that said, Intel has also continued to make further power optimizations to the entire Broadwell architecture, many of which will be applicable not just to Core M but to all future Broadwell products. Broadwell will see further power gating improvements to better shut off parts of the CPU that are not in use, and more generalized design optimizations have been made to reduce power consumption of various blocks as is appropriate. These optimizations coupled with power efficiency gains from the 14nm process are a big part of the driving force in improving Intel’s power efficiency for Core M.

Intel Broadwell Architecture Preview Broadwell GPU Architecture
Comments Locked

158 Comments

View All Comments

  • psyq321 - Tuesday, August 12, 2014 - link

    Actually, apart from power-users I fail to see any tangible improvements in performance of modern CPUs that matter to desktop/notebook usage, Intel or otherwise.

    In the mobile space, it is improvements in GPU which mattered, but even that will eventually flatten once some peak is reached since graphics improvements on 4" / 5" screen can only matter to wide audiences up to some point.

    However, there are surely enough customers that do look forward to more power - this is workstation and server market. Skylake and its AVX512 will matter to scientists and its enormous core count in EP (Xeon) version will matter to companies (virtualization, etc.).

    Standard desktop, not so much. But, then again, ever since Core 2 Quad 6600 this was the case. If anything, large-scale adoption of SSDs is probably the single most important jump in desktop performance since the days of Conroe.
  • Khenglish - Monday, August 11, 2014 - link

    I find the reduction in die thickness to be a big deal. Maybe this will prevent temperatures from getting out of control when the cpu core area gets cut in half for 14nm. High power 22nm cpus already easily hit 30c temperature difference between the cpu and heatsink.
  • AnnonymousCoward - Tuesday, August 12, 2014 - link

    Probably not. I'd guess thermal dissipation is the same.
  • dgingeri - Monday, August 11, 2014 - link

    PC sales are down mostly because people can keep their systems longer due to the lack of innovation coming from Intel on desktop chips and the lack of utilizing the current CPU technology by software developers. They could be so much more, if only developers would actually make use of the desktop CPU capabilities for things such as a voice command OS that doesn't need to be trained. Intel would then have a reason to produce more powerful chips that would trigger more PC sales.

    As it is, the current processor generation is less than 10% faster clock for clock compared to three generations ago. A great many thing aren't any faster at all. Know what? It doesn't even matter because nothing uses that much power these days.

    Tablets and smartphones can't take the place of full PCs for most people. Their screens are just too small. Perhaps the younger generations prefer the small form factors right now, but give them a little time, and their eyes won't let them use such things. I can see the move to laptops, especially with 14-15" screens, but trying to show the same content on a 10" screen is just near unusable, and a 5" smartphone screen is just downright impossible. However, desktop PCs still have their place, and that's never going to change.

    This push by "investors" for the tablet and smartphone market is just asinine. Broadwell isn't going to help sales all that much. Perhaps, they might sell some more Intel based tablets, but it won't be all that much of an improvement. Tablets have a niche, but it really isn't that much of one.
  • HanzNFranzen - Monday, August 11, 2014 - link

    Tablets are a niche and not much of one? lol yea ok... well while you were asleep in a cave, over 195 million tablets were sold in 2013 between Android/Apple/Microsoft which is just shy of 80 million more than the previous year. World wide PC sales totaled 316M units, so we are talking nearly 2 tablets for every 3 PC's sold. Eh...small niche...
  • dgingeri - Monday, August 11, 2014 - link

    yeah, lots of people have them, but how much do they really use them? I have two, one Android and one Windows RT, and I only use them for reading books or for reading the web news while away from home. The Windows unit showed promise, since I could use it to run Office and terminal programs, but I ended up not using it at work anymore because it couldn't use a USB to serial adapter for talking to switches and raid arrays. It ended up being only half useful. They're nice to have for certain things, but they aren't as versatile as a PC. My parents own two, and two PCs, and they use the PCs far more. My older sister has one, and she barely uses it. Her 7 year old uses it to play games most of the time. My nephew has one, and he's only ever used it to read Facebook. It's a telling tale that everyone I've known who has one only has limited used for it.
  • mapesdhs - Monday, August 11, 2014 - link

    Point taken, but if people are *buying* them, irrespective of whether they use them,
    then it doesn't really matter.

    Besides, this whole field of mobile computing, smart phones, tablets, now phablets,
    etc., it's too soon to be sure where we're heading long-term.

    Many people say the copout porting of console games to PCs with little enhancement
    is one thing that's harmed PC gaming sales. This may well be true. Now that the newer
    consoles use PC tech more directly, perhaps this will be less of an issue, but it's always
    down to the developer whether they choose to make a PC release capable of exploiting
    what a PC can do re high res, better detail, etc. Wouldn't surprise me if this issue causes
    internal pressures, eg. make the PC version too much better and it might harm console
    version sales - with devs no doubt eager to maximise returns, that's something they'd
    likely want to avoid.

    Ian.
  • az_ - Monday, August 11, 2014 - link

    Ryan, could you add a size comparison to an ARM SOC that would be used in a tablet? I wonder how close are Intel in size. Thanks.
  • name99 - Tuesday, August 12, 2014 - link

    BDW-Y is 82 mm^2. The PCH looks like it's about a third of that, so total is maybe 115 mm^2 or so.
    In comparison, Apple A7 is about 100 mm^2.
    A7 includes some stuff BDW-Y doesn't, and vice versa, so let's call it a wash in terms of non-CPU functionality.
    BDW-Y obviously can perform a LOT better (if it's given enough power, probably performs about the same at the same power budget). On the other hand it probably costs about 10x what an A7 costs.
  • Krysto - Tuesday, August 12, 2014 - link

    Sure, also let's conveniently forget that Broadwell Y benefits not only of 3D transistors, but a 2 generation node shrink, too, compared to A7. Now put A7 on 14nm and 3d transistors...and let's see which does better.

    This is the issue nobody seems to understand, not even Anand, or just conveniently ignored it when he declared that the "x86 myth is busted". At the time we were talking about a 22nm Trigate Atom vs 28nm planar ARM chip, with Atom barely competing on performance (while costing 2x more, and having half the GPU performance). Yet Anand said the x86 bloat myth is busted...How exactly?! Put them on the same process technology...and then we'll see if x86 myth is indeed busted, or it's still bloated as a pig.

Log in

Don't have an account? Sign up now