Broadwell CPU Architecture

We’ll kick off our look at Broadwell-Y with Broadwell’s CPU architecture. As this is a preview Intel isn’t telling us a great deal about the CPU at this time, but they have given us limited information about Broadwell’s architectural changes and what to expect for performance as a result.

With Broadwell Intel is at the beginning of the next cycle of their tick-tock cadence. Whereas tock products such as Haswell and Sandy Bridge designed to be the second generation of products to use a process node and as a result are focused on architectural changes, tick products such as Ivy Bridge and now Broadwell are the first generation of products on a new process node and derive much (but not all) of their advantage from manufacturing process improvements. Over the years Intel has wavered on just what a tick should contain – it’s always more than simply porting an architecture to a new process node – but at the end of the day Broadwell is clearly derived from Haswell and will be taking limited liberties in improving CPU performance as a result.

Intel's Tick-Tock Cadence
Microarchitecture Process Node Tick or Tock Release Year
Conroe/Merom 65nm Tock 2006
Penryn 45nm Tick 2007
Nehalem 45nm Tock 2008
Westmere 32nm Tick 2010
Sandy Bridge 32nm Tock 2011
Ivy Bridge 22nm Tick 2012
Haswell 22nm Tock 2013
Broadwell 14nm Tick 2014
Skylake 14nm Tock 2015

All told, Intel is shooting for a better than 5% IPC improvement over Haswell. This is similar to Ivy Bridge (4%-6%), though at this stage in the game Intel is not talking about expected clockspeeds or the resulting overall performance improvement. Intel has made it clear that they don’t regress on clockspeeds, but beyond that we’ll have to wait for further product details later this year to see how clockspeeds will compare.

To accomplish this IPC increase Intel will be relying on a number of architectural tweaks in Broadwell. Chief among these are bigger schedulers and buffers in order to better feed the CPU cores themselves. Broadwell’s out-of-order scheduling window for example is being increased to allow for more instructions to be reordered, thereby improving IPC. Meanwhile the L2 translation lookaside buffer (TLB) is being increased from 1K to 1.5K entries to reduce address translation misses.

The TLBs are also receiving some broader feature enhancements that should again improve performance. A second miss handler is being added for TLB pages, allowing Broadwell to utilize both handlers at once to walk memory pages in parallel. Otherwise the inclusion of a 1GB page mode should pay off particularly well for servers, granting Broadwell the ability to handle these very large pages on top of its existing 2MB and 4K pages.

Meanwhile, as is often the case Intel is once again iterating on their branch predictor to cut down on missed branches and unnecessary memory operations. Broadwell’s branch predictor will see its address prediction improved for both branches and returns, allowing for more accurate speculation of impending branching operations.

Of course efficiency increases can only take you so far, so along with the above changes Intel is also making some more fundamental improvements to Broadwell’s math performance. Both multiplication and division are receiving a performance boost thanks to performance improvements in their respective hardware. Floating point multiplication is seeing a sizable reduction in instruction latency from 5 cycles to 3 cycles, and meanwhile division performance is being improved by the use of an even larger Radix-1024 (10bit) divider. Even vector operations will see some improvements here, with Broadwell implementing a faster version of the vector Gather instruction.

Finally, while it’s not clear whether these will be part of AES-NI or another instruction subset entirely, Intel is once again targeting cryptography for further improvements. To that end Broadwell will bring with it improvements to multiple cryptography instructions.

Meanwhile it’s interesting to note that in keeping with Intel’s power goals for Broadwell, throughout all of this Intel put strict power efficiency requirements in place for any architecture changes. Whereas Haswell was roughly a 1:1 ratio of performance to power – a 1% increase in performance could cost no more than a 1% increase in power consumption – Broadwell’s architecture improvements were required to be at 2:1. While a 2:1 mandate is not new – Intel had one in place for Nehalem too – at the point even on the best of days meaningful IPC improvements are hard to come by at 1:1, never mind 2:1. The end result no doubt limited what performance optimizations Intel could integrate into Broadwell’s design, but it also functionally reduces power requirements for any given performance level, furthering Intel’s goals in getting Core performance in a mobile device. In the case of Broadwell this means Broadwell’s roughly 5% performance improvement comes at a cost of just a 2.5% increase in immediate power consumption.

With that said, Intel has also continued to make further power optimizations to the entire Broadwell architecture, many of which will be applicable not just to Core M but to all future Broadwell products. Broadwell will see further power gating improvements to better shut off parts of the CPU that are not in use, and more generalized design optimizations have been made to reduce power consumption of various blocks as is appropriate. These optimizations coupled with power efficiency gains from the 14nm process are a big part of the driving force in improving Intel’s power efficiency for Core M.

Intel Broadwell Architecture Preview Broadwell GPU Architecture
Comments Locked

158 Comments

View All Comments

  • IntelUser2000 - Wednesday, August 13, 2014 - link

    Broadwell Y's CPU portion is on 14nm but the PCH is on 32nm.

    Not to mention that Intel's 22nm process isn't that small compared to Foundries 28nm. 28nm Foundry is ~30% larger than Intel 22nm not 100%.

    20nm Foundry is really a half node jump from 28nm considering density improves but performance much less. 16nm is another half node since density barely changes but perf improves.

    Really, it can only be judged as a product.
  • mkozakewich - Thursday, August 14, 2014 - link

    The 'myth' was that x86 was a horrible power-sucking pig. It was shown that it was possible to at least get close to ARM processors.

    Meanwhile, Intel's chips are NOT as low-performance as Apple's. The chip in Surface Pro 3 is about 2x-3x faster, and these should be about the same. With this year's Apple chip, imagine it to be 2x faster or so. Meanwhile, they'll probably both take the same amount of power. SoCs these days are running multiples of Watts. Even Intel Atoms used to take over 10, but now a full Intel Core computer can run under 6 Watts. It would be really interesting to run all these numbers and give a complete report on the state of mobile vs. notebook processing. They seem matched on power usage, but ARM chips are far cheaper and run at far less performance.
  • peevee - Monday, August 11, 2014 - link

    Finally, a long awaited fanless design. I don't care about thickness that much as about energy waste on the heat and the fan to dissipate the heat.
    But... I have 2008 MacBook Pro with 2.4GHz Pentium M. If they achieved fanless design by bringing frequency down to, say, 1.8GHz, I am not interested (given that IPCs are not that different for real world applications). For me to upgrade, I want it to reach 3GHz, even if for a second and in a single thread, when starting applications for example. Anything below that is not a noticeable upgrade, and below 2.2GHz or so will be downgrade in practice.
    And biggest problem of Intel is not how thick they processors are, it is Microsoft - with Windows 8 (and 8.1) being so unbelievably awful (yes, I do own it for a while). Krzanich should call Nadella immediately and tell him to fire Larson-Green, or they both are going down.
  • MikhailT - Monday, August 11, 2014 - link

    You do realize this is for tablets and low power laptops (Chromebook/netbook style) only, right?

    It's not coming to MBP for any time soon unless you're talking about something else entirely, like the MBA, which also is not going to get M either because it'll be too big of a regression on the performance.

    I don't think we'll see any Macs with Core M.
  • ZeDestructor - Monday, August 11, 2014 - link

    You forget about IPC. Last I checked, compared to a Core 2 CPU at equal clock speeds, a Sandy Bridge CPU is 50+% faster on average, and Haswell is a further 15+% faster on top of that, all the while using less power.
  • Krysto - Tuesday, August 12, 2014 - link

    Since this is the FIRST core to be "fanless", they're probably squeezing a lot of stuff to make that work, and it probably still overheats. I wouldn't be too excited about it until we see how it does in actual devices, both from a power consumption point of view, but also a performance one (because if performance didn't matter, then we'd all just use ARM chips, no?).

    It would be laughable if Denver, which already beats mainstream Haswell Celeron, would be in range of Broadwell Y in performance, but still more energy efficient and with much better GPU performance.
  • UNCjigga - Monday, August 11, 2014 - link

    Time to spin up the 12" iPad Pro rumor mill again...but would Apple really need to build a device that runs 64bit iOS *and* OS X?
  • ilt24 - Monday, August 11, 2014 - link

    Apple doesn't seem all that interested in adding touch screen capabilities to OSX.
  • isa - Monday, August 11, 2014 - link

    OK, I'm confused, so any help appreciated. I want to replace my old desktop replacement laptop with a Broadwell equivalent. For example, the Broadwell version of an HP Envy 17t. What flavor of Broadwell am I waiting for? The Y flavor? U? H? Something else? thanks.
  • mapesdhs - Monday, August 11, 2014 - link


    I would ask the opposite: why do you need a replacement at all? What is it your current
    device cannot do, or does poorly? Once the newer products are out, find out which one
    solves those issues at the lowest cost. This focus on the jargon side of computing tech
    is the absolute worst aspect of how consumer computing has evolved since the 1980s.

    Ian.

Log in

Don't have an account? Sign up now