Intel Broadwell Architecture Preview: A Glimpse into Core M

Name: Intel Broadwell Architecture Preview: A Glimpse into Core M
Item: Intel Broadwell Architecture Preview: A Glimpse into Core M
Author: Ryan Smith

by Ryan Smith on August 11, 2014 12:01 PM EST

Posted in
CPUs
Intel
GPUs
Broadwell
Core M

158 Comments | Add A Comment

158 Comments

Broadwell CPU Architecture

We’ll kick off our look at Broadwell-Y with Broadwell’s CPU architecture. As this is a preview Intel isn’t telling us a great deal about the CPU at this time, but they have given us limited information about Broadwell’s architectural changes and what to expect for performance as a result.

With Broadwell Intel is at the beginning of the next cycle of their tick-tock cadence. Whereas tock products such as Haswell and Sandy Bridge designed to be the second generation of products to use a process node and as a result are focused on architectural changes, tick products such as Ivy Bridge and now Broadwell are the first generation of products on a new process node and derive much (but not all) of their advantage from manufacturing process improvements. Over the years Intel has wavered on just what a tick should contain – it’s always more than simply porting an architecture to a new process node – but at the end of the day Broadwell is clearly derived from Haswell and will be taking limited liberties in improving CPU performance as a result.

Intel's Tick-Tock Cadence
Microarchitecture	Process Node	Tick or Tock	Release Year
Conroe/Merom	65nm	Tock	2006
Penryn	45nm	Tick	2007
Nehalem	45nm	Tock	2008
Westmere	32nm	Tick	2010
Sandy Bridge	32nm	Tock	2011
Ivy Bridge	22nm	Tick	2012
Haswell	22nm	Tock	2013
Broadwell	14nm	Tick	2014
Skylake	14nm	Tock	2015

All told, Intel is shooting for a better than 5% IPC improvement over Haswell. This is similar to Ivy Bridge (4%-6%), though at this stage in the game Intel is not talking about expected clockspeeds or the resulting overall performance improvement. Intel has made it clear that they don’t regress on clockspeeds, but beyond that we’ll have to wait for further product details later this year to see how clockspeeds will compare.

To accomplish this IPC increase Intel will be relying on a number of architectural tweaks in Broadwell. Chief among these are bigger schedulers and buffers in order to better feed the CPU cores themselves. Broadwell’s out-of-order scheduling window for example is being increased to allow for more instructions to be reordered, thereby improving IPC. Meanwhile the L2 translation lookaside buffer (TLB) is being increased from 1K to 1.5K entries to reduce address translation misses.

The TLBs are also receiving some broader feature enhancements that should again improve performance. A second miss handler is being added for TLB pages, allowing Broadwell to utilize both handlers at once to walk memory pages in parallel. Otherwise the inclusion of a 1GB page mode should pay off particularly well for servers, granting Broadwell the ability to handle these very large pages on top of its existing 2MB and 4K pages.

Meanwhile, as is often the case Intel is once again iterating on their branch predictor to cut down on missed branches and unnecessary memory operations. Broadwell’s branch predictor will see its address prediction improved for both branches and returns, allowing for more accurate speculation of impending branching operations.

Of course efficiency increases can only take you so far, so along with the above changes Intel is also making some more fundamental improvements to Broadwell’s math performance. Both multiplication and division are receiving a performance boost thanks to performance improvements in their respective hardware. Floating point multiplication is seeing a sizable reduction in instruction latency from 5 cycles to 3 cycles, and meanwhile division performance is being improved by the use of an even larger Radix-1024 (10bit) divider. Even vector operations will see some improvements here, with Broadwell implementing a faster version of the vector Gather instruction.

Finally, while it’s not clear whether these will be part of AES-NI or another instruction subset entirely, Intel is once again targeting cryptography for further improvements. To that end Broadwell will bring with it improvements to multiple cryptography instructions.

Meanwhile it’s interesting to note that in keeping with Intel’s power goals for Broadwell, throughout all of this Intel put strict power efficiency requirements in place for any architecture changes. Whereas Haswell was roughly a 1:1 ratio of performance to power – a 1% increase in performance could cost no more than a 1% increase in power consumption – Broadwell’s architecture improvements were required to be at 2:1. While a 2:1 mandate is not new – Intel had one in place for Nehalem too – at the point even on the best of days meaningful IPC improvements are hard to come by at 1:1, never mind 2:1. The end result no doubt limited what performance optimizations Intel could integrate into Broadwell’s design, but it also functionally reduces power requirements for any given performance level, furthering Intel’s goals in getting Core performance in a mobile device. In the case of Broadwell this means Broadwell’s roughly 5% performance improvement comes at a cost of just a 2.5% increase in immediate power consumption.

With that said, Intel has also continued to make further power optimizations to the entire Broadwell architecture, many of which will be applicable not just to Core M but to all future Broadwell products. Broadwell will see further power gating improvements to better shut off parts of the CPU that are not in use, and more generalized design optimizations have been made to reduce power consumption of various blocks as is appropriate. These optimizations coupled with power efficiency gains from the 14nm process are a big part of the driving force in improving Intel’s power efficiency for Core M.

Intel Broadwell Architecture Preview Broadwell GPU Architecture

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

158 Comments

View All Comments

Morawka - Monday, August 11, 2014 - link
wasn't gartner the one who said apple mac sales were down 40% and it turned out to be up 60%
mkozakewich - Thursday, August 14, 2014 - link
Those weren't actual numbers, they were industry guesses. The lesson is not to trust pundits and "Industry analysts". If you can wait for proper news to be released, it's better than rumours.
lilmoe - Monday, August 11, 2014 - link
Actually, the tablet market as a cause for dropping PC sales is only half or less than half of the story. Logically speaking, most consumers find their 2-4 year old machines sufficient for their "productive" needs. Unlike previous years, PCs are living far beyond their intended years of services, and consumers are in no dire need for newer, faster components. Mind you, hardware isn't getting that much faster with each iteration (relatively speaking), and a simple SSD upgrade and/or more RAM would improve performance significantly that a whole hardware upgrade becomes less appealing (same for other components). It's mostly about convenience, connectivity and mobility for consumers these days (cheap mobile media consumption) that's why the tablet market "appear" to have affected PC sales. Those who find a tablet replacing their fully fledged PC didn't need a PC in the first place.
mayankleoboy1 - Monday, August 11, 2014 - link
i remember being very excited to read the Haswell architecture preview.
Felt completely disappointed when the actual CPU were reviewed.

Hope BDW-Y is not the same
edlee - Monday, August 11, 2014 - link
they should make a chromebook with this processor.
Krysto - Monday, August 11, 2014 - link
My guess is this chip won't be that much faster than Nvidia's Denver - unless Intel cheats in benchmarks somehow (like with TurboBoost).
68k - Monday, August 11, 2014 - link
How on earth is TurboBoost cheating? Is it cheating to include a feature that actually does result in a CPU that allow itself to do short sprints in moments where it is needed the most in order to make a device "feel" fast (i.e. respond quickly to events generated by the user)?

Cannot talk for BDW but both HSW and SMT usually stay at or very close to max turbo for _very_ long, Z3770 sitting inside a 10" tablet can run at 2.3-2.4GHz on all 4 cores for several minutes.
nonoverclock - Monday, August 11, 2014 - link
AMD has a turbo mode. Do they cheat?
psyq321 - Tuesday, August 12, 2014 - link
TurboBoost is not cheating but optimal demand-based power management.

All modern ARM CPUs are also doing the same thing (scaling down frequency when idle, up when busy). I am quite sure it is the case with the AMD CPUs, too - it is just that I do not have any to check.

The only three things that matter are a) price b) performance c) power draw (real that is measured, not marketing terms such as TDP, SDP, etc.).
mkozakewich - Thursday, August 14, 2014 - link
TDP isn't a marketing term, it's the wattage of heat displayed by the processor. Wattage is a bit of a weird word for it, but Watts are joules-per-second, and a joule is the amount of energy needed to heat water a certain amount.

Intel Broadwell Architecture Preview: A Glimpse into Core M

Broadwell CPU Architecture

Post Your Comment

158 Comments

View All Comments

Morawka - Monday, August 11, 2014 - link

mkozakewich - Thursday, August 14, 2014 - link

lilmoe - Monday, August 11, 2014 - link

mayankleoboy1 - Monday, August 11, 2014 - link

edlee - Monday, August 11, 2014 - link

Krysto - Monday, August 11, 2014 - link

68k - Monday, August 11, 2014 - link

nonoverclock - Monday, August 11, 2014 - link

psyq321 - Tuesday, August 12, 2014 - link

mkozakewich - Thursday, August 14, 2014 - link

Log in

Don't have an account? Sign up now