Broadwell CPU Architecture

We’ll kick off our look at Broadwell-Y with Broadwell’s CPU architecture. As this is a preview Intel isn’t telling us a great deal about the CPU at this time, but they have given us limited information about Broadwell’s architectural changes and what to expect for performance as a result.

With Broadwell Intel is at the beginning of the next cycle of their tick-tock cadence. Whereas tock products such as Haswell and Sandy Bridge designed to be the second generation of products to use a process node and as a result are focused on architectural changes, tick products such as Ivy Bridge and now Broadwell are the first generation of products on a new process node and derive much (but not all) of their advantage from manufacturing process improvements. Over the years Intel has wavered on just what a tick should contain – it’s always more than simply porting an architecture to a new process node – but at the end of the day Broadwell is clearly derived from Haswell and will be taking limited liberties in improving CPU performance as a result.

Intel's Tick-Tock Cadence
Microarchitecture Process Node Tick or Tock Release Year
Conroe/Merom 65nm Tock 2006
Penryn 45nm Tick 2007
Nehalem 45nm Tock 2008
Westmere 32nm Tick 2010
Sandy Bridge 32nm Tock 2011
Ivy Bridge 22nm Tick 2012
Haswell 22nm Tock 2013
Broadwell 14nm Tick 2014
Skylake 14nm Tock 2015

All told, Intel is shooting for a better than 5% IPC improvement over Haswell. This is similar to Ivy Bridge (4%-6%), though at this stage in the game Intel is not talking about expected clockspeeds or the resulting overall performance improvement. Intel has made it clear that they don’t regress on clockspeeds, but beyond that we’ll have to wait for further product details later this year to see how clockspeeds will compare.

To accomplish this IPC increase Intel will be relying on a number of architectural tweaks in Broadwell. Chief among these are bigger schedulers and buffers in order to better feed the CPU cores themselves. Broadwell’s out-of-order scheduling window for example is being increased to allow for more instructions to be reordered, thereby improving IPC. Meanwhile the L2 translation lookaside buffer (TLB) is being increased from 1K to 1.5K entries to reduce address translation misses.

The TLBs are also receiving some broader feature enhancements that should again improve performance. A second miss handler is being added for TLB pages, allowing Broadwell to utilize both handlers at once to walk memory pages in parallel. Otherwise the inclusion of a 1GB page mode should pay off particularly well for servers, granting Broadwell the ability to handle these very large pages on top of its existing 2MB and 4K pages.

Meanwhile, as is often the case Intel is once again iterating on their branch predictor to cut down on missed branches and unnecessary memory operations. Broadwell’s branch predictor will see its address prediction improved for both branches and returns, allowing for more accurate speculation of impending branching operations.

Of course efficiency increases can only take you so far, so along with the above changes Intel is also making some more fundamental improvements to Broadwell’s math performance. Both multiplication and division are receiving a performance boost thanks to performance improvements in their respective hardware. Floating point multiplication is seeing a sizable reduction in instruction latency from 5 cycles to 3 cycles, and meanwhile division performance is being improved by the use of an even larger Radix-1024 (10bit) divider. Even vector operations will see some improvements here, with Broadwell implementing a faster version of the vector Gather instruction.

Finally, while it’s not clear whether these will be part of AES-NI or another instruction subset entirely, Intel is once again targeting cryptography for further improvements. To that end Broadwell will bring with it improvements to multiple cryptography instructions.

Meanwhile it’s interesting to note that in keeping with Intel’s power goals for Broadwell, throughout all of this Intel put strict power efficiency requirements in place for any architecture changes. Whereas Haswell was roughly a 1:1 ratio of performance to power – a 1% increase in performance could cost no more than a 1% increase in power consumption – Broadwell’s architecture improvements were required to be at 2:1. While a 2:1 mandate is not new – Intel had one in place for Nehalem too – at the point even on the best of days meaningful IPC improvements are hard to come by at 1:1, never mind 2:1. The end result no doubt limited what performance optimizations Intel could integrate into Broadwell’s design, but it also functionally reduces power requirements for any given performance level, furthering Intel’s goals in getting Core performance in a mobile device. In the case of Broadwell this means Broadwell’s roughly 5% performance improvement comes at a cost of just a 2.5% increase in immediate power consumption.

With that said, Intel has also continued to make further power optimizations to the entire Broadwell architecture, many of which will be applicable not just to Core M but to all future Broadwell products. Broadwell will see further power gating improvements to better shut off parts of the CPU that are not in use, and more generalized design optimizations have been made to reduce power consumption of various blocks as is appropriate. These optimizations coupled with power efficiency gains from the 14nm process are a big part of the driving force in improving Intel’s power efficiency for Core M.

Intel Broadwell Architecture Preview Broadwell GPU Architecture
POST A COMMENT

158 Comments

View All Comments

  • wurizen - Monday, August 11, 2014 - link

    well, an fx-8350 is toe-to-toe with an i7-2600k, which is no slouch until today. and comparing fx-8350 with today's i7-4770k would be a little unfair since the 4770k is 22nm while the 8350 is at 32nm. and we're not even considering software optimizations from OS and/or programs that are probably bent towards intel chips due to its ubiquity.

    so, i think, you're wrong that the fx-8350 doesn't provide good enough. i have both i7-3770k oc'd to 4.1 ghz and an fx-8320 at stock and the amd is fine. it's more than good enough. i've ripped movies using handbrake on both systems and to me, both systems are fast. am i counting milliseconds? no. does it matter to me if the fx-8320 w/ lets say amd r9-290 has 85 fps for so and so game and an i7-4770k w/ the same gpu has a higher fps of 95, let's just say? i don't think so. that extra 10 fps cost that intel dude $100 more. and 10 extra frames with avg frames of 85-95 is undecipherable. it's only when the frames drop down below 60 does one notice it since most monitors are at 60 hz.

    so what makes the fx not good enough for you again? are you like a brag queen? a rich man?
    Reply
  • frostyfiredude - Monday, August 11, 2014 - link

    Not fair to compare against a 22nm from Intel? Bogus, I can go to the store and buy a 22nm Intel so it should be compared against AMDs greatest. An i5-4670K matches or exceeds the performance of even the FX-9590 in all but the most embarrassingly threaded tasks while costing 50$ more. Cost to operate the machine through the power bill makes up for that price difference at a fairly standard 12c per KWh when used heavily 2 hours per day for 4 years or idling 8 hours per day for the same 4 years.

    Your argument for gaming with the 8350 being good enough is weak too when the 10$ cheaper i3-4430 keeps up. Or spent 125$ less to get a Pentium G3258 AE, then mildly overclock it to again have the same good enough gaming performance if >60FPS is all that matters. The i3 and pentiums are ~70$ cheaper yet when power use is counted again.
    Reply
  • wurizen - Tuesday, August 12, 2014 - link

    well, if a pentium g3258 is good enuff for gaming, then so is an fx-8350. whaaaaaat? omg we know intel is king. i acknowledge and understand that. intel rules. but, amd is not bad. not bad at all is all im trying to make.

    /omg
    Reply
  • wetwareinterface - Monday, August 11, 2014 - link

    wow...

    first off you are assuming a lot and not bothering to check any published benchmarks out there so,

    1. 8350 isn't even equal to 2500 i5 let alone 2600 i7.
    2. 32nm vs. 22nm means nothing at all when comparing raw performance in a desktop. it will limit the thermal ceiling so in a laptop the higher nm chip will run hotter therefore be unable to hit higher clocks but in a desktop it means nil.
    3. handbrake ripping relies on speed of dvd/blu-ray drive, handbrake transcoding relies on cpu performance and the 8350 gets spanked there by a dual core i3 not by miliseconds but tens of seconds. i5 it gets to the level of minutes i7 more so.
    4. let's say you're pulling framerates for an r9-290 out of somewhere other than the ether... reality is an i5 is faster than the 8350 in almost any benchmark i've ever seen by roughly 15% overall. in certan games with lots of ai you get crazy framerate advantages with i5 over 8350, things like rome total war and starcraft 2 and diablo 3 etc...

    i'll just say fx8350 isn't good enough for me and i'm certainly not a rich man. system build cost for what i have vs. what the 8350 system would have run was a whopping $65 difference
    Reply
  • wurizen - Tuesday, August 12, 2014 - link

    #3 is B.S. a dual-core i3 can't rip faster than an fx-8350 in handbrake.

    #4 the r-290 was an example to pair a fairly high end gpu with an fx-8350. a fairly high end gpu helps in games. thus, pairing it with an fx-8350 will give you a good combo that is more than good enough for gaming.

    #2 22nm vs. 32nm does matter in desktops. the fx-8350 is 32nm. if it goes to 22nm, the die shrink would enable the chip to either go higher in clockspeed or lower it's tdp.

    u sound like a benchmark queen or a publicity fatso.
    Reply
  • wurizen - Tuesday, August 12, 2014 - link

    oh and #1--i am not saying the fx 8350 is better than the i7-2600k. i said "toe-to-toe." the i5-2500k can also beat the fx-835o b/c of intel's IPC speed advantage. but, i think the reasons for that are programs not made to be multithreaded and make use of fx-8350 8-cores to it's potential. since amd trails intel in IPC performance by a lot--this means that a 4-core i5-2500k can match it or sometimes even beat it in games. in a multithreaded environment, the 8-core fx-8350 will always beat the i5-2500k. although it might still trailer the 4-core + 4 fake cores i7-2600k. just kidding. lol.

    i said toe to toe with 2600k which means its "competitive" to an i7-2600k even though the AMD is handicapped with slower IPC speed and most programs/OS not optimize for multithreading. so, to be 10-20% behind in most benchmarks against an i7-2600k is not bad considering how programs take advantage of intel's higher IPC performance.

    do u understand what im trying to say?
    Reply
  • Andrew Lin - Tuesday, August 26, 2014 - link

    i'm sorry, is your argument here that the FX-8350 is better because it's inferior? because that's all i'm getting out of this. Of course a benchmark is going to take advantage of higher IPC performance. That's the point of a benchmark: to distinguish higher performance. The way you talk about benchmarks it's as if you think benchmarks only give higher numbers because they're biased. That's not how it works. The benchmarks give the i7-2600k higher scores because it is a higher performance part in real life, which is what anyone buying a CPU actually care about. Not to mention the significantly higher efficiency, which is just an added benefit.
    Also, it's really hard to take you seriously when your posts make me think they're written by a teenage girl.
    Reply
  • wurizen - Tuesday, August 12, 2014 - link

    also, if the fps disparity is so huge btwn fx-8350 and say i5-2500k in games u mention like starcraft 2, then something is wrong with that game. and not the fx-8350. i actually have sc2 and i have access to a pc w/ an fx-8320. so i am going to do a test later tonight. my own pc is an i7-3770k. so i could directly compare 2 different systems. the only thing is that the amd pc has an hd5850 gpu, which should be good enuff for sc2 and my pc has a gtx680 so it's not going to be a direct comparison. but, it should still give a good idea, right? Reply
  • wurizen - Tuesday, August 12, 2014 - link

    i just played starcraft 2 on a pc with fx-8320 (stock clockspeed), 8GB 1600Mhz RAM, 7200rpm HDD and an old AMD HD5850 w/ 1GB VRAM. the experience was smooth. the settings were 1080P, all things at ultra or high and antialiasing set to ON. i wasn't looking at FPS since i don't know how to do it with starcraft 2, but, the gameplay was smooth. it didn't deter my experience.

    i also play this game on my own pc which is an i7-3770k OC'd to 4.1, 16GB 1600 Mhz RAM, 7200rpmHDD and an Nvidia GTX680 FTW w/ 2GB VRAM and i couldn't tell the difference as far as the smoothness of the gameplay is concerned. there is some graphical differences between the AMD GPU and the Nvidia GPU but that is another story. my point is that my experience were seamless playing on an FX chip pc to my own pc with 3700k.

    to make another point, i also have this game on my macbook pro and that is where the experience of playing this game goes down. even in low settings. the MBP just can't handle it. at least the one i have with the older gt330m dGpu and dual-core w/ hyperthreading i7 mobile cpu.

    so.... there.... no numbers or stats. just the experience, to me, which is what counts did not change with the pc that had the amd fx cpu.
    Reply
  • wurizen - Tuesday, August 12, 2014 - link

    well, i should point out that my macbook pro (mid-2010 model) can handle starcraft 2. but, it's not a "fun" experience. or as smooth. Reply

Log in

Don't have an account? Sign up now