Broadwell-E Conclusion

Intel’s latest Broadwell-E platform is the next iteration of their high-end desktop strategy, which involves bringing the low-to-mid range professional processors into the consumer market and adding a few features (such as overclocking), but removing others (ECC). For this launch, Intel introduced four processors, ranging from six cores to ten cores and varying in price from $434 to $1723.

At AnandTech we have tested Intel’s Broadwell cores before, both in our Broadwell desktop processor review of the Core i7-5775C and the professional level Broadwell-EP Xeon E5-2600 v4 processor review. We noted a 3-5% increase in clock-per-clock performance compared to the previous generation ‘Haswell’ parts at the time. This review tests all the new Broadwell-E parts for direct comparison to the Haswell parts.

Performance

The move from Haswell-E to Broadwell-E is a change from 22nm to 14nm process technology but the microarchitecture is mostly the same, barring minor adjustments. These adjustments include an improved memory controller (now qualified on DDR4-2400), a faster divider, slightly improved branch prediction, a slightly larger scheduler, and a reduction in AVX multiply latency from 5 cycles to 3 cycles.

Due to this, the performance of the new Broadwell-E parts is somewhat predictable. Adding more cores and adjusting for frequency is a good marker, as is adjusting for the new memory speed. That means a move from the i7-5960X to the i7-6950X gives two more cores at the same frequency, or about 25% more performance. The downside of this upgrade is the price: the i7-5960X was launched at $999/$1049, whereas the new i7-6950X is $1723. That’s a big price increase by any standard.

Turbo Boost Max 3.0: A Troubled Implementation

For Broadwell-E, Intel introduced a new technology called Turbo Boost Max 3.0. With an appropriate driver, BIOS, BIOS settings, and software, this allows the system to pin a single threaded program to the best performing single core at a higher-than-listed frequency. It sounds as if it has potential, but the implementation means that very few users will ever see it.

Firstly, the driver/software implementation is perhaps easily overcome when the driver gets pushed through Windows 10 updates, similar to Speed Shift on Skylake processors which is now fully active. The part where it breaks down is in the BIOS and BIOS settings requirements. Ultimately the BIOS controls which P-states are in play (when the OS selects them), but the BIOS settings can override anything the processor might want by default. Because TBM3 involves an increase in frequency, this requires a number of settings in the BIOS to be enabled. But, because each processor is different, motherboard manufacturers are most likely going to run these options at a very conservative value so none of their users have a bad experience. In the end, whether it's used is going to depend on if the motherboard manufacturers enable it in the first place. In the motherboard we tested, we were told that it was a management decision to have it disabled by default. Because most users never touch the BIOS, especially in a prosumer/professional markets, it will most likely never be used in this case.

We didn’t get time to run a full benchmark suite with TBM 3.0 enabled, and will most likely follow up to see where in our tests it can make the most difference.

Market

The pricing will be prohibitive to most. Many enthusiasts who have played in the HEDT space for a number of years are used to the $999/$1049 price point for the most expensive processor, even when the number of cores has increased. However, this time Intel has decided to increase the top chip's cost by almost 70%. This has complications as to what product is best for prosumers looking to upgrade.

For $1721, if a user wants to invest in the i7-6950X but does not want the overclocking, they can invest in either the 14-core E5-2680 v4 for $1745 giving 40% more cores at a lower power with a slight decrease in frequency, or get double the cores in a 2P system and using the E5-2640 v4 processor: a 10-core 2.4 GHz/3.4 GHz part, running at 90W, for $939. Two of these runs a $1878, which is slightly more but having double the cores available might be the more important thing here. However because these CPUs are not often found at retail, it means that users may have to approach a system builder/integrator in order to source them.

One would assume that Intel is interested in retaining the long term HEDT hold-outs still on Nehalem, Westmere and Sandy Bridge-E processors. These prices (and the overclocking performance) might make these users feel that they should hold on another generation, or invest in Haswell-E. That being said, the low-end Broadwell-E pricing is higher than that of the low-end Haswell-E, which will extend the pricing gap between the mainstream and the high-end desktop platform.

Catching Up: How Intel Can Re-Align Consumer and HEDT
Comments Locked

205 Comments

View All Comments

  • Impulses - Tuesday, May 31, 2016 - link

    Or just get a 5820K if you'd benefit from the extra cores... Or a 6600/6700K if you want the IPC bump for non gaming tasks and platform upgrades (USB 3.1, more lanes for M.2, etc).

    After throwing tick tock out the window it's unlikely the next refresh will be any more tempting. If you don't need any of the aforementioned things (more cores or platform upgrades) then you might as well sit tight tho.
  • rhysiam - Tuesday, May 31, 2016 - link

    I've asked this question before and never got a good answer, so I'm trying again. Can someone explain to me why the boost clocks on the SKUs with more cores are always so much lower than those with fewer cores enabled?

    Base clocks have to be lower of course. I've got no issues with 10 active cores requiring lower clocks than 6 active cores, that makes sense. But what I don't get is why a 10 core SKU with 1 active core and 9 idle is somehow unable to turbo anywhere near as a 6 core implementation which is effectively 1 active core, 5 idle cores and 4 disabled cores on the same silicon . Is the power difference between those 4 idle & disabled cores really so significant on a 140W CPU that it necessitates an almost 10% lower clock speed?

    It makes it even harder to justify spending $1.7K on a CPU when it looses so many benchmarks to CPUs costing a fraction of the price (including often the almost 3 year old 4820K).
  • RealLaugh - Tuesday, May 31, 2016 - link

    I would like to know this too, I looked at the frequency and was surprised but I don't understand why it's like that.

    Also in the past Intel's top like Xeons have had hugh cache + core count but between 2-3GHz clock speeds only...
  • Ph0b0s - Tuesday, May 31, 2016 - link

    Brief go at an explanation. Others can chime in to add on to my very simplistic explanation.

    The higher cored CPU's having lower clocks is down the to thermal envelope (referred to as TDW in watts) they are trying to hit on that CPU. Each core when working is effectively a heather on the CPU package. On CPU's with more cores, the heaters are more dense, i.e more heaters per area as they try to hit the same physical CPU package size whether 6 or 10 cores.

    When all the cores are going a 10 core CPU will generate more heat than a 6 or 4 core CPU when the clocks are the same. To keep the 10 core CPU from hitting the thermal envelope limits that Intel put on them they decrease clock speed to offset the extra heat they are getting from the extra cores.
  • rhysiam - Tuesday, May 31, 2016 - link

    Yes, but the "boost" clocks refer to single (or lightly) threaded workloads. Only one core is working. Your last paragraph refers to "when all the cores are going" - that's a base clock situation. As I said in my post I have no issue at all with 10 active cores requiring a lower frequencies than 6 in the same package. It's the single core taxed scenario I struggle to understand.
  • adamod - Wednesday, June 1, 2016 - link

    just a side note....my x5660's are rated at 2.8 base and 3.2 boost and with all 12 cores (dual socket setup) and 24 threads going 100 percent load and pulling 102W ea (while rated at 95W ea) mine NEVER get below 3.1ghz...i somehow got lucky as shit because i have two that can do it and only get to about 80C with the stock cooling on my hp workstation...sometimes you get lucky.....unfortunately i cant overclock though :(
  • ThortonBe - Tuesday, May 31, 2016 - link

    Perhaps it is like this. Not every transister has the same performance. When you increase the number of transisters (more cores) you increase the chance that you will get some slower transisters. For the turbo spec every core needs to be able to hit it.

    By making the turbo spec lower for higher core parts, Intel will have more parts that can be sold as the more expensive part (e.g. with ten cores they have a higher chance of fabricating a slow core than with six cores so they lower the max turbo on the ten cores to compensate to keep yields high).

    Also, the floorplans (how the cores are wired up) might differ between the ten and eight core parts. In the GHz even the wiring can severely limit how high a frequency can be achieved. The more complex ten core designs are probably harder to wire up properly.
  • rhysiam - Wednesday, June 1, 2016 - link

    The idea of tolerances with individual cores is an interesting suggestion I hadn't thought of.

    RE your last paragraph though, my understanding is that all 4 of these SKUs are identical chips, with the lower parts simply having cores (and PCIe lanes) disabled. That was certainly the case with Haswell-E CPUs and I'm assuming the same here. So the 10 core designs are exactly the same as the 6 core.

    I suppose it's possible that the 6 core chips undergo testing and have their worst cores disabled, allowing higher turbo frequencies. It just seems, particularly with this generation, that the $1.7K flagship CPU is going to be such a low volume part anyway that they should be able to cherry-pick CPUs which can hit higher boost clocks.

    Your suggestion would certainly explain why they're pursuing and promoting "turbo boost max 3.0." It seems like it's a bit of a mess at the moment, but if they can allocate single threaded workloads to the "best" core, surely they could start to hit much better boost clocks?

    With Haswell the situation was even worse. You can buy a 4790K which can boost a single core to 4.4Ghz, but the best single threaded Haswell-E option (5930K), despite more the 50W additional TDP to play with and no iGPU for competition has to settle for a full 700mhz (16%) lower on the boost clock. I realise there's additional complexity with the "E" parts with larger cache and a wider memory bus, but that's a massive sacrifice to make that in many cases makes the cheaper 4790K the faster CPU, often by a wide margin.

    I'd welcome other thoughts/comments/ideas here.
  • SAAB340 - Wednesday, June 1, 2016 - link

    The 6950X will all have to be well binned to start with. They will all have to have the whole chip working and be able to do so at low voltage enough to meet the 140W TDP. If you have a leakier fully working chip it might still be sold as a 8 or 6 core version given that you just disable 2 or 4 (fully working but leaky) cores to meet the 140W TDP.

    The Turbo speed bins being lower in general the more cores you get on a CPU is certainly a function of that every individual core will have to be able to hit the highest turbo bin, even though it won't be TDP limited at that time. So you're pretty much guaranteed to be able to overclock to max single core turbo speed but you will most likely exceed the TDP.

    It's just the same as that its way harder to find a hexa- octa- or deca-core chip able to reach xGHz overclock on all cores compared to finding a quad-core chip able to reach the same xGHz as 'only' 4 cores have to be good enough overclockers to reach it. The more cores, the less likely when we start to push the limits.

    Turbo Boost Max 3.0 is certainly sounding like an interesting function where by the sound of it they instead try to identify the core that is able to run at the highest frequency. Here the opposite would be true, the more cores to choose from the higher likelihood to find one able to reach xGHz.
  • extide - Monday, June 6, 2016 - link

    The floorplan does not differ. All 4 of these sku's use the exact same 10-core die. The lower end ones just have cores disabled, but otherwise they are the same exact silicon.

Log in

Don't have an account? Sign up now