Broadwell-U: On Performance

As part of the Broadwell-U launch, it would not be complete without a list of performance related metrics direct from Intel indicating how Broadwell-U improves over Haswell-U. Without hardware on hand to test for ourselves it is hard to verify the numbers, but it provides a number of interesting talking points and how they compare to the previous Intel presentations leading up to this.

Core Improvements

We covered the transistor numbers on the previous page, but Intel’s direct performance metrics are most important when we consider graphics and battery life. Moving from Haswell-U to Broadwell-U, in terms of productivity, will not be that much of a jump as it is a similar architecture but on a different process node. It allows Intel to catch the low hanging fruit and move the IPC up by around 5%, achieved by the following:

Larger OoO scheduler
Faster store-to-load forwarding
Larger (+50%) L2 transaction lookaside buffer (TLB)
New dedicated 1GB page mode for L2 TLB
2nd TLB page miss handler
Faster FP multiplier
Faster Radix-1024 divider
Improved address prediction for branches and returns
Targeted cryptography instruction acceleration

The node adjustment has more weight when it comes to power saving, resulting in a lower voltage required for similar performance, but combined with Intel’s 2:1 policy for Broadwell (+2% performance uses at most +1% power) is good all around.

However the bigger change is on the GPU side. Intel is quoting a +22% synthetic graphics improvement from HD 5500 to HD 4400 with 3DMark and +50% for Cyberlink MediaEspresso for video conversion.

One might consider that Intel should bring alternating CPU and GPU performance each U series cycle, to give each platform a serious talking point. Haswell gave a half-generation increment in the name scheme after all (Gen7 to Gen7.5) but the CPU architecture was new compared to Ivy Bridge.

Intel is also a fan at looking into historical improvements. If you consider that a number of users are upgrading a 2-4 year old system, this makes a good amount of sense to see where the multi-generation improvements add up. On the other hand, when a person does upgrade, you would hope that every area has been improved over the 2-3 generations in the interim.

Naturally in order to give the best comparison data we look back at the oldest reasonable product for comparison – in this case Intel pitted an i5-5300U (HD 5500, GT2 with 24 EUs) against an i5-520UM. In the time between these two platforms, the concept of attacking mobile devices has changed significantly because of the base performance. If we put the 4.5W equivalent of the i5-520UM into a fanless tablet for example, the quality and features we know today would (I assume) feel slow almost to a point of excruciating. One argument is that back then, in 2010-ish (and before), our concept of software features and gaming was not at the level of detail it is today (which is true) and the same comparison will most likely be made in four years looking back at this era. Not only does the hardware improve, but also the understanding of the market and the concept of user experience.

Nevertheless, now we have devices that wake from sleep in fractions of a second rather than seconds, or turn on in seconds rather than minutes. Battery life has improved because integrated graphics are a bigger portion of the equation and we have thrown the graphics card away for most devices that need a sense of mobility. My old 8lb brick of a mobile 15-inch 1200p workstation used a 45W GPU with a 35W CPU, which was a nightmare for working on-the-go. The 11-inch netbook wasn’t a lot better, with the low 1366x768 resolution and underwhelming performance. As I am writing this review, my sub-3lb UX301 laptop is in a low power mode and on this flight I have managed three hours of active writing time, looking at text on white backgrounds, and still have half of the battery remaining. At this point four years ago, I would be getting out my charger for my 8lb brick with its extended battery and then wondering if I have exceeded the power limit for the flight socket. A popular feeling is to look back fondly to the past, but when it comes to the combination of laptop battery life with performance, the only way is forward.

Battery Life and the Audio DSP

Almost all the Intel suggested use scenarios, outside static All-In-Ones and mini-desktops, rely on some form of battery, so it makes sense that power efficiency is one card in play for Broadwell-U. In the past this relates in terms of actual performance per watt but also in regards to time-to-sleep, especially when parts of the system can be put into a lower power state or shut off completely when not in use. This makes designs complicated with disconnected clock domains as introduced in previous designs and so forth.

The test for battery life is also important as well because users typically do not run blank screens at idle when performing daily tasks. The two metrics Intel has provided is a 100 nit display idle with Windows 8.1, with the other requiring local HD video playback. 

For the former, Intel is quoting +60 minutes of battery life on their test platform at idle, equivalent to +11.0%. Most of this power saving comes from the SoC using better power saving techniques, but also the rest of the platform, such as the PCH, also reduces its power use to around half.

During the (local) video playback, a 90 minute difference equates to a substantial +20.8% battery use gain. A small amount of this is from the SoC and platform, but the biggest saving by far is the audio. Broadwell-Y and Broadwell-U both integrate Intel’s audio DSP (Digital Signal Processor) into the PCH. This removes a couple of Realtek components from the motherboard and allows Intel to bring it under their own manufacturing process, as well as configure the power gating needed.

The DSP is more powerful, presumably equating to a good race-to-sleep performance as well as dealing with HD audio under a lower power budget. Interestingly enough I would point out that the power usage of the DSP will be directly related to how much data is flowing through. If a HD video with little to no audio is involved, then the power usage will be quite low anyway. I would like to perhaps put a SYL metal live-show DVD through its paces to see how this affects power consumption.

As we mentioned back during the Core M discussions, the audio DSP lends itself to being a configurable and programmable entity, much in the same way that AMD’s solution is actively promoted. Similar to the response we had back then, Intel is considering opening it up with a public SDK, although that side of the equation is not on the roadmap as of yet.

Broadwell-U Platform Controller Hub (PCH)

As a writer, my bread and butter at AnandTech these past four years has revolved around motherboards and thus examining the connectivity provided by a chipset is always interesting. Because Intel bundle both the processor and the PCH on the same package, it allows manufacturers to save space in their design but it also allows Intel to control power consumption tighter to give better performance or longer battery life as a whole. There is still room for manufacturers to differentiate in their IO offerings, which is a good thing for consumers.

The new PCH for Broadwell-U focuses on that power consumption, especially when it comes to throttling sections and data pathways when not in use. The ‘Dynamic Power and Thermal Framework’ entry for the 5th Gen PCH should allow the performance to either respond as a function of battery life or skin temperature. This means throttling where necessary to reduce temperature or increase battery life. Wake on Voice is also a target for Intel, allowing devices to maintain a super-low power state but still respond without direct touch.

When it comes to direct connectivity, the PCH offers four SATA 6 Gbps, four USB 3.0 (two of which are muxed similar to a hub), eight USB 2.0 ports, TPM, a PCIe 2.0 x4 and another 12 PCIe 2.0 lanes split into 6 ports, allowing six devices maximum. We asked Intel regarding PCIe storage support for RST, and were told that with additional hardware support (remapping logic), Broadwell-U can support one PCIe 2.0 x2 PCIe storage device. This means that if a PCIe storage device based Broadwell-U came to market, with RST capabilities, it would cost a bit more than the base model. Also worth noting is that Broadwell-U is still using PCIe 2.0. On the PCH side this is perhaps not so much a big deal, and when asked about PCIe 3.0 Intel reiterated their stance on not commenting on possible future plans but they are monitoring demands and industry trends.

On the DRAM front, we got confirmation that Broadwell-U will support a maximum of 16GB of DDR3L/DDR3L-RS or LPDDR3 memory. No comment was made on a move towards two modules per channel memory or DDR4. Regarding video connectivity, Broadwell-U was too early for HDMI 2.0 and thus has HDMI 1.4b.

WiDi 5.1

Also new on the table is WiDi 5.1, which brings support for 4K to the ecosystem.

A part of WiDi that has been lacking has been the business features, and as a result Intel is focusing on security, privacy and controls needed for a professional environment. These will need a driver update for the ultra-early adopters of Broadwell, but Intel is driving down the costs of the WiDi adapters to a more palatable price point. My Belkin WiDi receiver, for example, retailed at 120 GBP-ish back in 2013 and requires an external power supply. Compare that to the product Intel promoted with their conference call - the Actiontec Mini2 which uses HDMI and is only $40.

Intel Wireless AC-7265

While not strictly speaking new to the market, Intel is promoting its new low power WiFi solution to the manufacturers to use in conjunction with Broadwell. The AC 7265 is an upgrade over the AC 7260 that was used extensively in Haswell from mobile devices all the way up to big desktop partners, and the AC 7265 brings about both performance and power benefits.

The form factor specifically for Broadwell-U is provided as a BGA M.2 part, with the package being 12mm x 16mm (given by the 1216 form factor designation). Low powered wireless is an important part of lower performance systems, as without the right configuration a sustained network load can eat up a portion of the processor performance. Intel’s partners with Broadwell-U are presumably not bound to use the AC 7265 and can use other products based on other performance metrics, but Intel is targeting networking as a source of power drain and working to correct that issue.

Devices! Where and When?

Most of AnandTech are here in Vegas, attending CES 2015 and (almost literally) running between meetings, press events and product showcases. Broadwell-U is high on our priority list, and we know several are due for announcement this week. Watch this space.

Fitting in With Core M & Release Dates
POST A COMMENT

85 Comments

View All Comments

  • KaarlisK - Monday, January 05, 2015 - link

    "It might come across as somewhat surprising that a 15W CPU like the i7-5650U has a 2.2 GHz base frequency but then a 3.2 GHz to 3.1 GHz operating window, and yet the i7-5557U has a 3.1 GHz base with 3.4 GHz operating for almost double the TDP. Apart from the slight increase in CPU and GPU frequency, it is hard to account for such a jump without point at the i7-5650U and saying that ultimately it is the more efficient bin of the CPUs."
    This is not surprising. This is used to increase the GPU performance. 28W CPUs have Iris 6100, 15W CPUs have HD 6000.
    In no way does TDP tell us anything about efficiency.
    Reply
  • aratosm - Monday, January 05, 2015 - link

    Iris 6100 vs HD 6000 are almost identical. The only difference is a slightly faster clock speed. I think the problem is, HD6000 will throttle more to stay in that power envelope. Reply
  • Topinio - Monday, January 05, 2015 - link

    Looks to me like the 23W ones (i.e. those with the 6100 graphics) will be the only ones to be capable of being near the max turbo clocks for long.

    Would also be interesting to know the AVX base and turbo clocks for these chips, to compare the possible 64b DP GFLOPS from the CPU cores to those listed on page 2 from the GPUs. Top bin is likely somewhere < 102 (vs 211 from GPU), but how much lower?
    Reply
  • MrSpadge - Monday, January 05, 2015 - link

    For the big Xeons the AVX base clock is typically 200 MHz below the regular base clock. They operate in a similar frequency & voltage range as the mobile chips (and are as power-limited as they are), so expect the same to apply here. Reply
  • hansmuff - Thursday, January 08, 2015 - link

    First time I read about AVX clocks, then found another mention in a previous Xeon CPU article. Is this a thing for Xeon only, or do the Haswell desktop chips throttle the clock with heavy AVX as well? Reply
  • naloj - Monday, January 05, 2015 - link

    A good example of this is in the throttling of the HD5000 in the 15W NUC i5-4250. You can get 40% better performance by changing the TDP settings from 25W short burst / 15W steady to 35W short burst / 31W steady. Reply
  • MrSpadge - Monday, January 05, 2015 - link

    "In no way does TDP tell us anything about efficiency."

    Agreed - TDP is far to crude for this. Intel Desktop CPUs often operate far below TDP, whereas mobile chips are throttled by it. How much? Depends on the laptop, environment temperature etc.

    So even though the 15 W CPUs quoted above are allowed to top out at 3+ GHz, they won't run at anywhere close to this frequency under sustained heavy load. The 28 W chips should have no trouble sustaining the speed, given adequate cooling.
    Reply
  • zepi - Monday, January 05, 2015 - link

    Iris 6100 + edram or at least DDR4 bandwidth increases would have made a terrific difference to "retina" and high-dpi ultrabooks / laptops, but now this upgrade is watered to irrelevancy.

    Nothing to see here...
    Reply
  • HungryTurkey - Wednesday, January 14, 2015 - link

    In a retina/hdpi environment, few applications would come close to saturating the bus. The EUs (even with the 6100) would bottleneck long before LPDDR3/DDR3 would. Reply
  • fokka - Thursday, January 08, 2015 - link

    as i see it the given tdp only ensures operation at base clocks and without a substantial graphics load. operation at turbo clocks requires to overstep the tdp until power draw and or temps are too high and the clock returns to the base frequency.

    if you look at it like this it's not surprising a 15w sku has a base clock of 2.2ghz ans a 28w sku 3.1ghz. that said the 28w tdp still looks "too high" for the frequency you get out of it, but i guess that this extra power/heat-budget is there for the sole reason so the 28w sku can operate at turbo clocks for longer without throttling down again, plus there is more headroom for a graphics load at the same time. this ensures, even with similar hardware and turbo-clocks, the 28w sku is allowed to produce more heat and in turn get more work done in the same time.

    that's the same reason core-m has very high turbo speeds, but can only turbo for a couple seconds until it's too hot and it "throttles" down to base clocks.
    Reply

Log in

Don't have an account? Sign up now