Performance Claims:

+18% IPC vs. Skylake,
+47% Performance vs. Broadwell

With every new product generation, the company releasing the product has to put some level of expectations on performance. Depending on the company, you’ll either get a high level number summarizing performance, or you’ll get reams and reams of benchmark data. Intel did both, especially with a headline ‘+18%’ value, but in recent months the company has also been on a charge about what sort of benchmarking is worth doing. I want to take a quick diversion down that road, and give my thoughts on the matter.

First, I want to define some terms, just so we’re all on the same page.

  • A synthetic test is a benchmark engineered to probe a feature of the processor, often to find its peak capability in one or several specific task. A synthetic test does not often reflect a real-world scenario, and likely doesn’t use real world software. Synthetic benchmarks are designed to be stable and repeatable, and the analysis often describing how a processor performs in an ideal scenario.
  • A real-world test uses software that the user ends up using, along with a representative workload for that software. These tests are usually most applicable to end-users looking to purchase a product, as they can see actual use-case results. Real-world tests can have obvious pitfalls: it can be hard to test across multiple machines with only a single license, and testing one piece of software has no guarantee on performance on another.

A typical analysis of a processor does two things: what can it do (synthetic) and how does it perform (real-world). Users interested in the development of a platform, how it will expand and grow, or engineers peering over the fence, or even investors looking at the direction the company is going, will look at what products can do. People looking at what to use, what to work with, are more interested in the performance. Reviewers should get this concept, and companies like Intel should get this too – with Intel hiring a number of ex-reviewers of late, this is coming through.

A couple of months ago, Intel approached subsets of reviewers to discuss best benchmarking practices. On the table were real-world benchmarks, and which benchmarks represent the widest array of the market. Under fire was Cinebench, a semi-synthetic test (it uses a real-world engine on example data) that Intel believed didn’t represent the performance of a processor.

Intel provided data from one of its commissioned surveys on software that people use. Their data was based on a list of all consumers, from entry-level users up to prosumers, casual gamers, and enthusiasts, but also covering commercial use cases. At the top of the list were the obvious examples, such as OS and browsers: Explorer.exe, Edge, Chrome. In the top set were important widely distributed software packages, such as Photoshop (all versions), Steam, WinRAR, Office programs, and popular games like Overwatch. The point Intel was trying to make with this list is that a lot of reviewers run software that isn’t popular, and should aim to cover the widest market as possible.

The key point they were trying to make was that Cinebench, while based on Cinema4D and a rendering tool used by a number of the community, wasn’t the be-all and end-all of performance. Now this is where Intel’s explanation became bifurcated: despite this being a discussion on what benchmarks reviewers should consider using, Intel’s perspective was that citing a single number, as Intel’s competitors have done, doesn’t represent true performance in all use cases. There was a general feeling that users were taking single numbers like this and jumping to conclusions. So despite the fact that the media in the room all test multiple software angles, Intel was clear in that they didn’t want a single number to dominate the headlines, especially when it’s from software that is ranked (according to Intel’s survey) somewhere in the 1400s.

Needless to say, Intel got a bit of backlash from the press in the room at the time. Key criticisms were that those present, when they get hardware, test a variety of software, not just Cinebench, to try and give a more overall view. Other key elements included that the survey covered all users, from consumer, commercial, and workstation: a number of the press in the room have audiences that are enthusiasts, so they will cater their benchmark accordingly. There was also a discussion that a number of software packages listed in the top 100 are actually difficult to benchmark, due to licensing arrangements designed to stop repeated installs across multiple systems. Typically most software vendors aren’t interested in working with the benchmark community to help evaluate performance, in the event that it exposes deficiencies in their code base. There was also the way in that readers were adapting over time: most focused readers want their specific software tested, and it is impossible to test 50 different software packages, so a few that can be streamlined in a benchmark suite are used as a representative sample, and typically Cinebench is one of those in the rendering arena, alongside POV-Ray, Corona, etc.

Intel, at this stage in the discussion, still went on to show how the new hardware performs on a variety of tests. We’ve covered these images before on previous pages, but Intel stated a significant uplift in graphics compared to the current 14nm offerings, from 40% up to 108%:

As well as comparisons to the competition:

Aside from 3DMark, these are all ‘real-world’ tests.

Move forward a few weeks, and Intel’s Tech Day where Ice Lake is discussed, and Intel brings up IPC.

Intel’s big statement is that Sunny Cove, a 2019 product, offers 18% more instructions per clock against Skylake, a 2015 product. In order to come to that conclusion, as expected, Intel has to turn to synthetic testing: SPEC2006, SPEC2017, SYSMark 2014 SE, WebXPRT, and Cinebench R15. Wait, what was that last one? Cinebench?

So there are two topics to discuss here.

First is the 18% increase over four years – that’s the equivalent to a 4.2% compound annual growth rate. Some users will state that we should have had more, and that Intel’s issues with its 10nm manufacturing process means that this should have been a 2017 product (which would have been an 8.6% CAGR). Ultimately Intel built enough of an IPC increase lead over the last decade to afford something like this, and it shows that there isn’t an IPC wall just yet.

Second is the use of Cinebench, and the previous version at that. Given what was discussed above, various conclusions could be drawn. I’ll leave those up to you. Personally, I wouldn’t have included it.

Aside from IPC, Intel also spoke about actual single-threaded performance about Sunny Cove in its 15W mode.

At a brief glance, I would have expected this graph to be from real-world analysis. But given the blurb at the bottom it shows that these results are derived from SPEC2006, specifically 1-thread int_rate_base, which means that these are synthetic results, so we’ll analyze them with that in mind. This test also gets lots of benefit from turbo, with each test likely to fit inside the turbo window of an adequately cooled system.

The base line here is Broadwell, Intel’s 5th Generation processor, which if you remember was the first Intel processor to have an integrated FIVR on the mobile parts for power efficiency. In this case we see that Intel puts Skylake as +9% above Broadwell, then moving through Kaby Lake and Whiskey Lake we see the effect of increasing that peak turbo frequency and power budget: when we moved from dual core to quad core 15W mobile processors, that peak turbo power budget increased from 19W to 44W, allowing longer turbo. Overall we hit +42% for 8th Gen Whiskey Lake over Broadwell.

Ice Lake, by comparison, is +47% over Broadwell. When moving from Broadwell to Ice Lake, which Intel expects most of its users to do, that’s a sizable single threaded performance jump, I won’t dispute that, although I will wait until we see real world data to come to a better conclusion.

However, if we compare Ice Lake to Whiskey Lake, we see only a +3.5% increase in single threaded performance. For a generation-on-generation increase, that’s even lower than the four-year CAGR from Skylake. Some of you might be questioning why this is happening, and it all comes down to frequency.

Intel’s current 8th Gen Whiskey Lake, the i7-8565U, has a peak turbo frequency of 4.8 GHz. In 15W mode, we understand that the peak frequency of Ice Lake is under 4.0 GHz, essentially handing Whiskey Lake a ~20% frequency advantage.

If this sounds odd, turn over to the next page. Intel is going to start tripping over itself with its new product lines, and we’ll do the math.

Wi-Fi 6: Implementing AX over AC* Competing Against Itself: 3.9 GHz Ice Lake-U on 10nm vs 4.9 GHz Comet Lake-U on 14nm
POST A COMMENT

107 Comments

View All Comments

  • repoman27 - Tuesday, July 30, 2019 - link

    “Each CPU has 16 PCIe 3.0 lanes for external use, although there are actually 32 in the design but 16 of these are tied up with Thunderbolt support.”

    This isn’t quite right. The ICL-U/Y CPU dies do not expose any PCIe lanes externally. They connect to the ICL PCH-LP via OPI and the PCH-LP exposes up to 16 PCIe 3.0 lanes in up to 6 ports via HSIO lanes (which are shared with USB 3.1, SATA 6Gbps, and GbE functions). So basically no change over the 300 Series PCH.

    The integrated Thunderbolt 3 host controller may well have a 16-lane PCIe back end on-die, and I’m sure the CPU floorplan can accommodate 16 more lanes for PEG on the H and S dies, but that’s not what’s going on here.
    Reply
  • voicequal - Friday, August 2, 2019 - link

    The SoC architecture shows a direct path for the Thunderbolt3 PCIe lanes to the CPU, with only USB2 going across OPI.. Whatever PCIe lanes are available on the PCH are in addition those available via TB3.

    https://images.anandtech.com/doci/14514/Blueprint%...
    Reply
  • repoman27 - Tuesday, August 6, 2019 - link

    The Thunderbolt 3 controller is part of the CPU die. There are four PCIe 3.0 x4 root ports connected to the CPU fabric that feed the Thunderbolt protocol converters connected to the Thunderbolt crossbar switch (the Converged I/O Router block in that diagram). The CPU exposes up to three (for Y-Series) or four (for U-Series) Thunderbolt 3 ports. The only way you can leverage the PCIe lanes on the back-end of the integrated Thunderbolt 3 controller is via Thunderbolt.

    The PCH is a separate die on the same package as the CPU die. The two are connected via an OPI x8 link operating at 4 GT/s which is essentially the equivalent of a PCIe 3.0 x4 link. The PCH contains a sizable PCIe switch internally which connects to the back-ends of all of the included controllers and also provides up to 16 PCIe 3.0 lanes in up to 6 ports for connecting external devices. These 16 lanes are fed into a big mux which Intel refers to as a Flexible I/O Adapter (FIA) along with all the other high-speed signals supported by the PCH including USB 3.1, SATA 6Gbps, and GbE to create 16 HSIO lanes which are what is exposed by the SoC. So there are up to 16 PCIe lanes available from the Ice Lake SoC package, all of which are provided by the PCH die, but they come with the huge asterisk that they are exposed as HSIO lanes shared with all of the other high-speed signaling capabilities of the PCH and provisioned by a PCIe switch that effectively only has a PCIe 3.0 x4 connection to the CPU.

    This is not at all what Ian seemed to be describing, but it is the reality.

    And the USB 2.0 signals for the Thunderbolt 3 ports do indeed come from the PCH, but they do not cross the OPI, they're simply routed from the SoC package directly to the Thunderbolt port. The Thunderbolt 3 host controller integrated into the CPU includes a USB 3.1 xHCI/xDCI but does not include a USB 2.0 EHCI.
    Reply
  • poohbear - Tuesday, July 30, 2019 - link

    I was looking at buying Dell's XPS 15.6" (7590 model), but with Project Athena laptops a few months away, i think i'll wait. Intel parts for solid reliability and unified drivers, and "4 hours of battery life with <30min of charging", those 2 on their own make the wait worth it for me! Reply
  • repoman27 - Tuesday, July 30, 2019 - link

    “The connection to the chipset is through a DMI 3.0 x4 link...”

    Should be OPI x8 for U/Y Series.

    “...Ice Lake will support up to six ports of USB 3.1 (which is now USB 3.2 Gen 1 at 5 Gbps)...”

    They’re USB 3.1 Gen 2 ports, so it’s six USB 3.2 Gen 2 x 1 (10 Gbit/s) ports.
    Reply
  • Roel9876 - Tuesday, July 30, 2019 - link

    Well, for one, it is certainly not realistic to run single thread benchmarks on application that support multi threading. Realistically, most (all?) people will run the application multi threaded? Reply
  • HStewart - Tuesday, July 30, 2019 - link

    As developer for many years, multiple threads are useful for handling utility threads and such - but IO is typically area which still has to single thread. Unless it has significantly change in API, it is very difficult to multi-thread the actual screen. And similar for disk io as resource. Reply
  • Arnulf - Tuesday, July 30, 2019 - link

    "Our best guess is that these units assist Microsoft Cortana for low-powered wake-on voice inference algorithms ..."

    Our best guess is that these are designed for use by assorted three-letter agencies.
    Reply
  • PeachNCream - Tuesday, July 30, 2019 - link

    Open mics are totally okay. There is absolutely no privacy risk to you at all and you should never give it a second thought. Reply
  • ToTTenTranz - Tuesday, July 30, 2019 - link

    With 4x TB3 connections available, I wonder if the maker of an external GPU box could develop a multiplexer that combined two TB3 connections into a PCIe 3.0 8x.

    This would significantly decrease some problems that eGPU owners are having due to relatively low CPU-GPU bandwidth.
    Reply

Log in

Don't have an account? Sign up now