Conclusions

First of all, I must say that Intel offering us to test a reference system in advance of a launch is a very good thing indeed. It is not something that Intel has done often in the past – in fact the last time I remember it happening was with Broadwell, when Intel sampled us one of their mobile CRB (consumer reference board) systems for the 45W chip. Before that, Intel had a small attempt allowing the press to benchmark Conroe in 2006 with canned pre-provided benchmarks, which did not go down to well. So moving into this pre-testing regime gets some immediate kudos to those who approved the testing.


Intel’s Broadwell / Crystalwell Mobile CRB

Given that the Ice Lake platform is more geared up towards ultra-premium designs, the software development system we ended up testing was certainly a reasonable expectation and direction that these parts would go in. Of course, we only had the best part of nine hours to test, and giving us the option to test both 15W and 25W modes meant we had to pick and choose what tests we thought were relevant. My most prominent feedback to Intel would be to give us two days to test next time, as it allows us to sit on our data after day one and decide what to do next. It was clear that some of the press in attendance only needed a day (or half a day), but for what we do at AT, then two days would be better.


The Intel Ice Lake SDS

As for Ice Lake itself, our results lean towards Ice Lake outperforming Whiskey Lake, if only by a small margin.

To preface this, I want to recall a graph that Intel showed off at Computex:

This graph shows the single thread performance of Skylake and beyond, compared to 5th Gen Broadwell hardware. Right at the very end, we see Whiskey Lake performing +42% above Broadwell, and Ice Lake performing +47% above Broadwell. A quick calculation of 1.47/1.42 means that even Intel is only predicting an absolute gain of ~3.5% for Ice Lake over current generation systems.

The reason why the difference is so small is because of IPC and frequency. Intel is touting a median IPC advantage on the new Sunny Cove cores of +18% against Skylake. That isn’t something we were able to test in the short time we had with the system, but +18% should provide a healthy bump – we actually see a number of key microarchitectural improvements bubble up through in our SPEC testing.

But at the same time, the frequency has decreased – our Whiskey Lake Huawei Matebook system was +500 MHz on the base frequency (+38%), and +700 MHz on the turbo frequency (+18%). If it were not for the vast increase in memory speed, moving from LPDDR3-2133 to LPDDR4X-3733, one might have predicted that the Core i7-1065G7 Ice Lake processor and the Core i7-8565U Whisky Lake processor would have performed equally.

The question here then becomes whether you prefer IPC or frequency. For instruction limited tasks, that answer should be IPC. For critical path limited tasks, you nominally require frequency. All this gets muddled a bit with the increased memory frequency, but with higher IPC at lower frequency, you should arguably be more power efficient as well, leading to longer battery life. At iso-performance between Ice and Whiskey, considering no other factors like price, I would choose Ice.

Intel has made a number of improvements to a chunk of the instruction set that should work well for users, however the new bigger cache design has added a bit of latency there, which ends up being a bit of give and take with cache hits and misses.

Of course, the one area where Ice Lake excels in is graphics. Moving from 24 EUs to 64 EUs, plus an increase in memory bandwidth to >50 GB/s, makes for some easy reading. It gets even better in 25W mode, for games that are CPU limited, but still don’t expect to be tackling AAA games at high resolutions. Despite Ice Lake being focused on the ultra-premium >1080p resolution market, you will still be gaming at 720p or 1080p at best here.

The other alternative is to attach a Thunderbolt 3 external graphics card. If there’s one really good add-in to Ice Lake, aside from the graphics uplift, it’s the inclusion of up to four TB3 ports as part of the CPU silicon. If and when the TB3 controllers get a lot cheaper on the device side, this should really help accelerate a high-performance standard here.

We should also talk about AVX-512 – Intel is in a position right now where including it in the chip uses a good amount of die area, and the software ecosystem hasn’t yet adopted it. By advertising speed-ups like DLBoost, the company is hoping to entice developers to work with AVX-512 in mind, and improve a number of machine learning applications for consumer processors. The other side there is what sort of consumer applications need machine learning that isn’t already done in the cloud. It’s a bit of a catch-22, but in our own testing, the AVX-512 does provide a significant speed-up. However, given Intel’s recent mantra of testing for user experience, it will be interesting to see how hammering the AVX-512 unit meshes with that mantra.

The scope of when these Ice Lake processors are coming to market, and how much, is still a question mark. Intel states that we’ll see Ice Lake in the market for the holiday season (i.e. Christmas), however we have a number of trade shows around the corner, such as IFA in September, where me might start seeing some companies start to show off their designs. We also know that Intel plans to release Comet Lake mobile processors sometime this year, on the old 14nm process and old Skylake-based microarchitecture, but at higher frequencies, so it will be interesting to see how they compete.

Final Thoughts

I’m glad to have tested Ice Lake. It’s a shame that we only had a day to test, because I could have spent a week testing that system. Increasing IPC is the best problem to solve, even if it gives similar performance due to a lower frequency, but hopefully the knock on effect here will be better battery life for users at the same performance. Once we get some systems in to test that battery life, and Project Athena’s requirement of 16+ hours comes to the front, I think we’ll see the best examples of Ice Lake shine through.

 

Gaming Results (15W and 25W)
Comments Locked

261 Comments

View All Comments

  • jospoortvliet - Friday, August 2, 2019 - link

    Sometimes people have insightful additions or questions. That is never you so I wouldn’t miss your ‘input’.
  • Phynaz - Friday, August 2, 2019 - link

    But yet you replied. Doh!
  • Korguz - Friday, August 2, 2019 - link

    and so did you !!! :-)
  • Phynaz - Saturday, August 3, 2019 - link

    Your comprehension skills aren’t that great, are they. Maybe that’s why you can’t afford a good cpu. Did you finish school?
  • Korguz - Saturday, August 3, 2019 - link

    yep.. but you obliviously havent as only children resort to insults, like you do. and again.. grow up
  • POlaris1983 - Thursday, August 1, 2019 - link

    Thermals and TDP are a test for UNdervolting and OCing on THICC laptops using ai windows OS GUI interface apps for easy one button flipping on and off for these CPUs and GPUs and RAM Timings customizations. Even for desktop towers soon using keyboard functions in special keys like on a laptop once they solve the luqid cooling issues on the THICC laptops.
  • thetrashcanisfull - Thursday, August 1, 2019 - link

    Ian,
    In this and the Ryzen 3000 review, I noticed that the 3DPM benchmarks with AVX enabled seem to benefit from AVX-512 much more than I would anticipate.

    If I'm understanding things correctly, the AVX-512 parts are capable of 2x512b FMAC / cycle in the case of Skylake-server or 1x512b FMAC + 1x512b ALU / cycle in the case of Sunny Cove, with both handling 2x512b load + 1x512b store / cycle. This would suggest to me that their vector FP performance/cycle ought to be around double that of Skylake-client or Zen 2, both of which do 2x256b FMAC / cycle and 2x256b loads + 1x256b store / cycle. However, in the 3DPM benchmark we see AVX-512 CPUs outpace the performance/cycle of AVX2 CPUs by a factor of 4 - possibly even more than 4, once we account for the frequency penalties associated with AVX-512!

    Am I misunderstanding some critical piece of the AVX-512 extension that explains this boost, or is there something wrong with the AVX2 codepath for this benchmark? Only using xmm instructions? Not using FMA instructions?
  • Mysticial - Friday, August 2, 2019 - link

    A while back, Ian sent me the non-vectorized and AVX512-vectorized binaries for 3DPM for me to analyze. (I never looked at the AVX2 version since this was before it was made.)

    Based on what I saw, I'm not at all surprised by the result. While I can't say that it fully explains such a large difference between AVX2 and AVX512, there are at least two things I noticed in the AVX512 binary that would contribute towards it.

    1. There are 64-bit integer multiplies. AVX512 has the vpmullq instruction. AVX2 does not. Emulating this instruction in AVX2 is *extremely* costly.
    2. The ratio of "heavy" to "light" AVX512 instructions is very low. Therefore, the 2nd FMA isn't needed to gain on AVX2.

    I've never analyzed the AVX2 binary itself to see how that 64-bit multiply is being handled. It could be vectorized with extreme overhead, not vectorized at all, or worked-around at an algorithmic level.
  • thetrashcanisfull - Friday, August 2, 2019 - link

    ohhhh... That makes more sense. I assumed that the 3DPM benchmark was doing primarily floating point math. I also didn't realize that AVX2 didn't support packed 64b muls... Thanks for the info!
  • Alexvrb - Friday, August 2, 2019 - link

    "The suggested PL2 for Kaby Lake-R was 44W, so this might indicate a small jump in strategy."

    Yeah, whereby TDP is virtually meaningless and every machine is a complete mystery box until you buy it and discover what actual thermals/power/performance are like - again regardless of the TDP. This is all without overclocking, mind you.

Log in

Don't have an account? Sign up now