The Ice Lake Benchmark Preview: Inside Intel's 10nm

Name: The Ice Lake Benchmark Preview: Inside Intel's 10nm
Item: The Ice Lake Benchmark Preview: Inside Intel's 10nm
Author: Dr. Ian Cutress

by Dr. Ian Cutress on August 1, 2019 9:00 AM EST

261 Comments | Add A Comment

261 Comments

Section by Andrei Frumusanu

SPEC2017 and SPEC2006 Results (15W)

SPEC2017 and SPEC2006 is a series of standardized tests used to probe the overall performance between different systems, different architectures, different microarchitectures, and setups. The code has to be compiled, and then the results can be submitted to an online database for comparsion. It covers a range of integer and floating point workloads, and can be very optimized for each CPU, so it is important to check how the benchmarks are being compiled and run.

We run the tests in a harness built through Windows Subsystem for Linux, developed by our own Andrei Frumusanu. WSL has some odd quirks, with one test not running due to a WSL fixed stack size, but for like-for-like testing is good enough. SPEC2006 is deprecated in favor of 2017, but remains an interesting comparison point in our data. Because our scores aren’t official submissions, as per SPEC guidelines we have to declare them as internal estimates from our part.

For compilers, we use LLVM both for C/C++ and Fortan tests, and for Fortran we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.

clang version 8.0.0-svn350067-1~exp1+0~20181226174230.701~1.gbp6019f2 (trunk)
clang version 7.0.1 (ssh://git@github.com/flang-compiler/flang-driver.git
24bd54da5c41af04838bbe7b68f830840d47fc03)

-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2
-mfma -mavx -mavx2

Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions. Despite ICL supporting AVX-512, we have not currently implemented it, as it requires a much greater level of finesse with instruction packing. The best AVX-512 software uses hand-crafted intrinsics to provide the instructions, as per our 3PDM AVX-512 test later in the review.

For these comparisons, we will be picking out CPUs from across our dataset to provide context. Some of these might be higher power processors, it should be noted.

SPECint2006

SPECint2006 Speed Estimated Scores

Amongst SPECint2006, the one benchmark that really stands out beyond all the rest is the 473.astar. Here the new Sunny Cove core is showcasing some exceptional IPC gains, nearly doubling the performance over the 8550U even though it’s clocked 100MHz lower. The benchmark is extremely branch misprediction sensitive, and the only conclusion we can get to rationalise this increase is that the new branch predictors on Sunny Cove are doing an outstanding job and represent a massive improvement over Skylake.

456.hmmer and 464.h264ref are very execution bound and have the highest actual instructions per clock metrics in this suite. Here it’s very possible that Sunny Cove’s vastly increased out-of-order window is able to extract a lot more ILP out of the program and thus gain significant increases in IPC. It’s impressive that the 3.9GHz core here manages to match and outpace the 9900K’s 5GHz Skylake core.

Other benchmarks here which are limited by other µarch characteristics have various increases depending on the workload. Sunny Cove doubled L2 cache should certainly help with workloads like 403.gcc and others. However because we’re also memory latency limited on this platform the increases aren’t quite as large as we’d expect from a desktop variant of ICL.

SPECfp2006(C/C++) Speed Estimated Scores

In SPECfp2006, Sunny Cove’s wider out-of-order window can again be seen in tests such as 453.povray as the core is posting some impressive gains over the 8550U at similar clocks. 470.lbm is also instruction window as well as data store heavy – the core’s doubled store bandwidth here certainly helps it.

SPEC2006 Speed Estimated Total

Overall in SPEC2006, the new i7-1065G7 beats a similarly clocked i7-8550U by a hefty 29% in the int suite and 34% in the fp suite. Of course this performance gap will be a lot smaller against 9^th gen mobile H-parts at higher clocks, but these are also higher TDP products.

The 1065G7 comes quite close to the fastest desktop parts, however it’s likely it’ll need a desktop memory subsystem in order to catch up in total peak absolute performance.

SPEC2006 Speed Estimated Performance Per GHz

Performance per clock increases on the new Sunny Cove architecture are outstandingly good. IPC increases against the mobile Skylake are 33 and 38% in the integer and fp suites, though we also have to keep in d mind these figures go beyond just the Sunny Cove architecture and also include improvements through the new LPDDR4X memory controllers.

Against a 9900K, although apples and oranges, we’re seeing 13% and 14% IPC increases. These figures likely would be higher on an eventual desktop Sunny Cove part.

SPEC2017

SPECint2017 Rate-1 Estimated Scores

SPECfp2017 Rate-1 Estimated Scores

SPEC2017 Rate-1 Estimated Total

The SPEC2017 results look similar to the 2006 ones. Against the 8550U, we’re seeing grand performance uplifts, just shy of the best desktop processors.

SPEC2017 Speed Estimated Performance Per GHz

Here the IPC increase also look extremely solid. In the SPECin2017 suite the Ice Lake part achieves a 14% increase over the 9900K, however we also see a very impressive 21% increase in the fp suite.

Overall in the 2017 suite, we’re seeing a 19% increase in IPC over the 9900K, which roughly matches Intel’s advertised metric of 18% IPC increase.

Security Updates, Improved Instruction Performance and AVX-512 Updates Power Results (15W and 25W)

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

261 Comments

View All Comments

zodiacfml - Friday, August 2, 2019 - link
Yes and No. Intel at 10nm should have made AMD nervous but products only at 4 cores, there is nothing or little benefit with 10nm. I reckon, AMD's 7nm mobile parts will mostly start at 6 cores.
Kevin G - Thursday, August 1, 2019 - link
Those 3D particle movement tests seem to be too good to be true. There should be a gigantic jump due to an optimized AVX-512 code path and ICL's enhanced caching structure but it is beyond that in the comparison. I'm not actually suspecting the ICL system given the disclosures in the article (odd that the note about AVX-512 intrinistics for the 3DPM test is mentioned around SPEC compiler settings) but rather the other test systems. Where the Whisky Lake or Kaby Lake systems power or thermal constrained at all? On those Hauwei laptops, were you able to set their fan to a fixed 100% to match that of the ICL system?
Ian Cutress - Thursday, August 1, 2019 - link
The AVX-512 tests were similar when we compared Cannon Lake to Kaby Lake at the same frequency. Against unoptimized SSE code, AVX-512 is killer.
Kevin G - Friday, August 2, 2019 - link
Getting a bit more than double the performance from AVX2 vs. AVX-512 should be possible using some of the new Ice Lake extensions and the obvious doubling of SIMD width. But going from a score of 1802 in Whiskey Lake 25W to 9242 for Ice Lake 25W, over a factor of 5! Ice Lake would have to remove some other bottleneck that the 3DPM test hits really hard (division?).

Looking back at your previous reviews ( https://www.anandtech.com/show/13400/intel-9th-gen... ), you can see a similar speed up from AVX-512 between the i9 9900K and the i9 7820X but that is explained from Skylake-X having both double the SIMD width and double the number of SIMD execution units. The client version of Ice Lake shouldn't have the same AVX-512 throughput as Sky Lake server.
CSMR - Thursday, August 1, 2019 - link
> the one area where Ice Lake excels in is graphics. Moving from 24 EUs to 64 EUs, plus an increase in memory bandwidth to >50 GB/s, makes for some easy reading.

I don't understand the comparison here and in this article. If you say a high-end intel processor update excels in graphics, you should compare to previous high-end processors (e.g. i7-8559U with Iris Plus 655). These have 48EUs not 24 and have 128MB EDRAM at 100 GB/s unlike the Ice Lake.

I am very interested in how the best Ice Lake processors compare to the best previous-gen processors, not how they compare to mediocre previous-gen processors.

Could the article be updated with some appropriate comparisons?
eastcoast_pete - Thursday, August 1, 2019 - link
Agree on adding the best previous generation graphics to the comparison. Also, while the over 1 TFlops for the 64EU Gen 11 sounds (and is) impressive (within the Intel iGPU world) , didn't the 48EU with Crystal Well get close to that already?
Rudde - Thursday, August 1, 2019 - link
The first apu with 1TFlops performance statement is full of asteriks. First, you have to exclude AMD; second, you have to exclude Intel Iris gpus with eDRAM.
Phynaz - Thursday, August 1, 2019 - link
AMD mobile chips are hot garbage
eva02langley - Friday, August 2, 2019 - link
Your opinion is not a fact... and it is garbage for real.
Phynaz - Friday, August 2, 2019 - link
Hahaha. It’s a fact. It’s why they have 0% market share.

The Ice Lake Benchmark Preview: Inside Intel's 10nm

SPEC2017 and SPEC2006 Results (15W)

SPECint2006

SPEC2017

Post Your Comment

261 Comments

View All Comments

zodiacfml - Friday, August 2, 2019 - link

Kevin G - Thursday, August 1, 2019 - link

Ian Cutress - Thursday, August 1, 2019 - link

Kevin G - Friday, August 2, 2019 - link

CSMR - Thursday, August 1, 2019 - link

eastcoast_pete - Thursday, August 1, 2019 - link

Rudde - Thursday, August 1, 2019 - link

Phynaz - Thursday, August 1, 2019 - link

eva02langley - Friday, August 2, 2019 - link

Phynaz - Friday, August 2, 2019 - link

Log in

Don't have an account? Sign up now