Comparing 15 W TGL to 15 W ICL to 15 W Renoir

Despite the hullaballoo with the 28 W numbers on Tiger Lake, we suspect that most OEMs will still be positioning the hardware inside chassis built for the 15 W ultraportable market. This is where most of Intel’s OEMs have had success over the last decade, as the lower cooling requirements allow for a more user-friendly design. At 28 W, there is more of a cross-over into laptops that have discrete graphics options, and the main company that has succeeded in offering 28 W laptops without discrete graphics has been Apple - most Intel partners, if they want discrete graphics, end up looking at the 45 W processors with more cores.

So in that respect, our main battle should occur between the products built for 15 W. To that end we have been able to put the three together that will command this holiday season’s offerings: Ice Lake, Tiger Lake, and AMD’s Renoir.

  • For our Ice Lake system, we have the Microsoft Surface Laptop 3. This has the top-of-the-line quad-core Core i7-1065G7, along with 16 GB of LPDDR4X-3733. Base 1.3 GHz, Turbo 3.9 GHz.  Because this is an OEM design, Microsoft have determined the PL1 and PL2 values, and so they might be different from a ‘base’ design, however this is data from a real system.
  • The Tiger Lake system is our Reference Design from Intel, running the quad-core Core i7-1185G7 at 15 W TDP mode. It has 16 GB of LPDDR4X-4266. Base 1.8 GHz, Turbo 4.8 GHz.
  • Our AMD Renoir system is one of the most premium examples of AMD’s Ryzen Mobile in a 15W form factor, the Lenovo Yoga Slim 7 with the eight-core Ryzen 7 4800U processor. Even when set to the highest performance mode, the system still operates with a 15 W sustained power draw. It comes equipped with 16 GB of LPDDR4X-4266. Base 1.8 GHz, Turbo 4.2 GHz.

Compute Workload

For our 15 W comparisons, we can look again at the same benchmarks as the previous page. First up is y-Cruncher, an AVX2/AVX512 compute workload that tasks the CPU and the memory by calculating 2.5 billion digits of Pi, and requires ~11 GB of DRAM.

As we saw on the previous page, our Tiger Lake system in green at 15 W turbos up to ~53 watts before very quickly coming down to 15 W for the rest of the test.

The Microsoft Surface Laptop 3, by virtue of an OEM system, has different behavior - it turbos for longer, settles into a short turbo limit of 25 W, and then after about two minutes comes down to 20 W. The system then appears to opportunistically up the power draw until the end of the test, likely due to detecting extra thermal headroom.

The AMD Renoir processor does not turbo as high, peaking at only 38.9 W. Over the course of the next 100 seconds or slow, we see a small ramp down to just under 30 watts, before a more consistent decline over 30 seconds to 15 W, before staying at 15 W for the full test. The Renoir here has eight cores rather that four, but is running AVX2 rather than AVX-512 code.

The results are as follows:

  • Ice Lake: 233 seconds, for 6072 joules, averaging 26.1 W
  • Tiger Lake: 241 seconds for 4082 joules, averaging 17.0 W
  • Renoir: 234 seconds for 5386 joules, averaging 23.0 W

All three systems perform the test in roughly the same amount of time, however the Tiger Lake system is very much ahead for efficiency. Tiger Lake effectively shaves off a third of the power from the previous generation Ice Lake system. We weren’t expecting this much of a jump from Ice Lake to Tiger Lake, but it would appear that Intel has done some work on the AVX-512 unit, and is putting that new high-performance transistor to use.

Professional ISV Workload

Moving onto the Agisoft test - as mentioned on the previous page, this is a 2D image to 3D modeling workflow where the algorithm comes in four stages, some of which prefer full multi-thread throughput, while others are more frequency and memory sensitive.

First, the Renoir finishes in almost half the time, mostly due to the fact that it has double the number of cores - there is no AVX-512 codepath in this test, and so all the processors rely on a mix of SSE, AVX, and perhaps some AVX2. That aside, the turbo behavior of Renoir is very interesting - we get almost 10 minutes of higher-than-base performance before the algorithm sets into a routine, hovering around 22 W. Because this test doesn’t attack the vector units as hard as the previous test, it may be a case that the Renoir system can manage the power distribution a bit better between the eight cores, allowing for the higher turbo.

Between the Ice Lake and the Tiger Lake, from the graph it would appear to be a double win for Tiger Lake, finishing in a shorter time but also consuming less power. The results are:

  • 15 W Renoir: 2842 seconds for 62660 joules
  • 15 W Ice Lake: 4733 seconds for 82344 joules
  • 15 W Tiger Lake: 4311 seconds for 64854 joules

In this case, it’s a win for Renoir - a lot shorter time, and better power to boot, derived from the eight cores built on TSMC 7nm. Tiger Lake still represents a good jump over Ice Lake, offering 10% better performance at only 79% of the power, or a 13% increase in performance efficiency.

Power Consumption: Comparing 15 W TGL to 28 W TGL CPU ST Performance: SPEC 2006, SPEC 2017
Comments Locked

253 Comments

View All Comments

  • JfromImaginstuff - Friday, September 18, 2020 - link

    Intel is planning to release a 8 core 16 thread SKU, confirmed by one of their management can't remember his name but when that'll reach the market is a question mark
  • RedOnlyFan - Friday, September 18, 2020 - link

    With the space and power constraints you can choose to pack more cores or other features that are also very important.
    So Intel chose to add 4c + the best igpu + AI + neural engine + thunderbolt + Wi-Fi 6 + pcie4.
    Amd chose 8cores and a decent igpu.
    So we have to choose between raw power and more useful package.

    For a normal everyday use an all round performance is more important. There are millions who don't even know what cinebench is for.
  • Spunjji - Friday, September 18, 2020 - link

    Weird that you're calling it "the best iGPU" when the benchmarks show that it's pretty much equivalent to Vega 8 in most tests at 15W with LPDDR4X, which is how it's going to be in most notebooks.

    Funny also that you're proclaiming PCIe 4 to be a "useful feature" when the only thing out there that will use it in current notebooks is the MX450, which obviates that iGPU.

    I could go on but really, Thunderbolt is the only one I'd say is a reasonable argument. A bunch of AMD laptops already have Wi-Fi 6
  • JayNor - Saturday, September 19, 2020 - link

    but Intel has lpddr5 support built in. Raising memory data rate by around 25% is something that should show up broadly as more performance in the benchmarks.

    Intel's Tiger Lake Blueprint Session benchmarks were run with lpddr4x, btw, so expect better performance when lpddr5 laptops become available.

    https://edc.intel.com/content/www/us/en/products/p...
  • Spunjji - Saturday, September 19, 2020 - link

    I understand and agree. My point was, what does "support" matter if it's not actually useable in the product? This will be an advantage when devices with it release. Right now, it's irrelevant.
  • abufrejoval - Friday, September 18, 2020 - link

    I'd say going for the biggest volume market (first).

    Adding cores costs silicon real-estate and profit per wafer and the bulk of the laptop market evidently doesn't want to pay double for eight cores at 15 Watts.

    Being a fab, Intel doesn't seem to mind doing lots of chip variants, for AMD it seems to make more sense to go for volume and fewer variants. The AMD 8 core APU covers a lot of desktop area, but also laptops, where Intel just does distinct 8 core chip.

    Intel might even do distinct iGPU variants at higher CPU cores (not just via binning), because the cost per SoC layout is calculated differently.... at least as long as they can keep up the volumes.

    I'm pretty sure they had a lot of smart guys run the numbers, doesn't mean things might not turn out differently.
  • Drumsticks - Thursday, September 17, 2020 - link

    Regarding:

    Compromises that had been made when increasing the cache by this great of an amount is in the associativity, which now increases from 8-way to a 20-way, which likely increases conflict misses for the structure.

    On the L3 side, there’s also been a change in the microarchitecture as the cache slice size per core now increases from 2MB to 3MB, totalling to 12MB for a 4-core Tiger Lake design. Here Intel was actually able to reduce the associativity from 16-way to 12-way, likely improving cache line conflict misses and improving access parallelism.

    ---

    Doesn't increasing cache associativity *decrease* conflict misses? Your maximum number of conflict misses would be a direct mapped cache, where everything can go into only one place, and your minimum number of conflict misses would be a fully associative cache, where everything can go everywhere.

    Also, isn't it weird that latency increases with the reduced associativity of the new L3? I guess the fact that it's 50% larger could have a larger impact, but I'd have thought reducing associativity should improve latency and vice versa, even if only slightly.
  • Drumsticks - Thursday, September 17, 2020 - link

    Later on, there is:

    The L2 seemingly has gone up from 13 cycles to 14 cycles in Willow Cove, which isn’t all that bad considering it is now 2.5x larger, even though its associativity has gone down.

    ---

    But in the table, associativity is listed as going from 8 way to 20 way. Is something mixed up in the table?
  • AMDSuperFan - Thursday, September 17, 2020 - link

    How does this compare with Big Navi? It seems that Big Navi will be much faster than this right?
  • Spunjji - Friday, September 18, 2020 - link

    🤡

Log in

Don't have an account? Sign up now