Comparing IPC on Skylake: Memory Latency and CPU Benchmarks

The following explanation of IPC has been previously used in our Broadwell review.

Being able to do more with less, in the processor space, allows both the task to be completed quicker and often for less power. While the concept of having devices with multiple cores has allowed many programs to run at once, purely parallel compute such as graphics and most things to run faster, we are all still limited by the fact that a lot of software is still relying on one line of code after another. This is referred to as the serial part of the software, and is the basis for many early programming classes – getting the software to compile and complete is more important than speed. But the truth is that having a few fast cores helps more than several thousand super slow cores. This is where IPC comes in to play.

The principles behind extracting IPC are quite complex as one might imagine. Ideally every instruction a CPU gets should be read, executed and finished in one cycle, however that is never the case. The processor has to take the instruction, decode the instruction, gather the data (depends on where the data is), perform work on the data, then decide what to do with the result. Moving has never been more complicated, and the ability for a processor to hide latency, pre-prepare data by predicting future events or keeping hold of previous events for potential future use is all part of the plan. All the meanwhile there is an external focus on making sure power consumption is low and the frequency of the processor can scale depending on what the target device actually is.

For the most part, Intel has successfully increased IPC every generation of processor. In most cases, 5-10% with a node change and 5-25% with an architecture change with the most recent large jumps being with the Core architecture and the Sandy Bridge architectures, ushering in new waves of super-fast computational power. As Broadwell to Skylake is an architecture change with what should be large updates, we should expect some good gains.

Intel Desktop Processor Cache Comparison
  L1-D L1-I L2 L3 L4
Sandy Bridge i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  
Ivy Bridge i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  
Haswell i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  
Broadwell i7
(Desktop / Iris Pro 6200)
4 x 32 KB 4 x 32 KB 4 x 256 KB 6 MB 128 MB eDRAM
Skylake i7 4 x 32 KB 4 x 32 KB 4 x 256 KB 8 MB  

For this test we took Intel’s most recent high-end i7 processors from the last five generations and set them to 3.0 GHz and with HyperThreading disabled. As each platform uses DDR3, we set the memory across each to DDR3-1866 with a CAS latency of 9. For Skylake we also run at DDR4-2133 C15 as a default speed. From a pure cache standpoint, here is how each of the processors performed:

If we ignore Broadwell and its eDRAM, the purple line, especially from 16MB to 128MB, both of the lines for Skylake stay at the low latencies until 4MB. Between 4MB and 8MB, the cache latency still seems to be substantially lower than that of the previous generations.

Normally in this test, despite all of the CPUs having 8MB of L3 cache, the 8MB test has to spill out to main memory because some of the cache is already filled. If you have a more efficient caching and pre-fetch algorithm here, then the latency ‘at 8MB’ will be lower. So an update for Skylake, as shown in both the DDR4 and DDR3 results, is that the L3 caching algorithms or hardware resources have been upgraded.

At this point I would also compare the DDR3 to DDR4 results on Skylake above 16MB. It seems that the latency in this region is a lot higher than the others, showing nearly 100 clocks as we move up to 1GB. But it is worth remembering that these tests are against a memory clock of 2133 MHz, whereas the others are at 1866 MHz. As a result, the two lines are more or less equal in terms of absolute time, as we would expect.

Here are the generational CPU results at 3.0 GHz:

Dolphin Benchmark: link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.

Dolphin Emulation Benchmark

Cinebench R15

Cinebench is a benchmark based around Cinema 4D, and is fairly well known among enthusiasts for stressing the CPU for a provided workload. Results are given as a score, where higher is better.

Cinebench R15 - Single Threaded

Cinebench R15 - Multi-Threaded

Point Calculations – 3D Movement Algorithm Test: link

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. For a brief explanation of the platform agnostic coding behind this benchmark, see my forum post here.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

Compression – WinRAR 5.0.1: link

Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.

WinRAR 5.01, 2867 files, 1.52 GB

Image Manipulation – FastStone Image Viewer 4.9: link

Similarly to WinRAR, the FastStone test us updated for 2014 to the latest version. FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and thus single threaded performance is often the winner.

FastStone Image Viewer 4.9

Video Conversion – Handbrake v0.9.9: link

Handbrake is a media conversion tool that was initially designed to help DVD ISOs and Video CDs into more common video formats. The principle today is still the same, primarily as an output for H.264 + AAC/MP3 audio within an MKV container. In our test we use the same videos as in the Xilisoft test, and results are given in frames per second.

HandBrake v0.9.9 LQ Film

HandBrake v0.9.9 2x4K

Rendering – PovRay 3.7: link

The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.

POV-Ray 3.7 Beta RC4

Synthetic – 7-Zip 9.2: link

As an open source compression tool, 7-Zip is a popular tool for making sets of files easier to handle and transfer. The software offers up its own benchmark, to which we report the result.

7-zip Benchmark

Overall: CPU IPC

Removing WinRAR as a benchmark because it gets boosted by the eDRAM in Broadwell, we get an interesting look at how each generation has evolved over time. Taking Sandy Bridge (i7-2600K) as the base, we have the following:

From a pure upgrade perspective, the IPC gain here for Skylake does not look great. In fact in two benchmarks the IPC seems to have decreased – 3DPM in single thread mode and 7-ZIP. What makes 3DPM interesting is that the multithread version still has some improvement at least, if only minor. This difference between MT and ST is more nuanced than first appearances suggest. Throughout the testing, it was noticeable that multithreaded results seem to (on average) get a better kick out of the IPC gain than single threaded. If this is true, it would suggest that Intel has somehow improved its thread scheduler or offered new internal hardware to deal with thread management. We’ll probably find out more at IDF later in the year.

If we adjust this graph to show generation to generation improvement and include the DDR4 results:

This graph shows that:

Sandy Bridge to Ivy Bridge: Average ~5.8% Up
Ivy Bridge to Haswell: Average ~11.2% Up
Haswell to Broadwell: Average ~3.3% Up
Broadwell to Skylake (DDR3): Average ~2.4% Up
Broadwell to Skylake (DDR4): Average ~2.7% Up

Oh dear. Typically with an architecture update we see a bigger increase in performance than 2.7% IPC.  Looking at matters purely from this perspective, Skylake does not come out well. These results suggest that Skylake is merely another minor upgrade in the performance metrics, and that a clock for clock result compared to Broadwell is not favorable. However, consider that very few people actually invested in Broadwell. If anything, Haswell was the last major mainstream processor generation that people actually purchased, which means that:

Haswell to Skylake (DDR3): Average ~5.7% Up.

This is more of a bearable increase, and it takes advantage of the fact that Broadwell on the desktop was a niche focused launch. The other results in the review will be interesting to see.

Skylake i7-6700K DRAM Testing: DDR4 vs DDR3L on Gaming Comparing IPC on Skylake: Discrete Gaming
Comments Locked

477 Comments

View All Comments

  • asmian - Sunday, August 9, 2015 - link

    >Somehow I doubt it...

    Sorry, no edit - I meant of course the reverse, that 2 extra cores is DEFINITELY better than marginal extra IPC at a slightly higher overclock, despite the slightly higher TDP. Quad-core Skylake at this price AND requiring DDR4 makes Haswell-E look very good indeed.
  • Ethos Evoss - Sunday, August 9, 2015 - link

    Why they STILL calling it i7 an i5 i3 ... they supposed to change it this time differently ..
    like i4 i6 i8 ?? or rather without that apples ''i'' ?
  • orion23 - Sunday, August 9, 2015 - link

    Yay for my 2600K @ 4.8ghz from day 1
    Never had as much fun overclocking and building system
    By now, I've changed cases (3x) and PSU's (2X), VGA's (2X). But not my loyal 2600K :)
    What a workhorse it is
  • Kutark - Sunday, August 9, 2015 - link

    I think a lot of people in the comments aren't really understanding the article. They state that the best reason to upgrade isn't really the processor speed, its all the other things the new platform affords you.

    In particular im very happy that i will FINALLY be able to get an SSD with speeds faster than what SATA3 allows as many of the motherboard for the z170 have m.2 thats not running on sata but on PCIE channels. It also allows for some real bandwidth in SLI situations. I have a single 980ti, and this platform would allow me to SLI another down the road and not impede things.

    Granted, its not a good value proposition when you look at the end result, but its a very nice future proofing platform in my opinion.

    Its kind of like saying if you have a modded older mustang thats as quick as a new mustang that you shouldn't upgrade because its just as or maybe slightly faster. There are more factors to the equation. Things that add to the quality of life, etc.

    In skylake's its mostly stuff related to the chipset. IMO thas fine by me.
  • sonny73n - Wednesday, August 12, 2015 - link

    I think you're an idiot. Understanding the article is one thing, realizing how close it is to the truth is another. Sure it's a nice upgrade for anything prior to Sandy Bridge but the author has summed up this article with a bold statement "Sandy Bridge, Your Time Is Up" which I believe - a false statement. Should I have a 5th grader break down the calculation of upgrade options so you can understand? First, note to mind that there's no such thing as future proofing in PC hardwares like you said and K series are made for overclocking.
    Let break down the upgrade options for my rig - Z68 MB $190, 2500K $230, HSF $60, 8GB RAM $60, PSU $180, GTX 780 $480, SSD $180, Case $80. Total $1460.
    Option 1: Upgrade MB, CPU, HSF and RAM. Old components ($540 new) can eBay for ~$200. New components $560 - $200 = $360 (out of pocket). Performance gain: System Overall 30%, Gaming 3 to 5%.
    Option 2: Upgrade the whole system. Total $1480. Performance gain same as option 1. Now having 2 systems (wonder what I'm gonna do with both).
    Option 3: Upgrade for gaming. Another GTX 780. Performance gain: BF3 1920x1200 4xAA about 95%. Total $480.

    Sure Skylake has some new features. Do I need them? NO. Do my SSD saturate SATA3 bus (throughput around 550MB/s)? NO. Is there any program (beside Handbrake which I use rarely) that can utilize the full power of my 2500K OCed mildly at 4.2GHz? NO. Can 980ti SLI saturate PCI-e2.0? NO. Am I such an idiot that I have a good running Mustang but I still like to buy another just because it's a bit better? NO. Is being financially irresponsible add to the quality of life? NO.

    Anyone with a brain that has a SB system or newer would never pick the first 2 options.
  • mapesdhs - Wednesday, August 12, 2015 - link

    If there was a thumbs-up button for your post, I'd be clicking it. :D
  • sonny73n - Thursday, August 13, 2015 - link

    Thanks :-) I wish I could explain it better. He's probably wondering why there's a $20 different lol. Hint CPU
  • Kutark - Thursday, August 20, 2015 - link

    This is pretty hilarious and just further proves my point. You had a fundamental misunderstanding of what the article is stating. You also have a fundamental understanding of the concept of an opinion. This article is not a encyclopedia brittanica article trying to create statements of fact. It is the OPINION of this website that sandy bridge's time is up. I tend to agree with them. And i'm on sandy bridge.

    Like most internet heroes, you're focusing on one aspect, price/performance. People buy products for a multitude of other reasons. Just simply getting a pure speed upgrade isn't always the primary factor behind the decision.

    For example, i bought a VW GTI a few years back instead of a Mazdaspeed 3, even though the mazdaspeed 3 was a better performing car, and was cheaper. I bought the VW because of the intangibles. I liked the way it drove, i liked the interior design better, the exterior design better, etc etc etc.

    I will be buying a skylake platform because i like the options the chipset affords me moving forward, in particular the increased number of PCI express lanes which will come in useful when m.2 pcie SSD's come down in price.

    And please don't talk to me about financial responsibility. We're not talking about buying a $500k house when you can really only afford a $300k house. Most of us make enough money that while $1k isn't insignificant, it's not going to break the bank either. Get your head out of your ass.

    But, please, continue on making an ass of yourself, if nothing it is entertaining...
  • FullCircle - Monday, August 10, 2015 - link

    I'm still happy with my SandyBridge i7-2600k.

    I see no reason to upgrade for 25% performance boost...

    I just upgraded my graphics card from GTX 580 to GTX 970, giving me a performance boost of 250%... now that's a worthwhile upgrade...

    25% on the other hand? That's not worth it. CPU advancement has slowed so much there's not much reason to upgrade at the moment unless you have an incredibly old processor. Even the Core i7 processor I have in my old PC is still pretty good.
  • mapesdhs - Wednesday, August 12, 2015 - link

    I upgraded from 3GB 580 SLI to one 980 and even that was a good speed increase. Rocking along with a 5GHz 2700K. For a 2nd system to drive a 48" TV, I considered HW, but in the end for the games I'll be playing (which can use more than 4 cores) a used SB-E build made a lot more sense. ASUS R4E only 113 UKP, 3930K only 185 UKP, etc. Only key item I bought new was another 980.

    It's pretty obvious with hindsight that Intel jumped ahead much more than they needed to with SB/SB-E, so we won't see another leap of that kind again unless AMD or some other corp can seriously compete once more, just as AMD managed to do with Athlon64 back in the day. All this stuff about bad paste under the heat spreaders of IB, HW and still with SL proves Intel is dragging its feet, ditto how lame the 5960X compares to XEONs wrt its low clock, TDP, etc. They could make better, but they don't need to. Likewise the meddling with the PCIe lanes for HW-E; it's crazy that a 4820K could actually be better than a 5820K in some cases. Should have been the other way round: 5820K should have been the 6-core low end with 40 lanes, next chip up at current 5930K pricing should have been an 8-core with 40 lanes, 5960X should have been an 8 or 10 core with 64 or 80 lanes (whatever), with a good 3.5 base clock, priced *above* the current 5960X a tad - that would have been a chip the real enthusiasts with money to burn would have bought, not the clock-crippled 5960X we have atm.

Log in

Don't have an account? Sign up now