Original Link: http://www.anandtech.com/show/4083/the-sandy-bridge-review-intel-core-i7-2600k-i5-2500k-core-i3-2100-tested
The Sandy Bridge Review: Intel Core i7-2600K, i5-2500K and Core i3-2100 Testedby Anand Lal Shimpi on January 3, 2011 12:01 AM EST
Intel never quite reached 4GHz with the Pentium 4. Despite being on a dedicated quest for gigahertz the company stopped short and the best we ever got was 3.8GHz. Within a year the clock (no pun intended) was reset and we were all running Core 2 Duos at under 3GHz. With each subsequent generation Intel inched those clock speeds higher, but preferred to gain performance through efficiency rather than frequency.
Today, Intel quietly finishes what it started nearly a decade ago. When running a single threaded application, the Core i7 2600K will power gate three of its four cores and turbo the fourth core as high as 3.8GHz. Even with two cores active, the 32nm chip can run them both up to 3.7GHz. The only thing keeping us from 4GHz is a lack of competition to be honest. Relying on single-click motherboard auto-overclocking alone, the 2600K is easily at 4.4GHz. For those of you who want more, 4.6 - 4.8GHz is within reason. All on air, without any exotic cooling.
Unlike Lynnfield, Sandy Bridge isn’t just about turbo (although Sandy Bridge’s turbo modes are quite awesome). Architecturally it’s the biggest change we’ve seen since Conroe, although looking at a high level block diagram you wouldn’t be able to tell.
Read on for our full, in-depth review!
Intel never quite reached 4GHz with the Pentium 4. Despite being on a dedicated quest for gigahertz the company stopped short and the best we ever got was 3.8GHz. Within a year the clock (no pun intended) was reset and we were all running Core 2 Duos at under 3GHz. With each subsequent generation Intel inched those clock speeds higher, but preferred to gain performance through efficiency rather than frequency.
Today, Intel quietly finishes what it started nearly a decade ago. When running a single threaded application, the Core i7-2600K will power gate three of its four cores and turbo the fourth core as high as 3.8GHz. Even with two cores active, the 32nm chip can run them both up to 3.7GHz. The only thing keeping us from 4GHz is a lack of competition to be honest. Relying on single-click motherboard auto-overclocking alone, the 2600K is easily at 4.4GHz. For those of you who want more, 4.6-4.8GHz is within reason. All on air, without any exotic cooling.
Unlike Lynnfield, Sandy Bridge isn’t just about turbo (although Sandy Bridge’s turbo modes are quite awesome). Architecturally it’s the biggest change we’ve seen since Conroe, although looking at a high level block diagram you wouldn’t be able to tell. Architecture width hasn’t changed, but internally SNB features a complete redesign of the Out of Order execution engine, a more efficient front end (courtesy of the decoded µop cache) and a very high bandwidth ring bus. The L3 cache is also lower and the memory controller is much faster. I’ve gone through the architectural improvements in detail here. The end result is better performance all around. For the same money as you would’ve spent last year, you can expect anywhere from 10-50% more performance in existing applications and games from Sandy Bridge.
I mentioned Lynnfield because the performance mainstream quad-core segment hasn’t seen an update from Intel since its introduction in 2009. Sandy Bridge is here to fix that. The architecture will be available, at least initially, in both dual and quad-core flavors for mobile and desktop (our full look at mobile Sandy Bridge is here). By the end of the year we’ll have a six core version as well for the high-end desktop market, not to mention countless Xeon branded SKUs for servers.
The quad-core desktop Sandy Bridge die clocks in at 995 million transistors. We’ll have to wait for Ivy Bridge to break a billion in the mainstream. Encompassed within that transistor count are 114 million transistors dedicated to what Intel now calls Processor Graphics. Internally it’s referred to as the Gen 6.0 Processor Graphics Controller or GT for short. This is a DX10 graphics core that shares little in common with its predecessor. Like the SNB CPU architecture, the GT core architecture has been revamped and optimized to increase IPC. As we mentioned in our Sandy Bridge Preview article, Intel’s new integrated graphics is enough to make $40-$50 discrete GPUs redundant. For the first time since the i740, Intel is taking 3D graphics performance seriously.
|CPU Specification Comparison|
|CPU||Manufacturing Process||Cores||Transistor Count||Die Size|
|AMD Thuban 6C||45nm||6||904M||346mm2|
|AMD Deneb 4C||45nm||4||758M||258mm2|
|Intel Gulftown 6C||32nm||6||1.17B||240mm2|
|Intel Nehalem/Bloomfield 4C||45nm||4||731M||263mm2|
|Intel Sandy Bridge 4C||32nm||4||995M||216mm2|
|Intel Lynnfield 4C||45nm||4||774M||296mm2|
|Intel Clarkdale 2C||32nm||2||384M||81mm2|
|Intel Sandy Bridge 2C (GT1)||32nm||2||504M||131mm2|
|Intel Sandy Bridge 2C (GT2)||32nm||2||624M||149mm2|
It’s not all about hardware either. Game testing and driver validation actually has real money behind it at Intel. We’ll see how this progresses over time, but graphics at Intel today very different than it has ever been.
Despite the heavy spending on an on-die GPU, the focus of Sandy Bridge is still improving CPU performance: each core requires 55 million transistors. A complete quad-core Sandy Bridge die measures 216mm2, only 2mm2 larger than the old Core 2 Quad 9000 series (but much, much faster).
As a concession to advancements in GPU computing rather than build SNB’s GPU into a general purpose compute monster Intel outfitted the chip with a small amount of fixed function hardware to enable hardware video transcoding. The marketing folks at Intel call this Quick Sync technology. And for the first time I’ll say that the marketing name doesn’t do the technology justice: Quick Sync puts all previous attempts at GPU accelerated video transcoding to shame. It’s that fast.
There’s also the overclocking controversy. Sandy Bridge is all about integration and thus the clock generator has been moved off of the motherboard and on to the chipset, where its frequency is almost completely locked. BCLK overclocking is dead. Thankfully for some of the chips we care about, Intel will offer fully unlocked versions for the enthusiast community. And these are likely the ones you’ll want to buy. Here’s a preview of what’s to come:
The lower end chips are fully locked. We had difficulty recommending most of the Clarkdale lineup and I wouldn’t be surprised if we have that same problem going forward at the very low-end of the SNB family. AMD will be free to compete for marketshare down there just as it is today.
With the CPU comes a new platform as well. In order to maintain its healthy profit margins Intel breaks backwards compatibility (and thus avoids validation) with existing LGA-1156 motherboards, Sandy Bridge requires a new LGA-1155 motherboard equipped with a 6-series chipset. You can re-use your old heatsinks however.
The new chipset brings 6Gbps SATA support (2 ports) but still no native USB 3.0. That’ll be a 2012 thing it seems.
I don’t include a lot of super markety slides in these launch reviews, but this one is worthy of a mention:
Sandy Bridge is launching with no less than 29 different SKUs today. That’s 15 for mobile and 14 for desktop. Jarred posted his full review of the mobile Core i7-2820QM, so check that out if you want the mobile perspective on all of this.
By comparison, this time last year Intel announced 11 mobile Arrandale CPUs and 7 desktop parts. A year prior we got Lynnfield with 3 SKUs and Clarksfield with 3 as well. That Sandy Bridge is Intel’s biggest launch ever goes without saying. It’s also the most confusing. While Core i7 exclusively refers to processors with 4 or more cores (on the desktop at least), Core i5 can mean either 2 or 4 cores. Core i3 is reserved exclusively for dual-core parts.
Intel promised that the marketing would all make sense one day. Here we are, two and a half years later, and the Core i-branding is no clearer. At the risk of upsetting all of Intel Global Marketing, perhaps we should return to just labeling these things with their clock speeds and core counts? After all, it’s what Apple does—and that’s a company that still refuses to put more than one button on its mice. Maybe it’s worth a try.
Check Jarred’s article out for the mobile lineup, but on desktop here’s how it breaks down:
|Processor||Core Clock||Cores / Threads||L3 Cache||Max Turbo||Max Overclock Multiplier||TDP||Price|
|Intel Core i7-2600K||3.4GHz||4 / 8||8MB||3.8GHz||57x||95W||$317|
|Intel Core i7-2600||3.4GHz||4 / 8||8MB||3.8GHz||42x||95W||$294|
|Intel Core i5-2500K||3.3GHz||4 / 4||6MB||3.7GHz||57x||95W||$216|
|Intel Core i5-2500||3.3GHz||4 / 4||6MB||3.7GHz||41x||95W||$205|
|Intel Core i5-2400||3.1GHz||4 / 4||6MB||3.4GHz||38x||95W||$184|
|Intel Core i5-2300||2.8GHz||4 / 4||6MB||3.1GHz||34x||95W||$177|
|Intel Core i3-2120||3.3GHz||2 / 4||3MB||N/A||N/A||65W||$138|
|Intel Core i3-2100||2.93GHz||2 / 4||3MB||N/A||N/A||65W||$117|
Intel is referring to these chips as the 2nd generation Core processor family, despite three generations of processors carrying the Core architecture name before it (Conroe, Nehalem, and Westmere). The second generation is encapsulated in the model numbers for these chips. While all previous generation Core processors have three digit model numbers, Sandy Bridge CPUs have four digit models. The first digit in all cases is a 2, indicating that these are “2nd generation” chips and the remaining three are business as usual. I’d expect that Ivy Bridge will swap out the 2 for a 3 next year.
What you will see more of this time around are letter suffixes following the four digit model number. K means what it did last time: a fully multiplier unlocked part (similar to AMD’s Black Edition). The K-series SKUs are even more important this time around as some Sandy Bridge CPUs will ship fully locked, as in they cannot be overclocked at all (more on this later).
|Processor||Core Clock||Cores / Threads||L3 Cache||Max Turbo||TDP|
|Intel Core i7-2600S||2.8GHz||4 / 8||8MB||3.8GHz||65W|
|Intel Core i5-2500S||2.7GHz||4 / 4||6MB||3.7GHz||65W|
|Intel Core i5-2500T||2.3GHz||4 / 4||6MB||3.3GHz||45W|
|Intel Core i5-2400S||2.5GHz||4 / 4||6MB||3.3GHz||65W|
|Intel Core i5-2390T||2.7GHz||2 / 4||3MB||3.5GHz||35W|
|Intel Core i5-2100T||2.5GHz||2 / 4||3MB||N/A||35W|
There are also T and S series parts for desktop. These are mostly aimed at OEMs building small form factor or power optimized boxes. The S stands for “performance optimized lifestyle” and the T for “power optimized lifestyle”. In actual terms the Ses are lower clocked 65W parts while the Ts are lower clocked 35W or 45W parts. Intel hasn’t disclosed pricing on either of these lines but expect them to carry noticeable premiums over the standard chips. There’s nothing new about this approach; both AMD and Intel have done it for a little while now, it’s just more prevalent in Sandy Bridge than before.
In the old days Intel would segment chips based on clock speed and cache size. Then Intel added core count and Hyper Threading to the list. Then hardware accelerated virtualization. With Sandy Bridge the matrix grows even bigger thanks to the on-die GPU.
|Processor||Intel HD Graphics||Graphics Max Turbo||Quick Sync||VT-x||VT-d||TXT||AES-NI|
|Intel Core i7-2600K||3000||1350MHz||Y||Y||N||N||Y|
|Intel Core i7-2600||2000||1350MHz||Y||Y||Y||Y||Y|
|Intel Core i5-2500K||3000||1100MHz||Y||Y||N||N||Y|
|Intel Core i5-2500||2000||1100MHz||Y||Y||Y||Y||Y|
|Intel Core i5-2400||2000||1100MHz||Y||Y||Y||Y||Y|
|Intel Core i5-2300||2000||1100MHz||Y||Y||N||N||Y|
|Intel Core i3-2120||2000||1100MHz||Y||N||N||N||N|
|Intel Core i3-2100||2000||1100MHz||Y||N||N||N||Y|
While almost all SNB parts support VT-x (the poor i3s are left out), only three support VT-d. Intel also uses AES-NI as a reason to force users away from the i3 and towards the i5. I’ll get into the difference in GPUs in a moment.
Overclocking, the K-Series and What You’ll Want to Buy
If you haven’t noticed, the computing world is becoming more integrated. We review highly integrated SoCs in our smartphone coverage, and even on the desktop we’re seeing movement towards beefy SoCs. AMD pioneered the integrated memory controller on desktop PCs, Intel followed suit and with Lynnfield brought a PCIe controller on-die as well. Sandy Bridge takes the next logical step and brings a GPU on-die, a move matched by AMD with Brazos and Llano this year.
In the spirit of integration, Intel made one more change this round: the 6-series chipsets integrate the clock generator. What once was a component on the motherboard, the PLL is now on the 6-series chipset die. The integrated PLL feeds a source clock to everything from the SATA and PCIe controllers to the SNB CPU itself. With many components driven off of this one clock, Intel has locked it down pretty tight.
With Nehalem and Westmere, to overclock you simply adjusted the BCLK from 133MHz to whatever speed you wanted and sometimes toyed with multipliers to arrive at a happy end result. With Sandy Bridge, the BCLK generated on the 6-series PCH is at 100MHz by default and honestly won’t go much higher than that.
While I’ve heard reports of getting as high as 115MHz, I’d view 103—105MHz as the upper limit for what you’re going to get out of BCLK overclocking. In other words: next to nothing. A 105MHz BCLK overclock on a Core i7-2600 will take you from a stock speed of 3.4GHz to a whopping 3.57GHz. The form of overclocking we’ve been using for the past decade is effectively dead on Sandy Bridge.
Years ago, before the Pentium II, we didn’t rely on BCLK (or back then it was just FSB or bus overclocking) to overclock. Back then, if we wanted a faster CPU we’d just increase the clock multiplier. Intel has dabbled in offering multiplier unlocked parts for overclockers, we saw this last year with the Core i7 875K for example. With Sandy Bridge, those unlocked parts are going to be a lot more important to overclockers.
It works like this. If you have a part that does not support Turbo (e.g. Core i3-2100 series), then your CPU is completely clock locked. You can’t overclock it at all, have fun at your stock frequency. This is good news for AMD as it makes AMD even more attractive at those price points.
If you have a part that does support turbo (e.g. Core i5-2400), then you have what’s called a “limited unlocked” core—in other words you can overclock a little bit. These parts are limited to an overclock of 4 processor bins above and beyond the highest turbo frequency. Confused yet? This chart may help:
In this case we’re looking at a Core i5-2500, which runs at 3.3GHz by default. When a single core is active, the chip can turbo up to 3.7GHz. If you want, you can change that turbo state to go as high as 4.1GHz (if your CPU and cooling can keep up).
Overclocking these limited unlocked chips relies entirely on turbo however. In the case above, the fastest your chip will run is 4.1GHz but with only one core active. If you have four cores active the fastest your chip can run is 3.8GHz. While Intel didn’t sample any limited unlocked parts, from what I’ve heard you shouldn’t have any problems hitting these multiplier limits.
There’s a third class of part: a fully unlocked K-series chip. At launch there are only two of these processors: the Core i5-2500K and the Core i7-2600K. Anything with a K at the end of it means you get all multipliers from 16x all the way up to 57x at your disposal. It’s effectively fully unlocked.
These chips overclock very well. Both my Core i5-2500K and Core i7-2600K hit ~4.4GHz, fully stable, using the stock low-profile cooler.
With a bit more effort and a better cooler, you can get anywhere in the 4.6-5.0GHz range:
It's a bit too early to tell how solid these near-5GHz overclocks will be, but I'm confident in the sub-4.5GHz overclocks we were able to sustain.
You do pay a price premium for these K-series SKUs. The 2500K will cost you another $11 over a stock 2500 and the 2600K costs an extra $23. In the case of the 2500K, that’s a small enough premium that it’s honestly worth it. You pay $11 extra for a chip that is very conservatively clocked and just begging for you to overclock it. Even the 2600K’s premium isn’t bad at all.
|Model Number||Standard SKU||K-Series SKU||Price Premium|
|Intel Core i7-2600||$294||$317||+$23|
|Intel Core i5-2500||$205||$216||+$11|
As an added bonus, both K-series SKUs get Intel’s HD Graphics 3000, while the non-K series SKUs are left with the lower HD Graphics 2000 GPU.
Compared to Lynnfield, you’re paying $11 more than a Core i5-760 and you’re getting around 10-45% more performance, even before you overclock. In a perfect world I’d want all chips to ship unlocked; in a less perfect world I’d want there to be no price premium for the K-series SKUs, but at the end of the day what Intel is asking for here isn’t absurd. On the bright side, it does vastly simplify Intel’s product stack when recommending to enthusiasts: just buy anything with a K at the end of it.
Since we’re relying on multiplier adjustment alone for overclocking, your motherboard and memory actually matter less for overclocking with Sandy Bridge than they did with P55. On both P67 and H67, memory ratios are fully unlocked so you can independently set memory speed and CPU speed. Even the GPU ratios are fully unlocked on all platforms and fully independent from everything else.
The 6-series Platform
At launch Intel is offering two chipset families for Sandy Bridge: P-series and H-series, just like with Lynnfield. The high level differentiation is easy to understand: P-series doesn’t support processor graphics, H-series does.
There are other differences as well. The P67 chipset supports 2x8 CrossFire and SLI while H67 only supports a single x16 slot off of the SNB CPU (the chip has 16 PCIe 2.0 lanes that stem from it).
While H67 allows for memory and graphics overclocking, it doesn’t support any amount of processor overclocking. If you want to overclock your Sandy Bridge, you need a P67 motherboard.
Had SSDs not arrived when they did, I wouldn’t have cared about faster SATA speeds. That’s how it worked after all in the evolution of the hard drive. We’d get a faster ATA or SATA protocol, and nothing would really change. Sure we’d eventually get a drive that could take advantage of more bandwidth, but it was a sluggish evolution that just wasn’t exciting.
SSDs definitely changed all of that. Today there’s only a single 6Gbps consumer SSD on the market—Crucial’s RealSSD C300. By the middle of the year we’ll have at least two more high-end offerings, including SandForce’s SF-2000. All of these SSDs will be able to fully saturate a 3Gbps SATA interface in real world scenarios.
To meet the soon to be growing need for 6Gbps SATA ports Intel outfits the 6-series PCH with two 6Gbps SATA ports in addition to its four 3Gbps SATA ports.
I dusted off my 128GB RealSSD C300 and ran it through a bunch of tests on five different platforms: Intel’s X58 (3Gbps), Intel’s P67 (3Gbps and 6Gbps), AMD’s 890GX (6Gbps) and Intel’s X58 with a Marvell 9128 6Gbps SATA controller. The Marvell 91xx controller is what you’ll find on most 5-series motherboards with 6Gbps SATA support.
I ran sequential read/write and random read/write tests, at a queue depth of 32 to really stress the limits of each chipset’s SATA protocol implementation. I ran the sequential tests for a minute straight and the random tests for three minutes. I tested a multitude of block sizes ranging from 512-bytes all the way up to 32KB. All transfers were 4KB aligned to simulate access in a modern OS. Each benchmark started at LBA 0 and was allowed to use the entire LBA space for accesses. The SSD was TRIMed between runs involving writes.
Among Intel chipsets I found that the X58 has stellar 3Gbps SATA performance, which is why I standardize on it for my SSD testbed. Even compared to the new 6-series platform there are slight advantages at high queue depths to the X58 vs. Intel’s latest chipsets.
Looking at 6Gbps performance though there’s no comparison, the X58 is dated in this respect. Thankfully all of the contenders do well in our 6Gbps tests. AMD’s 8-series platform is a bit faster at certain block sizes but for the most part it, Intel’s 6-series and Marvell’s 91xx controllers perform identically.
I hate to be a bore but when it comes to SATA controllers an uneventful experience is probably the best you can hope for.
UEFI Support: 3TB Drives & Mouse Support Pre-Boot
Remember the mountain of issues I had trying to get Seagate’s 3TB HDD to work as a boot drive in my X58 system? A couple of weeks ago Intel released version 10.1 of its storage drivers, which added software support for drives larger than 2.2TB. That’s one piece of the puzzle. With Sandy Bridge, many motherboard manufacturers are moving to UEFI instead of traditional 32-bit PC BIOSes. Combine that with a GPT partition and your new Sandy Bridge system should have no problems booting to and accessing 3TB drives made of a single partition.
ASUS sent over a couple of its 6-series motherboards which boast a custom skinned UEFI implementation. You get all of the functionality of a traditional BIOS but with a GUI, and yes, there’s full mouse support.
You’re either going to love or hate the new UEFI GUIs. They do take a little time to get used to but pretty much everything is where you’d expect it to be. Navigating with the mouse can be quicker than the keyboardin some situations and slower in others. Thankfully the interface, at least ASUS’, is pretty quick. There’s scroll wheel support although no draggable scroll bars, which makes quickly scrolling a little frustrating.
Unlike P55, you can set your SATA controller to compatible/legacy IDE mode. This is something you could do on X58 but not on P55. It’s useful for running HDDERASE to secure erase your SSD for example. If you do want to use HDDERASE on a 6-series motherboard you’ll need to first run HDDERASE4 to disable the UEFI initiated security on your drive and then run HDDERASE3 to secure erase it.
The biggest improvement to me honestly is POST time. Below is a quick comparison of time from power on to the Starting Windows screen. I’m using the exact same hardware in all three cases, just varying motherboard/CPU:
|Intel P67||Intel P55||Intel X58|
|Time from Power on to Boot Loader||22.4 seconds||29.4 seconds||29.3 seconds|
In developing its 6-series chipsets Intel wanted to minimize as much risk as possible, so much of the underlying chipset architecture is borrowed from Lynnfield’s 5-series platform. The conservative chipset development for Sandy Bridge left a hole in the lineup. The P67 chipset lets you overclock CPU and memory but it lacks the flexible display interface necessary to support SNB’s HD Graphics. The H67 chipset has an FDI so you can use the on-die GPU, however it doesn’t support CPU or memory overclocking. What about those users who don’t need a discrete GPU but still want to overclock their CPUs? With the chipsets that Intel is launching today, you’re effectively forced to buy a discrete GPU if you want to overclock your CPU. This is great for AMD/NVIDIA, but not so great for consumers who don’t need a discrete GPU and not the most sensible decision on Intel’s part.
There is a third member of the 6-series family that will begin shipping in Q2: Z68. Take P67, add processor graphics support and you’ve got Z68. It’s as simple as that. Z68 is also slated to support something called SSD Caching, which Intel hasn’t said anything to us about yet. With version 10.5 of Intel’s Rapid Storage Technology drivers, Z68 will support SSD caching. This sounds like the holy grail of SSD/HDD setups, where you have a single drive letter and the driver manages what goes on your SSD vs. HDD. Whether SSD Caching is indeed a DIY hybrid hard drive technology remains to be seen. It’s also unclear whether or not P67/H67 will get SSD Caching once 10.5 ships.
LGA-2011 Coming in Q4
One side effect of Intel’s tick-tock cadence is a staggered release update schedule for various market segments. For example, Nehalem’s release in Q4 2008 took care of the high-end desktop market, however it didn’t see an update until the beginning of 2010 with Gulftown. Similarly, while Lynnfield debuted in Q3 2009 it was left out of the 32nm refresh in early 2010. Sandy Bridge is essentially that 32nm update to Lynnfield.
So where does that leave Nehalem and Gulftown owners? For the most part, the X58 platform is a dead end. While there are some niche benefits (more PCIe lanes, more memory bandwidth, 6-core support), the majority of users would be better served by Sandy Bridge on LGA-1155.
For the users who need those benefits however, there is a version of Sandy Bridge for you. It’s codenamed Sandy Bridge-E and it’ll debut in Q4 2011. The chips will be available in both 4 and 6 core versions with a large L3 cache (Intel isn’t being specific at this point).
SNB-E will get the ring bus, on-die PCIe and all of the other features of the LGA-1155 Sandy Bridge processors, but it won’t have an integrated GPU. While current SNB parts top out at 95W TDP, SNB-E will run all the way up to 130W—similar to existing LGA-1366 parts.
The new high-end platform will require a new socket and motherboard (LGA-2011). Expect CPU prices to start off at around the $294 level of the new i7-2600 and run all the way up to $999.
A Near-Perfect HTPC
Since 2006 Intel’s graphics cores have supported sending 8-channel LPCM audio over HDMI. In 2010 Intel enabled bitstreaming of up to eight channels of lossless audio typically found on Blu-ray discs via Dolby TrueHD and DTS-HD MA codecs. Intel’s HD Graphics 3000/2000 don’t add anything new in the way of audio or video codec support.
Dolby Digital, TrueHD (up to 7.1), DTS, DTS-HD MA (up to 7.1) can all be bitstreamed over HDMI. Decoded audio can also be sent over HDMI. From a video standpoint, H.264, VC-1 and MPEG-2 are all hardware accelerated. The new GPU enables HDMI 1.4 and Blu-ray 3D support. Let’s run down the list:
Dolby TrueHD Bitstreaming? Works:
DTS HD-MA bitstreaming? Yep:
Blu-ray 3D? Make that three:
How about 23.976 fps playback? Sorry guys, even raking in $11 billion a quarter doesn’t make you perfect.
Here’s the sitch, most movie content is stored at 23.976 fps but incorrectly referred to as 24p or 24 fps. That sub-30 fps frame rate is what makes movies look like, well, movies and not soap operas (this is also why interpolated 120Hz modes on TVs make movies look cheesey since they smooth out the 24 fps film effect). A smaller portion of content is actually mastered at 24.000 fps and is also referred to as 24p.
In order to smoothly playback either of these formats you need a player and a display device capable of supporting the frame rate. Many high-end TVs and projectors support this just fine, however on the playback side Intel only supports the less popular of the two: 24.000Hz.
This isn’t intentional, but rather a propagation of an oversight that started back with Clarkdale. Despite having great power consumption and feature characteristics, Clarkdale had one glaring issue that home theater enthusiasts discovered: despite having a 23Hz setting in the driver, Intel’s GPU would never output anything other than 24Hz to a display.
The limitation is entirely in hardware, particularly in what’s supported by the 5-series PCH (remember that display output is routed from the processor’s GPU to the video outputs via the PCH). One side effect of trying to maintain Intel’s aggressive tick-tock release cadence is there’s a lot of design reuse. While Sandy Bridge was a significant architectural redesign, the risk was mitigated by reusing much of the 5-series PCH design. As a result, the hardware limitation that prevented a 23.976Hz refresh rate made its way into the 6-series PCH before Intel discovered the root cause.
Intel had enough time to go in and fix the problem in the 6-series chipsets, however doing so would put the chipset schedule at risk given that fixing the problem requires a non-trivial amount of work to correct. Not wanting to introduce more risk into an already risky project (brand new out of order architecture, first on-die GPU, new GPU architecture, first integrated PLL), Intel chose to not address it this round, which is why we still have the problem today.
Note the frame rate
What happens when you try to play 23.976 fps content on a display that refreshes itself 24.000 times per second? You get a repeated frame approximately every 40 seconds to synchronize the source frame rate with the display frame rate. That repeated frame appears to your eyes as judder in motion, particularly evident in scenes involving a panning camera.
How big of an issue this is depends on the user. Some can just ignore the judder, others will attempt to smooth it out by setting their display to 60Hz, while others will be driven absolutely insane by it.
If you fall into the latter category, your only option for resolution is to buy a discrete graphics card. Currently AMD’s Radeon HD 5000 and 6000 series GPUs correctly output a 23.976Hz refresh rate if requested. These GPUs also support bitstreaming Dolby TrueHD and DTS-HD MA, while the 6000 series supports HDMI 1.4a and stereoscopic 3D. The same is true for NVIDIA’s GeForce GT 430, which happens to be a pretty decent discrete HTPC card.
Intel has committed to addressing the problem in the next major platform revision, which unfortunately seems to be Ivy Bridge in 2012. There is a short-term solution for HTPC users absolutely set on Sandy Bridge. Intel has a software workaround that enables 23.97Hz output. There’s still a frame rate mismatch at 23.97Hz, but it would be significantly reduced compared to the current 24.000Hz-only situation.
MPC-HC Compatibility Problems
Just a heads up. Media Player Classic Home Cinema doesn't currently play well with Sandy Bridge. Enabling DXVA acceleration in MPC-HC will cause stuttering and image quality issues during playback. It's an issue with MPC-HC and not properly detecting SNB as far as I know. Intel has reached out to the developer for a fix.
Intel’s Quick Sync Technology
In recent years video transcoding has become one of the most widespread consumers of CPU power. The popularity of YouTube alone has turned nearly everyone with a webcam into a producer, and every PC into a video editing station. The mobile revolution hasn’t slowed things down either. No smartphone can play full bitrate/resolution 1080p content from a Blu-ray disc, so if you want to carry your best quality movies and TV shows with you, you’ll have to transcode to a more compressed format. The same goes for the new wave of tablets.
At a high level, video transcoding involves taking a compressed video stream and further compressing it to better match the storage and decoding abilities of a target device. The reason this is transcoding and not encoding is because the source format is almost always already encoded in some sort of a compressed format. The most common, these days, being H.264/AVC.
Transcoding is a particularly CPU intensive task because of the three dimensional nature of the compression. Each individual frame within a video can be compressed; however, since sequential frames of video typically have many of the same elements, video compression algorithms look at data that’s repeated temporally as well as spatially.
I remember sitting in a hotel room in Times Square while Godfrey Cheng and Matthew Witheiler of ATI explained to me the challenges of decoding HD-DVD and Blu-ray content. ATI was about to unveil hardware acceleration for some of the stages of the H.264 decoding pipeline. Full hardware decode acceleration wouldn’t come for another year at that point.
The advent of fixed function video decode in modern GPUs is important because it helped enable GPU accelerated transcoding. The first step of the video transcode process is to first decode the source video. Since transcoding involves taking a video already in a compressed format and encoding it in a new format, hardware accelerated video decode is key. How fast a decode engine is has a tremendous impact on how fast a hardware accelerated video encode can run. This is true for two reasons.
First, unlike in a playback scenario where you only need to decode faster than the frame rate of the video, when transcoding the video decode engine can run as fast as possible. The faster frames can be decoded, the faster they can be fed to the transcode engine. The second and less obvious point is that some of the hardware you need to accelerate video encoding is already present in a video decode engine (e.g. iDCT/DCT hardware).
With video transcoding as a feature of Sandy Bridge’s GPU, Intel beefed up the video decode engine from what it had in Clarkdale. In the first generation Core series processors, video decode acceleration was split between fixed function decode hardware and the GPU’s EU array. With Sandy Bridge and the second generation Core CPUs, video decoding is done entirely in fixed function hardware. This is not ideal from a flexibility standpoint (e.g. newer video codecs can’t be fully hardware accelerated on existing hardware), but it is the most efficient method to build a video decoder from a power and performance standpoint. Both AMD and NVIDIA have fixed function video decode hardware in their GPUs now; neither rely on the shader cores to accelerate video decode.
The resulting hardware is both performance and power efficient. To test the performance of the decode engine I launched multiple instances of a 15Mbps 1080p high profile H.264 video running at 23.976 fps. I kept launching instances of the video until the system could no longer maintain full frame rate in all of the simultaneous streams. The graph below shows the maximum number of streams I could run in parallel:
|Intel Core i5-2500K||NVIDIA GeForce GTX 460||AMD Radeon HD 6870|
|Number of Parallel 1080p HP Streams||5 streams||3 streams||1 stream|
AMD’s Radeon HD 6000 series GPUs can only manage a single high profile, 1080p H.264 stream, which is perfectly sufficient for video playback. NVIDIA’s GeForce GTX 460 does much better; it could handle three simultaneous streams. Sandy Bridge however takes the cake as a single Core i5-2500K can decode five streams in tandem.
The Sandy Bridge decoder is likely helped by the very large (and high bandwidth) L3 cache connected to it. This is the first advantage Intel has in what it calls its Quick Sync technology: a very fast decode engine.
The decode engine is also reused during the actual encode phase. Once frames of the source video are decoded, they are actually fed to the programmable EU array to be split apart and prepared for transcoding. The data in each frame is transformed from the spatial domain (location of each pixel) to the frequency domain (how often pixels of a certain color appear); this is done by the use of a discrete cosine transform. You may remember that inverse discrete cosine transform hardware is necessary to decode video; well, that same hardware is useful in the domain transform needed when transcoding.
Motion search, the most compute intensive part of the transcode process, is done in the EU array. It's the combination of the fast decoder, the EU array, and fixed function hardware that make up Intel's Quick Sync engine.
Quick Sync: The Best Way to Transcode
Currently Intel’s Quick Sync transcode is only supported by two applications: Cyberlink’s Media Espresso 6 and Arcsoft’s Media Converter 7. Both of these applications are video to go applications targeted at users who want to take high resolution/high bitrate content and transcode it to more compact formats for use on smartphones, tablets, media streamers and gaming consoles. The intended market is not users who are attempting to make high quality archives of Blu-ray content. As a result, there’s no support for multi-channel audio; both applications are limited to 2-channel MP3 or AAC output. There’s also no support for transcoding to anything higher than the main profile of H.264.
Intel indicates that these are not hardware limitations of Quick Sync, but rather limitations of the transcoding software. To that extent, Intel is working with developers of video editing applications to bring Quick Sync support to applications that have a more quality-oriented usage model. These applications are using Intel’s Media SDK 2.0 which is publicly available. Intel says that any developer can get access to and use it.
For the purposes of this comparison I’ve used Media Converter 7, but that’s purely a personal preference thing. The performance and image quality should be roughly identical between the two applications as they both use the same APIs. Jarred's look at Mobile Sandy Bridge will focus on MediaEspresso.
Where image quality isn’t consistent however is between transcoding methods in either application. Both applications support four codepaths: ATI Stream, Intel Quick Sync, NVIDIA CUDA, and x86. While you can set any of these codepaths to the same transcoding settings, the method by which they arrive at the transcoded image will differ. This makes sense given how different all four target architectures are (e.g. a Radeon HD 6870 doesn’t look anything like a NVIDIA GeForce GTX 460). Each codepath makes a different set of performance vs. quality tradeoffs which we’ll explore in this section.
The first but not as obvious difference is if you use the Sandy Bridge CPU cores vs. Quick Sync to transcode you will actually get a different image. The image quality is slightly better on the x86 path, but the two are similar.
The reason for the image quality difference is easy to understand. CPUs are inherently not very parallel beasts. We get tremendous speedup on highly parallel tasks on multi-core CPUs, but compared to a GPU’s ability to juggle hundreds or thousands of threads, even a 6-core CPU doesn’t look too wide. As a result of this serial vs. parallel difference, transcoding algorithms optimized for CPUs are very computationally efficient. They have to be, because you can’t rely on hundreds of cores running in parallel when you’re running on a CPU.
Take the same code and run it on a GPU and you’ll find that the majority of your execution resources are wasted. A new codepath is needed that can take advantage of the greater amount of compute at your disposal. For example, a GPU can evaluate many different compression modes in parallel whereas on a CPU you generally have to pick a balance between performance and quality up front regardless of the content you’re dealing with.
There’s also one more basic difference between code running on the CPU vs. integrated GPU. At least in Intel’s case, certain math operations can be performed with higher precision on Sandy Bridge’s SSE units vs. the GPU’s EUs.
Intel tuned the PSNR of the Quick Sync codepath to be as similar to the x86 codepath as possible. The result is, as I mentioned above, quite similar:
Now let’s tackle the other GPUs. When I first started my Quick Sync investigations I did a little experiment. Without forming any judgments of my own, I quickly transcoded a ~15Mbps 1080p movie into a iPhone 4 compatible 720p H.264 at 4Mbps. I then trimmed it down to a single continuous 4 minute scene and passed the movie along to six AnandTech editors. I sent the editors three copies of the 4 minute scene. One transcoded on a GeForce GTX 460, one using Intel’s Quick Sync, and one using the standard x86 codepath. I named the three movies numerically and told no one which platform was responsible for each output. All I asked for was feedback on which ones they thought were best.
Here are some of the comments I received:
“Wow... there are some serious differences in quality. I'm concerned that the 1.mp4 is the accelerated transcode, in which case it looks like poop..”
“Video 1: Lots of distracting small compression blocks, as if the grid was determined pre-encoding (I know that generally there are blocks, but here the edges seem to persist constantly). Persistent artifacts after black. Quality not too amazing, I wouldn't be happy with this.”
Video one, which many assumed was Quick Sync, actually came from the GeForce GTX 460. The CUDA codepath, although extremely fast, actually produces a much worse image. Videos 2 and 3 were outputs from Sandy Bridge, and the editors generally didn’t agree on which one of those two looked better just that they were definitely better than the first video.
To confirm whether or not this was a fluke I set up three different transcodes. Lossy video compression is hard to get right when you’re transcoding scenes that are changing quickly, so I focused on scenes with significant movement.
The first transcode involves taking the original Casino Royale Blu-ray, stripping it of its DRM using AnyDVD HD, and feeding that into MC7 as a source. The output in this case was a custom profile: 15Mbps 1080p main profile H.264. This is an unrealistic usage model simply because the output file only had 2-channel audio, making it suitable only for PC use and likely a waste of bitrate. I simply wanted to see how the various codepaths looked and performed with an original BD source.
Let’s look at performance first. The entire movie has around 200,000 frames, the transcoding frame rate is below:
As we’ve been noting in our GPU reviews for quite some time now, there’s no advantage to transcoding on a GPU faster than the $200 mainstream parts. Remember that the transcode process isn’t all infinitely parallel, we are ultimately bound by the performance of the sequential components of the algorithm. As a result, the Radeon HD 6970 offers no advantage over the 6870 here. Both of these AMD GPUs end up being just as fast as a Core i5-2500K.
NVIDIA’s GPUs offer a 15.7% performance advantage, but as I mentioned earlier, the advantage comes at the price of decreased quality (which we’ll get to in a moment).
Inte’s Quick Sync is untouchable though. It’s 48% faster than NVIDIA’s GeForce GTX 460 and 71% faster than the Radeon HD 6970. I don’t want to proclaim that discrete GPU based transcoding is dead, but based on these results it sure looks like it. What about image quality?
My image quality test scene isn’t anything absurd. Bond and Vespyr are about to meet Mathis for the first time. Mathis walks towards the two and the camera pans to follow him. With only one character and the camera both moving at a predictable rate, using some simple motion estimation most high quality transcoders should be able to handle this scene without getting tripped up too much.
|Intel Core i5-2500K (x86)||Intel Quick Sync||NVIDIA GeForce GTX 460||AMD Radeon HD 6870|
|Download: PNG||Download: PNG||Download: PNG||Download: PNG|
The GeForce GTX 460 looks horrible here. The output looks like an old film, it’s simply inexcusable.
The Radeon HD 6870 produces a frame that has similar sharpness to the x86 codepath, but with muted colors. Quick Sync maintains color fidelity but loses the sharpness of the x86 path, similar to what we saw in the previous test. In this case the loss of sharpness does help smooth out some aliasing in the paint on the police car but otherwise is undesirable.
Overall, based on what I’ve seen in my testing of Quick Sync, it isn’t perfect but it does deliver a good balance of image quality and performance. With Quick Sync enabled you can transcode a ~2.5 hour Blu-ray disc in around 35 minutes. If you’ve got a lower quality source (e.g. a 15GB Blu-ray re-encode), you can plan on doing a full movie in around 13 minutes. Quick Sync will chew through TV shows in a couple of minutes, without a tremendous loss in quality.
With CUDA on NVIDIA GPUs we had to choose between high quality or high performance. (Perhaps other applications will do the transcode better as well, but at least Arcsoft's Media Converter 7 has serious image quality problems with CUDA.) With Quick Sync you can have both, and better performance than we’ve ever seen from any transcoding solution in desktops or notebooks.
Quick Sync with a Discrete GPU
There’s just one hangup to all of this Quick Sync greatness: it only works if the processor’s GPU is enabled. In other words, on a desktop with a single monitor connected to a discrete GPU, you can’t use Quick Sync.
This isn’t a problem for mobile since Sandy Bridge notebooks should support switchable graphics, meaning you can use Quick Sync without waking up the discrete GPU. However there’s no standardized switchable graphics for desktops yet. Intel indicated that we may see some switchable solutions in the coming months on the desktop, but until then you either have to use the integrated GPU alone or run a multimonitor setup with one monitor connected to Intel’s GPU in order to use Quick Sync.
Intel’s Gen 6 Graphics
All 2nd generation Core series processors that fit into an LGA-1155 motherboard will have one of two GPUs integrated on-die: Intel’s HD Graphics 3000 or HD Graphics 2000. Intel’s upcoming Sandy Bridge E for LGA-2011 will not have an on-die GPU. All mobile 2nd generation Core series processors feature HD Graphics 3000.
The 3000 vs. 2000 comparison is pretty simple. The former has 12 cores or EUs as Intel likes to call them, while the latter only has 6. Clock speeds are the same although the higher end parts can turbo up to higher frequencies. Each EU is 128-bits wide, which makes a single EU sound a lot like a single Cayman SP.
Unlike Clarkdale, all versions of HD Graphics on Sandy Bridge support Turbo. Any TDP that is freed up by the CPU running at a lower frequency or having some of its cores shut off can be used by the GPU to turbo up. The default clock speed for both HD 2000 and 3000 on the desktop is 850MHz; however, the GPU can turbo up to 1100MHz in everything but the Core i7-2600/2600K. The top-end Sandy Bridge can run its GPU at up to 1350MHz.
|Processor||Intel HD Graphics||EUs||Quick Sync||Graphics Clock||Graphics Max Turbo|
|Intel Core i7-2600K||3000||12||Y||850MHz||1350MHz|
|Intel Core i7-2600||2000||6||Y||850MHz||1350MHz|
|Intel Core i5-2500K||3000||12||Y||850MHz||1100MHz|
|Intel Core i5-2500||2000||6||Y||850MHz||1100MHz|
|Intel Core i5-2400||2000||6||Y||850MHz||1100MHz|
|Intel Core i5-2300||2000||6||Y||850MHz||1100MHz|
|Intel Core i3-2120||2000||6||Y||850MHz||1100MHz|
|Intel Core i3-2100||2000||6||Y||850MHz||1100MHz|
|Intel Pentium G850||Intel HD Graphics||6||N||850MHz||1100MHz|
|Intel Pentium G840||Intel HD Graphics||6||N||850MHz||1100MHz|
|Intel Pentium G620||Intel HD Graphics||6||N||850MHz||1100MHz|
Mobile is a bit different. The base GPU clock in all mobile SNB chips is 650MHz but the max turbo is higher at 1300MHz. The LV/ULV parts also have different max clocks, which we cover in the mobile article.
As I mentioned before, all mobile 2nd gen Core processors get the 12 EU version—Intel HD Graphics 3000. The desktop side is a bit more confusing. In desktop, the unlocked K-series SKUs get the 3000 GPU while everything else gets the 2000 GPU. That’s right: the SKUs most likely to be paired with discrete graphics are given the most powerful integrated graphics. Of course those users don’t pay any penalty for the beefier on-die GPU; when not in use the GPU is fully power gated.
Despite the odd perk for the K-series SKUs, Intel’s reasoning behind the GPU split does makes sense. The HD Graphics 2000 GPU is faster than any desktop integrated GPU on the market today, and it’s easy to add discrete graphics to a desktop system if the integrated GPU is insufficient. The 3000 is simply another feature to justify the small price adder for K-series buyers.
On the mobile side going entirely with 3000 is simply because of the quality of integrated or low-end graphics in mobile. You can’t easily add in a discrete card so Intel has to put its best foot forward to appease OEMs like Apple. I suspect the top-to-bottom use of HD Graphics 3000 in mobile is directly responsible for Apple using Sandy Bridge without a discrete GPU in its entry level notebooks in early 2011.
I’ve been careful to mention the use of HD Graphics 2000/3000 in 2nd generation Core series CPUs, as Intel will eventually bring Sandy Bridge down to the Pentium brand with the G800 and G600 series processors. These chips will feature a version of HD Graphics 2000 that Intel will simply call HD Graphics. Performance will be similar to the HD Graphics 2000 GPU, however it won’t feature Quick Sync.
Image Quality and Experience
Perhaps the best way to start this section is with a list. Between Jarred and I, these are the games we’ve tested with Intel’s on-die HD 3000 GPU:
Batman: Arkham Asylum
Battlefield: Bad Company 2
Call of Duty: Black Ops
Call of Duty: Modern Warfare 2
Chronicles of Riddick: Dark Athena
Dawn of War II
Dragon Age Origins
Elder Scrolls IV: Oblivion
Empire: Total War
Far Cry 2
Fallout: New Vegas
FEAR 2: Project Origin
Left 4 Dead 2
Mass Effect 2
STALKER: Call of Pripyat
World of Warcraft
This is over two dozen titles, both old and new, that for the most part worked on Intel’s integrated graphics. Now for a GPU maker, this is nothing to be proud of, but given Intel’s track record with game compatibility this is a huge step forward.
We did of course run into some issues. Fallout 3 (but not New Vegas) requires a DLL hack to even run on Intel integrated graphics, and we saw some shadow rendering issues in Mafia II, but for the most part the titles—both old and new—worked.
Modern Warfare 2 in High Quality
Now the bad news. Despite huge performance gains and much improved compatibility, even the Intel HD Graphics 3000 requires that you run at fairly low detail settings to get playable frame rates in most of these games. There are a couple of exceptions but for the most part the rule of integrated graphics hasn’t changed: turn everything down before you start playing.
Modern Warfare 2 the way you have to run it on Intel HD Graphics 3000
This reality has been true for more than just Intel integrated graphics however. Even IGPs from AMD and NVIDIA had the same limitations, as well as the lowest end discrete cards on the market. The only advantage those solutions had over Intel in the past was performance.
Realistically we need at least another doubling of graphics performance before we can even begin to talk about playing games smoothly at higher quality settings. Interestingly enough, I’ve heard the performance of Intel’s HD Graphics 3000 is roughly equal to the GPU in the Xbox 360 at this point. It only took six years for Intel to get there. If Intel wants to contribute positively to PC gaming, we need to see continued doubling of processor graphics performance for at least the next couple generations. Unfortunately I’m worried that Ivy Bridge won’t bring another doubling as it only adds 4 EUs to the array.
Intel HD Graphics 2000/3000 Performance
I dusted off two low-end graphics cards for this comparison: a Radeon HD 5450 and a Radeon HD 5570. The 5450 is a DX11 part with 80 SPs and a 64-bit memory bus. The SPs run at 650MHz and the DDR3 memory interface has a 1600MHz data rate. That’s more compute power than the Intel HD Graphics 3000 but less memory bandwidth than a Sandy Bridge if you assume the CPU cores aren’t consuming more than half of the available memory bandwidth. The 5450 will set you back $45 at Newegg and is passively cooled.
The Radeon HD 5570 is a more formidable opponent. Priced at a whopping $70, this GPU comes with 400 SPs and a 128-bit memory bus. The core clock remains at 650MHz and the DDR3 memory interface has an 1800MHz data rate. This is more memory bandwidth and much more compute than the HD Graphics 3000 can offer.
Based on what we saw in our preview I’d expect performance similar to the Radeon HD 5450 and significantly lower than the Radeon HD 5570. Both of these cards were paired with a Core i5-2500K to remove any potential CPU bottlenecks.
On the integrated side we have a few representatives. AMD’s 890GX is still the cream of the crop for AMD integrated for at least a few more months. I paired it with a 6-core 1100T to keep the CPU from impacting things.
Representing Clarkdale I have a Core i5-661 and 660. Both chips run at 3.33GHz but the 661 has a 900MHz GPU while the 660 runs at 733MHz. These are the fastest representatives with last year’s Intel HD Graphics, but given the margin of improvement I didn’t feel the need to show anything slower.
And finally from Sandy Bridge we have three chips: the Core i5-2600K and 2500K both with Intel HD Graphics 3000 (but different turbo modes) and the Core i3-2100 with HD Graphics 2000.
Nearly all of our test titles were run at the lowest quality settings available in game at 1024x768. We ran with the latest drivers available as of 12/30/2010. Note that all of the screenshots used below were taken on Intel's HD Graphics 3000. For a comparison of IQ between it and the Radeon HD 5450 I've zipped up originals of all of the images here.
Dragon Age: Origins
DAO has been a staple of our integrated graphics benchmark for some time now. The third/first person RPG is well threaded and is influenced both by CPU and GPU performance.
We ran at 1024x768 with graphics and texture quality both set to low. Our benchmark is a FRAPS runthrough of our character through a castle.
The new Intel HD Graphics 2000 is roughly the same performance level as the highest clock speed HD Graphics offered with Clarkdale. The Core i3-2100 and Core i5-661 deliver about the same level of performance here. Both are faster than AMD’s 890GX and all three of them are definitely playable in this test.
HD Graphics 3000 is a huge step forward. At 71.5 fps it’s 70% faster than Clarkdale’s integrated graphics, and fast enough that you can actually crank up some quality settings if you’d like. The higher end HD Graphics 3000 is also 26% faster than a Radeon HD 5450.
What Sandy Bridge integrated graphics can’t touch however is the Radeon HD 5570. At 112.5 fps, the 5570’s compute power gives it a 57% advantage over Intel’s HD Graphics 3000.
Dawn of War II
Dawn of War II is an RTS title that ships with a built in performance test. I ran at the lowest quality settings at 1024x768.
Here the Core i7-2600K and 2500K fall behind the Radeon HD 5450. The 5450 manages a 25% lead over the HD Graphics 3000 on the 2600K. It's interesting to note the tangible performance difference enabled by the higher max graphics turbo frequency of the 2600K (1350MHz vs. 1100MHz). It would appear that Dawn of War II is largely compute bound on these low-end GPUs.
Compared to last year's Intel HD Graphics, the performance improvement is huge. Even the HD Graphics 2000 is almost 30% faster than the fastest Intel offered with Clarkdale. While I wouldn't view Clarkdale as being useful graphics, at the performance levels we're talking about now game developers should at least be paying attention to Intel's integrated graphics.
Call of Duty: Modern Warfare 2
Our Modern Warfare 2 benchmark is a quick FRAPS run through a multiplayer map. All settings were turned down/off and we ran at 1024x768.
The Intel HD Graphics 3000 enabled chips are able to outpace the Radeon HD 5450 by at least 5%. The 2000 model isn't able to do as well, losing out to even the 890GX. On the notebook side this won't be an issue but for desktops with integrated graphics, it is a problem as most will have the lower end GPU.
The performance improvement over last year's Clarkdale IGP is at least 30%, and more if you compare to the more mainstream Clarkdale SKUs.
Our test is a quick FRAPS runthrough in the first level of BioShock 2. All image quality settings are set to low, resolution is at 1024x768.
Once again the HD Graphics 3000 GPUs are faster than the Radeon HD 5450; it's the 2000 model that's slower. In this case the Core i3-2100 is actually slightly slower than last year's Core i5-661.
World of Warcraft
Our WoW test is run at fair quality settings (with weather turned down all the way) on a lightly populated server in an area where no other players are present to produce repeatable results. We ran at 1024x768.
The high-end HD Graphics 3000 SKUs do very well vs. the Radeon HD 5450 once again. We're at more than playable frame rates in WoW with all of the Sandy Bridge parts, although the two K-series SKUs are obviously a bit smoother.
Our HAWX performance tests were run with the game's built in benchmark in DX10 mode. All detail settings were turned down/off and we ran at 1024x768.
The Radeon HD 5570 continues to be completely untouchable. While Sandy Bridge can compete in the ~$40-$50 GPU space, anything above that is completely out of its reach. That isn't too bad considering Intel spent all of 114M transistors on the SNB GPU, but I do wonder if Intel will be able to push up any higher in the product stack in future GPUs.
Once again the HD Graphics 2000 GPU is a bit too slow for my tastes, just barely edging out the fastest Clarkdale GPU.
We have two Starcraft II benchmarks: a GPU and a CPU test. The GPU test is mostly a navigate-around-the-map test, as scrolling and panning around tends to be the most GPU bound in the game. Our CPU test involves a massive battle of six armies in the center of the map, stressing the CPU more than the GPU. At these low quality settings, however, both benchmarks are influenced by CPU and GPU.
Starcraft II is really a strong point of Sandy Bridge's graphics. It's more than fast enough to run one of the most popular PC games out today. You can easily crank up quality settings or resolution without turning the game into a slideshow. Of course, low quality SC2 looks pretty weak compared to medium quality, but it's better than nothing.
Our CPU test actually ends up being GPU bound with Intel's integrated graphics, AMD's 890GX is actually faster here:
Call of Duty: Black Ops
Call of Duty: Black Ops is basically unplayable on Sandy Bridge integrated graphics. I'm guessing this is not a compute bound scenario but rather an optimization problem for Intel. You'll notice there's hardly any difference between the performance of the 2000 and 3000 GPUs, indicating a bottleneck elsewhere. It could be memory bandwidth. Despite the game's near-30fps frame rate, there's way too much stuttering and jerkiness during the game to make it enjoyable.
Mafia II ships with a built in benchmark which we used for our comparison.
Frame rates are pretty low here, definitely not what I'd consider playable. This is a fact across the board though; you need to spend at least $70 on a GPU to get a playable experience here.
For our Civilization V test we're using the game's built in lateGameView benchmark. The test was run in DX9 mode with everything turned down at 1024x768:
Performance here is pretty low. Even a Radeon HD 5450 isn't enough to get you smooth frame rates; a discrete GPU is just necessary for some games. Civ V does have the advantage of not depending on high frame rates, though; the mouse input is decoupled from rendering, so you can generally interact with the game even at low frame rates.
We're using the Metro 2033 benchmark that comes with the patched game. Occasionally I noticed rendering issues at the Metro 2033 menu screen but I couldn't reproduce the problem regularly on Intel's HD Graphics.
Metro 2033 and many newer titles are just not playable at smooth frame rates on anything this low-end. Intel integrated graphics as well as low-end discrete GPUs are best paired with older games.
Our DiRT 2 performance numbers come from the demo's built-in benchmark:
DiRT 2 is another game that needs compute power, and the faster 2600K gets a decent boost from the higher clock speed. Frame rates are relatively consistent as well, though you'll get dips into the low 20s and teens at times, so at these settings the game is borderline playable. (Drop to Ultra Low if you need higher performance.)
Resolution Scaling with Intel HD Graphics 3000
All of our tests on the previous page were done at 1024x768, but how much of a hit do you really get when you push higher resolutions? Does the gap widen between a discrete GPU and Intel's HD Graphics as you increase resolution?
On the contrary: low-end GPUs run into memory bandwidth limitations just as quickly (if not quicker) than Intel's integrated graphics. Spend about $70 and you'll see a wider gap, but if you pit Intel's HD Graphics 3000 against a Radeon HD 5450 the two actually get closer in performance the higher the resolution is—at least in memory bandwidth bound scenarios:
Call of Duty: Modern Warfare 2 stresses compute a bit more at higher resolutions and thus the performance gap widens rather than closes:
For the most part, at low quality settings, Intel's HD Graphics 3000 scales with resolution similarly to a low-end discrete GPU.
Graphics Quality Scaling
The biggest issue with integrated and any sort of low-end graphics is that you have to run games at absurdly low quality settings to avoid dropping below smooth frame rates. The impact of going to higher quality settings is much greater on Intel's HD Graphics 3000 than on a discrete card as you can see by the chart below.
The performance gap between the two is actually its widest at WoW's "Good" quality settings. Moving beyond that however shrinks the gap a bit as the Radeon HD 5450 runs into memory bandwidth/compute bottlenecks of its own.
Overclocking Intel's HD Graphics
The base clock of both Intel's HD Graphics 2000 and 3000 on desktop SKUs is 850MHz. Thankfully, Intel's 32nm process allows for much headroom in both the CPU and GPU for overclocking. There are no clock locks or K-series parts to worry about when it comes to GPU overclocking; everything is unlocked. I started by trying to see how far I could push the Core i3-2100's HD Graphics 2000.
While I could get into Windows and run games at up to 1.6GHz, I needed to back down to 1.4GHz to maintain stability across all of our tests. That's a 64.7% overclock:
In some cases (Civilization V, WoW, Dawn of War II), the overclocked HD Graphics 2000 was enough to bring the 6 EU part close to the performance of the 3000 model. For the most part however the overclock just helped the Core i3-2100 perform halfway between it and the Core i5-2500K.
I tried the same experiment with the Core i5-2500K. While there's no chance it could catch up to a Radeon HD 5570, I managed to overclock my 2500K to 1.55GHz (the GPU clock can be adjusted in 50MHz increments):
The 82.4% increase in clock speed resulted in anywhere from a 0.6% to 33.7% increase in performance. While that's not terrible, it's also not that great. It looks like we're fairly memory bandwidth constrained here.
To keep the review length manageable we're presenting a subset of our results here. For all benchmark results and even more comparisons be sure to use our performance comparison tool: Bench.
ASUS P7H57DV- EVO (Intel H57)
Intel X25-M SSD (80GB)
Crucial RealSSD C300
Corsair DDR3-1600 2x4GB (9-9-9-24)
Corsair DDR3-1333 4x1GB (7-7-7-20)
Corsair DDR3-1333 2x2GB (7-7-7-20)
Patriot DDR3-1600 2x4GB (9-9-9-24)
eVGA GeForce GTX 280 (Vista 64)
ATI Radeon HD 5870 (Windows 7)
MSI GeForce GTX 580 (Windows 7)
AMD Catalyst 10.12 (Windows 7)
NVIDIA ForceWare 293.09 (Windows 7)
ATI Catalyst 9.12 (Windows 7)
NVIDIA ForceWare 180.43 (Vista64)
NVIDIA ForceWare 178.24 (Vista32)
Windows Vista Ultimate 32-bit (for SYSMark)
Windows Vista Ultimate 64-bit
Windows 7 x64
Special thanks to Corsair for sending an 8GB Vengeance kit for this review:
As well as Patriot for sending an 8GB Viper Xtreme kit:
All of our brand new tests (Civilization V, Visual Studio) use 8GB memory configurations enabled by both Corsair and Patriot.
General Performance: SYSMark 2007
Our journey starts with SYSMark 2007, the only all-encompassing performance suite in our review today. The idea here is simple: one benchmark to indicate the overall performance of your machine. SYSMark 2007 ends up being more of a dual-core benchmark as the applications/workload show minimal use of more than two threads.
The 2600K is our new champion, the $317 chip is faster than Intel's Core i7 980X here as SYSMark 2007 doesn't really do much with the latter's extra 2 cores. Even the 2500K is a hair faster than the 980X. Compared to the Core i5 750, the upgrade is a no brainer - Sandy Bridge is around 20% faster at the same price point as Lynnfield.
Compared to Clarkdale, the Core i3 2100 only manages a 5% advantage howeer.
Adobe Photoshop CS4 Performance
To measure performance under Photoshop CS4 we turn to the Retouch Artists’ Speed Test. The test does basic photo editing; there are a couple of color space conversions, many layer creations, color curve adjustment, image and canvas size adjustment, unsharp mask, and finally a gaussian blur performed on the entire image.
The whole process is timed and thanks to the use of Intel's X25-M SSD as our test bed hard drive, performance is far more predictable than back when we used to test on mechanical disks.
Time is reported in seconds and the lower numbers mean better performance. The test is multithreaded and can hit all four cores in a quad-core machine.
Once again, we have a new king - the 2600K is 9.7% faster than the 980X in our Photoshop CS4 test and the 2500K is just about equal to it. The Core i3 2100 does much better compared to the i3 540, outpacing it by around 30% and nearly equaling the performance of AMD's Phenom II X6 1100T.
Video Encoding Performance
Our DivX test is the same DivX / XMpeg 5.03 test we've run for the past few years now, the 1080p source file is encoded using the unconstrained DivX profile, quality/performance is set balanced at 5 and enhanced multithreading is enabled.
Despite the greatness that is Quick Sync, there are no editing/high quality transcode tools that support Intel's hardware transcode engine. Luckily, Sandy Bridge is still very fast when it comes to software encoding. Our WME test only shows minimal gains thanks to the architectural improvements however.
Graysky's x264 HD test uses x264 to encode a 4Mbps 720p MPEG-2 source. The focus here is on quality rather than speed, thus the benchmark uses a 2-pass encode and reports the average frame rate in each pass.
Other than the Core i7 980X, there's nothing quicker than Sandy Bridge. The Core i7 2600K is 10% faster than the Core i7 975, and the 2500K easily outpaces its Lynnfield rivals. The i3 2100 is quicker than its predecessor, however not by much. In these heavily threaded situations, AMD's Athlon II X4 645 is a better option than the 2100.
3D Rendering Performance
Today's desktop processors are more than fast enough to do professional level 3D rendering at home. To look at performance under 3dsmax we ran the SPECapc 3dsmax 8 benchmark (only the CPU rendering tests) under 3dsmax 9 SP1. The results reported are the rendering composite scores.
At the risk of sounding like a broken record, we have a new champ once more. The 2600K is slightly ahead of the 980X here, while the 2500K matches the performance of the i7 975 without Hyper Threading enabled. You really can't beat the performance Intel is offering here.
The i3 2100 is 11% faster than last year's i3 540, and the same performance as the Athlon II X4 645.
Created by the Cinema 4D folks we have Cinebench, a popular 3D rendering benchmark that gives us both single and multi-threaded 3D rendering results.
Single threaded performance sees a huge improvement with Sandy Bridge. Even the Core i3 2100 is faster than the 980X in this test. Regardless of workload, light or heavy, Sandy Bridge is the chip to get.
POV-Ray is a popular, open-source raytracing application that also doubles as a great tool to measure CPU floating point performance.
I ran the SMP benchmark in beta 23 of POV-Ray 3.73. The numbers reported are the final score in pixels per second.
File Compression/Decompression Performance
Par2 is an application used for reconstructing downloaded archives. It can generate parity data from a given archive and later use it to recover the archive
Chuchusoft took the source code of par2cmdline 0.4 and parallelized it using Intel’s Threading Building Blocks 2.1. The result is a version of par2cmdline that can spawn multiple threads to repair par2 archives. For this test we took a 708MB archive, corrupted nearly 60MB of it, and used the multithreaded par2cmdline to recover it. The scores reported are the repair and recover time in seconds.
Here both the K-series SKUs are faster than the 980X. The Core i3 2100 manages a 13% lead over the Core i3 540.
In all of our compression tests, Sandy Bridge does very well. The 2600K is faster than the 980X in the real world compression tests, while the 7-Zip algorithm benchmark is fully threaded and shows you what would be possible with 6-cores.
Visual Studio 2008: Compiler Performance
You guys asked for it and finally I have something I feel is a good software build test. Using Visual Studio 2008 I'm compiling Chromium. It's a pretty huge project that takes over forty minutes to compile from the command line on the Core i3 2100. But the results are repeatable and the compile process will stress all 12 threads at 100% for almost the entire time on a 980X so it works for me.
I don't have a full set of results here but I'm building up the database. The 2600K manages a 12% lead over the previous generation high end chips, but it can't touch the 980X. The 2500K does well but it is limited by its lack of Hyper Threading. The Phenom II X6 1100T beats it.
Flash Video Creation
Excel Math Performance
There's simply no better gaming CPU on the market today than Sandy Bridge. The Core i5 2500K and 2600K top the charts regardless of game. If you're building a new gaming box, you'll want a SNB in it.
Our Fallout 3 test is a quick FRAPS runthrough near the beginning of the game. We're running with a GeForce GTX 280 at 1680 x 1050 and medium quality defaults. There's no AA/AF enabled.
In testing Left 4 Dead we use a custom recorded timedemo. We run on a GeForce GTX 280 at 1680 x 1050 with all quality options set to high. No AA/AF enabled.
Far Cry 2 ships with several built in benchmarks. For this test we use the Playback (Action) demo at 1680 x 1050 in DX9 mode on a GTX 280. The game is set to medium defaults with performance options set to high.
Crysis Warhead also ships with a number of built in benchmarks. Running on a GTX 280 at 1680 x 1050 we run the ambush timedemo with mainstream quality settings. Physics is set to enthusiast however to further stress the CPU.
Our Dragon Age: Origins benchmark begins with a shift to the Radeon HD 5870. From this point on these games are run under our Bench refresh testbed under Windows 7 x64. Our benchmark here is the same thing we ran in our integrated graphics tests - a quick FRAPS walkthrough inside a castle. The game is run at 1680 x 1050 at high quality and texture options.
We're running Dawn of War II's internal benchmark at high quality defaults. Our GPU of choice is a Radeon HD 5870 running at 1680 x 1050.
Our World of Warcraft benchmark is a manual FRAPS runthrough of a lightly populated server with no other player controlled characters around. The frame rates here are higher than you'd see in a real world scenario, but the relative comparison between CPUs is accurate.
We run on a Radeon HD 5870 at 1680 x 1050. We're using WoW's high quality defaults but with weather intensity turned down all the way.
For Starcraft II we're using our heavy CPU test. This is a playback of a 3v3 match where all players gather in the middle of the map for one large, unit-heavy battle. While GPU plays a role here, we're mostly CPU bound. The Radeon HD 5870 is running at 1024 x 768 at medium quality settings to make this an even more pure CPU benchmark.
This is Civ V's built in Late GameView benchmark, the newest addition to our gaming test suite. The benchmark outputs three scores: a full render score, a no-shadow render score and a no-render score. We present the first and the last, acting as a GPU and CPU benchmark respectively.
We're running at 1680 x 1050 with all quality settings set to high. For this test we're using a brand new testbed with 8GB of memory and a GeForce GTX 580.
Power consumption is very low thanks to core power gating and Intel's 32nm process. Also, when the integrated GPU is not in use it is completely power gated as to not waste any power either. The end result is lower power consumption than virtually any other platform out there under load.
I also measured power at the ATX12V connector to give you an idea of what actual CPU power consumption is like (excluding the motherboard, PSU loss, etc...):
|Processor||Idle||Load (Cinebench R11.5)|
|Intel Core i7 2600K @ 4.4GHz||5W||111W|
|Intel Core i7 2600K (3.4GHz)||5W||86W|
|AMD Phenom II X4 975 BE (3.6GHz)||14W||96W|
|AMD Phenom II X6 1100T (3.3GHz)||20W||109W|
|Intel Core i5 661 (3.33GHz)||4W||33W|
|Intel Core i7 880 (3.06GHz)||3W||106W|
Idle power is a strength of Intel's as the cores are fully power gated when idle resulting in these great single digit power levels. Under load, there's actually not too much difference between an i7 2600K and a 3.6GHz Phenom II (only 10W). There's obviously a big difference in performance however (7.45 vs. 4.23 for the Phenom II in Cinebench R11.5), thus giving Intel better performance per watt. The fact that AMD is able to add two more cores at only a 13W load and 300MHz frequency penalty is pretty impressive as well.
In terms of absolute CPU performance, Sandy Bridge doesn't actually move things forward. This isn't another ultra-high-end CPU launch, but rather a refresh for the performance mainstream and below. As one AnandTech editor put it, you get yesterday's performance at a much lower price point. Lynnfield took away a lot of the reason to buy an X58 system as it delivered most of the performance with much more affordable motherboards; Sandy Bridge all but puts the final nail in X58's coffin. Unless you're running a lot of heavily threaded applications, I would recommend a Core i7-2600K over even a Core i7-980X. While six cores are nice, you're better off pocketing the difference in cost and enjoying nearly the same performance across the board (if not better in many cases).
In all but the heaviest threaded applications, Sandy Bridge is the fastest chip on the block—and you get the performance at a fairly reasonable price. The Core i7-2600K is tempting at $317 but the Core i5-2500K is absolutely a steal at $216. You're getting nearly $999 worth of performance at roughly a quarter of the cost. Compared to a Core i5-750/760, you'll get an additional 10-50% performance across the board in existing applications, and all that from a ~25% increase in clock speed. A big portion of what Sandy Bridge delivers is due to architectural enhancements, the type of thing we've come to expect from an Intel tock. Starting with Conroe, repeating with Nehalem, and going strong once more with Sandy Bridge, Intel makes this all seem so very easy.
Despite all of the nastiness Intel introduced by locking/limiting most of the Sandy Bridge CPUs, if you typically spend around $200 on a new CPU then Sandy Bridge is likely a better overclocker than anything you've ever owned before it. The biggest loser in the overclock locks is the Core i3 which now ships completely locked. Thankfully AMD has taken care of the low-end segments very well over the past couple of years. All Intel is doing by enforcing clock locks for these lower end chips is sending potential customers AMD's way.
The Core i3-2100 is still a step forward, but not nearly as much of one as the 2500K. For the most part you're getting a 5-20% increase in performance (although we did notice some 30-40% gains), but you're giving up overclocking as an option. For multithreaded workloads you're better off with an Athlon II X4 645; however, for lightly threaded work or a general purpose PC the Core i3-2100 is likely faster.
If this were a normal CPU, I'd probably end here, but Sandy Bridge is no normal chip. The on-die GPU and Quick Sync are both noteworthy additions. Back in 2006 I wondered if Intel would be able to stick to its aggressive tick-tock cadence. Today there's no question of whether or not Intel can do it. The question now is whether Intel will be able to sustain a similarly aggressive ramp in GPU performance and feature set. Clarkdale/Arrandale were both nice, but they didn't do much to compete with low-end discrete GPUs. Intel's HD Graphics 3000 makes today's $40-$50 discrete GPUs redundant. The problem there is we've never been happy with $40-$50 discrete GPUs for anything but HTPC use. What I really want to see from Ivy Bridge and beyond is the ability to compete with $70 GPUs. Give us that level of performance and then I'll be happy.
The HD Graphics 2000 is not as impressive. It's generally faster than what we had with Clarkdale, but it's not exactly moving the industry forward. Intel should just do away with the 6 EU version, or at least give more desktop SKUs the 3000 GPU. The lack of DX11 is acceptable for SNB consumers but it's—again—not moving the industry forward. I believe Intel does want to take graphics seriously, but I need to see more going forward.
Game developers need to put forth some effort as well. Intel has clearly tried to fix some of its bad reputation this go around, so simply banning SNB graphics from games isn't helping anyone. Hopefully both sides will put in the requisite testing time to actually improve the situation.
Quick Sync is just awesome. It's simply the best way to get videos onto your smartphone or tablet. Not only do you get most if not all of the quality of a software based transcode, you get performance that's better than what high-end discrete GPUs are able to offer. If you do a lot of video transcoding onto portable devices, Sandy Bridge will be worth the upgrade for Quick Sync alone.
For everyone else, Sandy Bridge is easily a no brainer. Unless you already have a high-end Core i7, this is what you'll want to upgrade to.