While compute functionality could technically be shoehorned into DirectX 10 GPUs such as Sandy Bridge through DirectCompute 4.x, neither Intel nor AMD's DX10 GPUs were really meant for the task, and even NVIDIA's DX10 GPUs paled in comparison to what they've achieved with their DX11 generation GPUs. As a result Ivy Bridge is the first true compute capable GPU from Intel. This marks an interesting step in the evolution of Intel's GPUs, as originally projects such as Larrabee Prime were supposed to help Intel bring together CPU and GPU computing by creating an x86 based GPU. With Larrabee Prime canceled however, that task falls to the latest rendition of Intel's GPU architecture.

With Ivy Bridge Intel will be supporting both DirectCompute 5—which is dictated by DX11—but also the more general compute focused OpenCL 1.1. Intel has backed OpenCL development for some time and currently offers an OpenCL 1.1 runtime for their CPUs, however an OpenCL runtime for Ivy Bridge will not be available at launch. As a result Ivy Bridge is limited to DirectCompute for the time being, which limits just what kind of compute performance testing we can do with Ivy Bridge.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

As we alluded to in our look at Civilization V's performance in game mode, Ivy Bridge ends up being compute limited here. It's well ahead of the even more DirectCompute anemic Radeon HD 5450 here—in spite of the fact that it can't take a lead in game mode—but it's slightly trailing the GT 520, which has a similar amount of compute performance on paper. This largely confirms what we know from the specs for HD 4000: it can pack a punch in pushing pixels, but given a shader heavy scenario it's going to have a great deal of trouble keeping up with Llano and its much greater shader performance.

But with that said, Ivy Bridge is still reaching 55% of Llano's performance here, thanks to AMD's overall lackluster DirectCompute performance on their pre-7000 series GPUs. As a result Ivy Bridge versus Llano isn't nearly as lop-sided as the paper specs tell us; Ivy Bridge won't be able to keep up in most situations, but in DirectCompute it isn't necessarily a goner.

And to prove that point, we have our second compute test: the Fluid Simulation Sample in the DirectX 11 SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

Thanks in large part to its new dedicated L3 graphics cache, Ivy Bridge does exceptionally well here. The framerate of this test is entirely arbitrary, but what isn't is the performance relative to other GPUs; Ivy Bridge is well within the territory of budget-level dGPUs such as the GT 430, Radeon HD 5570, and for the first time is ahead of Llano, taking a lead just shy of 10%.  The fluid simulation sample is a very special case—most compute shaders won't be nearly this heavily reliant on shared memory performance—but it's the perfect showcase for Ivy Bridge's ideal performance scenario. Ultimately this is just as much a story of AMD losing due to poor DirectCompute performance as it is Intel winning due to a speedy L3 cache, but it shows what is possible. The big question now is what OpenCL performance is going to be like, since AMD's OpenCL performance doesn't have the same kind of handicaps as their DirectCompute performance.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage’s pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

It's interesting to note here that as DDR3 clockspeeds have crept up over time, IVB now has as much memory bandwidth as most entry-to-mainstream level video cards, where 128bit DDR3 is equally common. Or on a historical basis, at this point it's half as much bandwidth as powerhouse video cards of yesteryear such as the 256bit GDDR3 based GeForce 8800GT.

Altogether, with 29.6GB/sec of memory bandwidth available to Ivy Bridge with our DDR3-1866 memory, Ivy Bridge ends up being able to push more pxiels than Llano, more pixels than the entry-level dGPUs, and even more pixels the budget-level dGPUs such as GT 440 and Radeon HD 5570 which have just as much dedicated memory bandwidth. Or put in numbers, Ivy Bridge is pushing 42% more pixels than Sandy Bridge and 25% more pixels than the otherwise more powerful Llano. And since pixel fillrates are so memory bandwidth bound Intel's L3 cache is almost certainly once again playing a role here, however it's not clear to what extent that's the case.

Moving on, our second synthetic test is 3DMark Vantage’s texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

After Ivy Bridge's strong pixel fillrate performance, its texture fillrate brings us back down to earth. At this point performance is once again much closer to entry level GPUs, and also well behind Llano. Here we see that Intel's texture performance increases also exactly linearly with the increase in EUs from Sandy Bridge to Ivy Bridge, indicating that those texture units are being put to good use, but at the same time it means Ivy Bridge has a long way to go to catch Llano's texture performance, achieving only 47% of Llano's performance here. The good news for Intel here is that texture size (and thereby texel density) hasn't increased much over the past couple of years in most games, however the bad news is that we're finally starting to see that change as dGPUs get more VRAM.

Our final synthetic test is the set of settings we use with Microsoft’s Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

The results here are actually pretty decent. Compared to what we've seen with shader and texture performance, where Ivy Bridge is largely tied at the hip with the GT 520, at lower tessellation factors Ivy Bridge manages to clearly overcome both the GT 520 and the Radeon HD 5450. Per unit of compute performance, Intel looks to have more tessellation performance than AMD or NVIDIA, which means Intel is setting a pretty good baseline for tessellation performance. Tessellation performance at high tessellation factors does dip however, with Ivy Bridge giving up much of its performance lead over the entry-level dGPUs, but still managing to stay ahead of both of its competitors.

Intel HD 4000 Performance: Civilization V Power Consumption
Comments Locked

173 Comments

View All Comments

  • JarredWalton - Tuesday, April 24, 2012 - link

    I don't think it's a mystery. It's straight fact: "One problem Intel does currently struggle with is game developers specifically targeting Intel graphics and treating the GPU as a lower class citizen."

    It IS a problem, and it's one INTEL has to deal with. They need more advocates with game developers, they need to make better drivers, and they need to make faster hardware. We know exactly why this has happened: Intel IGP failed to run for so long that a lot of developers gave up and just blacklisted Intel. Now, Intel is actually capable of running most games, and so long as they aren't explicitly blacklisted things should be okay.

    In truth, the only title I can think of from recent history where Intel could theoretically work but was blacklisted by the game developer is Fallout 3. Even today, if you want to run FO3 on Intel IGP (HD 2000/3000/4000), you need to download a hacked DLL that will identify your Intel GPU as an NVIDIA GT 9800 or something.

    And really, there's no need to blacklist by game developers, because you can't predict the future. FO3 is the perfect example: it runs okay on HD 3000 and plenty fast on HD 4000, but the shortsighted developers locked out Intel for all time. It's better to pop up a warning like some games do: "Warning: we don't recognize your driver and the game may not run properly." Blacklisting is almost more of a political statement IMO.
  • craziplaya21 - Monday, April 23, 2012 - link

    I might be blind or something but did you guys not do a comparison between an original bluray IQ vs an encoded 1080p IQ by quicksync??
  • toyotabedzrock - Monday, April 23, 2012 - link

    Why is Intel disabling this on the K parts? And why disable vPro?
  • jwcalla - Monday, April 23, 2012 - link

    First, a diversion: "I was able to transcode a complete 130 minute 1080p video to an iPad friendly format..." Just kill me. Somebody please. Why do consumers put up with this crap? Even my ancient Galaxy S has better media playback support.

    It's the same story with my HP TouchPad: MP4 container or GTFO. Who can stand to re-encode their media libraries or has the patience to deal with DLNA slingers when the hardware is perfectly capable of curb-stomping any container / codec you could even conceive? Just get an Android tablet if this is the crap they force on you. Or, in the TouchPad case, wipe it and install ICS.

    As for the article... did I totally misunderstand the page about power consumption? I got the impression that idle power is relatively unchanged. I must be misreading that. Or maybe the lower-end chips will show a stark improvement. Otherwise I totally miss the point of IVB.

    I'm beginning to lose confidence in Intel, at least in terms of innovation. These tick-tock improvements are basically minor pushes in the same boring direction. From an enthusiasts' perspective, the stuff going into ARM SoCs is so much more interesting. Intel makes great high-end CPUs but it seems that these are becoming less important when looking at the consumer market as a whole.
  • Anand Lal Shimpi - Monday, April 23, 2012 - link

    Idle power didn't really go down because at idle nearly everything is power gated to begin with. Any improvements in leakage current don't help if the transistors aren't leaking to begin with :)

    Your ARM sentiments are spot on for a huge portion of the market however. Let's see what Haswell brings...

    Take care,
    Anand
  • thomas-hrb - Monday, April 23, 2012 - link

    I disagree with the testing methodology for the World of Warcraft test. Firstly no gamer of any game buys hardware so they can go to the most isolated areas in a game. Also the percentage of who can pay for one of these CPU's who would be playing at 1650x1050, would be pretty small.

    I've been playing WoW for a number of years and I don't care about 60fps+ because my monitor won't display it anyway. I care about minimum fps and average fps. nVidia's new adaptive vsync is a great innovation, but I am sure there are other tests that while not as controlled and repeatable is a much indicative of real world performance (the actual reason behind purchasing decisions).

    One possible testing methodology you could look into is to take a character into one of the topend 25man raids. There are 10 classes in WoW and my experience is that a 25man raid will show up every single possible spell/ability and effect that the game has to offer in fairly repeatable patterns.

    I agree that it is not the most scientific approach but I put more stock in a friend saying "go buy this cpu/gpu you can do all the raids and video capture and you get no lag" than you telling me that this cpu will give me 100+ fps in the middle of nowhere. There is a fine line between efficient and effective. I am just hoping that you can dial down the efficiency and come up with a testing methodology that actually produces a metric I can use in my purchasing decisions. After all that is one of the core reasons most people read reviews at all.
  • redisnidma - Monday, April 23, 2012 - link

    Expect Anand's Trinity review to be heavily biased with lots of AMD bashing.
    This site is so predictable...
  • Nfarce - Monday, April 23, 2012 - link

    Oh boy. Another delusional red label fangirl. Maybe when AMD gets their s**t together Anandtech will have something positive to review in comparison to the Intel offerings at the moment. Bulldozer bulldozed right off a cliff. And don't get me wrong: I WANT AMD to whip out some butt-kicking CPUs to keep the competition strong. But right now, Intel is not getting complacent and keep stepping their game up when the competition isn't even on the same playing court. But that's just for now. If AMD continues to falter, Intel may not be as motivated to stay ahead and spend so much R&D in the future. After all, why put the latest F1 car on the track when the competition can only bring a NASCAR car to every track?
  • Reikon - Monday, April 23, 2012 - link

    Temperature is in the overclocking article.

    http://www.anandtech.com/show/5763/undervolting-an...
  • rickthestik - Monday, April 23, 2012 - link

    An upgrade for me makes sense as my current cpu is an Intel Core 2 Quad and the new i7-3770K will be a pretty significant upgrade...2.34Ghz to 3.5Ghz and the heaps of additonal tech to go with it.
    I could see a fair number of Sandy Bridge owners holding off for Haswell, though for me this jump is pretty big and I'm looking forward to seeing what the i7-3770K can do with the Z77 motherboards and a shiny new PCI 3.0 GPU.

Log in

Don't have an account? Sign up now