While compute functionality could technically be shoehorned into DirectX 10 GPUs such as Sandy Bridge through DirectCompute 4.x, neither Intel nor AMD's DX10 GPUs were really meant for the task, and even NVIDIA's DX10 GPUs paled in comparison to what they've achieved with their DX11 generation GPUs. As a result Ivy Bridge is the first true compute capable GPU from Intel. This marks an interesting step in the evolution of Intel's GPUs, as originally projects such as Larrabee Prime were supposed to help Intel bring together CPU and GPU computing by creating an x86 based GPU. With Larrabee Prime canceled however, that task falls to the latest rendition of Intel's GPU architecture.

With Ivy Bridge Intel will be supporting both DirectCompute 5—which is dictated by DX11—but also the more general compute focused OpenCL 1.1. Intel has backed OpenCL development for some time and currently offers an OpenCL 1.1 runtime for their CPUs, however an OpenCL runtime for Ivy Bridge will not be available at launch. As a result Ivy Bridge is limited to DirectCompute for the time being, which limits just what kind of compute performance testing we can do with Ivy Bridge.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

As we alluded to in our look at Civilization V's performance in game mode, Ivy Bridge ends up being compute limited here. It's well ahead of the even more DirectCompute anemic Radeon HD 5450 here—in spite of the fact that it can't take a lead in game mode—but it's slightly trailing the GT 520, which has a similar amount of compute performance on paper. This largely confirms what we know from the specs for HD 4000: it can pack a punch in pushing pixels, but given a shader heavy scenario it's going to have a great deal of trouble keeping up with Llano and its much greater shader performance.

But with that said, Ivy Bridge is still reaching 55% of Llano's performance here, thanks to AMD's overall lackluster DirectCompute performance on their pre-7000 series GPUs. As a result Ivy Bridge versus Llano isn't nearly as lop-sided as the paper specs tell us; Ivy Bridge won't be able to keep up in most situations, but in DirectCompute it isn't necessarily a goner.

And to prove that point, we have our second compute test: the Fluid Simulation Sample in the DirectX 11 SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

Thanks in large part to its new dedicated L3 graphics cache, Ivy Bridge does exceptionally well here. The framerate of this test is entirely arbitrary, but what isn't is the performance relative to other GPUs; Ivy Bridge is well within the territory of budget-level dGPUs such as the GT 430, Radeon HD 5570, and for the first time is ahead of Llano, taking a lead just shy of 10%.  The fluid simulation sample is a very special case—most compute shaders won't be nearly this heavily reliant on shared memory performance—but it's the perfect showcase for Ivy Bridge's ideal performance scenario. Ultimately this is just as much a story of AMD losing due to poor DirectCompute performance as it is Intel winning due to a speedy L3 cache, but it shows what is possible. The big question now is what OpenCL performance is going to be like, since AMD's OpenCL performance doesn't have the same kind of handicaps as their DirectCompute performance.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage’s pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

It's interesting to note here that as DDR3 clockspeeds have crept up over time, IVB now has as much memory bandwidth as most entry-to-mainstream level video cards, where 128bit DDR3 is equally common. Or on a historical basis, at this point it's half as much bandwidth as powerhouse video cards of yesteryear such as the 256bit GDDR3 based GeForce 8800GT.

Altogether, with 29.6GB/sec of memory bandwidth available to Ivy Bridge with our DDR3-1866 memory, Ivy Bridge ends up being able to push more pxiels than Llano, more pixels than the entry-level dGPUs, and even more pixels the budget-level dGPUs such as GT 440 and Radeon HD 5570 which have just as much dedicated memory bandwidth. Or put in numbers, Ivy Bridge is pushing 42% more pixels than Sandy Bridge and 25% more pixels than the otherwise more powerful Llano. And since pixel fillrates are so memory bandwidth bound Intel's L3 cache is almost certainly once again playing a role here, however it's not clear to what extent that's the case.

Moving on, our second synthetic test is 3DMark Vantage’s texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

After Ivy Bridge's strong pixel fillrate performance, its texture fillrate brings us back down to earth. At this point performance is once again much closer to entry level GPUs, and also well behind Llano. Here we see that Intel's texture performance increases also exactly linearly with the increase in EUs from Sandy Bridge to Ivy Bridge, indicating that those texture units are being put to good use, but at the same time it means Ivy Bridge has a long way to go to catch Llano's texture performance, achieving only 47% of Llano's performance here. The good news for Intel here is that texture size (and thereby texel density) hasn't increased much over the past couple of years in most games, however the bad news is that we're finally starting to see that change as dGPUs get more VRAM.

Our final synthetic test is the set of settings we use with Microsoft’s Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

The results here are actually pretty decent. Compared to what we've seen with shader and texture performance, where Ivy Bridge is largely tied at the hip with the GT 520, at lower tessellation factors Ivy Bridge manages to clearly overcome both the GT 520 and the Radeon HD 5450. Per unit of compute performance, Intel looks to have more tessellation performance than AMD or NVIDIA, which means Intel is setting a pretty good baseline for tessellation performance. Tessellation performance at high tessellation factors does dip however, with Ivy Bridge giving up much of its performance lead over the entry-level dGPUs, but still managing to stay ahead of both of its competitors.

Intel HD 4000 Performance: Civilization V Power Consumption
Comments Locked

173 Comments

View All Comments

  • Alexo - Wednesday, April 25, 2012 - link

    It will be in Canada once Bill C-11 passes in a couple of months.
  • p05esto - Monday, April 23, 2012 - link

    It would be neat to see older CPUs in these benchmarks. It's always a pet peve of mine that these reviews only compare new CPUs against the previous generation and not 2-3 generations back.

    Most people do NOT upgrade with every single CPU release, most people upgrade their rigs every 2-3 years I'm guessing. For example, I'm running a Core i7 930 and it's very fast already, I want to upgrade to Ivy and will either way, but I'd love to see how much faster I can expect the Ivy to compare to the ol 930/920 which tons of people have.

    In my opinion going back a 2-3 generations is the ideal thing that people want to compare to. No one will upgrade from Sandy bridge (unless rich and a little stupid), but a lot of people will upgrade from the original 920 era which is a few years old now.

    Just food for thought.
  • Tchamber - Monday, April 23, 2012 - link

    I agree, I have an X58 CPU too, and there was no SB CPU worth upgrading to.
  • Anand Lal Shimpi - Monday, April 23, 2012 - link

    I agree with you and typically try to do just that, time was an issue this round - I was on the road for much of the past month and had to cut out a number of things I wanted to do for this launch.

    Thankfully, we have bench - with the 3770K included: www.anandtech.com/bench. Feel free to compare away :)

    Take care,
    Anand
  • AmdInside - Monday, April 23, 2012 - link

    Wish you guys would have included BF3 numbers for discrete GPU benchmarks. It is a game that is CPU heavy in multiplayer maps with large amounts of people.
  • fic2 - Monday, April 23, 2012 - link

    "One problem Intel does currently struggle with is game developers specifically targeting Intel graphics and treating the GPU as a lower class citizen."

    Well, as long as Intel treats their igp as the bastard red-headed step child then I am sure that developers will too.

    If they would actually put the HD3000/4000 into the main stream parts developers might pay attention to it. If I was a game developer why would I pay attention to the HD2000/2500 which isn't really capable of playing crap and is the mainstream Intel IGP? If I was a game developer I would know that anyone buying a 'K' series part is also going to be buying a discrete video card also.
  • JarredWalton - Monday, April 23, 2012 - link

    Intel's IGP performance has improved by about 500% since the days of GMA 4500. Is that not enough of an improvement for you? My comparison, Llano is only about 300% faster than the HD 4200 IGP. What's more, Haswell is set to go from 16 EUs in IVB GT2 to 40 EUs in GT3. Along with other architectural tweaks, I expect Haswell's GT3 IGP to be about three times as fast as Ivy Bridge. You'll notice that in the gaming tests, 3X HD 4000 is going to put discrete GPUs in a tough situation.
  • fic2 - Monday, April 23, 2012 - link

    Yes, but the majority of users will not have an HD3000/4000 since they will have an OEM built computer. Conversely, gamers will more than likely have an HD3000/4000 included with the 'K' series. BUT, these same gamers will more than likely also have a discrete video card and never use the HD3000/4000.

    Again, if I was a game developer why would I put resources into optimizing for an igp that gamers aren't going to use?

    I give props to Intel for the huge jump in improvement in the 'K' series igp - it went from really crappy to just sort of crappy.
    If Intel would stop doing the stupid igp segmentation and include the HD3000/HD4000 in ALL of their *Bridge cpus then game developers might see there is a big market there to optimize for. Until Intel stops shooting themselves in the marketing foot then game developers won't pay any attention to their igp. But, based on IB it looks like Haswell will probably do the same brain damaged thing and include the "good" graphics into cpus that less than 10% of the people buy and less than 10% of that 10% don't use a discrete graphics card.

    Oh, and your 500%/300% improvement is pretty crappy since HD 4200 was way faster than GMA 4500 to begin with so in absolute terms the 4200->Llano made a bigger jump than 4500->3000:
    i.e.
    4500 starts out at 2. 500% improvement would put it to 10 for an absolute improvement of 8.
    4200 starts out at 6. 300% improvement would put it at 18 for an absolute improvement of 12.
    So, AMD is still pulling away from Intel on the igp front. And AMD doesn't play igp segmentation game so their whole market has pretty good igp.
  • JarredWalton - Monday, April 23, 2012 - link

    It's an estimate, and it's pretty clear that AMD did not make the bigger jump. They were much faster than GMA 4500, but not the 3x improvement you suggest. In fact, I tested this several years back: http://www.anandtech.com/show/2818/8

    Even if we count the "failed to run" games as a 0 on Intel, AMD's HD 4200 was only 2.4x faster, and if we only look at games where the drivers didn't fail to work, they were more like 2X faster. So here's the detailed history that you're forgetting:

    1) HD 4200 was much faster than GMA 4500 -- call it twice as fast. Intel = 1, AMD = 2.

    2) Arrandale's HD Graphics really closed the gap with HD 4200 (which AMD continued to ship for far too long). Arrandale's "pathetic" HD Graphics were actually just 10% behind HD 4200, give or take. Intel = 1.9, AMD = 2 (http://www.anandtech.com/show/3773)

    3) Sandy Bridge more than doubled IGP performance on average compared to Arrandale, 130% faster by my tests (http://www.anandtech.com/show/4084/5). Meanwhile, AMD finally came out with a new IGP to replace the horribly outdated HD 4200 with Llano (http://www.anandtech.com/show/4444/11). The A8 GPU ended up being on average 50% faster than HD 3000. Intel = 2.5, AMD = 3.8.

    4) Ivy Bridge comes out and improves by 50% on average over HD 3000 (http://www.anandtech.com/show/5772/6). Intel = 3.8, AMD = 3.8

    So by those figures, what we've actually seen is that since GMA 4500MHD and HD 4200, Intel has improved their integrated graphics performance 280% and AMD has improved their performance by around 90%. So my initial estimates were off (badly, apparently). If we bring Trinity into the equation and it gets 50% more performance, then yes AMD is still ahead: Intel 3.8, AMD 5.7. That will give Intel a 280% improvement over three years and AMD a similar 280% improvement.

    Of course, if we look at the CPU side, Intel CPU multithreaded performance (just looking at Cinebench 10 SMP score) has gone up 340% from the Core 2 P8600 to the i7-3720QM. AMD's performance in the same test has gone up 80%. For single-threaded performance, Intel has gone up 115% and AMD has improved about 5-10%. So for all the talk of Intel IGP being bad, at least in terms of relative performance Intel has kept pace or even surpassed AMD. For CPU performance on the other hand, AMD has only improved marginally since the days of Athlon X2.

    Your discussion of the Intel's market segmentation is apparently missing the whole point of running a business. You do it to make a profit. Core i3 exists because not everyone is willing to pay Core i5 prices, and Core i5 exists because even fewer people are willing to pay Core i7 prices. The people that buy Core i3 and are willing to compromise on performance are happy, the people that buy i5 are happy, and the people that buy i7 are happy...and they all give money to Intel.

    If you look at the mobile side of the equation, your arguments become even less meaningful. Intel put HD 3000 into all of the Core i3/i5/i7 mobile parts because that's where IGP performance is the most important. They're doing the exact same thing on the mobile side. People who care about graphics performance on desktops are already going to by a dGPU, but you can't just add a dGPU to a notebook if you want more performance.

    And finally, "AMD doesn't play IGP segmentation" is just completely false. Take off your blinders. A8 APUs have 400 cores clocked at 444MHz. A6 APUs have 320 cores clocked at 400MHz, and A4 APUs have 240 cores clocked at 444MHz. AMD is every bit as bad as Intel when it comes to market segmentation by IGP performance!
  • fic2 - Monday, April 23, 2012 - link

    I guess you are correct about AMD - I haven't really paid much attention to them since, as you said, they can't keep up on the cpu side.

    But, TH lists the 6410 (A4 igp) as being 3 levels above the HD3000 in their Graphics Hierarchy Chart. They also have the HD2000 2 levels below the HD3000. So, Intel's mainstream igp is 5 levels below AMDs lowest igp.

    That is why game developers treat Intel's igp as a lower class citizen.

    The quote that I was addressing (as stated in my first post) is:
    "One problem Intel does currently struggle with is game developers specifically targeting Intel graphics and treating the GPU as a lower class citizen."

    The article acts like it is a total mystery why game developers don't give the Intel igp any respect. As I have repeatedly said in my comments - until Intel starts putting the HD3000/HD4000 into their mainstream parts and not just the 'K' series game developers know that Intel igp is a lower class citizen. And, yes, I know that you can get a xxx5 variant w/HD3000 if you look around enough, but I doubt any OEM is using them and they didn't appear until 6+ months after the launch. It is just easier to slap a 5-6 year old discrete video card into a computer.
    Game developers can't target the HD3000/HD4000 since those are the minority for SB/IB chips. They would have to target the HD2000/HD2500. Since they don't the conclusion is that it isn't worth putting the resources into such a low end graphics solution.

Log in

Don't have an account? Sign up now