While compute functionality could technically be shoehorned into DirectX 10 GPUs such as Sandy Bridge through DirectCompute 4.x, neither Intel nor AMD's DX10 GPUs were really meant for the task, and even NVIDIA's DX10 GPUs paled in comparison to what they've achieved with their DX11 generation GPUs. As a result Ivy Bridge is the first true compute capable GPU from Intel. This marks an interesting step in the evolution of Intel's GPUs, as originally projects such as Larrabee Prime were supposed to help Intel bring together CPU and GPU computing by creating an x86 based GPU. With Larrabee Prime canceled however, that task falls to the latest rendition of Intel's GPU architecture.

With Ivy Bridge Intel will be supporting both DirectCompute 5—which is dictated by DX11—but also the more general compute focused OpenCL 1.1. Intel has backed OpenCL development for some time and currently offers an OpenCL 1.1 runtime for their CPUs, however an OpenCL runtime for Ivy Bridge will not be available at launch. As a result Ivy Bridge is limited to DirectCompute for the time being, which limits just what kind of compute performance testing we can do with Ivy Bridge.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

As we alluded to in our look at Civilization V's performance in game mode, Ivy Bridge ends up being compute limited here. It's well ahead of the even more DirectCompute anemic Radeon HD 5450 here—in spite of the fact that it can't take a lead in game mode—but it's slightly trailing the GT 520, which has a similar amount of compute performance on paper. This largely confirms what we know from the specs for HD 4000: it can pack a punch in pushing pixels, but given a shader heavy scenario it's going to have a great deal of trouble keeping up with Llano and its much greater shader performance.

But with that said, Ivy Bridge is still reaching 55% of Llano's performance here, thanks to AMD's overall lackluster DirectCompute performance on their pre-7000 series GPUs. As a result Ivy Bridge versus Llano isn't nearly as lop-sided as the paper specs tell us; Ivy Bridge won't be able to keep up in most situations, but in DirectCompute it isn't necessarily a goner.

And to prove that point, we have our second compute test: the Fluid Simulation Sample in the DirectX 11 SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

Thanks in large part to its new dedicated L3 graphics cache, Ivy Bridge does exceptionally well here. The framerate of this test is entirely arbitrary, but what isn't is the performance relative to other GPUs; Ivy Bridge is well within the territory of budget-level dGPUs such as the GT 430, Radeon HD 5570, and for the first time is ahead of Llano, taking a lead just shy of 10%.  The fluid simulation sample is a very special case—most compute shaders won't be nearly this heavily reliant on shared memory performance—but it's the perfect showcase for Ivy Bridge's ideal performance scenario. Ultimately this is just as much a story of AMD losing due to poor DirectCompute performance as it is Intel winning due to a speedy L3 cache, but it shows what is possible. The big question now is what OpenCL performance is going to be like, since AMD's OpenCL performance doesn't have the same kind of handicaps as their DirectCompute performance.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage’s pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

It's interesting to note here that as DDR3 clockspeeds have crept up over time, IVB now has as much memory bandwidth as most entry-to-mainstream level video cards, where 128bit DDR3 is equally common. Or on a historical basis, at this point it's half as much bandwidth as powerhouse video cards of yesteryear such as the 256bit GDDR3 based GeForce 8800GT.

Altogether, with 29.6GB/sec of memory bandwidth available to Ivy Bridge with our DDR3-1866 memory, Ivy Bridge ends up being able to push more pxiels than Llano, more pixels than the entry-level dGPUs, and even more pixels the budget-level dGPUs such as GT 440 and Radeon HD 5570 which have just as much dedicated memory bandwidth. Or put in numbers, Ivy Bridge is pushing 42% more pixels than Sandy Bridge and 25% more pixels than the otherwise more powerful Llano. And since pixel fillrates are so memory bandwidth bound Intel's L3 cache is almost certainly once again playing a role here, however it's not clear to what extent that's the case.

Moving on, our second synthetic test is 3DMark Vantage’s texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

After Ivy Bridge's strong pixel fillrate performance, its texture fillrate brings us back down to earth. At this point performance is once again much closer to entry level GPUs, and also well behind Llano. Here we see that Intel's texture performance increases also exactly linearly with the increase in EUs from Sandy Bridge to Ivy Bridge, indicating that those texture units are being put to good use, but at the same time it means Ivy Bridge has a long way to go to catch Llano's texture performance, achieving only 47% of Llano's performance here. The good news for Intel here is that texture size (and thereby texel density) hasn't increased much over the past couple of years in most games, however the bad news is that we're finally starting to see that change as dGPUs get more VRAM.

Our final synthetic test is the set of settings we use with Microsoft’s Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

The results here are actually pretty decent. Compared to what we've seen with shader and texture performance, where Ivy Bridge is largely tied at the hip with the GT 520, at lower tessellation factors Ivy Bridge manages to clearly overcome both the GT 520 and the Radeon HD 5450. Per unit of compute performance, Intel looks to have more tessellation performance than AMD or NVIDIA, which means Intel is setting a pretty good baseline for tessellation performance. Tessellation performance at high tessellation factors does dip however, with Ivy Bridge giving up much of its performance lead over the entry-level dGPUs, but still managing to stay ahead of both of its competitors.

Intel HD 4000 Performance: Civilization V Power Consumption
Comments Locked

173 Comments

View All Comments

  • frozentundra123456 - Monday, April 23, 2012 - link

    According to the Asus review just out by Anand, the Intel HD4000 and AMD HD6620 are essentially even in the mobile space, where it really matters. I dont know where you are getting the "soundly trounces" description, unless you are talking about the desktop. I dont really care about integrated graphics on the desktop, it is just too easy to add a discrete card that soundly trounces either Intel or AMD integrated. I have no doubt that AMD will regain the lead in the mobile space when Trinity comes out. I just question that they will make the kind of improvements that are being speculated about.

    I also find it ironic that so many people are criticizing IVB for lack of cpu improvement while in the same breath saying bulldozer is OK because it is "good enough" already.
  • DanNeely - Monday, April 23, 2012 - link

    Primarily Einstein@Home.
  • fastman696 - Monday, April 23, 2012 - link

    Thanks for the review, but this is new Tech, why use old Tech chipset?
  • JarredWalton - Monday, April 23, 2012 - link

    You're being deliberately obtuse in order to set up a straw man.

    Me: "As I note in the mobile IVB article, mobile Llano GPU performance isn't nearly as impressive relative to IVB as on the desktop."

    You: "The mobile variant of the part that launched last year isn't as dominant over the part that just launched today as the desktop variant is?"

    In other words, you want us to compare to a product that's not out because the current product doesn't look good. I mention Trinity already, but you act as though I miss it. Then you throw out stuff like, "Thanks for resorting to namecalling" when you've already been insulting with your comments since the get go. "Sad to see this kind of crap coming from Anandtech." "I guess Anandtech's standards have drastically lowered." Put another way, you're already calling me an idiot but doing it indirectly. But let's continue....

    How much faster can you do Flash video when it's already accelerated and working properly in Sandy Bridge? Web browsers are basically in the same boat, unless you can name major web sites that a lot of users visit where HD 3000/4000 is significantly worse than the competition.

    Does Photoshop benefit from GPUs? Sure, and lots of people use that, including me, but the same people that use Photoshop are also the people who need more than Llano CPU performance, and more than HD 4000 or Llano or Trinity GPU performance. I'm running Bloomfield with a GTX 580, which is more than 95% of users out there. Most serious Photoshop users that I know use quad-core Intel with some form of NVIDIA graphics for a reason. But even running on straight Sandy Bridge with HD 3000, Photoshop runs faster than on Llano with HD 6620G.

    Vegas, naturally, is in the same category as video transcoding. I suppose I could have said "video editing/transcoding" just to be broader. There are tons of people that don't do video editing/transcoding. Even for those that do, NVIDIA GPUs are doing far better than AMD GPUs, and NVIDIA + Intel CPU is still the platform to beat. If you want quality, though, encoding is still done in software running on the CPU; Premiere for instance really just leverages the GPU to help with the "quick preview" videos, not for final rendering (unless something has changed since the last time I played with it).

    So let's try again: what exactly are the areas where Intel's Ivy Bridge and HD 4000 fall short, where AMD's Llano (or the upcoming Trinity) are going to be substantially better? All without adding a discrete GPU. Llano is equal to HD 4000 for gaming, and seriously behind on the CPU department. There are still areas where AMD's drivers are much better than Intel's drivers, and there are certain tasks (shader and geometry) where AMD is better. Really, though, the only area where Intel doesn't compete is in strictly budget laptops.
  • chizow - Monday, April 23, 2012 - link

    Yes I have heard of a "tick", and IVB has manifested itself as a tick+ as indicated in the article which means we are basically on the 3rd generation of the same architecture introduced with Nehalem in late 2008 with some minor bumps in clockspeed/Turbo modes and overclocking headroom.

    Both Conroe and Nehalem were pretty huge jumps in performance only 2.5 years apart on one of Intel's Tick Tock cadence cycles and since then, nothing remotely as interesting.

    Maybe you should be asking yourself why you aren't expecting bigger performance gains? Or maybe you're still reveling and ogling over Tahiti's terrible price:performance gains in the GPU space? :D
  • JarredWalton - Monday, April 23, 2012 - link

    Yes, because that extra 10W TDP makes all the difference, doesn't it? 45W Llano parts aren't shipped in very many laptops because the OEMs aren't interested. Just look at Newegg as an example:
    http://www.newegg.com/Product/ProductList.aspx?Sub...

    There is one current A8 APU faster than the A8-3520M for sale at Newegg, and it has an A8-3510MX. AMD's own list isn't much better (http://shop.amd.com/us/All/Search?NamedQuery=visio... there's one more notebook there with an A8-3530MX. So that's why we looked at A8-3520M, but if I had an MX chip I would certainly run the same tests -- no one has been willing to send us such a laptop, unfortunately.

    But even if we got an MX chip, their GPUs are still clocked the same as the A8-3500M/A8-3520M. We might be CPU limited in a couple games, but while there are Llano parts with 20% higher CPU clocks, that just means Intel is "only" ahead by 60-70% instead of 100% faster on CPU performance.
  • Joepublic2 - Monday, April 23, 2012 - link

    Because stock temperatures are irrelevant (much like your posting) to the end user as long as the chip isn't throttling.
  • samal90 - Monday, April 23, 2012 - link

    you people over-analyzed my comment. All I wanted to say is that they are bragging about HD 4000 when it doesn't come close to the current competition.
    Couple of years down the road, people won't want dedicated graphics cards in their laptops anymore..its too bulky and consumes too much power. We will all have integrated GPUs. the AMD APU is the way to go. To be honest, CPU power is already way more than enough for a lot of things most people use their laptops for (browsing the web, writing documents, play web-based games a.k.a. angry birds on chrome). The extra GPU is for people that either want to do some graphics processing or play some more graphics intensive games. So yes, it is important for the future to have a good and strong integrated GPU and a good CPU. Therefore, I think AMD will win this round. I hope they continue to compete at each other's throats so we see better and cheaper products from both sides.
    So as I understand it right now: Go for AMD if you want better GPU, go for Intel if CPU is more important for you. Trinity might narrow the CPU gap however and greatly increase the GPU one. Only time will tell.
  • chaos215bar2 - Tuesday, April 24, 2012 - link

    "Ivy Bridge is hotter, so if you're paying for the AC, it should be a negative impact."

    Where do you think the dissipated power is going? TDP and overall thermal output are roughly equivalent.

    IVB may get hotter, but without measuring TDP overclocked and under load, that could easily be because the die is smaller and doesn't dissipate heat quite as well.
  • DanNeely - Tuesday, April 24, 2012 - link

    "I don't understand this. We're talking about power consumption, not TDP. Heat-wise, Ivy Bridge is hotter, so if you're paying for the AC, it should be a negative impact."

    Power consumption is TDP. 100W of power is 100joules/second of heat to be disipated; it doesn't matter if the heat's coming off a large warm die, or a small hot one. 100W is 100W.

    My current i7-9xx boxes are 130W chips; so just looking at TDP somewhere between 60 and 90W less power at stock (~50 just from the CPU TDP, the higher number the chipset's a theoretical 18 more, probably a lot less in practice, and then whatever cut of the IB's TDP is for the GPU). Probably a wider gap when OCed, but I don't have any stock vs OC power numbers to look at. With AC costs added, cost savings would probably be between $100 and $200/year per box.

    Up front costs would be ~$400-550 for CPU + mobo pairs depending on how high up the feature chain I went; probably fairly high for my main box and more bang for the buck on the 2nd.

    Looking on ebay for successful auctions it looks like I could get ~$250 for my existing cpu/mobo pairs less whatever ebay's fee is. The very rough guess would be a 2 yearish payback time which is somewhat better that I thought (closer to 3 years).

    Not sure I'll do it since I have a few other PC related purchases on the wishlist too: replacing my creaky Core One Duo laptop with a light/medium gaming model or swapping out my netbook for a new ultra portable after Win8 launches might give better returns for my dollar. The latter's battery isn't really lasting as long as I'd like any more. Also, my WHSv1 box is scheduled for retirement this winter.

    I am going to have to give it some serious thought though. Part of me still wants to wait for Haswell even though preliminary indications are that it won't be a huge step up; the much bigger GPU and remaining at dual channel memory makes a mainstream hex core part unlikely.

Log in

Don't have an account? Sign up now