HD 2500: Compute & Synthetics

While compute functionality could technically be shoehorned into DirectX 10 GPUs such as Sandy Bridge through DirectCompute 4.x, neither Intel nor AMD's DX10 GPUs were really meant for the task, and even NVIDIA's DX10 GPUs paled in comparison to what they've achieved with their DX11 generation GPUs. As a result Ivy Bridge is the first true compute capable GPU from Intel. This marks an interesting step in the evolution of Intel's GPUs, as originally projects such as Larrabee Prime were supposed to help Intel bring together CPU and GPU computing by creating an x86 based GPU. With Larrabee Prime canceled however, that task falls to the latest rendition of Intel's GPU architecture.

With Ivy Bridge Intel will be supporting both DirectCompute 5—which is dictated by DX11—but also the more general compute focused OpenCL 1.1. Intel has backed OpenCL development for some time and currently offers an OpenCL 1.1 runtime that runs across multiple generations of CPUs, and now Ivy Bridge GPUs.

Our first compute benchmark comes from Civilization V, which uses DirectCompute 5 to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes. And while games that use GPU compute functionality for texture decompression are still rare, it's becoming increasingly common as it's a practical way to pack textures in the most suitable manner for shipping rather than being limited to DX texture compression.

Compute: Civilization V

These compute results are mostly academic as I don't expect anyone to really rely on the HD 2500 for a lot of GPU compute work. With under 40% of the EUs of the HD 4000, we get under 30% of the performance from the HD 2500.

We have our second compute test: the Fluid Simulation Sample in the DirectX 11 SDK. This program simulates the motion and interactions of a 16k particle fluid using a compute shader, with a choice of several different algorithms. In this case we’re using an (O)n^2 nearest neighbor method that is optimized by using shared memory to cache data.

DirectX11 Compute Shader Fluid Simulation - Nearest Neighbor

Thanks to its large shared L3 cache, Intel's HD 4000 did exceptionally well here. Thanks to its significantly fewer EUs, Intel's HD 2500 does much worse by comparison.

Our last compute test and first OpenCL benchmark, SmallLuxGPU, is the GPU ray tracing branch of the open source LuxRender renderer. We’re now using a development build from the version 2.0 branch, and we’ve moved on to a more complex scene that hopefully will provide a greater challenge to our GPUs.

 

SmallLuxGPU 2.0d4

Intel's HD 4000 does well here for processor graphics, delivering over 70% of the performance of NVIDIA's GeForce GTX 285. The HD 2500 takes a big step backwards though, with less than half the performance of the HD 4000.

Synthetic Performance

Moving on, we'll take a few moments to look at synthetic performance. Synthetic performance is a poor tool to rank GPUs—what really matters is the games—but by breaking down workloads into discrete tasks it can sometimes tell us things that we don't see in games.

Our first synthetic test is 3DMark Vantage’s pixel fill test. Typically this test is memory bandwidth bound as the nature of the test has the ROPs pushing as many pixels as possible with as little overhead as possible, which in turn shifts the bottleneck to memory bandwidth so long as there's enough ROP throughput in the first place.

3DMark Vantage Pixel Fill

It's interesting to note here that as DDR3 clockspeeds have crept up over time, IVB now has as much memory bandwidth as most entry-to-mainstream level video cards, where 128bit DDR3 is equally common. Or on a historical basis, at this point it's half as much bandwidth as powerhouse video cards of yesteryear such as the 256bit GDDR3 based GeForce 8800GT.

Moving on, our second synthetic test is 3DMark Vantage’s texture fill test, which provides a simple FP16 texture throughput test. FP16 textures are still fairly rare, but it's a good look at worst case scenario texturing performance.

3DMark Vantage Texture Fill

Our final synthetic test is the set of settings we use with Microsoft’s Detail Tessellation sample program out of the DX11 SDK. Since IVB is the first Intel iGPU with tessellation capabilities, it will be interesting to see how well IVB does here, as IVB is going to be the de facto baseline for DX11+ games in the future. Ideally we want to have enough tessellation performance here so that tessellation can be used on a global level, allowing developers to efficiently simulate their worlds with fewer polygons while still using many polygons on the final render.

DirectX11 Detail Tessellation Sample - Normal

DirectX11 Detail Tessellation Sample - Max

The results here are as expected. With far fewer EUs, the HD 2500 falls behind even some of the cheapest discrete GPUs.

GPU Power Consumption

As you'd expect, power consumption with the HD 2500 is tangibly lower than HD 4000 equipped parts:

GPU Power Consumption Comparison under Load (Metro 2033)
  Intel HD 2500 (i5-3470) Intel HD 4000 (i7-3770K)
Intel DZ77GA-70K 76.2W 98.9W

Running our Metro 2033 test, the HD 4000 based Core i7 drew nearly 30% more power at the wall compared to the HD 2500.

Intel HD 2500 Performance General Performance
Comments Locked

67 Comments

View All Comments

  • BSMonitor - Thursday, May 31, 2012 - link

    f they truly were interested in building the best APU. And by that, a knockout iGPU experience.

    Where are the dual-core Core i7's with 30-40 EU's??
    Or the AMD <whatevers> (not sure anymore what they call their APU) Phenom X2 CPU with 1200 Shaders??

    When we are talking about a truly GPU intense application, a LOT of times single/dual core CPU is enough. Heck, if you were to take a dual-core Core 2, and stick it with a GeForce 670 or Radeon 7950.. You would see very similar numbers in terms of gaming performance to what's in the BENCH charts. ESP at the 1920x1080 and below.

    Surely Intel can afford another die that aims a ton of transistors at just the GPU side of things. AMD, maybe. Why do we get from BOTH, their top end iGPU stuck with the most transistors dedicated to the CPU??

    I find it hard to believe anyone shopping for an APU is hoping for amazing CPU performance to go with their average iGPU performance. That market would be the opposite. Sacrifice a few threads on the CPU side for amazing iGPU.

    Am I missing something technically limiting?? Is that many GPU units overkill in terms of power/heat dissipation of the CPU socket??
  • tipoo - Thursday, May 31, 2012 - link

    Well, their chips have to work in a certain set of thermal limits. Maybe at this point 1200 shader cores would not be possible on the same die as a quad core CPU for power consumption and heat reasons. I think Haswel will have 64 EUs though if the rumours are true.
  • Roland00Address - Thursday, May 31, 2012 - link

    There is no point of a 1200 shaders apu due to memory bandwidth. You couldn't feed a beast of an apu with only dual channel 1600 mhz memory when that same memory limits the performance of llano and trinity compared to their gpu cousins which have the same calculation units and core clocks but the gpus perform significantly better.
  • silverblue - Thursday, May 31, 2012 - link

    Possibly, but at the moment, bandwidth is a surefire performance killer.
  • BSMonitor - Thursday, May 31, 2012 - link

    Good Points. But currently, Intel has Quad Channel DDR3-1600 up on the Socket 2011. I am sure AMD could get more bandwidth there too, if they step up the memory controller.

    My overall point, is that neither is even trying for a low-medium transistor CPU and a high transistor GPU.

    It's either Low-Medium CPU with Low-Medium GPU (disabled cores and what have you), or High End CPU with "High End" GPU.

    There is no attempt at giving up CPU die space for more GPU transistors from either.. None. If you someone spends $$ on the High End of the CPU (Quad Core i7), the implementation of iGPU is not even close to worth using for that much CPU.
  • Roland00Address - Thursday, May 31, 2012 - link

    Quad Channel is not a "free upgrade" it requires much more traces on the motherboard as well as more pins on the cpu socket. This dramtically increases costs for the motherboard and the cpu. Both of those are going against what AMD is trying to do with their APUs which will be both laptop as well as desktop chips. They are trying to increase their margins on their chips not decrease them.

    You have a large number OEMs only putting a single 4gb ddr3 stick in laptops and desktops (thus not achieving dual channel) in the current apus. You want think those same vendors are suddenly going to put 16gbs of memory on an apu (and it is going to be 16gbs since 2gb ddr3 sticks are being phased out via the memory manufactures.)
  • tipoo - Thursday, May 31, 2012 - link

    I'm curious why the HD4000 outperforms something like the 5450 by nearly double in Skyrim, yet falls behind in something like Portal or Civ, or even Minecraft? Is it immature drivers or something in the architecture itself?
  • ShieTar - Thursday, May 31, 2012 - link

    For Minecraft, read the article and what it has to say about OpenGL.

    For Portal or Civ, it might very well be related to Memory Bandwidth. The HD2500 can have 25.6 GB/s (with DDR3-1600), or even more. The 5450 generally comes with half as much (12.8 GB/s), or even a quarter of it since there are also 5450s with DDR2.

    As a matter of fact, I remember reading several reports on how much the Llano-Graphics would improve with faster Memory, even beyond DDR3-1600. I havn't seen any tests on the impact of memory speed from Ivy Bridge or Trinity yet, but that would be interesting given their increased computing powers.
  • silverblue - Thursday, May 31, 2012 - link

    I'm sure it'll matter for both, more so for Trinity. I'm not sure we'll see much in the way of a comparison until the desktop Trinity appears, but for IB, I'm certainly waiting.
  • tipoo - Thursday, May 31, 2012 - link

    Having half the memory bandwidth would lead to the reverse expectation, the 5450 is close to or even surpasses the HD4000 with twice the bandwidth in those games, yet the 4000 beats it by almost double in games like Skyrim, even the 2500 beats it there.

Log in

Don't have an account? Sign up now