GPU Limited Gaming Oddities

Scott Wasson first picked up on this anomaly in his GPU-limited FarCry 2 results at the bottom of this page. Jon Stokes pointed it out and our own Gary Key duplicated and expanded upon the results.

The situation is this: in some cases, Nehalem can go from being much faster than Phenom II, to being measurably slower within the same benchmark depending on resolution. Gary was the first to tie the issue to the GPU used. Gary found that NVIDIA GPUs appeared to behave this way on Nehalem/Phenom II while AMD GPUs didn't. In other words, NVIDIA GPUs were running faster on AMD hardware while AMD GPUs were running faster on Intel hardware. It's all very strange.

It's no surprise that Ryan and I are working on the reviews for AMD's next-generation DX11 GPUs due out before the end of September. I cloned my GPU testbed SSD and moved it over to my CPU testbeds. I then proceeded to run a subset of our GPU tests on the Core i7 920, Core i7 870, Core i5 750, Phenom II X4 965 BE and Core 2 Quad Q9450 on two different GPUs, a GeForce GTX 275 and a Radeon HD 4890.

Let's go through the results game by game, shall we?

I'll start with Gary's FarCry 2 benchmark. We're running in DX10 mode with the optimal quality defaults (latest patch) and 2X AA. Much more GPU-bound than our normal CPU gaming tests, but that's exactly what we're looking for here. The benchmark of choice is "Ranch small", it comes with the game:

So I've duplicated Gary's results. The Nehalem cores all perform about the same, the i7 920 is a bit slower thanks to lacking turbo mode it seems. But look at the Phenom II X4, it is significantly faster regardless of resolution. Now look at the same test with a Radeon HD 4890:

The Phenom II X4 965 BE advantage disappears completely. That's odd.

Next, I ran the FarCry 2 benchmark we're using for our upcoming GPU reviews. It's the Playback action demo with Ultra Quality defaults and 4X AA enabled. First on NVIDIA hardware:

The Core i7 920 falls a bit behind the other Nehalems and while the Phenom II X4 965 BE pulls ahead slightly at 2560 x 1600, the performance is generally GPU bound across the board. An unexpected result is that the Core 2 Quad Q9450 at 1680 x 1050 is actually CPU bound. There may just be a gaming reason to upgrade your CPU after all. Now let's switch to AMD hardware:

Now this is strange. The Core 2 Quad doesn't fall behind in performance, in fact it ties the Core i7 870 at 1680 x 1050. In other words, it doesn't appear to be CPU bound anymore at 1680 x 1050. Confused?

Let's keep going.

The next game I tested was Crysis Warhead. Again I ran all of the numbers in DX10 mode, this time with "Gamer" quality presets but with "Enthusiast" quality shaders. I ran the "frost" benchmark included with the initial version of the game.

All of the lines are overlapping as they should be, we're in a GPU limited situation afterall. The 870 pulls ahead slightly at the end but it's nothing to get terribly excited about.

Switch to the Radeon HD 4890 and we now have an outlier. The Core i7 920 is measurably slower than everything else at 1680 x 1050. The only change we made was the graphics card/drivers. Next.

Dawn of War II is a RTS/RPG that includes a wonderful built in benchmark. I ran with all settings maxed out in the game (including turning AA "on"):

At 1680 x 1050 we actually see some performance breakdown here. The Lynnfields are fastest, most likely due to faster turbo modes. The Core i7 920 is next on the charts, followed by the Phenom II X4 965 BE. At the bottom we have the Core 2 Quad Q9450. But at 2560 x 1600 they all converge at roughly the same point. Since many users have monitors capable of resolutions lower than 1920 x 1200 it's quite possible that the differences between these CPUs would be noticeable.

Things don't change too much as we switch graphics cards. The Phenom II X4 does a bit better with the Radeon HD 4890, but that's about the only change.

Left 4 Dead is next. All settings are maxed including Anisotropic Filtering at 16X. V-Sync is disabled and AA is set to 4X MSAA.

These numbers mostly make sense. The i7 870 is the fastest, followed by the i5 750 and the i7 920 - you have turbo to thank for that. The Phenom II is a bit slower and the Core 2 Quad is a lot slower. But by the time you hit 2560 x 1600, all roads lead to around 76 fps.

Similar behavior with ATI hardware, whew.

HAWX is a combat flight simulator that also doubles as a great DX10 benchmark. I ran the DX10 version of the game with all settings at their highest values with the exception of Ambient Occlusion, which was set to "low".

This is another one of those games where the Phenom II pulls ahead of the Nehalem processors even at a supposedly GPU-bound 2560 x 1600 resolution. The advantage isn't huge, about 7% but the Core 2 Quad gives us some indication as to what's going on. The Q9450 actually beats everything here - perhaps it's a large L2 thing? Now look at what happens with a Radeon HD 4890:

The Core 2 Quad still does better than everything else, but pretty much everything converges at the same point. The Phenom II advantage seems to disappear. So far we have HAWX and FarCry 2 exhibiting this behavior. Mental note, next benchmark please.

Our final test is Battleforge, a free to play online card based RTS. I ran with all settings maxed out:

Here we see the opposite happening - the Phenom II X4 965 BE is far slower than anything else at 1680 x 1050. As expected, all CPUs tend to converge at the same point if you crank the resolution up high enough.

Switch graphics cards and the AMD disadvantage actually disappears. It's the opposite of what we've been seeing in games like FarCry 2 and HAWX where switching to an AMD GPU causes the AMD advantage to disappear.

What can we conclude from all of this data? Not much unfortunately. There are a couple of certainties:

1) Even at relatively stressful GPU settings, 1680 x 1050 with 4X AA enabled, some games are still CPU bound. The next-generation of DX11 GPUs will make this even more true.

2) Gaming performance isn't totally clean cut between all of these CPUs. There are situations where Nehalem is faster, Penryn is faster or Phenom II is faster. The trend appears to be that Nehalem is generally the fastest, followed by Phenom II and only rarely does the Core 2 Quad end up on top.

How do I explain the odd behavior that we've seen in some of these games? Honestly, I'm not sure if there's any one explanation. What appears to happen is a perfect storm of CPU power, GPU power, GPU drivers, cache sizes, clock speeds and instruction mix. In some cases it looks to be cache related as the Core 2 and Phenom II both do very well and have a noticeably larger L2 than Nehalem, but in other cases it's much more difficult to explain by any one variable. The fact that the situation changes almost entirely when switching to ATI hardware is what makes me believe the GPU driver is playing some role in all of this.

Ultimately it's not a big (or consistent) enough of an issue to get too worked up about, but it's definitely something real and not just a figment of testbed imagination. I've shared all of my data with hopes of figuring out exactly what's going on, but as I mentioned in my Lynnfield review - not all applications/games are going to play out the same way. I'll update you if I do find anything out.

Lynnfield vs. Bloomfield: Overclocked and Without Turbo
Comments Locked

46 Comments

View All Comments

  • coconutboy - Saturday, September 19, 2009 - link

    Good article. There are a number of great hardware sites out there, but I do appreciate that you fellas at anandtech not only get out the info fast, but also in depth. I was especially interested in the stock voltage (or near to it) overclocking comparisons with i7 920 vs i7 860 both with and without turbo enabled. I was pretty sure I wanted an i7 920 versus the 860, but this article along w/ some early forum board results pretty much seals it.

    I understand that anandtech has to go by the prices of reliable online retailers or else chains like Best Buy etc, but for overclockers with a microcenter nearby I think i7 920 is a better value than 860. Lynnfields turbo modes are of dubious value for OCers, even conservative ones like me. I plan on running a low OC of ~3.4GHz which likely makes 860's turbo modes much more attractive to me versus more aggressive OCers, but still find the total system cost/performance of even a moderately overclocked 860 to be negligible vs 920 because-

    1) I can buy a 920 for $200 vs $230 for the 860.

    2) There are a number of excellent 1366 mobos in the $170-200 range. Most 1156 mobos which compare featurewise to those sub-$200 x58 mobos are at least $150 with many costing the same price as X58. Combined w/ a cheaper CPU from microcenter, Lynnfield offers me nothing pricewise.

    3) x58 is a safer and easier OC since it doesn't fiddle w/ the PCIe. This is of particular interest for those of us who might be diving in early to the upcoming (and probably $$) ATI/Nvidia GPUs. x58 = Less challenge for the tweakers but a safer bet for the set-it-and-forget-it crowd who don't want issues later on.

    4) The cost per GB of low-latency DDR3 is a almost identical for 3x2GB and 2x2GB kits. I easily chew up 4GB of RAM on my currrent system, so 6 or even 12GB is much more attractive.

    If you have the ability to buy your cpu from a nearby microcenter or someplace with similar prices the main attraction for buying an i7 860 seems to be-

    1) running stock speeds/low OCs
    2) buying a low-cost $100-120 mobo that skimps on a few features.
    3) you want the coolest running i7 CPU possible
    4) new and shiny ooooh.

    I'll buy 920 for me, and probably pick up 860 with one of those $100-120 mobos for the woman. Now please hurry up and pass NDA, I'm curious about the new ATI GPUs.
  • ginbong - Monday, September 21, 2009 - link

    You forgot about the idle and load power consumption.

    Lynnfield has really low power consumption. I'm one of the slight OC with stock voltage persons but I think if you will only be running a single high end GPU or a dual GPU on one PCB then the Lynnfield is the way to go for stock clocks relying on the aggressive turbo to keep the power consumption down. (I normally take off my overclocks when I don't play for a few months.)

    Lynnfield would have been great if it wasn't for the linked PCI-e on die controller.

    I'm undecided yet because I'm not sure about how to handle that while overclocking so I'll be waiting for more articles related to that issue. *wink* *wink* AnandTech staff
  • JamesA - Saturday, September 19, 2009 - link

    From looking at the benchmarks, it seems that in Gaming the Core2Duo E8600 and the Core2Quad Q9650 perform very well. It seems to be mostly in the Photoshop / 3D rendering tests that they move way down the charts.
    So if you are mostly doing Gaming and already have a good system that could handle the E8600/Q9650 it would seem there was not a specific value right yet in spending all the money to upgrade to an i5/i7 system.
  • Zoomer - Saturday, September 19, 2009 - link

    From the benchmark results, would I be right to extrapolate that there is no real reason to get a Lynnfield if one solely focused on gaming performance?

    The 2.66Ghz C2Q is clock matched with the i5, but remains on par most of the time, despite the i5 dynamically overclocking. I surmise that a C2Q at a frequency = max i5 turbo freq would beat the i5. Furthermore, since the i5 is not that great of an overclocker, max C2Q freq > i5 freq, but $C2Q << $i5 due to the newer platform & need for DDR3. P55 mobos, cooling solutions, ram all cost more.

    Hopefully someone can run some benches and do a comparison to squash such speculation. ;)
  • nevbie - Saturday, September 19, 2009 - link

    Penryn and the Radeons like each other. Though these results are with 4 cores only..
  • Patrick Wolf - Saturday, September 19, 2009 - link

    So do those gaming results mean your going to post results of future GPU benchmarks on both Intel and AMD hardware?
  • TA152H - Saturday, September 19, 2009 - link

    I read this, and I'm really confused.

    The on-die PCIe should make the Lynnfield slower, right? The reason the benchmarks close up a little on games at higher clock speeds is the bottleneck probably moves more towards the GPU.

    If you notice on your next page, you see that at higher resolutions the i7 920 starts creeping up. You could say this is the reverse, but in some situations it actually passes the Lynnfield. This is because of the inferior PCIe implementation on the Lynnfield, probably. In this event, you're probably have more collisions, because of the higher resolutions, you're using main memory for video. Consequently, the weaker on-die implementation starts to falter, while the x58 doesn't have the memory contention issue.

    That's my guess anyway. It's going to be as hard to get people to understand this as it was for them to understand the additional stages for the K8 were for IPC, not clock speed, but ...

    On-die PCIe isn't going to boost performance, it should hurt it. Unless Intel did something weird and gave the Lynnfield a separate memory bus for PCIe, all the memory requests from the video cards now have to go through the processor. If the processor and video card want to access memory at the same time, you lose performance. The x58 doesn't have to use the wider memory bus of the Bloomfield, so this problem doesn't exist.

    This would explain some of your benchmark results. You'd see it more if you actually used proper memory. Not that you'd want to.

    I'm not sure of this, but so far benchmarks seem to imply it, and I've seen nothing to disprove it. Have you heard something different from Intel? I really don't think they would have a separate memory bus for PCIe, when you think of how infrequently, relatively speaking, it would be used. So, it seems very likely there is a potential contention issue, and on-die PCIe would lower performance, not increase it.
  • lopri - Saturday, September 19, 2009 - link

    Your explanation seems plausible at first but it fails to account for:

    1) That the symptom more or less disappears with AMD GPU.
    2) That the Bloomfield suffers the same thing as the Lynnfield.
    3) That it is not Intel CPU under-performing but rather AMD CPU (or platform) better performing when coupled with NV GPU. I deduce this partially from C2Q-P45's showing under GPU-limited scenarios.

    These are subtle yet important distinctions, IMO.
  • goinginstyle - Saturday, September 19, 2009 - link

    TA152H...
    Where is your article about P55 being "brain damaged" at your review site? Where are all of your benchmarks proving that the 920 "wipes the floor" with the 860? Where are all your benchmarks proving that the 920 is faster with higher speed memory? So far none of the benchmarks here or elsewhere even show what you claim. How is that next copy and paste article coming along for you by the way?
  • TA152H - Saturday, September 19, 2009 - link

    You saw them here, you twit.

    Although, I'm not crazy about him running the uncore on the Bloomfield faster. The results are skewed. They would be lower were it not for that.

    Still, let's say 3%. When you see 3% just from changing the CPU, considering the other parts, that's a big difference. With caches being so effective, getting 3% difference from the same architecture, on average, is pretty big.

    On some, it's much bigger.

    Did you learn something, moron?

Log in

Don't have an account? Sign up now