Multi-core support in Games?

Both Quake 4 and Call of Duty 2 now have SMP support, supposedly offering performance improvements on dual core and/or Hyper Threading enabled processors. 

For Call of Duty 2, you simply install the new patch and off you go; SMP support is enabled.  To verify, we ran our CoD 2 benchmark and kept a log of the total processor utilization over time.  Below is a shot of perfmon with a fresh install of CoD2 (sans SMP patch):

Note how the total CPU utilization for our dual-core testbed hovers right around 50%, with the maximum being just under 52% (the remaining 2% can be attributed to driver and other overhead that can eat up extra CPU cycles). 

Now, let's look at CoD2 CPU utilization with the SMP patch installed:

While the average CPU utilization only goes up by around 9%, the maximum CPU utilization increases tremendously, now up to 83%, showing us that the second core is being used. 

We looked at performance at 1024x768 and obviously the higher the resolution, the lesser the impact of a faster CPU (at the same time, the lower the resolution, the greater the impact will be as the game becomes less GPU limited). 

To ensure a fair comparison, we tested using the SMP patch and simply disabled SMP manually by setting the r_smp_backend variable to "0".  We confirmed that SMP support was actually disabled by running perfmon and measuring CPU utilization. 

 Call of Duty 2    SMP Disabled SMP Enabled
AMD Athlon 64 FX-57 (2.8GHz) 80.6 N/A
AMD Athlon 64 X2 4800+ (2.4GHz) 79.8 70.3
AMD Athlon 64 X2 3800+ (2.0GHz) 78.7 68.1
Intel Pentium Extreme Edition 955 (3.46GHz) 79.8 68.4
Intel Pentium Extreme Edition 840 (3.2GHz) 78.1 68
Intel Pentium D 820 (2.8GHz) 75.6 67.1

Surprisingly enough, we actually saw pretty large performance drops in CoD2 with SMP enabled across both AMD and Intel platforms.  This is unfortunate, but the withdrawn SMP support of Quake 3 makes it less than shocking. We do expect that things will get better as time goes on. 

Quake 4 was a different story; with r_useSMP enabled, we saw some extremely large performance gains with the move to dual core:

 Quake 4    SMP Disabled SMP Enabled
AMD Athlon 64 FX-57 (2.8GHz) 115.4 N/A
AMD Athlon 64 X2 4800+ (2.4GHz) 114.9 147.4
AMD Athlon 64 X2 3800+ (2.0GHz) 100.9 143.2
Intel Pentium Extreme Edition 955 (3.46GHz) 98.9 142.3
Intel Pentium Extreme Edition 840 (3.2GHz) 89.0 133.6
Intel Pentium D 820 (2.8GHz) 80.6 125.5

The SMP patch either only spawns two threads, or the instruction mix of Quake 4 with the patch does not mix well with Intel's Pentium EE 955.  The dual core with Hyper Threading enabled platform didn't do anything at all for performance. 

While we're only looking at two games, this is a start for multithreaded game development.  You can expect to see a lot of examples where dual-core does absolutely nothing for gaming, but as time goes on, the situation will change. 

Presler vs. Smithfield - A Brief Look Dual Core and Hyper Threading: Detriment or Not?
Comments Locked

84 Comments

View All Comments

  • yacoub - Tuesday, January 3, 2006 - link

    quote:

    The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.


    Yet no mention of the Max, where the 4800+ utterly trounces the two Intel chips. Does Max not matter (in which case why bother listing it), or does it matter but you just neglected to mention that (whether on purpose or by accident)?
  • jjunk - Tuesday, January 3, 2006 - link

    quote:

    Yet no mention of the Max, where the 4800+ utterly trounces the two Intel chips. Does Max not matter (in which case why bother listing it), or does it matter but you just neglected to mention that (whether on purpose or by accident)?


    It's right there in the chart. As for further discussion not really necessary. Screaming frame rates might look good on the chart but they don't help game play. A 10 fps min will definately be noticiable.
  • IntelUser2000 - Sunday, January 1, 2006 - link

    quote:

    When we do receive the new motherboard, we will take a look at power consumption once more to get an idea of the final state of Intel's 65nm power consumption, but until then, we don't want to draw any conclusions based on what we've seen.
    '

    I don't like that paragraph. It makes it sound like 65nm will be all that makes Presler in power consumption. It will also make people judge 65nm based on Presler, since that's the first CPU on the 65nm.

    In fact its not that simple. Taking a CPU that's on a certain process like the Smithfield and putting on a smaller process won't mean instant 40-50% decrease in power consumption. That's called the dumb shrink. The reason Northwood had significantly lower power than Willamette was because Northwood was optimized to lower power consumption.

    A CPU that runs well at 130nm may do bad at 90nm and even worse at 65nm for example. Presler was said to be not Intel's main focus and Intel moved their design teams to Conroe, so people who's supposed to be optimizing Presler for 65nm all went away and Presler was just done a dumb shrink.

    Sleep transistor was an optional feature on 65nm, not required. So Presler may not have it.
  • IntelUser2000 - Monday, January 2, 2006 - link

    Why use DDR2-667 with 5-5-5-15 timings?? Most DDR2-667 can do 4-4-4-8(around there). This is gonna skew the results in AMD's favor as DDR400 used is the lowest latency possible.

    In reality nobody is gonna use DDR400 at 2-2-2-7 lateny or DDR2-667 at 4-4-4-8 latency. Nobody I have ever heard in outside internet uses the RAM at those timings.

    Anandtech should either benchmark them all at JEDEC timings or use them all with low latency. I understand they want to be sure the new test system to work properly, but using low latency RAM for the comparison system is just not fair.

    JEDEC timings for DDR400 is 3-3-3-8. Where are your DDR400 advantage over DDR2 now??
  • hans007 - Sunday, January 1, 2006 - link

    i think that the 9xx series is a big improvement over the 8xx.

    i have an 8xx myself the 820 which is the lowest power. the leakage is exponential so the 955 is going to draw a much highe ramount than say a 920 will.

    i bet the 920 will be a half decent cpu drawing maybe only 70 watts. which isnt TOO terrible in the grand scheme of power. the 920 would only run at 2.8 ghz and have not as high leakage percentage so i think it will be the one to get.

    true intel is not better yet, but they are getting there. and their dual cores still cost less.

    i also think that intel should be commended for writing the smp code for q4. that is the doom3 engine which will go into a LOT of games. and since it speeds up the amd chips as well, it is a free upgrade for everyone. sure it makes up for a large deficiency in the intel chips, but it is FREE.

    and it makes the really cheap 920/820 chips very price competitive. as the 820 chips are very very cheap about $150 on ebay (which is probably near what oems get them for in bulk, this the rampant dell 820 deals going on)
  • jjmcwill - Saturday, December 31, 2005 - link

    I do professional software development for a living, using Visual Studio 2003 to build the code for a product I work on. We have over 1000 .cpp files and over 1500 header files.

    On my work box: An HP xw6200 workstation with a single 3.0GHz Xeon CPU, 2MB L2 cache, 1G RAM, compilation takes 10:45 for a single project in our solution. On my home system: Socket 754 Athlon 64 3000+, 1.5G RAM, compilation takes 7:30. Both systems build the code off of the exact same, external ide hard drive in a Firewire enclosure. I use it to carry all my work back and forth between work and home.

    At some point we'll be investigating Make to launch parallel compiles, and I would be VERY interested in seeing dual-core CPU comparisons which include compilation benchmarks, using Visual Studio 2003 under Windows, using Make -j2 or Make -j3 under windows, and using gcc/make under Linux.

    Based on what I've seen with the Xeon, I'm leaning toward an AMD X2 or dual core Opteron for my next upgrade.


    Thanks.

  • Calin - Tuesday, January 3, 2006 - link

    I think that an Extreme Edition CPU (while much more expensive) would give better results with hyperthreading enabled than a simple Pentium D and maybe even than an Athlon64 X2 while doing several threads of compile.
  • Brian23 - Saturday, December 31, 2005 - link

    The second valuable post in this thread.

    I own a X2 3800 and I'm pleased with the results anand posted. I won't need to upgrade for a while.

    I'm looking forward to AMD implementing something similar to Sun's design: multiple threads running simultaneously. It shouldn't be that hard to do. It's just adding GPRs and a little logic that controls the thread contexts.
  • Missing Ghost - Saturday, December 31, 2005 - link

    Some other web sites report that the cpu becomes too hot with the stock heatsink.
  • Gary Key - Saturday, December 31, 2005 - link

    quote:

    Some other web sites report that the cpu becomes too hot with the stock heatsink.


    The initial press release kits that contained the Intel D975XBX motherboard had an issue that created higher than normal idle/load temperatures. We have new boards on the way from Intel. I can promise you that the first results shown in other 955EE reviews do not occur on the 975x boards from Gigabyte and Asus, nor will it occur on the production release Intel D975XBX. I highly recommend a different air cooling system than the stock heatsink but most of the reported results at this time are incorrect.

Log in

Don't have an account? Sign up now