Multitasking Performance

When we were trying to think up new multitasking benchmarks to truly stress Kentsfield and Quad FX platforms we kept running into these interesting but fairly out-there scenarios that did a great job of stressing our test beds, but a terrible job and making a case for how you could use quad-core today.

Without a doubt, in the next two years the number of applications that see a benefit when running on four cores will increase dramatically. Even multitasking under Windows Vista will make the argument for more cores easier (simply opening a new Explorer window in Vista will eat up 10% of the CPU time of a Quad FX system), but our Vista benchmarks are not yet complete and we wanted to have something to showcase for this review.

While working on our Quad FX article we also happened to be working on a follow-up to our HDCP Graphics Card Roundup, focusing on H.264 decoding performance in Blu-ray titles. A light bulb went off and we had our benchmark: how many cores do you need to watch a high bit-rate Blu-ray movie and do something else at the same time on your PC?

The movie we used was Xmen III, encoded in H.264, and featuring bitrates in excess of 40Mbps at times. Our benchmark starts at the beginning of Chapter 18 and continues until our background tasks are complete. This particular segment ranges in bitrate from 13Mbps up to above 40Mbps, with the average falling in the 18 - 24Mbps range.
We played the movie in the foreground, while in the background we either ran our Cinebench test, encoded a DivX movie, encoded a WME9 movie or performed our 3dsmax test.

The two rendering tests are important because rendering can take a bit of time and it might be nice to entertain yourself with a movie while your rendering completes; after all, what's the point of having $1000 worth of CPUs if you can't use them for entertainment?

The two encoding tests are also important because being able to encode and decode at the same time is a fundamental requirement for a DVR, and at some point the next-generation of media center PCs will need to be able to decode high bitrate HD movies while encoding others. We chose to include both DivX and WME because DivX runs much better on Intel CPUs, while the standings are a bit closer under WME, to give you a better overall impression of how the two platforms handle these heavy multitasking scenarios.

Our first test involved us playing back the BD title while running our multi-threaded Cinebench test; we reported the Cinebench score upon its completion:

General Performance - Multitasking

The dual core processors all fall to the bottom of the list and basically perform like single-core CPUs while decoding the Blu-ray movie. The quad-core setups do much better and perform very well, but all of the CPUs in this test were able to run without dropping any frames in the BD movie.

Making things a bit more difficult, our next test had the same movie playing back but this time we ran our DivX encoding test in the background. We reported the DivX encoding frame rate upon completion:

General Performance - Multitasking

Performance is pretty much what you'd expect, although Intel's superior DivX encoding performance results in the Core 2 Extreme X6800 doing almost as well as the FX-74. What you don't see however is how well these systems played back the Blu-ray movie; none of the dual core setups were able to play the BD movie smoothly, not even the Core 2 Extreme X6800. The movie was basically unwatchable due to all of the pausing and stuttering.

All of the four core systems played the BD movie fairly well; although they all dropped some frames, it wasn't enough to totally ruin the experience.

Next up we tried playing our BD title while running our WME9 test, and found similar results:

General Performance - Multitasking

Once again, none of the dual core platforms were able to play the BD title even remotely smoothly. The quad-core setups were able to play the movie while encoding, but still managed to drop some frames (not enough to ruin the experience though).

Our final multitasking test has us playing the same BD title while running our 3dsmax 8 render test:

General Performance - Multitasking

Much to our disappointment, none of the systems could handle this workload without ruining the movie playback; even the quad-core setups had troubles. We're not talking a few dropped frames, but rather the movie playback would be completely stopped at times. It looks like we may have a scenario for either more GPU assisted H.264 decode or an 8-core Quad FX platform in the future.

Gaming Performance with Half Life 2: Episode One and Valve SMP Benchmarks Power Consumption
Comments Locked

88 Comments

View All Comments

  • Nighteye2 - Thursday, November 30, 2006 - link

    I'm interested in that as well. NUMA will be an important part of 4x4 performance - so why isn't NUMA used in the benchmark, or at least mentioned. NUMA is the advantage of having 2 sockets - having NUMA disabled in this benchmark by using an OS that does not support it unfairly cripples the 4x4 performance.
  • Viditor - Thursday, November 30, 2006 - link

    quote:

    NUMA will be an important part of 4x4 performance - so why isn't NUMA used in the benchmark, or at least mentioned

    Agreed...I think that one of the reasons that AMD delayed release of this so long is that they wanted to show it on Vista instead of WinXP. It seems to me that there would be a substantial difference between the 2...
  • Viditor - Thursday, November 30, 2006 - link

    As a follow up on just how important NUMA is for 4x4, check out http://babelfish.altavista.com/babelfish/trurl_pag...">this review which actually compares the 2...
    There is a DRASTIC difference between performance on XP and Vista!
  • Accord99 - Friday, December 1, 2006 - link

    Most of the difference is running in 64-bit mode. The extra bandwidth didn't help the FX-74 in the megatasking bench. They didn't do any game benchmarks but based on past reviews of NUMA, the FX-74 will probably keep on losing to the FX-62 in games.
  • Viditor - Friday, December 1, 2006 - link

    quote:

    Most of the difference is running in 64-bit mode

    I'm not sure I agree...there's a 22.5% increase in performance there, and I haven't seen anything like that on the 64 bit version of 3DS Max before...
    Not to mention that Vista isn't known as a real speed demon (quite the opposite) for these apps...
    What the 64bit version does is allow for larger scene use and stability, not so much faster rendering.
  • photoguy99 - Friday, December 1, 2006 - link

    quote:

    I'm not sure I agree...there's a 22.5% increase in performance there, and I haven't seen anything like that on the 64 bit version of 3DS Max before...


    Sorry totally wrong -

    64-bit can make a big difference in performance depending on the app. Remember you can process 64 bits of data in a typical instruction instead of 32, so theoretically twice as much pixel data at a time for rendering.

    Some apps may not show the full benefit it depends on how they are coded and compiled, but it's definitely a real potential for speedup.

    Bottom line is 64-bit could easily account for a bigger performance increase than NUMA.
  • Kiijibari - Friday, December 1, 2006 - link

    quote:

    64-bit can make a big difference in performance depending on the app. Remember you can process 64 bits of data in a typical instruction instead of 32, so theoretically twice as much pixel data at a time for rendering.


    quote:

    I'm not sure I agree...there's a 22.5% increase in performance there, and I haven't seen anything like that on the 64 bit version of 3DS Max before...


    You see that he refers already to 3DS MAX .. I have not investigated this, but if he refers to it, then I trust him on that one ...

    Futhermore I miss synthetical Sandra Mem bandwidth benches .. these should easily show what is going on there ...

    Anyways a 4x4 review without mentioning the XP - NUMA problem is just not worth reading it ... Sorry Anand ...

    cheers

    Kiijibari
  • Anand Lal Shimpi - Friday, December 1, 2006 - link

    The performance deficit seen when running latency sensitive single and dual threaded applications exists even in a NUMA-aware OS (I've confirmed this under Vista). I'm still running tests under Vista but as far as I see, running in a NUMA-aware OS doesn't seem to change the performance picture at all.

    Take care,
    Anand
  • Kiijibari - Saturday, December 2, 2006 - link

    Hi Anand,

    first of all, thanks for your reply.

    Then, if there is really no performance difference, then I would double check the BIOS, if you have really disabled node interleave.

    Furthermore there seems to be a BIOS bug, with the SRAT ACPI tables, which are necessary for NUMA. It would be nice, if you can dig up some more information about that topic.

    Clearly, that would be not your fault, but AMD's.

    cheers

    Kiijibari
  • Anand Lal Shimpi - Saturday, December 2, 2006 - link

    From what I can tell the Node Interleave option in the BIOS is doing something. Disabling it (enabling NUMA) results in lower latencies than leaving it enabled, but still not as slow as running with a single socket.

    CPU-Z offers the following latencies for the three configurations:

    2S, NUMA On: 168 cycles
    2S, NUMA Off: 205 cycles
    1S: 131 cycles

    From my discussions with AMD last week, this behavior is expected. I will do some more digging to see if there's anything else I'm missing though.

    Take care,
    Anand

Log in

Don't have an account? Sign up now