More Sockets, but Lower Performance?

When AMD briefed us on Quad FX, the performance focus was on heavy multitasking (AMD calls this "Megatasking") or very multi-threaded tests. We figured it was an innocent attempt to make sure we didn't run a bunch of single threaded benchmarks on Quad FX and proclaim it a failure. Given that the vast majority of our CPU test suite is multi-threaded to begin with, we didn't think there would be any problems showcasing where four cores is better than two, much like we did in our Kentsfield review.

However when running our SYSMark 2004SE tests we encountered a situation that didn't make total sense to us at first, and somewhat explained AMD's desire for us to strongly focus on megatasking/multithreaded tests. If we pulled one of the CPUs out of the Quad FX system, we actually got higher performance in SYSMark than with both CPUs in place. In other words, four cores was slower than two.

CPU SYSMark 2004SE Internet Content Creation Office Productivity
2 Sockets (4 cores) 261 373 182
1 Socket (2 cores) 288 393 211

You'll see that in some of the individual tests there is an advantage to having both CPUs installed, but in the vast majority of them performance goes down with four cores. It turns out that there are two explanations for the anomaly.

CPU Internet Content Creation 3D Creation 2D Creation Web Publication
2 Sockets (4 cores) 373 245 514 411
1 Socket (2 cores) 393 364 453 369

First, in Internet Content Creation SYSMark 2004SE, there appears to be an issue with having two physical CPUs in the system that results in the 3dsmax rendering test only spawning a single thread, lowering performance below that of a normal dual-core processor. This problem may be caused by a licensing violation within the benchmark where it is expecting to see one physical CPU with multiple cores and isn't prepared to deal with multiple CPUs. Regardless of the exact cause of the problem, it doesn't appear to be anything more than a benchmark issue. It's the performance in the Office Productivity suite that is far more worrisome because there is no issue with the benchmark that's causing the problem.

CPU Office Productivity Communication Document Creation Data Analysis
2 Sockets (4 cores) 182 171 259 137
1 Socket (2 cores) 211 187 285 176

The Office Productivity suite of SYSMark 2004SE wasn't the only situation where we saw lower performance on Quad FX than with a single dual core setup. 3D games seemed to suffer the most; take a look at what happens in our Oblivion and Half Life 2: Episode One tests:

CPU Oblivion - Bruma Oblivion - Dungeon Half Life 2: Episode One
2 Sockets (4 cores) 67.3 78.3 155.8
1 Socket (2 cores) 75.2 90.9 165.7

Once again, populate both sockets in the Quad FX system and performance goes down. The explanation for these anomalies lies in the result of one more benchmark, CPU-Z's memory latency test:

CPU CPU-Z Latency (8192KB, 128-byte)
2 Sockets (4 cores) 55.3 ns
1 Socket (2 cores) 43.3 ns

With both sockets populated, memory latency goes up by around 27% and thus in applications that are more latency sensitive and don't necessarily need all four cores, you get worse performance than with a single dual-core CPU. The added latency comes from the additional probing over the HT bus that's done for coherency whenever a memory request is made to see where the latest copy of the data resides.

It's a problem that will go away if you have a single quad-core CPU with one memory controller, but one that makes Quad FX a tougher pill to swallow compared to Intel's quad-core offerings.

How does a 3GHz Athlon 64 X2 Perform? Four cores, 1 Socket or Four cores, 2 Sockets?
Comments Locked

88 Comments

View All Comments

  • Viditor - Thursday, November 30, 2006 - link

    quote:

    if they decide to do anything with quad G80 chips you can pretty much guarantee that it will be for both platforms

    If they can...
    The 680a chipset has a direct HT link to each MCP, the 680i obviously can't do that and must bridge through the SPP.

    quote:

    Anyway, this Quad FX is just the same thing as Quad SLI: potentially good marketing, but lackluster final performance and terrible heat and power requirements


    Now if only we could find a review that actually showed that...;)
    Seriously, the one major benefit of Quad FX is that it can run 4 GPUs. While I appreciate all of the conjecture and speculation, it isn't really a test of the facts, is it?
  • defter - Friday, December 1, 2006 - link

    <quote>Seriously, the one major benefit of Quad FX is that it can run 4 GPUs.</quote>

    How that's a benefit? You can have 8 GPUs in a same system (AMD or Intel based, it doesn't matter) with a couple of NVIDIA Quadro Plex 1000 Model II's if money isn't an issue:
    http://www.nvidia.com/page/quadroplex_comparison_c...">http://www.nvidia.com/page/quadroplex_comparison_c...

  • JarredWalton - Friday, December 1, 2006 - link

    Fact: Quad SLI (7950 GX2) works on 590 SLI and 680i.
    Fact: Quad SLI (8800 GTX) does not exist.

    Until the second item changes, we only have the first to go on, which is that current quad SLI works - at least as much as it works anywhere - on both platforms. And the QSLI drivers are still largely broken - you can run benchmarks, but as soon as you start playing lots of games rather than just benching, problems crop up. Neverwinter Nights 2 for example doesn't even run properly with CrossFire or SLI, so let's not even worry about getting QSLI support for now.
  • JackPack - Thursday, November 30, 2006 - link

    8800 GTX requies two slots, which means it won't fit in the 4x4 motherboard. Quad-SLI performance has already shown to be poor using two 7950 GX2 cards. Finally, how do you bridge four 8800 cards together?
  • Viditor - Thursday, November 30, 2006 - link

    quote:

    8800 GTX requies two slots, which means it won't fit in the 4x4 motherboard


    Huh?
    http://www.bit-tech.net/hardware/2006/11/08/nvidia...">Single slot 8800 GTX

    quote:

    Quad-SLI performance has already shown to be poor using two 7950 GX2 cards


    This is only when using a single MCP, the 680a uses dual MCPs.
    The 680i uses one MCP and one SPP.

    quote:

    Finally, how do you bridge four 8800 cards together?

    By having 2 sets of bridges (one bridge per MCP).
  • JarredWalton - Friday, December 1, 2006 - link

    Quad SLI has problems whether or not you have dual MCPs. It's driver and software related - basically the drivers don't do AFR on a lot of titles and so you end up with lower than 7900 GTX SLI performance.

    As for two slots, they're talking the width of the cards. They only plug into one slot, but they fill the adjacent slot. Quad 8800 GTX would require eight expansion slots right now. Given that Vista 8800 drivers aren't even out yet, I think NVIDIA has other things to do before they worry about moving beyond SLI'ed 8800 cards.
  • PrinceGaz - Thursday, November 30, 2006 - link

    I suppose you could replace the HSF with something smaller which would fit in a single-slot, which would have to mean water-cooling.

    Quad-SLI performance (or lack of) is probably a driver-issue.

    Don't 8800 cards have two SLI sockets therefore allowing you to chain together as many as you like (in theory)?
  • casket - Thursday, November 30, 2006 - link

    It appears with win-xp sp2... this quad fx stinks. How about Win 2003 or Vista Ultimate? It might change things drastically.
  • Neosis - Thursday, November 30, 2006 - link

    I don't think the problems in the benchmarks are not an opperating system issue. Two processors having totally four cores are not the same as a processor having the same number of cores. Additional latencies will slow down the performance.
  • Viditor - Thursday, November 30, 2006 - link

    quote:

    I don't think the problems in the benchmarks are not an opperating system issue


    Actually, they probably are...Windows XP is not NUMA aware, while Vista is.

    quote:

    Two processors having totally four cores are not the same as a processor having the same number of cores. Additional latencies will slow down the performance


    In this case there is no difference...the Kentsfield has exactly the same latency as a 2 socket dual core because the 2 dual cores on-board don't talk directly with each other.

Log in

Don't have an account? Sign up now