Brace Yourself, High Latency Roads Ahead

We tested Skulltrail with only two FB-DIMMs installed, but even in this configuration memory latency was hardly optimal:

CPU CPU-Z Latency in ns (8192KB, 256-byte stride)
Intel Core 2 Extreme QX9775 (FBD-DDR2/800) 79.1 ns
Intel Core 2 Extreme QX9770 (DDR2/800) 55.9 ns

 

Memory accesses on Skulltrail take almost 42% longer to complete than on our quad-core X38 system. In applications that can't take advantage of 8-cores, this is going to negatively impact performance. While you shouldn't expect a huge real world deficit there are definitely going to be situations where this 8-core behemoth is slower than its quad-core desktop counterpart.

Scaling to 8 Cores: Most Benchmarks are Unaffected

Trying to benchmark an 8 core machine, even today, is much like testing some of the first dual-core CPUs: most applications and benchmarks are simply unaffected. We've called Skulltrail a niche platform but what truly makes it one is the fact that most applications, even those that are multithreaded, can't take advantage of 8 cores.

While games today benefit from two cores and to a much lesser degree benefit from four, you can count the number that can even begin to use 8 cores on one hand...if you lived in Springfield and had yellow skin.

The Lost Planet demo is the only game benchmark we found that actually showed a consistent increase in performance when going from 4 to 8 cores. The cave benchmark results speak for themselves:

CPU Lost Planet Cave Benchmark (FPS)
Dual Intel Core 2 Extreme QX9775 113
Intel Core 2 Extreme QX9775 82
Dual Intel Core 2 Extreme QX9775 @ 4.0GHz 124

 

At 1600 x 1200 we're looking at a 30% increase in performance when going from 4 to 8 cores, unfortunately Lost Planet isn't representative of most other games available today. Other titles like Flight Simulator X can actually take advantage of 8 cores, but not all the time and not consistently enough to offer a real world performance advantage over a quad-core system.

The problem is that because most games can't use the extra cores the added latency of Skulltrail's FB-DIMMs actually makes the platform slower than a regular quad-core desktop. To show just how bad it can get, take a look at our Supreme Commander benchmark.

At the suggestion of Gas Powered Games, we don't rely on Supreme Commander's built in performance test. Instead we play back a recording of our own gameplay with game speed set to maximum and record the total simulation time, making a great CPU benchmark. We ran the game at maximum image quality settings but left resolution at 1024 x 768 to focus on CPU performance, the results were a bit startling:

Supreme Commander Performance

Thanks to the high latency FBD memory subsystem, it takes a 4.0GHz Skulltrail system to offer performance better than a single QX9770 on a standard desktop motherboard. We can't stress enough how much more attractive Skulltrail would have been were it able to use standard DDR2 or DDR3 memory.

Gamers shouldn't be too worried however, Skulltrail's memory latency issues luckily don't impact GPU-limited scenarios. Take a look at our Oblivion results from earlier for affirmation:

Oblivion: Shivering Isles Performance

In more CPU bound scenarios like Supreme Commander, you will see a performance penalty, but in GPU bound scenarios like Oblivion (or Crysis, for example), Skulltrail will perform like a regular quad-core system.

The Bottom line? Skulltrail is a system made for game developers, not gamers.

Other benchmarks, even our system level suite tests like SYSMark 2007, hardly show any performance improvement when going from 4 to 8 cores. We're talking about a less than 5% performance improvement, most of which is erased when you compare to a quad-core desktop platform with standard DDR2 or DDR3 memory.

That being said, there are definitely situations where Skulltrail performance simply can't be matched.

Comparing to the New Mainstream & The Test A Hammer for 3D Rendering Applications
Comments Locked

30 Comments

View All Comments

  • SiliconDoc - Thursday, February 7, 2008 - link

    You're onto something there, just make it ten times worse, and you'll have the real picture. I've seen hardware years ago that puts current harddrives to shame. So for whatever reasons, things are limited, like the 56k modem was.
    A friend just bought an HD2900 Pro (got it yesterday), 256bit 512mb. There were rumors about that there was a 512 bit version, he swore he saw it advertised. Well, to make a long story short, the $163 512bit version was getting bios flashed and overclocked up to the $400 XT 2900 whatever...(it was the same core apparently ) and it got sold real quickly and then pulled.... it's a ghost now...
    I looked for one since I just found out, and saw one at some place online for nearly $400, at one at a music store online posted but not in stock - special order only, likely a mere pic-e-presence.
    In other words they can pump those things out like mad, and depending on how much turkey they want in their bank...they start doing calculations, and when the consumer "catches" them, it's like anything else.
    Let's face it, prices have gone a bit wild lately, and the big boys must have ringing cash registers in their eyes.
    If they can pump 300 or 500 or 2 grand out of people instead of Disney World or Vegas, they'ell do it, and they see the drooling...,
    slobberer out
  • Anonymous Freak - Tuesday, February 5, 2008 - link

    Have you checked the prices of Xeons vs. equivalent Core 2 Extreme recently?

    According to Pricewatch, the Xeon 5472 (3.0 GHz, 1600 MHz bus,) is about $1029/1050. The Core 2 Extreme QX9650 (3.0 GHz, 1600 MHz bus,) is $1038/$1166. I can't find the QX9770 on Pricewatch, but other searches find it is about $1600, while the equivalent Xeon is $1400.

    The Core 2 "Extreme" line has, since its inception, been more expensive than equivalent Xeons. Heck, it might be cheaper to pick up the Xeon equivalent of the 9775 than to pick up the 9775 itself.
  • Anonymous Freak - Tuesday, February 5, 2008 - link

    You state: "We tested Skulltrail with only two FB-DIMMs installed, but even in this configuration memory latency was hardly optimal:"

    This is a major flaw in your benchmarking. As [url=http://www.anandtech.com/mac/showdoc.aspx?i=2816&a...">http://www.anandtech.com/mac/showdoc.aspx?i=2816&a...]your own[/url] Mac Pro review shows, quad-channel FB-DIMMs have lower latency, and higher bandwidth, than dual-channel. You should have filled all four FB-DIMM sockets. The latency penalty on multiple AMBs only applies to multiple AMBs on the same channel. For example, in a 5400-based server with four sockets per channel, having four total FB-DIMMs (one per channel, say 4 GB each,) produces better results than eight total FB-DIMMs (two per channel, 2 GB each.) And a sixteen FB-DIMM total (four per channel, 1 GB each,) system fares worst of all. Of course, that is assuming the TOTAL amount of RAM remains the same for each configuration. If you have an application that can benefit from massive amounts of RAM, having the extra RAM will far outweigh the performance penalty of the extra AMBs per channel. (In my example, moving from 16 GB of RAM using four 4 GB FB-DIMMs to 64 GB by having sixteen 4 GB FB-DIMMs, would produce performance benefits to certain applications just from the amount of RAM.)

    In addition, the new chipset, and newer FB-DIMM modules with newer AMBs, produces better results than the first-generation counterparts. For example, your Mac Pro benchmark showed CPU-Z latencies of 87 ns (quad-channel) and 92 ns (dual-channel, worse,) for the Mac Pro, vs. 52 ns for a Core 2 Duo with DDR-2 800; the new benchmark shows 79 ns for the 5400 chipset in dual-channel (assuming the same %, quad-channel should show 74 ns,) vs. 55 ns for a Core 2 Quad with DDR-2 800. Yeah, 74 is still slower than 55, but it's better than the 87 ns the (original) Mac Pro scored. (The new Mac Pro should see an improvement on par with this Skulltrail board over the old Mac Pro.)
  • Anand Lal Shimpi - Tuesday, February 5, 2008 - link

    You are correct on the FBD/latency issue. We didn't have small enough FB-DIMMs on hand to run a 4x1GB configuration, but the difference in latency is still not enough to change the situations where Skulltrail is outperformed by its desktop counterparts. The situation will be improved a bit but the point that I was trying to make is that in applications that can't take advantage of all 8 cores, Skulltrail will be slower thanks to its higher latency memory subsystem.

    Take care,
    Anand
  • Googer - Monday, February 4, 2008 - link

    For being a premium enthusiast product with a $500 price tag and server DNA, this thing better come with an intergrated SAS controller too. There are plenty of other server/workstation motherboards in this price range that offer SAS, if performance is the purpose for Skulltrails existance, there's no reason for it to be left out. 15,000 RPM drives for the win.
  • dansus - Monday, February 4, 2008 - link

    I would imagine you would see more difference if you used the multi threaded .dll (mt.dll) with x264 when encoding.

    Especially if your doing a 2+ pass encode where the first pass typically uses 50% cpu.

    I can see myself buying one later in the year as prices come down. At the very least, i can do two quad core encodes at once.
  • JKing76 - Monday, February 4, 2008 - link

    It's no secret that at a certain point, the computer "enthusiast" market is more about bragging than performance. But this is the most absurd and pointless release of pure e-penis waggling I've ever seen, and as a computer engineer I am literally embarrassed that a legitimate company like Intel is responsible. The EPA should fine Intel for this debacle, penalize them for each machine sold, and confiscate the computers of anyone selfish and stupid enough to buy one.
  • SiliconDoc - Thursday, February 7, 2008 - link

    Wow. I get a kick out the bloggers that so often find so many problems with the really high end machines. Strangely enough they never seem to post their "rig" stats when they are having a big fit of complaints.
    I suspect the real problem is massive e-penis envy, and expecting the government to shutdown a private firms product, confiscate purchasers products unless they pass your "needs" test, and maybe give them a greenpeace fine and carbon tax (I know it crossed someones mind) seems to me to be the biggest green streak of jealousy I've yet to witness.
    The bottom line is, more than 99% of the freaks reading this review would wet their pants and float off into heavenly bliss if they found the "Skulltrail" ( that's what I find offensive - the sick name ) on their desk in the morning.
    I find the whole thing much like let's say a bunch of guys at an auto show putting down the swing-up door 10/80 stainless brand new XXX sports car, when deep down inside not a one of them would turn down the set of keys, no matter how often they'ed claim otherwise.
    Suddenly all that extra cpu-horsepower here would be the prudent reserve for the upcoming releases that no doubt very soon will make use of it all, since dou and quads are now getting to be commonplace.
    It's just all so amusing, when Jones' hate the new McMansion, basically because they aren't living it.
  • Nihility - Monday, February 4, 2008 - link

    Why didn't AMD make this available with phenom? would have won them the performance crown (sorta since this apparantly doesn't scale very well).
  • legoman666 - Monday, February 4, 2008 - link

    the scalability has nothing to do with the platform, it has to do with the apps themselves. 2x phenoms will scale no better than 2x Intel quad. There are simply few programs out there designed to take advantage of >1 core, much less 8.

Log in

Don't have an account? Sign up now