Brace Yourself, High Latency Roads Ahead

We tested Skulltrail with only two FB-DIMMs installed, but even in this configuration memory latency was hardly optimal:

CPU CPU-Z Latency in ns (8192KB, 256-byte stride)
Intel Core 2 Extreme QX9775 (FBD-DDR2/800) 79.1 ns
Intel Core 2 Extreme QX9770 (DDR2/800) 55.9 ns

 

Memory accesses on Skulltrail take almost 42% longer to complete than on our quad-core X38 system. In applications that can't take advantage of 8-cores, this is going to negatively impact performance. While you shouldn't expect a huge real world deficit there are definitely going to be situations where this 8-core behemoth is slower than its quad-core desktop counterpart.

Scaling to 8 Cores: Most Benchmarks are Unaffected

Trying to benchmark an 8 core machine, even today, is much like testing some of the first dual-core CPUs: most applications and benchmarks are simply unaffected. We've called Skulltrail a niche platform but what truly makes it one is the fact that most applications, even those that are multithreaded, can't take advantage of 8 cores.

While games today benefit from two cores and to a much lesser degree benefit from four, you can count the number that can even begin to use 8 cores on one hand...if you lived in Springfield and had yellow skin.

The Lost Planet demo is the only game benchmark we found that actually showed a consistent increase in performance when going from 4 to 8 cores. The cave benchmark results speak for themselves:

CPU Lost Planet Cave Benchmark (FPS)
Dual Intel Core 2 Extreme QX9775 113
Intel Core 2 Extreme QX9775 82
Dual Intel Core 2 Extreme QX9775 @ 4.0GHz 124

 

At 1600 x 1200 we're looking at a 30% increase in performance when going from 4 to 8 cores, unfortunately Lost Planet isn't representative of most other games available today. Other titles like Flight Simulator X can actually take advantage of 8 cores, but not all the time and not consistently enough to offer a real world performance advantage over a quad-core system.

The problem is that because most games can't use the extra cores the added latency of Skulltrail's FB-DIMMs actually makes the platform slower than a regular quad-core desktop. To show just how bad it can get, take a look at our Supreme Commander benchmark.

At the suggestion of Gas Powered Games, we don't rely on Supreme Commander's built in performance test. Instead we play back a recording of our own gameplay with game speed set to maximum and record the total simulation time, making a great CPU benchmark. We ran the game at maximum image quality settings but left resolution at 1024 x 768 to focus on CPU performance, the results were a bit startling:

Supreme Commander Performance

Thanks to the high latency FBD memory subsystem, it takes a 4.0GHz Skulltrail system to offer performance better than a single QX9770 on a standard desktop motherboard. We can't stress enough how much more attractive Skulltrail would have been were it able to use standard DDR2 or DDR3 memory.

Gamers shouldn't be too worried however, Skulltrail's memory latency issues luckily don't impact GPU-limited scenarios. Take a look at our Oblivion results from earlier for affirmation:

Oblivion: Shivering Isles Performance

In more CPU bound scenarios like Supreme Commander, you will see a performance penalty, but in GPU bound scenarios like Oblivion (or Crysis, for example), Skulltrail will perform like a regular quad-core system.

The Bottom line? Skulltrail is a system made for game developers, not gamers.

Other benchmarks, even our system level suite tests like SYSMark 2007, hardly show any performance improvement when going from 4 to 8 cores. We're talking about a less than 5% performance improvement, most of which is erased when you compare to a quad-core desktop platform with standard DDR2 or DDR3 memory.

That being said, there are definitely situations where Skulltrail performance simply can't be matched.

Comparing to the New Mainstream & The Test A Hammer for 3D Rendering Applications
Comments Locked

30 Comments

View All Comments

  • chizow - Monday, February 4, 2008 - link

    quote:

    we don't have a problem recommending it, assuming you are running applications that can take advantage of it. Even heavy multitasking won't stress all 8 cores, you really need the right applications to tame this beast.


    Not sure how you could come to that conclusion unless you posted some caveats like 1) you're getting it for free from Intel or 2)you're not paying for it yourself or have no concern about costs.

    Besides the staggering price tag associated with it ($500 + 2 x Xeon 9770 @$1300-1500 + FB-DIMM premium) there's some real concerns with how much benefit this set-up would yield over the best performing single socket solutions. In games, there's no support for Tri-SLI and beyond for NV parts although 3-4 cards may be an option with ATI. 3 seems more realistic as that last slot will be unusable with dual-cards.

    Then there's the actual benefit gained on a practical basis. In games, looks like its not even worth bothering with as you'd most likely see a bigger boost from buying another card for SLI or CrossFire. For everything else, they're highly input intensive apps, so you spend most of your work day preparing data to shave a few seconds off compute time so you can go to lunch 5 minutes sooner or catch an earlier train home.

    I guess in the end there's a place for products like this, to show off what's possible but recommending it without a few hundred caveats makes little sense to me.
  • chinaman1472 - Monday, February 4, 2008 - link

    The systems are made for an entirely different market, not the average consumer or the hardcore gamer.

    Shaving off a few minutes really adds up. You think people only compile or render one time per project? Big projects take time to finish, and if you can shave off 5 minutes every single time and have it happen across several computers, the thousands of dollars invested comes back. Time is money.
  • chizow - Monday, February 4, 2008 - link

    I didn't focus on real-world applications because the benefits are even less apparent. Save 4s on calculating time in Excel? Spend an hour formatting records/spreadsheets to save 4s...ya that's money well spent. The same is true for many real world applications. Sad reality is that for the same money you could buy 2-3x as many single-CPU rigs and in that case, gain more performance and productivity as a result.
  • Cygni - Monday, February 4, 2008 - link

    As we both noted, 'real world' isnt just Excel. Its also AutoCAD and 3dsmax. These are arenas where we arent talking about shaving 4 seconds, we are talking shaving whole minutes and in extreme cases even hours on renders.

    This isnt an office computer, this isnt a casual gamers machine. This is a serious workstation or extreme enthusiast rig, and you are going to pay the price premium to get it. Like I said, this is a CAD and 3D artists dream machine... not for your secretary to make phonetrees on. ;)

    In this arena? I cant think of any machines that are even close to it in performance.
  • chizow - Monday, February 4, 2008 - link

    Again, in both AutoCAD and 3DSMax, you'd be better served putting that extra money into another GPU or even workstation for a fraction of the cost. 2-3x the cost for uncertain increases over a single-CPU solution or a second/third workstation for the same price. But for a real world example, ILM said it took @24 hours or something ridiculous to render each Transformer frame. Say it took 24 hours with a single Quad Core with 2 x Quadro FX. Say Skulltrail cut that down to 18 or even 20 hours. Sure, nice improvement, but you'd still be better off with 2 or even 3 single CPU workstations for the same price. If it offered more GPU support and non-buffered DIMM support along with dual CPU support it might be worth it but it doesn't and actually offers less scalability than cheaper enthusiast chipsets for NV parts.
  • martin4wn - Tuesday, February 5, 2008 - link

    You're missing the point. Some people need all the performance they can get on one machine. Sure batch rendering a movie you just do each frame on a separate core and buy roomfulls of blade servers to run them on. But think of an individual artist on their own workstation. They are trying to get a perfect rendering of a scene. They are constantly tweaking attributes and re-rendering. They want all the power they can get in their own box - it's more efficient than trying to distribute it across a network. Other examples include stuff like particles or fluid simulations. They are done best on a single shared memory system where you can load the particles or fluid elements into a block of memory and let all the cores in your system loose on evaluating separate chunks of it.

    I write this sort of code for a living, and we have many customers buying up 8 core machines for individual artists doing exactly this kind of thing.
  • Chaotic42 - Tuesday, February 5, 2008 - link

    Anyone can come up with arbitrary workflows that don't use all of the power of this system. There are, however, some workflows which would use this system.

    I'm a cartographer, and I deal with huge amounts of data being processed at the same time. I have mapping program cutting imagery on one monitor, Photoshop performing image manipulation on a second, Illustrator doing TIFF separates on a third, and in the background I have four Excel tabs and enough IE tabs to choke a horse.

    Multiple systems makes no sense because you need so much extra hardware to run them (In the case of this system, two motherboards, two cases, etc) and you'll also need space to put the workstations (assuming you aren't using a KVM). You would also need to clog the network with your multi-gigabyte files to transfer them from one system to another for different processing.

    That seems a bit more of a hassle than a system like the one featured in the article.

  • Cygni - Monday, February 4, 2008 - link

    I dont see any problem with what he said there.

    All you talked about was gaming, but lets be honest here, this is not a system thats going to appeal to gamers, and this isnt a system setup for anyone with price concerns.

    In reality, this is a CAD/CAM dream machine, which is a market where $4-5,000 rigs are the low end. In the long run for even small design or production firms, 5 grand is absolute peanuts and WELL worth spending twice a year to have happy engineers banging away. The inclusion of SLI/Crossfire is going to move these things like hotcakes in this sector. There is nothing that will be able to touch it. And thats not even mentioning its uses for rendering...

    I guess what im saying is try to realize the world is a little bit bigger than gaming.
  • Knowname - Sunday, February 10, 2008 - link

    On that note, is there any studies on the gains you get in CAD applications by upgrading your videocard?? How much does the gpu really play in the process?? The only significant gain I can think of for CAD is quad desktop monitors per card with Matrox vid cards. I don't see how the GPU (beyond the RAMDAC or whatever it's called) really makes a difference. Pls tell me this, I keep wasting my money on ATI cards (not mention my G550 wich I like, but it wasn't worth the money I spent on it when I could have gotten a 6600gt...) just on the hunch they'd be better than nvidea due to the 2d filtering and such (not really a big deal now, but...)
  • HilbertSpace - Monday, February 4, 2008 - link

    A lot of the 5000 intel chipsets let you use riser cards for more memory slots. Is that possible with skully?

Log in

Don't have an account? Sign up now