Original Link: http://www.anandtech.com/show/2435
Intel's 8-core Skulltrail Platform: Close to Perfecting the Nicheby Anand Lal Shimpi on February 4, 2008 5:00 AM EST
- Posted in
Dual socket motherboards have been around for ages, but dual socket enthusiast motherboards have a far shorter history. Back during the days where instruction level parallelism seemed to have no end in sight, having more than one CPU just didn't make sense for the masses. Most Windows applications weren't multithreaded and CPU prices just weren't what they are today.
Many of the same types of applications that benefit from multiple cores today were still around back then; 3D rendering, animation and image processing were all multithreaded CPU hogs. The problem is that if you wanted more than one CPU you generally had to make a choice between a tweakable, high performance enthusiast motherboard or a workstation board. Workstation motherboards were much more expensive, not nearly as flexible from a component standpoint and hardly ever performed as well as their desktop counterparts - the only real benefits were a more robust design and of course, the ability to support multiple CPUs.
Over the years we saw a few important dual-socket enthusiast motherboards arrive on the scene, the most popular of which was arguably ABIT's BP6. For all intents and purposes the BP6 was a desktop motherboard, it just had two CPU sockets. Intel's Celeron processors were cheap enough where you could pop in a couple, overclock them and have a pretty decent workstation based on an enthusiast desktop motherboard. Tradeoffs? There were none. It was a very popular board.
Times do change and eventually AMD/Intel stopped getting amazing returns from simply increasing instruction level parallelism and clock speed with their CPUs. The two turned to thread level parallelism to carry them through the next decade of microprocessor evolution; seemingly overnight, everyone had multiple cores in their systems.
The advent of the multi-core x86 CPU all but eliminated the need for a dual socket enthusiast platform. If you needed more cores simply toss a multi-core CPU in your desktop board and you were good to go. When Intel introduced the first quad-core desktop x86 processors things got even worse for dual socket motherboards. Most applications have a tough time using more than two cores, a single quad core CPU covered virtually all bases - and they were affordable too.
AMD didn't have a quad-core CPU until the recent launch of Phenom. In order to fill the gap between the dual core Athlon 64 X2 and the delayed arrival of Phenom, AMD dusted off plans to introduce a dual socket enthusiast platform and called it Quad FX.
The idea was simple: build an enthusiast platform that used normal dekstop components but had two sockets. With dual-core CPUs this meant that you'd have four cores in a system, and when quad-core arrived you'd have a healthy 8, all on an enthusiast class motherboard.
Quad FX was abandoned by AMD (although it does promise an upgrade path to quad-core CPUs), largely because while you had to buy an expensive motherboard and two dual cores to put the Quad in Quad FX, Intel was shipping faster, single socket, quad-core CPUs.
Intel did see some merit in AMD's Quad FX platform and actually released an ill-prepared competitor, something it called V8. Intel basically took a workstation Xeon motherboard and recommended enthusiasts purchase a pair of quad-core Xeon processors, giving you an 8-core alternative to Quad FX. The problem with the V8 platform was that it was expensive, there was no multi-GPU support and it required expensive FB-DIMMs thanks to its Xeon heritage.
The original V8 board was straight from the server world
Last April, Intel announced that it would be releasing a successor to V8, codenamed: Skulltrail. Designed to fix many of the problems with V8, Intel kept its promise to release the platform despite AMD's abandonment of the Quad FX project.
Today we have a preview of Skulltrail, which Intel expects to make available this quarter. Unlike Intel's Centrino or vPro, Skulltrail isn't officially a "platform" it's just a name for a motherboard and CPU combination, nothing more. The motherboard is the Intel D5400XS, based on Intel's 5000 series server/workstation chipset (yes, FB-DIMMs are still a requirement). The board supports any LGA-771 CPU, but Skulltrail is designed to be used with a new processor: the Core 2 Extreme QX9775.
Late last year Intel announced the Core 2 Extreme QX9770, a 3.2GHz, 1.6GHz FSB quad-core desktop processor. We'll finally see availability of that beast this quarter, but you'll notice that the Skulltrail platform uses a slightly different CPU: the Core 2 Extreme QX9775. The 5 indicates that unlike the QX9770, this chip works in a LGA-771 socket just like Intel's Xeon processor. The different pinout is necessary because only Xeon chipsets support multiple CPU sockets and for a handful of reasons, that means we're limited to LGA-771.
LGA-771 QX9775 (left), LGA-775 QX9770 (right) - Can you spot the four missing pins?
The specs on the QX9775 are otherwise identical to its LGA-775 counterpart. The 45nm Yorkfield/Penryn core runs all four of its cores at 3.2GHz and is fed by a 1.6GHz FSB. Each pair of cores on the CPU package has a 6MB L2 cache, for 12MB total on each individual CPU.
Despite using a Xeon socket, the Core 2 Extreme QX9775 isn't a Xeon. It turns out there are some very subtle differences between Core 2 and Xeon processors, even if they're based on the same core. Intel tunes the prefetchers on Xeon and Core 2 CPUs differently so unlike the Xeon 5365 used in its V8 platform, the QX9775 is identical in every way to desktop Core 2 processors - the only difference being pinout.
Intel wouldn't give us any more information on how the prefetchers are different, but we suspect the algorithms are tuned according to the typical applications Xeons find themselves running vs. where most Core 2s end up.
Like all other Extreme edition processors, the QX9775 ships completely unlocked, making overclocking unbelievably easy as you'll soon see.
Cooling the beast, if you think those fins will cut you...they will.
The price point is going to be the worst part of the QX9775. Intel isn't publicly announcing the price per processor, but over $1000 is a starting point. It looks like QX9770s preorders are going for over $1500 so we'd expect the QX9775s to be priced similarly, possibly eventually settling at $1200.
Fully Buffered DIMM: An Unnecessary Requirement
The Intel D5400XS is quite possibly the most impressive part of the entire Skulltrail platform. Naturally it features two LGA-771 sockets, connected to Intel's 5000 chipset via two 64-bit FSB interfaces. The chipset supports the 1600MHz FSB required by the QX9775 but it will work with all other LGA-771 Xeon processors, in case you happen to have some laying around your desk too.
Thanks to the Intel 5400 chipset, the D5400XS can only use Fully Buffered DIMMs. If you're not familiar with FBD, here's a quick refresher taken from our Mac Pro review:
Years ago, Intel saw two problems happening with most mainstream memory technologies: 1) As we pushed for higher speed memory, the number of memory slots per channel went down, and 2) the rest of the world was going serial (USB, SATA and more recently, Hyper Transport, PCI Express, etc...) yet we were still using fairly antiquated parallel memory buses.
The number of memory slots per channel isn't really an issue on the desktop; currently, with unbuffered DDR2-800 we're limited to two slots per 64-bit channel, giving us a total of four slots on a motherboard with a dual channel memory controller. With four slots, just about any desktop user's needs can be met with the right DRAM density. It's in the high end workstation and server space that this limitation becomes an issue, as memory capacity can be far more important, often requiring 8, 16, 32 or more memory sockets on a single motherboard. At the same time, memory bandwidth is also important as these workstations and servers will most likely be built around multi-socket multi-core architectures with high memory bandwidth demands, so simply limiting memory frequency in order to support more memory isn't an ideal solution. You could always add more channels, however parallel interfaces by nature require more signaling pins than faster serial buses, and thus adding four or eight channels of DDR2 to get around the DIMMs per channel limitation isn't exactly easy.
Intel's first solution was to totally revamp PC memory technology, instead of going down the path of DDR and eventually DDR2, Intel wanted to move the market to a serial memory technology: RDRAM. RDRAM offered significantly narrower buses (16-bits per channel vs. 64-bits per channel for DDR), much higher bandwidth per pin (at the time a 64-bit wide RDRAM memory controller would offer 6.4GB/s of memory bandwidth, compared to a 64-bit DDR266 interface which at the time could only offer 2.1GB/s of bandwidth) and of course the ease of layout benefits that come with a narrow serial bus.
Unfortunately, RDRAM offered no tangible performance increase, as the demands of processors at the time were no where near what the high bandwidth RDRAM solutions could deliver. To make matters worse, RDRAM implementations were plagued by higher latency than their SDRAM and DDR SDRAM counterparts; with no use for the added bandwidth and higher latency, RDRAM systems were no faster, if not slower than their SDR/DDR counterparts. The final nail in the RDRAM coffin on the PC was the issue of pricing; your choices at the time were this: either spend $1000 on a 128MB stick of RDRAM, or spend $100 on a stick of equally performing PC133 SDRAM. The market spoke and RDRAM went the way of the dodo.
Intel quietly shied away from attempting to change the natural evolution of memory technologies on the desktop for a while. Intel eventually transitioned away from RDRAM, even after its price dropped significantly, embracing DDR and more recently DDR2 as the memory standards supported by its chipsets. Over the past couple of years however, Intel got back into the game of shaping the memory market of the future with this idea of Fully Buffered DIMMs.
The approach is quite simple in theory: what caused RDRAM to fail was the high cost of using a non-mass produced memory device, so why not develop a serial memory interface that uses mass produced commodity DRAMs such as DDR and DDR2? In a nutshell that's what FB-DIMMs are, regular DDR2 chips on a module with a special chip that communicates over a serial bus with the memory controller.
The memory controller in the system stops having a wide parallel interface to the memory modules, instead it has a narrow 69 pin interface to a device known as an Advanced Memory Buffer (AMB) on the first FB-DIMM in each channel. The memory controller sends all memory requests to the AMB on the first FB-DIMM on each channel and the AMBs take care of the rest. By fully buffering all requests (data, command and address), the memory controller no longer has a load that significantly increases with each additional DIMM, so the number of memory modules supported per channel goes up significantly. The FB-DIMM spec says that each channel can support up to 8 FB-DIMMs, although current Intel chipsets can only address 4 FB-DIMMs per channel. With a significantly lower pin-count, you can cram more channels onto your chipset, which is why the Intel 5000 series of chipsets feature four FBD channels.
The AMB has two major roles, to communicate with the chipset's memory controller (or other AMBs) and to communicate with the memory devices on the same module.
When a memory request is made the first AMB in the chain then figures out if the request is to read/write to its module, or to another module, if it's the former then the AMB parallelizes the request and sends it off to the DDR2 chips on the module, if the request isn't for this specific module, then it passes the request on to the next AMB and the process repeats.
As we've seen, the AMB translation process introduces a great deal of latency to all memory accesses (it also adds about 3-6W of power per module), negatively impacting performance. The tradeoff is generally worth it in workstation and server platforms because the ability to use even more memory modules outweighs the latency penalty. The problem with the D5400XS motherboard is that it only features one memory slot per FBD channel, all but ruining the point of even having FBD support in the first place.
Four slots, great. We could've done that with DDR3 guys.
You do get the benefit of added bandwidth since Intel is able to cram four FBD channels into the 5400 chipset, the problem is that the two CPUs on the motherboard can't use all of the bandwidth. Serial busses inherently have more overhead than their parallel counterparts, but the 38.4GB/s of memory bandwidth offered by the chipset is impressive sounding for a desktop motherboard. You only get that full bandwidth if all four memory slots are populated, but you do increase latency as well.
Some quick math will show you that peak bandwidth between the CPUs and the chipset is far less than the 38.4GB/s offered between the chipset and memory. Even with a 1600MHz FSB we're only talking about 25.6GB/s of bandwidth. We've already seen that the 1333MHz FSB doesn't really do much for a single processor, so a good chunk of that bandwidth will go unused by the four cores connected to each branch.
The X38/X48 dual channel DDR3-1333 memory controller would've offered more than enough bandwidth for the two CPUs, without all of the performance and power penalties associated with FBD. Unfortunately a side effect of choosing to stick with a Xeon chipset is that FBD isn't optional - you're stuck with it. As you'll soon see, this is a side effect that does really hurt Skulltrail.
Intel's D5400XS: The Best Multi-GPU Motherboard?
Despite our rant about Fully Buffered DIMM, the D5400XS is actually a very impressive motherboard. It all starts with the fact that the D5400XS supports both SLI and Crossfire.
Don't be fooled, there's no technical reason why SLI can't work on all current Intel chipsets. NVIDIA's public argument for why it isn't enabled is because NVIDIA goes through a lot of internal testing to make sure that SLI works as best as possible with its own platforms, putting Intel platforms through the same tests isn't very high on NVIDIA's list of priorities.
The reality is that SLI is a very important brand to NVIDIA and simply giving away support just isn't going to happen. Intel and NVIDIA have never been able to come to terms on a licensing agreement to gain SLI support on Intel chipsets. It's not hard to understand why; if NVIDIA enabled SLI support on Intel chipsets, there would be absolutely no reason to buy NVIDIA based motherboards. With AMD already working against NVIDIA to make sure its own chipsets are the most desirable for its CPUs, it's not too hard to see why NVIDIA wants to hold onto the only reason to buy a nForce Intel chipset.
Because of the small production numbers however, Skulltrail makes the perfect platform for SLI support. There's still no licensing agreement in place, but the D5400XS motherboard uses two NVIDIA PCIe 1.1 bridges that each take 16 PCIe lanes coming off of the MCH and make two x16 slots out of them.
The two NVIDIA MCPs
There are four usable PCIe x16 slots on the motherboard, which should be able to support 2, 3 or 4 way CrossFire X down the road.
With NVIDIA silicon on board, the NVIDIA graphics drivers don't have to do anything funny to enable SLI support - it just works. It is worth noting that only 2-way SLI will work on the D5400XS, 3-way and 4-way configurations are not and never will be supported according to NVIDIA. After all, higher end SLI customers would be the target market for a Skulltrail system and you definitely don't want to make them too happy with an Intel chipset.
We ran a couple of quick tests to make sure that SLI scaling was on par with NVIDIA's own chipsets. The results were as expected, the D5400XS scales from one to two NVIDIA GPUs just as well as the nForce 780i:
|CPU||NVIDIA nForce 780i||Intel Skulltrail D5400XS|
|1 x 8800 GT||34.3||35.4|
|2 x 8800 GT||65.0||67.0|
Now you can see why NVIDIA doesn't want to enable SLI on more Intel chipsets. It's odd that it has taken a high end, two socket enthusiast motherboard to become the ideal multi-GPU desktop platform but we'll take what we can get. We finally have a motherboard that doesn't tie your hands when it comes to graphics upgrade path. Kudos to Intel for making it happen, but it's a shame that we'll probably never see it on another motherboard.
Skulltrail: Build your own Mac Pro?
There are some issues with the NVIDIA logic on board that are worth mentioning. The most annoying is that the chips run very hot and are cooled by an incredibly loud fan. You shouldn't expect Skulltrail to be the ideal silent HTPC platform, but the loudest fan in your system will be the damn thing that cools the two NVIDIA MCPs.
You can always try to retrofit the board with a quieter fan, but for something we're expecting to cost at least $500 we wanted better from Intel. The other problem caused by the NVIDIA MCPs is a little more unique.
If you look at the specs of Skulltrail they look very similar to that of Apple's second generation Mac Pro. While the old V8 platform wasn't exactly well received by the enthusiast community, many working on custom PCs running OS X used V8 to build their own Hackintosh Mac Pro. We were hoping that Skulltrail would make for an equally good starting point for a Hackintosh.
While the latest OSx86 releases will install on the D5400XS with some effort, we couldn't get beyond a kernel panic upon booting into Leopard. We suspect that the NVIDIA MCPs are at fault as they are in-line with the PCIe x16 slots and can't be disabled.
As the OSx86 community already has Leopard working on AMD platforms, we tend to believe that some clever work may be able to make Skulltrail the ideal Hackintosh Pro platform but out of the box it doesn't work.
More on the Motherboard
Our last bit of praise for the D5400XS are mostly minor things that all add up. The board features no legacy ports, you've got a few USB and eSATA ports on the back panel but no PS/2 in sight.
Just like some high end enthusiast boards Intel outfitted the D5400XS with power and reset switches on board, which make our lives much, much easier.
The layout itself is cramped but well done, largely thanks to the fact that the motherboard is an extended ATX form factor design. Yes, you'll most likely need a new case for it.
Power requirements will vary depending on your configuration. Below we have Intel's recommendations:
Our OCZ 1kW unit worked just fine and it's worth noting that peak power consumption with a single GPU never exceeded 343W. Adding a second GPU still kept things under 500W, and while we'd stress that sticking with Intel's recommendations is a good thing, the power requirements of Skulltrail aren't nearly as ridiculous as you'd expect. Our 3-way SLI testbed needed much more. When 4-way CrossFire X support is enabled however, you may need to look at upgrading to a 20A circuit in your office. Keep in mind that despite the two 8-pin 12V power connectors on board, only one is necessary for basic operation. Intel recommends a power supply with two 8-pin 12V power connectors if you plan on doing any serious overclocking.
We were quite happy with the D5400XS' BIOS, it gave us access to all of the basic overclocking features we needed but still falls short of being the sort of amazingly tweakable motherboard that the high end enthusiast market demands. Given the target market of the D5400XS, its BIOS should be more than enough though. Other than a horrendously laggy interface typical of most Intel BIOSes, it's simple to overclock which is very helpful given how well the QX9775s do overclock.
Intel isn't releasing pricing information on the D5400XS, but we're told to expect pricing similar to other workstation motherboards. We'd hazard a guess of around $500 but we should know for sure in the next month or so.
No Upgrade Path to Nehalem
Skulltrail isn't Intel's first attempt at a two socket enthusiast class system; the V8 platform was created in response to AMD's Quad FX about a year ago. Unfortunately, V8 owners are out of luck as their motherboards aren't upgradable to these new Penryn based processors. Apparently you can mod the V8 boards to enable Penryn support but the process isn't easy, requiring 25 resistors, 2 transistors and 1 IC - not to mention some BIOS modifications as well.
Skulltrail owners will find themselves similarly disappointed when Nehalem launches at the end of this year. Nehalem will require a brand new socket (LGA-1366) and thus won't work in the Intel D5400XS motherboard.
At launch Nehalem is only going to be available in a quad-core version, so there will still be a reason to have Skulltrail by the end of 2008 but sometime in 2009 we expect to see an 8-core Nehalem at which point Skulltrail will be fully obsolete.
We are lucky to have a roadmap this far in advance to prepare you; so if you're not put off by the lack of an upgrade path and need 8 cores today, then let's continue with our look at Skulltrail.
Timeline: Cores, When Can you get Single Socket Skully?
Just as quad-core was launched on a 65nm manufacturing process but transitioned to 45nm, the first eight-core designs will make their debut on 45nm but we won't really see a major transition to that many cores until 32nm. Skulltrail as a platform gives you 8 cores today, something that we don't expect to be mainstream until 2010 to be honest.
The point is that software developers looking to work on a system that will represent what will be available (from a TLP standpoint) in 3 years, Skulltrail is the only enthusiast-class option.
We've been alluding to incredibly easy overclocking on Skulltrail and thanks to a simple BIOS interface and unlocked QX9775s, all you need to do is adjust the clock multiplier.
At CES Intel showed us Skulltrail running at 4.0GHz with no changes other than bumping the clock multiplier from 8.0x to 10.0x. While we could get our system to POST at 4.0GHz without increasing the voltage, we found that we needed to increase the core voltage by around 12% to get a fully stable system at 4.0GHz.
Despite the ease at which we reached 4.0GHz, we couldn't get the system stable on air at higher frequencies. Posting and running some tests at 4.2GHz wasn't a problem, but it wasn't completely stable. With more exotic cooling we suspect that 4.2 - 4.6GHz should be possible; we're also unsure how big of a factor the power supply plays in overclocking Skulltrail.
An easy bump to 4.0GHz makes us really pleased with Intel's 45nm process. Four cores at 4.0GHz? It's a check written by NetBurst but cashed by Core 2.
Comparing to the New Mainstream & The Test
Much like Intel's quad-core CPUs upon their release, Skulltrail doesn't really have many competitors in the market. The biggest questions we need to answer are how it compares to other Intel CPUs, and we have a handful of those in today's review.
The Core 2 Extreme QX9770 is the closest thing regular desktop users can get to Skulltrail as it is the same processor as the QX9775; naturally we had to include it. We also included the QX9650 since it's a more "reasonably" priced alternative. Keep in mind that a Skulltrail CPU/motherboard setup will cost you around 3x what a similar QX9650/motherboard will, so hopefully including the QX9650 will help keep things in perspective.
The final chip we're comparing to is the upcoming Core 2 Quad Q9450. Around the time that Skulltrail is released, we should see the Q9450 crop up as well as a new mainstream quad-core CPU from Intel. Running at 2.66GHz and with a total of 12MB of L2 cache on chip the Q9450 gives us a good reference point for what a mainstream quad-core system can do compared to Skully.
|CPU:|| Intel Core 2 Extreme QX9775 (3.2GHz/1600MHz)
Intel Core 2 Extreme QX9770 (3.20GHz/1600MHz)
Intel Core 2 Extreme QX9650 (3.00GHz/1333MHz)
Intel Core 2 Quad QX9450 (2.66GHz/1333MHz)
|Motherboard:|| ASUS P5K Deluxe (Intel P35)
Intel D5400XS (Intel 5400)
|Chipset Drivers:||Intel 188.8.131.520 (Intel)|
|Hard Disk:||Seagate 7200.9 300GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 4)
Micron FB-DIMM DDR2-8800
|Video Card:||NVIDIA GeForce 8800 GTS 512 x 2|
|Video Drivers:||NVIDIA ForceWare 169.25|
|Desktop Resolution:||1920 x 1200|
|OS:||Windows Vista Ultimate 64-bit|
Brace Yourself, High Latency Roads Ahead
We tested Skulltrail with only two FB-DIMMs installed, but even in this configuration memory latency was hardly optimal:
|CPU||CPU-Z Latency in ns (8192KB, 256-byte stride)|
|Intel Core 2 Extreme QX9775 (FBD-DDR2/800)||79.1 ns|
|Intel Core 2 Extreme QX9770 (DDR2/800)||55.9 ns|
Memory accesses on Skulltrail take almost 42% longer to complete than on our quad-core X38 system. In applications that can't take advantage of 8-cores, this is going to negatively impact performance. While you shouldn't expect a huge real world deficit there are definitely going to be situations where this 8-core behemoth is slower than its quad-core desktop counterpart.
Scaling to 8 Cores: Most Benchmarks are Unaffected
Trying to benchmark an 8 core machine, even today, is much like testing some of the first dual-core CPUs: most applications and benchmarks are simply unaffected. We've called Skulltrail a niche platform but what truly makes it one is the fact that most applications, even those that are multithreaded, can't take advantage of 8 cores.
While games today benefit from two cores and to a much lesser degree benefit from four, you can count the number that can even begin to use 8 cores on one hand...if you lived in Springfield and had yellow skin.
The Lost Planet demo is the only game benchmark we found that actually showed a consistent increase in performance when going from 4 to 8 cores. The cave benchmark results speak for themselves:
|CPU||Lost Planet Cave Benchmark (FPS)|
|Dual Intel Core 2 Extreme QX9775||113|
|Intel Core 2 Extreme QX9775||82|
|Dual Intel Core 2 Extreme QX9775 @ 4.0GHz||124|
At 1600 x 1200 we're looking at a 30% increase in performance when going from 4 to 8 cores, unfortunately Lost Planet isn't representative of most other games available today. Other titles like Flight Simulator X can actually take advantage of 8 cores, but not all the time and not consistently enough to offer a real world performance advantage over a quad-core system.
The problem is that because most games can't use the extra cores the added latency of Skulltrail's FB-DIMMs actually makes the platform slower than a regular quad-core desktop. To show just how bad it can get, take a look at our Supreme Commander benchmark.
At the suggestion of Gas Powered Games, we don't rely on Supreme Commander's built in performance test. Instead we play back a recording of our own gameplay with game speed set to maximum and record the total simulation time, making a great CPU benchmark. We ran the game at maximum image quality settings but left resolution at 1024 x 768 to focus on CPU performance, the results were a bit startling:
Thanks to the high latency FBD memory subsystem, it takes a 4.0GHz Skulltrail system to offer performance better than a single QX9770 on a standard desktop motherboard. We can't stress enough how much more attractive Skulltrail would have been were it able to use standard DDR2 or DDR3 memory.
Gamers shouldn't be too worried however, Skulltrail's memory latency issues luckily don't impact GPU-limited scenarios. Take a look at our Oblivion results from earlier for affirmation:
In more CPU bound scenarios like Supreme Commander, you will see a performance penalty, but in GPU bound scenarios like Oblivion (or Crysis, for example), Skulltrail will perform like a regular quad-core system.
The Bottom line? Skulltrail is a system made for game developers, not gamers.
Other benchmarks, even our system level suite tests like SYSMark 2007, hardly show any performance improvement when going from 4 to 8 cores. We're talking about a less than 5% performance improvement, most of which is erased when you compare to a quad-core desktop platform with standard DDR2 or DDR3 memory.
That being said, there are definitely situations where Skulltrail performance simply can't be matched.
A Hammer for 3D Rendering Applications
If anything could showcase Skulltrail's performance potential it is our 3D rendering benchmarks.
3dsmax has always been very well threaded and thus scaling from 4 to 8 cores is guaranteed. Our benchmark, as always, is the SPECapc 3dsmax 8 test but for the purpose of this article we only run the CPU rendering tests and not the GPU tests.
The results are reported as render times in seconds and the final CPU composite score is a weighted geometric mean of all of the test scores.
Scaling from 4 to 8 cores is pretty impressive at just under 40%. You'll also see that the desktop QX9770 is faster than a single QX9775 thanks to the lower latency unbuffered DDR2 memory.
Lightwave 3D 09
Lightwave is another 3D rendering application that we've used from time to time as a benchmark. We used two scenes that come with the application to measure performance: Dirty Building and Old Record Player. We simply rendered the scenes using all available cores, Image Viewer was disabled during the render process.
We continue to see good scaling in Lightwave, approaching 40% from 4 to 8 cores. A single QX9775 continues to be slightly slower than a QX9770 due to the use of FB-DIMMs.
A benchmarking favorite, Cinebench R10 is designed to give us an indication of performance in the Cinema 4D rendering application.
Cinebench is the poster child for 8 core performance: we're looking at a greater than 60% increase in performance when going from 4 to 8 cores. This sort of performance can't be achieved with raw clock speed, you need more cores.
POV-Ray 3.7 Beta 24
POV-Ray is a popular raytracer, also available with a built in benchmark. We used the 3.7 beta which has SMP support and ran the built in multithreaded benchmark.
POV-Ray shows almost perfect scaling with the move to 8 cores, there's over a 90% increase in performance from a single to quad QX9775 setup.
Media Encoding: Not as Happy to See Skulltrail as You'd Think
When we first got the Skulltrail machine we had visions of ripping HD-DVD and Blu-ray discs in record times, on the fly transcoding and just chewing through our DivX tests. While one of our media encoding benchmarks showed reasonable gains, for the most part we didn't see scaling beyond four cores with our encoders.
DivX 6.8 with Xmpeg
Our DivX test is the same one we've run in our regular CPU reviews, we're simply encoding a 1080p MPEG-2 file in DivX. We are using an unconstrained profile and enhanced multithreading is enabled.
While the dual QX9775 setup is technically faster than a single QX9770, it's not faster by all that much. Clock speed matters far more here.
Windows Media Encoder 9 x64
Using Windows Media Encoder's advanced video profile we encode a 500MB AVI file, this is the same test we've run in other CPU reviews.
Once again we're not seeing great performance gains from 8 cores here, there's basically no performance advantage to Skulltrail.
x264 Encoding with AutoMKV
Using AutoMKV we compress the same source file we used in our WME test down to 100MB, but with the x264 codec. If we use anything less than the 2 Pass Insane Quality profile we won't see any scaling on 8 cores, but if we enable the highest quality settings we end up with around 80 - 90% CPU utilization across all 8 cores.
Higher quality x264 encodes will benefit from 8 cores, but anything less intensive will show gains similar to what we saw with DivX and WME.
Photoshop and Valve Multithreaded Game Dev Benchmarks
Adobe Photoshop CS3
To measure performance in Adobe's Photoshop CS3 we turned to the Retouch Artists CS3 benchmark. The test cycles through a handful of commonly used filters and is timed manually. We ran the benchmark at its default settings.
While we see a performance gain with 8 cores, it's mostly erased by the high latency of the platform's FBD memory subsystem. Two QX9775s on Skulltrail are faster than one, but slower than a single QX9770 on a desktop motherboard.
Valve Map Compilation
Valve supplied us with their VRAD map compilation tool to measure the performance of compiling Source engine maps. We ran two builds simultaneously to fully stress all 8 cores:
As you'd expect, highly parallelizable tasks work very well on Skulltrail. The performance we're seeing here is only attainable by adding more cores, clock speed alone can't do it.
Valve Particle Benchmark
Particle systems are an important aspect of CPU performance in 3D games, although this benchmark does overstate its importance a bit. However, it does give one aspect of how more cores can be used in future games.
The performance gains are far more marginal here, we're looking at about 14% from 4 to 8 cores.
Highly Scalable Tests: 8 Cores FTW
Intel passed along a few of its own real world benchmarks that show off the pinnacle of 8-core scalability. The tests are real world, but they were constructed by Intel and are obviously very well threaded. They mostly show what is possible and aren't representative of the vast majority of usage models.
The latest version of Excel is well threaded and can easily take advantage of all 8 cores if you're performing complex calculations in it.
Intel describes the first Excel test as follows:
"A finance worker wants to make some calculations in Excel on a large data set. Calculations including arithmetic operations like addition, subtraction, division, rounding and square root. Also includes common statistical calculations."
The performance scaling is similar to what we saw in some of our 3D rendering tests; if the application is well threaded, there are performance gains to be had.
A simulation commonly used by financial analysts, the Monte Carlo simulation test is run on a 70MB Excel file and on a mainstream quad-core processor takes almost 30 seconds to complete. Intel describes the benchmark:
"A financial analyst wants to compute the expected prices of stock options using the Black Scholes formula. He needs to do a Monte Carlo simulation to help estimate the correct value."
A Monte Carlo simulation takes into account many unknown variables and attempts to simulate an outcome while taking these variables into account. It is a very compute heavy simulation that is commonly used by traders.
Just over 11 seconds is all Skulltrail needs.
ProShow Gold lets you create slideshow video from digital stills. The benchmark uses high resolution images and creates 16:9 standard resolution, high quality DVD video from them:
Power consumption of Skulltrail isn't as bad as you'd expect but once you start adding more graphics cards you can expect the consumption to increase tremendously.
Intel's V8 platform was a niche product but it wasn't very well executed. Skulltrail fixes many of the problems posed by V8 and does so well enough that we don't have a problem recommending it, assuming you are running applications that can take advantage of it. Even heavy multitasking won't stress all 8 cores, you really need the right applications to tame this beast.
If there's one thing we noticed in our tests is that Skulltrail is made for 3D rendering. We saw consistent gains and good scaling in all of our 3D rendering tests. These corner cases are the perfect use for Skulltrail, but much like Apple's MacBook Air you have to keep in mind that the usefulness of this platform is very limited. In a year's time, Skulltrail will be well on the way to obsolescence. If you need the power, have the budget and you can deal with the fact that Nehalem will offer 8 cores in a single socket that you can't upgrade to, then Skulltrail is worth a look. If not, just settle on a QX9770 and be done with it.
The problem with Skulltrail is the something that also plagued Intel's original V8 platform: the use of Fully Buffered DIMMs. It's even more insulting on Skulltrail because there are only four DIMM slots on the motherboard; the entire point of FBD is to allow for higher memory clock speeds without reducing the number of memory slots on a motherboard. The problem is that Skulltrail doesn't actually offer any more memory slots than a desktop motherboard, nor does it run its memory any faster. Intel is using a technology that has absolutely no business being on this motherboard. You just end up with slower, more expensive and more power hungry memory.
There are two sides to every story and we can't just hate on Intel for this. From Intel's perspective the market for a Skulltrail system isn't that big, so it makes sense to reuse existing chipsets/technology rather than create an entirely new platform. The majority of the market will switch to 8 cores when you can get that many with a single CPU, Skulltrail just isn't an option. All of Intel's dual CPU chipsets are all made for the Xeon, which all use Fully Buffered DIMM.
Using a desktop chipset isn't an option either. If Intel were to allow desktop chipsets to work in multi-socket configurations it would hurt Xeon sales, a couple of Q6600s in a desktop motherboard would make for a killer server and ruin the point of the profit-heavy Xeon platform. So because Skulltrail is such a niche product, it has to use components that aren't exactly well suited for it. We understand why Intel made the choices it did, but that doesn't mean we can't complain about it.
Skulltrail is closer to perfection than Intel's V8 platform ever was, but it's still not quite there. Let us use desktop memory and Intel will have a winner on its hands.