Original Link: http://www.anandtech.com/show/2125
AMD's Quad FX: Technically Quad Coreby Anand Lal Shimpi on November 30, 2006 1:16 PM EST
- Posted in
Imagine for a moment you're at the decision making table at AMD; you are at least a year away from introducing an updated micro-architecture to refresh your now aging K8 design and your chief competitor just introduced faster and cooler CPUs than anything in your lineup. To make matters worse, this very same competitor enjoys a manufacturing advantage and has also announced that it will begin the transition to quad-core even earlier than originally expected, starting at the end of 2006. The earliest you can even hope to release a quad-core CPU is the middle of 2007. What do you do?
AMD's first move made sense, and that was to dramatically reduce the pricing of its entire lineup to remain competitive. Most computer components are not things you can buy and sell off of emotions alone, and thus something that performs worse must cost less. Through the price drops AMD actually ended up with a fairly attractive dual core lineup, although our similarly aggressive pricing from Intel meant that the most attractive AMD CPUs were the cheapest ones.
But what was AMD to do about the quad-core race? Even though Intel would release its first quad-core CPUs this year, less than 1% of all shipments would feature four cores. It won't be until the end of 2007 before more than 5% of Intel's shipments are quad-core CPUs. But would the loss in mindshare be great enough if Intel already jumped ahead in the race to more cores?
Manufacturing a quad-core Athlon 64 or Opteron on AMD's current 90nm process simply isn't feasible; AMD would end up with a chip that is too big and too hot to sell, not to mention that it would put an even greater strain on AMD's manufacturing which is already running at capacity.
With the 90nm solution being not a very good one, there's always the "wait until 2007" option, which honestly seemed like a very good one to us. We just mentioned that Intel wasn't going to be shipping many of these quad-core CPUs and the majority of users, even enthusiasts who are traditionally early adopters, will stay away from quad-core until 2007 at the earliest to begin with.
Then there's the third option, the one AMD ended up taking; instead of building quad-core on 90nm or waiting until next year, around April/May of 2006 AMD decided that it had a better solution. AMD would compete in the quad-core race by the end of 2006 but with two dual core CPUs running in a desktop motherboard.
Of course dual-core, dual-socket is nothing new, as AMD has been offering that on Opteron platforms for quite a while now. But the difference is that this new platform would be designed for the enthusiast, meaning it would come equipped with a performance tuned (and tweakable) BIOS, tons of USB ports, support for SLI, etc... Most importantly, unlike an Opteron system, this dual socket desktop platform would run using regular unbuffered DDR2 memory.
Back then the platform was called 4x4, and honestly it was about as appealing as a pickup truck. The platform has since matured and thanks to a very impressive chipset from NVIDIA and aggressive pricing from AMD, what's now known as Quad FX may actually have some potential. Today we're here to find out if AMD's first four-core desktop platform is a viable competitor to Intel's Kentsfield, or simply an embarrassing knee-jerk reaction.
With the recent ATI acquisition under its belt, AMD has started down the path of becoming a platform company. As such it's almost fitting that the most interesting part of today's story isn't the CPUs, but rather the chipset and motherboard that complete Quad FX.
When AMD first talked about Quad FX as 4x4, everyone assumed that we would be looking at a pair of Socket-AM2 CPUs on a desktop motherboard. However as we got closer to launch it quickly became evident that Quad FX would be using Opteron's new Socket-1207 instead. The reason for using Socket-1207 instead of AM2 is simple; in a single socket AM2 system you only need a single Hyper Transport link between the CPU and the chipset, which is provided for in AM2 CPUs. However, with two sockets you need a minimum of two links, one connecting the two sockets and one for the chipset. It's the number of HT links required that forced the 1207-pin socket upon Quad FX. We will get to how this impacts your upgrade path and CPU costs later, but for now just know that Quad FX only works with 1207-pin Athlon 64 FX CPUs (Opterons are prevented from working in the BIOS).
Despite the ATI acquisition, the only Quad FX chipset available at launch is from NVIDIA. Dubbed the nForce 680a, NVIDIA once again gives us a reason to respect its platform group. Although singular in name, the nForce 680a is composed of two 680a SLI chips, each with a x16 and x8 PCIe slot. One of the two chips has an additional 8 PCIe lanes bringing the total up to 56 lanes, more than any other NVIDIA chipset, including the recently released 680i.
The madness doesn't end with the sheer number of PCIe lanes; the 680a also supports a total of four GigE ports, twelve SATA ports and twenty USB 2.0 ports. All of the usual nForce features are present on the 680a, including SLI support, RAID and NVIDIA's networking (FirstPacket and Teaming support). The chipset is a pure beast and we were eager to see an implementation of all of the PCIe, GigE, SATA and USB ports that the 680a supports on the first Quad FX motherboards; unfortunately we were met with disappointment.
AMD's Quad FX platform is launching with a single motherboard partner, ASUS, and a single Quad FX motherboard: the L1N64-SLI WS. The ASUS board features four physical x16 slots (two x16 and two x8), a single x1 PCIe slot and one regular PCI slot.
You also get all twelve SATA ports, but there's only support for ten USB ports and two GigE ports. Obviously the number of people that will complain about not having all twenty USB ports and four GigE ports are limited, but with AMD expecting the L1N64-SLI WS to retail for around $370, we wanted all of the bells and whistles.
We asked AMD when we could expect other Quad FX motherboard designs but at this point it looks like the ASUS solution is it. AMD is working with more motherboard partners but there's no indication if or when additional Quad FX designs will surface.
AMD is guaranteeing a bit of an upgrade path to early adopters of Quad FX by promising that these motherboards will work with AMD's native quad-core CPUs when they are available next year, meaning you'll get support for eight cores in the same platform in less than a year.
To complete the brand new Quad FX platform AMD is introducing three new processors today: the Athlon 64 FX-74, FX-72 and FX-70, running at 3.0GHz, 2.8GHz and 2.6GHz respectively. Each physical processor features two cores and a 1MB L2 cache per core, much like previous dual core FX processors, but what sets these CPUs apart from previous FX chips is that they are sold in bundles of two. So when you buy an Athlon 64 FX-74, you are actually buying two dual-core CPUs in a single box. It's not the most elegant way of getting four cores, but it gets the job done and AMD manages to do so at a competitive price. Note that these CPUs are effectively Opterons but with the memory controller configured to support un-buffered DDR2.
Core 2 Duo (left) vs. Athlon 64 FX-74 (right), AMD's first LGA desktop CPU
AMD's pricing structure, including the new Quad FX processors, is as follows, with Intel's upper echelon CPUs thrown in for comparison:
|CPU||Clock Speed||L2 Cache||Price|
|AMD Athlon 64 FX-74*||3.0GHz||1MB per core||$999|
|AMD Athlon 64 FX-72*||2.8GHz||1MB per core||$799|
|AMD Athlon 64 FX-70*||2.6GHz||1MB per core||$599|
|AMD Athlon 64 FX-62||2.8GHz||1MB per core||$713|
|AMD Athlon 64 X2 5200+||2.6GHz||1MB per core||$403|
|Intel Core 2 Extreme QX6700||2.66GHz||4MB per 2 cores||$999|
|Intel Core 2 Quad Q6600**||2.40GHz||4MB per 2 cores||$851|
|Intel Core 2 Extreme X6800||2.93GHz||4MB||$999|
|Intel Core 2 Duo E6700||2.66GHz||4MB||$530|
* Note: These processors come in pairs of two, pricing is for both CPUs
** Note: The Core 2 Quad Q6600 is an unreleased CPU and will be introduced in January 2007.
So for $999 you can either get two dual core 3.0GHz AMD processors, or a single quad core 2.66GHz Core 2 Extreme QX6700. Later we'll figure out which is indeed faster but it seems that AMD's pricing is at least competitive.
When we first heard that Quad FX wasn't going to be Socket-AM2, we couldn't help but feel that AMD was introducing yet another Socket-940 into the mix. Is there really a future for Quad FX or is it nothing more than a stop-gap solution until native quad-core CPUs arrive?
AMD has already committed to supporting two quad-core CPUs in current Quad FX platforms, so there's at least an upgrade path well into 2007, but what happens afterwards?
AMD's most recent roadmaps show continued support for Quad FX throughout 2007; in fact, the highest clock speed AMD CPUs will always be Socket-1207 parts (3.0GHz today and then 3.2GHz by Q2 '07). It looks like AMD is transitioning the Athlon 64 FX line to be exclusively for the Quad FX platform, leaving all other chips for AM2.
|CPU:|| Intel Core 2 Extreme X6800 (2.93GHz/4MB)
Intel Core 2 Extreme QX6700 (2.66GHz/4MBx2)
AMD Athlon 64 FX-74 (3.0GHz)
|Motherboard:|| eVGA nForce 680i SLI
ASUS L1N64-SLI WS
|Chipset:||NVIDIA nForce 680i SLI
NVIDIA nForce 680a SLI
|Chipset Drivers:||NVIDIA 9.35|
|Hard Disk:||Seagate 7200.9 300GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 2/512MB x 4)|
|Video Card:||NVIDIA GeForce 8800 GTX|
|Video Drivers:||NVIDIA ForceWare 97.01|
|Desktop Resolution:||1600 x 1200|
|OS:||Windows XP Professional SP2|
How does a 3GHz Athlon 64 X2 Perform?
Although today's story is mostly about AMD's Quad FX platform, there is a little gem worth mentioning. AMD's top of the line Athlon 64 FX-74 processors run at 3.0GHz, the highest shipping frequency of any AMD desktop CPU. While it won't be until next year before we see 3.0GHz in an Athlon 64 X2, we were curious to get a little preview of what the dual core race would look like early next year. Note that many of these tests are using updated benchmarks using newer versions of our applications and thus can't be compared to previous results.
The showdown is between the Athlon 64 X2 6000+ (3.0GHz, 1MB L2 per core) and the Intel Core 2 Extreme X6800 (2.93GHz, 4MB L2):
Intel still has the performance advantage, even once AMD reaches 3.0GHz. We didn't expect another 200MHz to do much but it's further confirmation that AMD will need a new architecture to compete; the second half of 2007 can't come quickly enough for AMD.
More Sockets, but Lower Performance?
When AMD briefed us on Quad FX, the performance focus was on heavy multitasking (AMD calls this "Megatasking") or very multi-threaded tests. We figured it was an innocent attempt to make sure we didn't run a bunch of single threaded benchmarks on Quad FX and proclaim it a failure. Given that the vast majority of our CPU test suite is multi-threaded to begin with, we didn't think there would be any problems showcasing where four cores is better than two, much like we did in our Kentsfield review.
However when running our SYSMark 2004SE tests we encountered a situation that didn't make total sense to us at first, and somewhat explained AMD's desire for us to strongly focus on megatasking/multithreaded tests. If we pulled one of the CPUs out of the Quad FX system, we actually got higher performance in SYSMark than with both CPUs in place. In other words, four cores was slower than two.
|CPU||SYSMark 2004SE||Internet Content Creation||Office Productivity|
|2 Sockets (4 cores)||261||373||182|
|1 Socket (2 cores)||288||393||211|
You'll see that in some of the individual tests there is an advantage to having both CPUs installed, but in the vast majority of them performance goes down with four cores. It turns out that there are two explanations for the anomaly.
|CPU||Internet Content Creation||3D Creation||2D Creation||Web Publication|
|2 Sockets (4 cores)||373||245||514||411|
|1 Socket (2 cores)||393||364||453||369|
First, in Internet Content Creation SYSMark 2004SE, there appears to be an issue with having two physical CPUs in the system that results in the 3dsmax rendering test only spawning a single thread, lowering performance below that of a normal dual-core processor. This problem may be caused by a licensing violation within the benchmark where it is expecting to see one physical CPU with multiple cores and isn't prepared to deal with multiple CPUs. Regardless of the exact cause of the problem, it doesn't appear to be anything more than a benchmark issue. It's the performance in the Office Productivity suite that is far more worrisome because there is no issue with the benchmark that's causing the problem.
|CPU||Office Productivity||Communication||Document Creation||Data Analysis|
|2 Sockets (4 cores)||182||171||259||137|
|1 Socket (2 cores)||211||187||285||176|
The Office Productivity suite of SYSMark 2004SE wasn't the only situation where we saw lower performance on Quad FX than with a single dual core setup. 3D games seemed to suffer the most; take a look at what happens in our Oblivion and Half Life 2: Episode One tests:
|CPU||Oblivion - Bruma||Oblivion - Dungeon||Half Life 2: Episode One|
|2 Sockets (4 cores)||67.3||78.3||155.8|
|1 Socket (2 cores)||75.2||90.9||165.7|
Once again, populate both sockets in the Quad FX system and performance goes down. The explanation for these anomalies lies in the result of one more benchmark, CPU-Z's memory latency test:
|CPU||CPU-Z Latency (8192KB, 128-byte)|
|2 Sockets (4 cores)||55.3 ns|
|1 Socket (2 cores)||43.3 ns|
With both sockets populated, memory latency goes up by around 27% and thus in applications that are more latency sensitive and don't necessarily need all four cores, you get worse performance than with a single dual-core CPU. The added latency comes from the additional probing over the HT bus that's done for coherency whenever a memory request is made to see where the latest copy of the data resides.
It's a problem that will go away if you have a single quad-core CPU with one memory controller, but one that makes Quad FX a tougher pill to swallow compared to Intel's quad-core offerings.
Four cores, 1 Socket or Four cores, 2 Sockets?
One of the major arguments in favor of AMD's Quad FX architecture is the fact that you should get better performance scaling when going from 2 to 4 cores since there's no FSB limiting the data coming in to the CPUs. We looked at the performance scaling from a single FX-74 to two FX-74 processors in our Quad FX platform and compared it to Intel's Core 2 running at 2.66GHz with two and four cores enabled.
|Benchmark||AMD Scaling (2 to 4 cores)||Intel Scaling (2 to 4 cores)|
|Blu-ray + Cinebench||147%||135%|
|Blu-ray + DivX||43.9%||48.3%|
|Blu-ray + WME||65.4%||73.4%|
|Blu-ray + 3dsmax 8||63.1%||77.0%|
|Valve Particle Systems||48.8%||93.1%|
|Valve Map Compilation||42.0%||44.3%|
Even when we take into account our heavy multitasking Blu-ray playback scenarios (which we will describe later), AMD's Quad FX doesn't scale any better than Intel's quad-core solution. All things being equal, AMD should have better scaling, however AMD's cores are inherently slower in most of these benchmarks and thus simply adding more of them is not going to make up for the deficit seen by one.
AMD will have better scaling on paper, but Intel has the superior micro-architecture today, which results in better performance and in most cases, better scaling than AMD. The same might not be true in the enterprise market, but we'll have to save that for a look at Opteron vs. Xeon.
3D Rendering Performance with 3dsmax 8 & Cinebench
We've updated our benchmark suite a bit and we're now using 3dsmax 8 with the SPECapc 3dsmax 8 rendering test. The composite rendering score is graphed below, which is the geometric mean of the rendering times in the table below normalized to a reference system.
The FX-74 system comes within striking distance of Intel's Core 2 Extreme QX6700 and the unreleased Core 2 Quad Q6600, which is good for performance but not so great when you take into account power consumption and cost.
Looking at the lower end FX-70, performance isn't that much better than Intel's top of the line dual-core setup, making the purchase even more difficult.
Performance under Cinebench is similar, however the FX-74 actually manages to take the lead away from the QX6700.
Media Encoding Performance with DivX 6.4, WME9, Quicktime and iTunes
With its Core architecture Intel made significant improvements to SSE performance, which we see in the real world here with our DivX encoding test. AMD will be significantly improving its SSE performance with its Barcelona core due out next year, but until then not even four cores will paint a decent picture.
Windows Media Encoder with Advanced Profile encoding performance is far closer, with Intel taking the lead by 7% at the top. Performance is quite competitive, but you have to deal with a more expensive platform and as you will soon see, much greater power consumption.
Both Quicktime and iTunes get no real improvement from four cores over two so these final encoding tests mainly boil down to a performance comparison between dual core CPUs:
Gaming Performance with Quake 4 and Oblivion
Quake 4 doesn't show a huge deficit when looking at Quad FX performance vs. a single socket dual core AMD setup; there's a slight drop but nothing huge. Even though the latest version of the engine is multi-threaded, there's no real benefit above two cores. We will have to wait for the next-generation of titles before we'll start seeing real boosts in performance for quad core CPUs.
Oblivion performance is notably worse on Quad FX than with only a single socket AMD system due to the added memory latency. Now if you were running a 3dsmax render in the background, then the Quad FX system would be faster than its dual core counterpart.
Gaming Performance with Half Life 2: Episode One and Valve SMP Benchmarks
Intel continues to be at the top of the charts in gaming performance with Half Life 2: Episode One:
And once again, Quad FX doesn't do so well if all you're doing is running a single game; an Athlon 64 X2 setup is faster.
Our final two benchmarks are synthetic tests that Valve left us with to give us a preview of the impact of multi-core CPUs in future games. We've talked about both of these tests in our Valve Hardware Day 2006 article if you're interested in learning more about them and what they do.
Both tests favor Intel's Core 2 processors, but both show incredible scaling from two to four cores.
When we were trying to think up new multitasking benchmarks to truly stress Kentsfield and Quad FX platforms we kept running into these interesting but fairly out-there scenarios that did a great job of stressing our test beds, but a terrible job and making a case for how you could use quad-core today.
Without a doubt, in the next two years the number of applications that see a benefit when running on four cores will increase dramatically. Even multitasking under Windows Vista will make the argument for more cores easier (simply opening a new Explorer window in Vista will eat up 10% of the CPU time of a Quad FX system), but our Vista benchmarks are not yet complete and we wanted to have something to showcase for this review.
While working on our Quad FX article we also happened to be working on a follow-up to our HDCP Graphics Card Roundup, focusing on H.264 decoding performance in Blu-ray titles. A light bulb went off and we had our benchmark: how many cores do you need to watch a high bit-rate Blu-ray movie and do something else at the same time on your PC?
The movie we used was Xmen III, encoded in H.264, and featuring bitrates in excess of 40Mbps at times. Our benchmark starts at the beginning of Chapter 18 and continues until our background tasks are complete. This particular segment ranges in bitrate from 13Mbps up to above 40Mbps, with the average falling in the 18 - 24Mbps range.
We played the movie in the foreground, while in the background we either ran our Cinebench test, encoded a DivX movie, encoded a WME9 movie or performed our 3dsmax test.
The two rendering tests are important because rendering can take a bit of time and it might be nice to entertain yourself with a movie while your rendering completes; after all, what's the point of having $1000 worth of CPUs if you can't use them for entertainment?
The two encoding tests are also important because being able to encode and decode at the same time is a fundamental requirement for a DVR, and at some point the next-generation of media center PCs will need to be able to decode high bitrate HD movies while encoding others. We chose to include both DivX and WME because DivX runs much better on Intel CPUs, while the standings are a bit closer under WME, to give you a better overall impression of how the two platforms handle these heavy multitasking scenarios.
Our first test involved us playing back the BD title while running our multi-threaded Cinebench test; we reported the Cinebench score upon its completion:
The dual core processors all fall to the bottom of the list and basically perform like single-core CPUs while decoding the Blu-ray movie. The quad-core setups do much better and perform very well, but all of the CPUs in this test were able to run without dropping any frames in the BD movie.
Making things a bit more difficult, our next test had the same movie playing back but this time we ran our DivX encoding test in the background. We reported the DivX encoding frame rate upon completion:
Performance is pretty much what you'd expect, although Intel's superior DivX encoding performance results in the Core 2 Extreme X6800 doing almost as well as the FX-74. What you don't see however is how well these systems played back the Blu-ray movie; none of the dual core setups were able to play the BD movie smoothly, not even the Core 2 Extreme X6800. The movie was basically unwatchable due to all of the pausing and stuttering.
All of the four core systems played the BD movie fairly well; although they all dropped some frames, it wasn't enough to totally ruin the experience.
Next up we tried playing our BD title while running our WME9 test, and found similar results:
Once again, none of the dual core platforms were able to play the BD title even remotely smoothly. The quad-core setups were able to play the movie while encoding, but still managed to drop some frames (not enough to ruin the experience though).
Our final multitasking test has us playing the same BD title while running our 3dsmax 8 render test:
Much to our disappointment, none of the systems could handle this workload without ruining the movie playback; even the quad-core setups had troubles. We're not talking a few dropped frames, but rather the movie playback would be completely stopped at times. It looks like we may have a scenario for either more GPU assisted H.264 decode or an 8-core Quad FX platform in the future.
Power consumption of a Quad FX system is simply unreal for a desktop, as it should be because this is effectively a workstation platform with un-buffered memory. At idle our Quad FX test bed consumed nearly 400W, partially because we couldn't get Cool 'n Quiet running on the system, but also because the CPUs and motherboard simply draw an incredible amount of power. Update: We got Cool 'n Quiet working on the motherboard which reduced idle power significantly, down to within a few watts of the Kentsfield system. Load power was unchanged.
|CPU||Idle Power||Load Power||Performance per Watt (fps/watt)|
|AMD Athlon 64 FX-74 (3.0GHz x 4)||217W||456W||17.7|
|Intel Core 2 Extreme QX6700 (2.66GHz x 4)||213W||263W||32.9|
Looking at power consumption under full load, Cool 'n Quiet would have no chance to even make an impact as all cores are being utilized at full speed. Under load the Quad FX system pulled 456W on average, a full 73% more than our Kentsfield testbed.
If we look at performance per watt, the Quad FX loses big time. We specifically chose to look at our WME encoding test because the performance of the FX-74 and QX6700 is pretty close. What you're looking at here is the best case scenario for the Quad FX's performance per watt; in applications where it's significantly slower than Kentsfield the performance per watt will be even worse.
AMD is going to have a very tough sell with Quad FX; although the CPUs are priced competitively, if the ASUS L1N64-SLI WS ends up just shy of the $400 mark it's a platform that is simply too expensive at no benefit to the end user. When only running one or two CPU intensive threads, Quad FX ends up being slower than an identically clocked dual core system, and when running more threads it's no faster than Intel's Core 2 Extreme QX6700. But it's more expensive than the alternatives and consumes as much power as both, combined.
There is the upgrade path argument, that eventually you will be able to put a total of eight cores in this Quad FX platform, but we can't help but wonder if the market for someone who wants a non-workstation 8-core setup for desktop use is a very small one. Although to AMD's credit we were able to create a scenario where even four cores won't cut it, making a case for the need for 8-core setups in the future. But the promise of eight cores in the future doesn't do a great job of justifying the Quad FX purchase today.
For those users who won't migrate to eight cores, once AMD's new micro-architecture debuts next year with native quad-core support, this expensive Quad FX platform will be notably slower than cheaper single socket systems. Quad FX is simply a very niche product, and in the era of power efficiency and performance per watt, AMD has released the proverbial SUV of high end desktops.
AMD hopes to sell more Quad FX processors than any FX processor in the past, which to us means that either AMD sees much more opportunity in this platform than we do, or that the previous FX processors simply didn't sell very well. Either way you slice it, there's only one AMD CPU we're really interested in and we won't get it until the middle of next year. Luckily for AMD, Intel doesn't appear to be doing anything huge between now and then either, so it looks like the CPU wars will cool down for a while after a heated few months.
Prepare to revisit this discussion in less than a year's time, and next time AMD will hopefully be much better prepared, armed with a new architecture and a cooler, smaller 65nm process. Until then, there's always Quad FX but you're better off with Kentsfield.