Original Link: http://www.anandtech.com/show/862
AMD's 760MPX Chipset - Multiprocessor for the Massesby Anand Lal Shimpi on December 18, 2001 5:00 PM EST
- Posted in
Back in June we had some very strong words to say in favor of the first multiprocessor chipset from AMD. Indeed the 760MP had a lot of potential and performed quite well, but would it survive?
The platform did not gain any major OEM design wins although the HPs of the world were definitely paying attention to its progress. Instead, the 760MP was a hit among enthusiasts, early adopting risk-takers, and those that were well aware of the platform's power and potential. Thus it came as no surprise that just two months after we wrote about it, we were running it to serve our backend databases as well.
It will take time for the large OEMs to trust the AMD name to power their mission critical server solutions, but it starts with successes at the early adopter level in order to gain a reputation for the product. While there were no performance or stability issues with the 760MP when we first reviewed it, there were some limitations of the chipset design that would most definitely hold back its success as a true high-end product:
Support for only a single PCI bus - As trivial as it may seem to most desktop users, having only a single PCI bus is laughable when it comes to the truly high end server solutions. With a single powerful RAID array it is very easy to saturate a 32-bit/33MHz PCI bus leaving no room for network traffic among other things. Remember that the ServerWorks HEsl chipset for the Pentium III platform already supports two 64-bit/66MHz PCI buses and one 32-bit/33MHz bus.
Support for only a 64-bit/33MHz PCI bus - The original 760MPX only supported a 64-bit/33MHz PCI bus which is ok for most users but a lack of 66MHz PCI support is definitely a mark down for the 760MP. Going back to the previous example of the HEsl from ServerWorks, it had two 64-bit/66MHz PCI buses whereas this chipset didn't even have one.
Low-bandwidth North/South Bridge connection - Because it was only outfitted with a single 64-bit/33MHz PCI Bus, that same bus was used to connect the North and South bridges of the chipset. A 266MB/s link between the North and South bridges is fine for a desktop computer, but not for a server.
This completed the issues that most higher end server users would have with the chipset and it paved the way for the introduction of the true mass-production version of the chipset - the 760MPX.
Better, but not perfect
While it would be great if all of the issues we brought up on the first page were solved, that's unfortunately not the case. The 760MPX does improve on a couple of them but it clearly paves the way for an even superior solution, most likely one for AMD's upcoming Hammer architecture, to truly take off. And we'll hypothesize about that later, but first let's look at how the 760MPX differs from what AMD launched six months ago.
The AMD 762 North Bridge remains unchanged from the original 760MP chipset. Remember that the original chipset had a 64-bit interface to an external PCI bus so the North Bridge wouldn't have to change if support for a 64-bit/66MHz PCI bus were to be included.
Because it's the same North Bridge it features the same benefits that resulted in such stellar performance for the 760MP platform. We've already presented you with a thorough explanation of why the point-to-point bus protocol and MOESI cache coherency provided by the 762 North Bridge and the Athlon architecture result in superior performance. For more information on that as well as the other performance improving factors of the 762 North Bridge take a look at our original 760MP review.
What sets the 760MPX apart from the 760MP is everything outside of the 762 North Bridge. The most obvious enhancement is that there is now a 64-bit PCI bus that runs at 66MHz extending from the 762 North Bridge. This provides 533MB/s of bandwidth across that bus.
The original 760MP chipset only featured a 33MHz 32/64-bit PCI bus (above), while the 760MPX resolves that issue (below)
This 64-bit/66MHz PCI bus supports up to two 32/64-bit PCI devices that operate at either 33 or 66MHz. Now if you stick a 32-bit/33MHz device in one of those slots the bus will default to a 32-bit/33MHz operating mode which obviously ruins the benefits of the bus. Similarly, a 64-bit/33MHz PCI device will reduce the operating frequency of that bus to 33MHz. Why is this important to know?
The 64-bit/66MHz PCI bus is also what connects the AMD 762 North Bridge to the 760MPX South Bridge and thus reducing its operating frequency to 33MHz also reduces the bandwidth between the North and South bridges.
As we just alluded to, the 760MPX does have a new South Bridge and it is the AMD 768. The 768 replaces the old AMD 766 South Bridge which was used ever since the introduction of the AMD 760 chipset back in October 2000. The 768 features a second PCI bus for the platform, this time a 32-bit/33MHz bus for all other peripherals. The only other difference between the 768 and 766 South Bridges is the inclusion of integrated AC'97 audio support.
This is definitely a step in the right direction for AMD, it is a very bold step indeed but there is still a bit of distance to travel before they're officially there. Looking at what they're up against, AMD has already been able to take the performance crown but there's much more to the server market than performance.
Looking at competing solutions from Intel and ServerWorks we can see the need for more PCI buses but the current Athlon platform isn't ready for such an endeavor. Instead, we must look towards the Hammer architecture. With multiple HyperTransport links connecting PCI and PCI-X bridge chips the potential for an equally or more robust platform to be produced by AMD is definitely there. The 760MPX chipset will open many doors for AMD, but it will be the future Hammer platforms that will walk through them.
Obviously a chipset is nothing without support from motherboard manufacturers and luckily the 760MPX will enjoy much broader support than the original 760MP chipset did.
Originally AMD worked very close with one and only one motherboard manufacturer in developing the 760MP platform - Tyan. Thus it was no surprise that Tyan had a pretty hefty production lead on the competition when they released the Thunder K7. For the 760MPX launch, AMD prepared their own reference design under the code name Paulaner.
The Paulaner board can be seen above and is clearly just an engineering test board and not a basis for other products. The external/removable voltage regulator modules bring back memories of the Pentium Pro days and also ruin the otherwise perfect ability to fit into a 1U chassis. Like we mentioned before, this reference design is for testing and validation purposes only and won't be used as a layout template for third party motherboards.
In our Motherboards in 2002 preview we showed off 760MPX solutions from ABIT, ASUS, EPoX, MSI, and Tyan among other manufacturers. While a few of those boards have started shipping already, you won't see them in great supply until next year.
The goal of many motherboard manufacturers for the 760MPX chipset is to bring it down to the workstation level in order to gain acceptance for the platform; with motherboards supposedly being priced in the $180 - $300 range that should be a feasible reality. These motherboards will also work off of regular ATX power supplies and not require the custom 460W units that the original 760MP and 760MPX reference boards needed.
Although we've seen reports contrary to this, we noticed no USB 1.1 compatibility problems with the reference Paulaner board. Using the only two ports available on the board we captured video from an Intel USB web camera while constantly using a Microsoft USB Intellimouse without any compatibility issues. While this is clearly not exhaustive testing, we thought it was worth mentioning.
CPUs and testing them
Although AMD did manufacture a batch of Athlons with a 1MB L2 cache, the product based on what was then known as the Mustang core never made it to market. The current Athlon MPs that are officially validated for dual processor operation are actually no different than their Athlon XP desktop counterparts.
There is one small exception to that statement and that is in regards to the L1 bridges on the most recent CPUs that implement the new organic packaging. As we pointed out in our Athlon XP 1900+ article, the L1 bridges on desktop CPUs are actually cut from the factory while MP CPUs aren't cut, the bridges simply aren't connected.
The only answer we could get from AMD was that the feature the L1 bridges control (multiplier) is not supported by MP motherboards. This could mean that no 760MPX motherboards will be allowed to have multiplier adjustment, or it could mean no more than AMD won't approve of multiplier support on 760MPX motherboards. Only time will tell.
Currently, Athlon MPs are being released on a staggered schedule in comparison to their desktop counterparts. The XP 1900+ was released on November 5, 2001 and the MP 1900+ followed on December 12, just about a month later.
The performance of the 760MPX chipset is no different than the 760MP chipset in the vast majority of the benchmarks we run but it was worth benchmarking to show the improvement dual Athlon MP 1900+ CPUs can have over the previous 1.2GHz kings. We have already shown the dominance of the Athlon MP 1.2GHz processors in our database server tests as well as 3D rendering and other applications so we'll try and limit the performance analysis here to areas which we haven't covered in such great detail.
Unfortunately we're still limited to testing with 1.7GHz Xeons as the only other processor released by Intel are the 2GHz CPUs which we weren't in possession of during the time of testing. We will however provide guidance as to how a pair of 2GHz Xeons would stack up where appropriate.
Since we've already examined the performance difference between single and dual processors you can consult our original article for a look at those performance numbers. This review will just briefly focus on the improvement the MP 1900+ offers over the MP 1.2GHz and compare both of those to the 1.7GHz Xeons.
Windows 200 SP2 Test Configuration
Intel Xeon 1.7GHz x 2
AMD Athlon MP 1.2GHz x 2
AMD Athlon MP 1900+ x 2
4 x 128MB Samsung PC800 RDRAM
2 x 256MB Crucial DDR266 CAS 2 Registered SDRAM
IBM 60GXP 40GB 7200 RPM
NVIDIA Detonator 21.81
Windows 2000 Professional SP2
Over the past year the Athlon has become a hit within the CAD market because of its low price and high performance. To find out exactly why the Athlon has been so successful in that market we turned to AutoCad 2002 in order to measure the influence the CPU had on performance.
AutoCad 2002 isn't the best application to take advantage of multiple processors however. The vast majority of operations and calculations performed in AutoCad aren't multithreaded, meaning that one processor will remain idle while the other one handles the task. Even on a single Athlon MP 1900+ however, these tasks will peg the CPU at 100% utilization meaning that anything else going on in the background will suffer. Some of the 2D operations are multithreaded but overall, AutoCad users won't see an improvement by making the transition to dual CPUs.
Where adding a second CPU will help however is in multitasking. As we just mentioned, most processes in AutoCad will eat up 100% of the CPU time of a single Athlon MP 1900+ meaning that anything from playing an MP3 to working on another project in AutoCad at the same time will result in reduced overall performance. A second CPU would allow you to more effectively multitask or even work on multiple AutoCad projects simultaneously with much greater performance.
In order to measure performance under AutoCad 2002 we used Cadalyst Labs' C2001 benchmark.
The 3D wireframe portion of the C2001 test is limited by two things - the platform its running on as well as the video card used. Since we stuck with an end-user class GeForce3, the wireframe performance is not as great as a Quadro 2 or higher end 3DLabs graphics card would be; making that a clear limitation. This would explain why there's relatively no difference between the dual Athlon MP 1900+ and the same CPUs just clocked 400MHz lower.
The gap between the dual Xeon platform and the two Athlon MP solutions however indicates that an inherent limitation of the Xeon's 860 chipset or the Xeon's architecture in general is what is holding this CPU back. There are a number of possibilities to explore here, although we don't believe the cause of its performance to be related to bandwidth. It could very well be that an overall cache size advantage gives the Athlon MPs the advantage, but it's clear that the Xeons are not able to perform in the same class as the Athlon MPs in this test.
The 3D Gouraud shading index is clearly limited by our test bed's GeForce3. The Xeons exert a negligible lead over the Athlon MP 1.2s, and the same is true for the faster Athlon MP 1900+ setup.
The non-graphic portion of the test includes all I/O operations as well as all calculations that are made before the display of any graphical changes meaning that it's easily the most CPU intensive portion of this benchmark.
With that said, the benchmark results speak volumes about the relative performance of these dual CPU setups. The dual Athlon MP 1.2GHz setup we ran was already 9.7% faster than the dual Xeon 1.7GHz system, and the new dual Athlon MP 1900+ simply extended that lead. Even a pair of 2GHz Xeon processors won't be able to offer similar performance as they would only just begin to outperform the dual Athlon MP 1.2GHz solution.
These results alone are good indication of why the Athlon has caught on so well in this market and why it will continue to do so.
The 2D graphics index illustrates some of the efficiencies of the various platforms since the 2D tests were the only ones that took advantage of both processors. Here the Xeon is able to outperform the dual Athlon MP 1.2 by an 8% margin and only falls about 5% short of the Athlon MP 1900+. An overclocked dual Xeon 1.9GHz system would even outperform our dual Athlon MP 1900+ test bed in this benchmark.
Explanation for this could be the increased memory bandwidth of the Xeon's dual channel RDRAM memory bus able to feed the processors much better than the Athlon MPs which have to fight for 1/3 less memory bandwidth than the Xeons.
Totaling up all of those indices we see that the dual Athlon MP 1900+ completed the tests in 5.3 minutes less than the dual Athlon MP 1.2GHz; a decent boost in performance. The difference between the dual Athlon MP 1900+ and the dual Xeon 1.7GHz is much larger at 11 minutes.
The overall performance index shows the dual Athlon MP 1.2GHz and dual Xeon 1.7GHz platforms performing within 4% of each other, while the dual Athlon MP 1900+ is a good 11% and 15% higher than the two respective platforms.
Since our Xeons were unlocked we could overclock them to 1.9GHz to see if that would reduce the performance gap a bit, however they came in at 44.63 which is almost identical to the performance of the dual Athlon MP 1.2.
If this is an indication of overall performance in AutoCad then it's plain and clear why the Athlon and Athlon MP platforms have become such favorites in the CAD community.
3D Rendering & Animation Performance
Time is money and for most professionals using 3D rendering applications like Maya and 3D Studio MAX, we're talking about a lot of money. 3D rendering is inherently very cache and FPU intensive and depending on the size of the scene being rendered, it can be very memory bandwidth intensive as well. Luckily the more advanced 3D rendering and animation programs do a great job of splitting up the difficult work among multiple CPUs making the upgrade to dual CPUs more than worth it.
We used two benchmarks to categorize the 3D rendering & animation performance of these platforms - Maya 4.0.1 using the Maya-Testcenter's rendertest and 3D Studio MAX 4.26 using our own benchmark from the original 760MP article. Note that you can't compare these numbers to the previous 3D Studio MAX numbers because of the fact that we're using a newer version of the software (4.26 vs 4.02) which boasts improved SSE and SSE2 support.
The upgrade to dual Athlon MP 1900+ CPUs results in an increase of about 22% in terms of rendered images per hour. The difference between the dual Athlon MP 1.2 and the dual Xeon 1.7 systems is about 5%.
We'd expect a pair of Xeons running at 2GHz to approach but not exceed the performance of the dual Athlon MP 1900+.
We see almost identical performance gaps here under 3D Studio MAX. Even with the latest patch for the software with improved Pentium 4 optimizations cannot give the Xeon the performance edge it needs.
Content Creation Performance
Content Creation users also demand the high performance that is almost always targeted at these higher price workstations. There is a benefit to having multiple CPUs and again it is the ability to perform many tasks at once with each of them running as fast as possible. The Content Creation user is one who spends a lot of time doing just that, creating content. Whether that content is in the form of videos that must be encoded, music that must be sampled, or images that must be edited, the creation process takes time and it takes clock cycles.
We've proved in the past that going to dual CPUs can increase overall content creation performance by anywhere from 20 - 50% which is much more than even a couple of CPU speed grades will give you.
To measure Content Creation performance we used the Internet Content Creation tests of SYSMark 2001 as well as the new Content Creation Winstone 2002 from Ziff Davis Media.
Here there is only a 3.2% difference between the dual Athlon MP 1900+ and the dual Xeon 1.7GHz solution. If SSE were properly enabled in Windows Media Encoder on the Athlon platforms then the performance gap would be even greater but unfortunately AMD's SYSMark patch would not work under our test Windows 2000 OS.
Content Creation Winstone 2002 provides a picture we're much more used to seeing, with the Athlon MP 1900+ CPUs coming out 24% faster than the dual Xeons. The Xeons are much more competitive with the dual Athlon MPs running at 1.2GHz which are only able to outpace Intel's offering by 6%.
Linux kernel compilation provides an entirely different standpoint from which to judge performance. Most obviously, it demonstrates performance relative to the Linux kernel instead of a Microsoft operating system, thus providing a more well-rounded test suite. However, it also benchmarks using compilation instead of trying to reproduce the tasks Microsoft Office user would perform. Yes, some people do more with their CPUs than sum tables in Excel.
In our earlier review of AMD's 760MP chipset, we were quite impressed with the results of moving from 1-CPU builds to 2-CPU. For this review, we decided to focus on clock rate importance and the age-old Intel vs. AMD battle. Please read our original 760MP review for a single vs. dual CPU comparison and general discussion of relevance in a Linux environment.
The Athlon MP 1900+, despite being 30% faster than its 1.2GHz counterpart, only reduced compilation times by 15%. Thus, CPU speed is only half the limiting factor in these tests. Memory bandwidth and cache latency are likely the other major factors; as nearly all disk access should have been cache work.
This article served two purposes; one was to show the performance improvement of the Athlon MP 1900+ over the Athlon MP running at 1.2GHz. The second purpose was to introduce the 760MPX chipset as a very capable workstation solution. We've already shown you how well the platform would perform as an entry-level or midrange server solution, this is just in addition to that.
But the 760MPX chipset isn't the best thing since sliced bread. It has the opportunity to fill a fairly large niche and we feel that it will succeed very well in doing this however it isn't robust enough for some of the largest applications and environments that have heavy I/O requirements. While it's highly unlikely that AMD will update the 760MPX chipset anytime soon to provide for these features, you can bet that their Hammer chipsets will enable many of the features we have talked about the 760MPX lacking.
AMD has the performance and they're quickly building a name for themselves in the server and workstation market, but there's a long road ahead before they can even begin to claim the same victories in those markets that they have in the desktop world. The stakes are much higher in these markets and it also means that the competition is even tougher; it's clear that AMD has a lot resting on a successful launch of their Hammer line of processors.