Original Link: http://www.anandtech.com/show/1665
AMD's dual core Opteron & Athlon 64 X2 - Server/Desktop Performance Previewby Anand Lal Shimpi, Jason Clark & Ross Whitehead on April 21, 2005 9:25 AM EST
- Posted in
Earlier this month, Intel introduced their first dual core desktop CPUs - the Pentium D and the Pentium Extreme Edition. Coupled with extremely aggressive pricing designed to move the majority of the desktop market to dual core within the next two years, Intel's launch did not cease to impress. The ability to bring workstation class performance in multithreaded and multitasking environments to the desktop, at an affordable price, is something that we've been hoping for for years. Intel's launch was pretty interesting, but today, AMD has more bang.
From the beginning, AMD has talked about how they were going to bring dual core to the K8 architecture. The on-die north bridge, a part of every Athlon 64 and Opteron CPU, was designed from the ground up to be able to support multiple cores. AMD had designed their first dual core K8 CPUs years ago. They were simply waiting for manufacturing processes to mature in order to actually make producing such a chip a feasible endeavor.
With their 90nm process finally maturing, it made business and financial sense to introduce their first dual core products. AMD wasn't alone in their decision, as it's obvious that Intel waited for 90nm before making their move to dual core as well. The problem for both AMD and Intel is that at 90nm, a dual core chip is getting a bit on the large side. Intel's first dual core CPUs weigh in at 230 million transistors on a 206 mm2 die, and AMD's new CPUs are a bit obese themselves at 233.2 million transistors on a 199 mm2 die. While the dual core CPUs that we have tested from both AMD and Intel are no slouch, it's clear that both companies are working to transition to 65nm as quickly as possible to make manufacturing these chips much more reasonable.
Although this may seem like a tangent to the topic at hand, manufacturing has a lot to do with today's announcements from AMD. What exactly is being announced? Well, for starters, AMD is announcing their first dual core Opteron parts. The word "announcing" in this sense means that they are declaring availability of their 800 series dual core Opteron CPUs, and promising that 200 and 100 series dual core Opteron CPUs will be made available starting next month. Before we move on to the rest of the announcement, pay very close attention to the parts for which AMD is announcing availability - the 800 series parts. The Opteron 800 series CPUs are for use in 4 or more socket servers and are AMD's most expensive CPUs, and thus, their lowest volume CPUs. Remember that at 90nm, AMD can produce around half as many dual core CPUs as they can single core CPUs per wafer - so they need to be very careful about demand. You will notice later on in this article that AMD's strategy involves keeping prices higher and introducing lower quantity CPUs first, in order to ensure that their single core CPUs still have a market and that they aren't committing to more than what they can deliver. At the end of the day, AMD is still a much smaller manufacturer than Intel and thus, they have to play their cards very differently, which leads us to the second part of AMD's announcement today: the new dual core desktop Athlon 64 X2 line.
The Athlon 64 X2 will be AMD's new brand for dual core desktop CPUs. This line is being talked about today, but an official announcement with full benchmarks won't come until June. That being said, we have made it a point to bring you a preview of Athlon 64 X2 performance in this article, despite the fact that AMD isn't introducing the chips for another two months. So, for all of you who are interested to see how AMD's dual core desktop CPUs stack up against the recently introduced Pentium D, never fear, we have what you're looking for.
The introduction of the Athlon 64 X2 brand also comes with a few other tidbits of information:
This article will serve two purposes. First and foremost, we're interested in the new dual core Opterons as a server solution - and we run them through our usual web and database serving tests. Next, we're going to take a look at the Athlon 64 X2 and how it compares in performance to Intel's recently announced Pentium D. We've developed even more desktop multitasking tests for this article, so we'll be able to provide you with an idea of how well AMD will be able to compete in the multi-core world.
- The Athlon 64 4000+ was the last single core member of the Athlon 64 line.
- The Athlon 64 FX will continue as a single core CPU line, with the FX-57 (2.8GHz) due out later this year.
We are missing a look at workstation performance between the Opteron and Xeon, but rest assured that such a comparison is in the works. The usual mix of very limited time and hardware problems (which we will also discuss in this article) forced us to exclude one of the comparisons, and thus, the workstation comparison will have to wait for another day.
A Look at AMD's Dual Core ArchitectureEven Intel will admit that the architecture of the Pentium D is not the most desirable as is two Pentium 4 cores literally glued together. The two cores can barely be managed independently from a power consumption standpoint (they still share the same voltage and must run in the same power state) and all communication between cores must go over the external FSB. The diagram below should illustrate the latter point pretty well:
Any communication between the two cores has to be done over the external FSB, and obviously, core-to-core communication over an external bus is slow. It particularly doesn't make sense, since the two cores are on the same die. Even the 65nm successor to the Pentium D (Presler) will have this same limitation.
Intel's Pentium D dual core architecture
AMD's architecture is much more sophisticated, thanks to the K8 architecture's on-die North Bridge. While we normally only discuss the benefits of the K8's on-die memory controller, the on-die North Bridge is extremely important for dual core. Instead of having all communication between the cores go over an external FSB, each core will put its request on the System Request Queue (SRQ) and when resources are available, the request will be sent to the appropriate execution core - all without leaving the confines of the CPU's die. There are numerous benefits to AMD's implementation, and in heavily multithreaded/multitasking scenarios, it is possible for AMD to have a performance advantage over Intel just because of this implementation detail alone.
The one limitation that both AMD and Intel have is bandwidth. In order to maintain compatibility with present day Socket-940 and Socket-939 motherboards, AMD could not increase the pincount of their dual core processors. The benefit is that AMD's dual core CPUs will work in almost all Socket-940 and Socket-939 motherboards (more on this later), but the downside is that the memory bus remains unchanged at 128-bits wide and supports a maximum memory speed of DDR400. So, while single core Athlon 64 and Opteron CPUs get a full 6.4GB/s of memory bandwidth, today's dual core CPUs are given the same memory bandwidth to share among two cores instead of one.
AMD's solution to the problem will come in the form of DDR2 and a new socket down the road, but for now there's no getting around the memory bandwidth limitations. Intel is actually in a better position from a memory bandwidth standpoint. At this point, their chipsets provide more memory bandwidth than what a single core needs with their dual channel DDR2-667 controller. The problem is that the Intel dual core CPUs still run on a 64-bit wide 800MHz FSB, which makes Intel's problem more of a FSB bandwidth limitation than a memory bandwidth limitation.
Backwards CompatibilityIntel's dual core Pentium D and Extreme Edition won't work in any previous motherboards, but as we mentioned at the start of this article, AMD has more bang. Here, the additional bang comes from the almost 100% backwards compatibility with single-core motherboards. We say "almost" because it's not totally perfect; here's the breakdown:
- On the desktop, the Athlon 64 X2 series is fully compatible with all Socket-939 motherboards. All you need is a BIOS update and you're good to go.For desktop users, the ability to upgrade your current Socket-939 motherboards to support dual core in the future is a huge offer from AMD. While it may not please motherboard manufacturers to lengthen upgrade cycles like this, we have never seen a CPU manufacturer take care of their users like this before. Even during the Socket-A days when you didn't have to upgrade your motherboard, most users still did because of better chipsets. AMD's architectural decisions have made those days obsolete. The next generation of dual core processors will most likely need a new motherboard, but rest assured that you have a solid upgrade path if you have recently invested in a new Socket-939 desktop system or Socket-940.
- For workstations/servers, if you have a motherboard that supports the 90nm Opterons, then all you need is a BIOS update for dual core Opteron support. If the motherboard does not support 90nm Opterons then you are, unfortunately, out of luck.
The Lineup - Opteron x75Prior to the dual core frenzy, multiprocessor servers and workstations were referred to by the number of processors that they had. A two-processor workstation would be called a 2-way workstation, and a four-processor server would be called a 4-way server.
Both AMD and Intel sell their server/workstation CPUs not only according to performance characteristics (clock speed, cache size, FSB frequency), but also according to the types of systems for which they were designed. For example, the Opteron 252 and Opteron 852 both run at 2.6GHz, but the 252 is for use in up to 2-way configurations, while the 852 is certified for use in 4- and 8-way configurations. The two chips are identical; it's just that one has been run through additional validation and costs a lot more. As you may remember, the first digit in the Opteron's model number denotes the sorts of configurations for which the CPU is validated. So, the 100 series is uniprocessor only, the 200 series works in up to 2-way configurations and the 800 series is certified for 4+ way configurations.
AMD's dual core server/workstation CPUs will still carry the Opteron brand, but they will feature higher model numbers; and while single core Opterons increased in model numbers by 2 points for each increase in clock speed, dual core Opterons will increase by 5s. With each "processor" being dual core, AMD will start referring to their Opterons by the number of sockets for which they are designed. For example, the Opteron 100 series will be designed for use in 1-socket systems, the Opteron 200 series will be designed for use in up to 2-socket systems and the Opteron 800 series will be designed for use in 4 or more socket systems.
There are three new members of the Opteron family - all dual core CPUs: the Opteron x65, Opteron x70 and Opteron x75.
There are a few things to take away from this table:
- The fastest dual core runs at 2.2GHz, two speed grades lower than the fastest single core CPU - not too shabby at all.
- The slowest dual core CPU is priced at the same level as the fastest single core CPU; in this case, $637.
- Unlike Intel, AMD's second core comes at a much higher price. Take a look at the 148 vs. 175. Both run at 2.2GHz, but the dual core chip is over 3.5x the price of the single core CPU.
The pricing structure at the 200 and 800 levels doesn't change much either - the stakes are simply higher.
While AMD will undoubtedly hate the comparison below, it's an interesting one nonetheless. How much are you paying for that second core on these new dual core Opterons? To find out, let's compare prices on a clock for clock basis:
AMD's margins on their dual core Opteron parts are huge. On average, the second core costs customers over 3x as much as the first core for any of these CPUs. As you will soon see, the performance benefits are definitely worth it, but know that AMD's pricing is not exactly designed to drive dual core into widespread adoption.
The Lineup - Athlon 64 X2As we mentioned earlier, the Athlon 64 X2 isn't going to be officially launched until June. While AMD is purposefully vague in their discussion of availability, it looks like their plans are for system builders and OEMs to offer Athlon 64 X2 systems in Q3 of this year and for retail availability to be in Q4 of this year.
For AMD, the Athlon 64 4000+ was the last single core Athlon 64 that they will make; all model numbers after 4000+ will be dual core Athlon 64 X2s. Starting at 4200+ and going up to 4800+, the Athlon 64 X2 continues AMD's trend of basing model numbers on clock speeds and cache sizes. You can see the breakdown below:
For starters, the Athlon 64 X2's clock speeds aren't that low compared to the current single-core Athlon 64s. The top of the line Athlon 64 FX-55 runs at 2.6GHz, only 200MHz faster than the Athlon 64 X2 4800+. This is in stark contrast to Intel's desktop dual core offerings, which run between 2.8 and 3.2GHz, a full 600MHz drop from their fastest single core CPU.
The other major difference between AMD and Intel's dual core desktop approach is in pricing. Let's take a look at the cost per core of the Athlon 64 X2:
We see that AMD's desktop pricing is much more reasonable than their dual core Opteron pricing, but then again, also remember that their desktop CPUs won't be in volume until later this year. The second core never costs more than the first one, which is honestly the only way you can ensure good desktop adoption rates.
That being said, let's compare it to Intel's pricing:
Because Intel is only shipping lower clocked dual core CPUs, Intel's chip prices are much lower - not to mention that Intel's manufacturing abilities far exceed those of AMD. Percentage-wise, the Pentium D 3.2 commands a high premium for that second core, but the prices are overall quite reasonable. The fastest Pentium D is still cheaper than the slowest Athlon 64 X2 4200+, and the slowest Pentium D is ridiculously cheap compared to AMD's dual core offerings.
AMD's answer to Intel's aggressive pricing is two-fold. Eventually, all of AMD's CPUs will be dual core, and thus, prices will be driven back down to single core levels. But for now, AMD feels confident enough that their single core CPUs are fast enough to compete with Intel's low clocked Pentium Ds. We put that exact thinking to the test in Part II of our Intel dual core preview and concluded that it really depends on what type of a user you are. If you tend to multitask a lot or run a lot of multithreaded applications, then a slower Intel dual core is what you need; otherwise, a faster single core AMD is your best bet.
Dual Core Server Performance: AMD's Opteron x75 SeriesOur first comparison of AMD's new dual core parts is in the server world - where AMD's new CPUs will be shipping to first. Of course, no review is complete without a handful of interesting experiences from the lab, and this dual core launch was no exception.
Server Test PlatformsAMD
Our Dual Core samples arrived a few weeks ago from AMD, well in advance of the launch date of April 21st. At the time of the samples' arrival, we didn't have a stable server board to use for our tests. The Tyan S2891 board that we had on hand was still going through BIOS changes and was not recommended for use with the Dual Core parts. As per AMD's recommendation, we secured a Tyan S2895 Workstation board, which AMD had verified was stable. We were uneasy running server based benchmarks on a workstation board and felt that a server based board recommended by AMD would have been more appropriate. That being said, both the S2891 and S2895 are very similar and are both nForce 4 based chipsets, so performance is virtually identical.
Intel is expected to release their Dual Core Xeon parts in the first quarter of 2006. So, we requested from Intel their latest Xeon MP system, since we were essentially putting a "4P" system against a Dual Xeon with the current hardware that we have in the lab. Intel, as always, came through with their SR4850HW4 4P system along with 4 Cranford 3.6 GHz 1MB L2 cache processors and 4 Potomac 3.3 GHz 8MB L3 Cache processors.
The SR4850HW4 system uses Intel's new E8500 server chipset "Twin Castle", which most importantly includes a new dual bus architecture that runs at 667MHz, up from 400MHz on older Xeon platforms. As you may have read in our last Quad Xeon article, the Xeon was in dire need of some front side bus bandwidth. Aside from the new bus architecture, the E8500 uses DDR2 based memory, in line with the current DP based Xeon systems.
When we began our testing on the new Intel platform, we quickly learned another "feature" of the SR4850HW4. After unpacking the system and setting it up, we proceeded to power it up with the default configuration with which the system had been shipped. The system wouldn't power up. With barely 2-3 days until the launch of this article, we were (needless to say) "on edge" about getting the benchmarks running. We placed an E-mail into our Intel contact, and within about 5 minutes, an engineer gave us a call. After a few minutes on the phone, the engineer asked, "What do you have the system plugged in to?" We responded, "Well, a wall plug in our lab." He then broke the news: "That system requires 208V to run." Now what? Off to Home Depot we went and grabbed some 12 gauge wire and breaker, and within an hour, we were installing Windows. Another Lab adventure for the books?
Server Test Hardware ConfigurationAMD
Motherboard: Tyan S2895
Memory: 4GB Kingston PC3200 ECC (2GB for Web benchmarks)
OS: Windows 2003 Enterprise/Windows 2003 Web edition (Web benchmarks)
RAID: LSI Logic 320-2 with 8 Seagate 15K Cheetahs in Raid 0
Memory: 4GB Infineon DDR2
OS: Windows 2003 Enterprise/Windows 2003 Web edition (Web benchmarks)
RAID: LSI Logic 320-2 with 8 Seagate 15K Cheetahs in Raid 0
Web Tests - FuseTalk .NETOur .NET test was run on the recently released FuseTalk .NET collaboration application. For this test, we ran with configurations that would typically be used in a web server, which is why you won't see results for the Quad Xeons here.
The .NET platform is the new framework for building Windows-based and web-based applications from Microsoft. It not only replaces the older ASP platform, but introduces some up-to-date languages that run on the Common Language Runtime, which is the backbone of .NET. The three main languages used with .NET are: C#, VB.NET, and J#. Whatever language in which you write your code, it is compiled into an intermediate language - CIL (Common Intermediate Language). It is then managed and executed by the CLR (Common Language Runtime).
As you can see from the results below, the Dual Core Opteron 875 part posted a 30% gain over the fastest Dual Opteron 252. Take note that the Dual Core Opteron 875 is a lower clocked part than the 252; the 875 is clocked at 2.2 GHz while the 252 is clocked at 2.6 GHz.
SQL Stress Tool BenchmarkOur first benchmark was custom-written in .NET, using ADO.NET to connect to the database. The AnandTech Forums database, which is over 14GB in size at the time of the benchmark, was used as the source database. We'll dub this benchmark tool "SQL Stress Tool" for the purposes of discussing what it does. We have done some updates to the tool since we first used it; it now supports Oracle, and MySQL. We also adjusted the test time for this test and future tests to 20 minutes. The reason for this was to ensure that we used as much memory as possible for future planned 64 bit tests.
Example Random ID Generator:
The workload used for the test was based on every day use of the Forums, which are running FuseTalk. We took the most popular queries and put them in the workload. Functions, such as reading threads and messages, getting user information, inserting threads and messages, and reading private messages, were in the spotlight. Each reiteration of the test was run for 20 minutes, with the first being from a cold boot. SQL was restarted in-between each test that was run consecutively.
The importance of this test is that it is as real world as you can get; for us, the performance in this test directly influences what upgrade decisions we make for our own IT infrastructure.
SQL Stress ResultsThe SQL Stress results have changed somewhat from some of our earlier articles using this tool. We did a revamp of the tool itself, which is more performant on high volume queries. Also, we lengthened the test time to 20 minutes and changed the queries around some to reflect our current FuseTalk version.
The Quad Xeon prevailed in this test, due to the small size of the data being manipulated by the CPU. The Xeon doesn't have to go off the processor for data nearly as often as it does in our Heavy workload tests. The Quad Xeon 3.6 with 1MB of L2 squeaked by the Dual Core 875 with a 5% margin, while the 3.3GHz with 8MB of L3 managed nearly a 10% lead. Obviously, with a smaller data set, we're not fully utilizing the 8MB of L3 that the Potomac Xeons offer.
"Order Entry" Stress Test: Measuring Enterprise Class PerformanceOne complaint that we've historically received regarding our Forums database test was that it isn't strenuous enough for some of the Enterprise customers to make a good decision based on the results.
In our infinite desire to please everyone, we worked very closely with a company that could provide us with a truly Enterprise Class SQL stress application. We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment).
We will use an Order Entry system as an analogy for how this test interacts with the database. All interaction with the database is via stored procedures. The main stored procedures used during the test are:
sp_AddOrder - inserts an Order
sp_AddLineItem - inserts a Line Item for an Order
sp_UpdateOrderShippingStatus - updates a status to "Shipped"
sp_AssignOrderToLoadingDock - inserts a record to indicate from which Loading Dock the Order should be shipped
sp_AddLoadingDock - inserts a new record to define an available Loading Dock
sp_GetOrderAndLineItems - selects all information related to an Order and its Line Items
The above is only intended as an overview of the stored procedure functionality; obviously, the stored procedures perform other validation, and audit operations.
Each Order had a random number of Line Items, ranging from one to three. Also randomized was the Line Items chosen for an order, from a pool of approximately 1500 line items.
Each test was run for 10 minutes and was repeated three times. The average between the three tests was used. The number of Reads to Writes was maintained at 10 reads for every write. We debated for a long while about which ratio of reads to writes would best serve the benchmark, and we decided that there was no correct answer. So, we went with 10.
The application was developed using C#, and all database connectivity was accomplished using ADO.NET and 20 threads - 10 for reading and 10 for inserting.
So, to ensure that IO was not the bottleneck, each test was started with an empty database and expanded to ensure that auto-grow activity did not occur during the test. Additionally, a gigabit switch was used between the client and the server. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.
At the beginning of each platform, both the server and client workstation were rebooted to ensure a clean and consistent environment. The database was always copied to the 8-disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, and the empty one was copied again to the clean array. SQL Server was not restarted.
Order Entry ResultsOur Vendor test has received quite a bit of interest from certain processor vendors; rightfully so, as the workload is quite difficult to recreate.
As you can see from the results below, there are some interesting conclusions that you can draw:
- The Dual Opteron 875 took the lead by 18% over the fastest Quad Intel. This should come as no surprise as we have seen in the past that the memory bandwidth limitation of the Intel FSB architecture does not allow the quads to really stretch their legs. On the other hand, the Integrated Memory Controller of the Opterons allow them to pull ahead.
- The additional L3 cache of the Quad Xeon 3.3GHz allows it to outperform the Quad Xeon 3.6GHz by 16%.
- The Quad Xeon 3.6GHz with the 667MHz FSB is only able to outperform the Dual Xeon 3.6GHz 800MHz FSB by 5%.
- The dual Xeons are able to outpace the dual 252's by 2%, and the single 875 by 5%. The Xeons success here can be attributed to the additional L2 cache.
- The Dual Opteron 875 demonstrated nice scalability by servicing 52% more requests in the same period as the single Opteron 875.
Data Warehouse TestWe are always looking to improve the quality of our reviews and as a result, we have added a new Stress Test to our suite.
This "Data Warehouse" test is focused on large record sets with plenty of aggregation. This test is based on a system that we developed to track and manage Request statistics for www.AnandTech.com and Forums.AnandTech.com. It tracks statistics like Requests/Hour, Requests/Hour/IP Address, Unique IP Addresses/Hour, Unique Users/Hour, Daily Browser stats, etc. These stats are further summarized by site, i.e. www or Forums.
As with the other Stress Tests, each test was repeated three times and the average between the three tests was used. For this Data Warehouse Stress Test, we defined a quantity of work to complete and measured how long each platform required to process the workload.
So, to ensure that IO was not the bottleneck, each test was started with a database, including tempdb, which had already been expanded so that autogrow activity did not occur during the test. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.
At the beginning of each platform, the server was rebooted to ensure a clean and consistent environment. The database was always copied to the 8 disk RAID 0 array with no other files present to ensure that file placement and fragmentation were consistent between runs. In between each of the three tests, the database was deleted, the original database was copied again to the array, and SQL Server was restarted.
There is no "client" required for this test. The workload is initiated by a stored procedure call from Query Analyzer.
Data Warehouse ResultsThis test is different than our other tests as there are no parallel queries being executed. On the other hand, the query optimizer is able to use parallel execution and utilize multiple processors for the same query. This does not lead to the same CPU load as parallel queries, but does demonstrate a measurable difference in execution time.
In these tests, the datasets are mostly 100,000+ records and are many times larger than the L2 and L3 cache; thus, the additional cache is of no benefit. It is all about the number of instructions that can be completed in a given period. The ability to move data quickly in and out of the CPU are the characteristics of a winner in this test.
- There is a 30% spread in results between the fastest and slowest platform. Yet again, it is proof that you do not have to be CPU bound to recognize a performance gain from a hardware upgrade.
- The Dual Opteron 252's lead by 19% over the closest Xeon, which was the Quad Xeon 3.6 GHz 667MHz FSB
- The Quad Xeon 3.6 GHz 667MHz outperformed the Dual 3.6GHz 800MHz FSB by 8%.
Dual Core Desktop Performance: AMD's Athlon 64 X2 4400+AMD didn't send out any Athlon 64 X2 processors for this review. They promised us chips for the real launch in June, but we don't like waiting and neither do most of you, so we improvised.
The Opteron x75 CPUs that AMD sent us run at 2.2GHz and have a 1MB L2 cache per core, which makes the specs basically identical to the Athlon 64 X2 4400+. Although the use of ECC memory and a workstation motherboard would inevitably mean that performance will be slower than what will be when the real Athlon 64 X2s launch, its close enough to get a good idea of the competitiveness of the Athlon 64 X2.
For these tests, we used the same workstation board that we used in the server performance tests, but in doing so, we encountered a lot of other random problems.
With only a single CPU installed in the Tyan S2985, the system would always hang upon restarting Windows. We could shut down Windows fine and we could manually restart the machine, but if we hit Start > Shut Down > Restart, our test bed would always hang at the "Windows is Shutting Down" screen. Populating the second CPU socket fixed that problem, but obviously for our desktop comparison, we only used a single CPU to simulate a single Athlon 64 X2 4400+. The problem is undoubtedly due to the dual core BIOS, but it was frustrating to say the least (note that our normal desktop benchmark suite requires over 200 reboots - and we did every last one by hitting the reset switch on that motherboard).
The next issue we had with the motherboard is that none of the four on-board SATA ports would detect a hard drive. Apparently, this is a common problem with this board and since we were using the absolute latest BIOS revision from Tyan (we had to in order to support dual core), there was no fix for the problem at the time of our testing. Because of this problem, we were forced to use a PATA hard drive, which unfortunately meant that we couldn't test with an NCQ enabled drive.
The final problem we had was that there were significant issues with regards to memory compatibility and performance on this Tyan board with the dual core BIOS. We were forced to run at much slower memory settings than we would normally run on a desktop Athlon 64 motherboard - we had to run with the bus turnaround option set to 2T in order to even get Windows to install. A side effect of some of these issues was that not all of our tests would run properly; most did, but a few didn't make it. Obviously, we'll fill in the blanks when we perform our actual tests for the Athlon 64 X2 review, but this will serve as a preview.
All in all, we were extremely disappointed with the only board that AMD would recommend us to use with their first dual core processors. The BIOS is far from ready and the board seems to have issues that extend beyond what can be attributed to the dual core BIOS. When Intel sent us a dual core setup earlier this month, we were surprised at how stable the system was. Our experience with AMD's platform was the exact opposite. While we're very confident that dual core Opteron systems from tier one OEMs won't have these sorts of issues, the fact that we were having these problems just weeks before the launch of a major CPU is worth mentioning. We've also held off on doing any sort of power consumption analysis between the Athlon 64 X2 and the Pentium 4 until we get desktop platforms in hand. That being said, AMD rates the Athlon 64 X2 as having the same thermal envelope as the current Socket-939 Athlon 64 processors. Thanks to a cool running 90nm process and slightly lower clock speeds, AMD is able to achieve just that.
With the problems out of the way, we were ready to get down to benchmarking. So, we put together a list of CPUs that made sense to compare for the desktop portion of this preview.
AMD's own marketing suggests that based on the price differences between their dual core CPUs and Intel's, the Athlon 64 X2 is in a class above the Pentium D. Instead, AMD suggests that the real competitors to the Pentium D 820, 830 and 840 are the Athlon 64 3400+, 3500+ and 3800+, respectively. To test that theory, we included an Athlon 64 3800+ as well as the fastest single core AMD processor, the Athlon 64 FX-55, in our comparisons.
The comparison that AMD makes is depicted below. Note that this is AMD's marketing comparison, not our own.
For the Athlon 64s, we used MSI's nForce4 SLI board; and for the Intel CPUs, we used Intel's own 955X board. All systems were configured with 1GB of memory and used the same Seagate 120GB PATA HDD and ATI Radeon X850 XT video card. We used the latest Catalyst 5.4 drivers.
Business/General Use Performance
Business Winstone 2004Business Winstone 2004 tests the following applications in various usage scenarios:
- Microsoft Access 2002
- Microsoft Excel 2002
- Microsoft FrontPage 2002
- Microsoft Outlook 2002
- Microsoft PowerPoint 2002
- Microsoft Project 2002
- Microsoft Word 2002
- Norton AntiVirus Professional Edition 2003
- WinZip 8.1
Business Winstone is a good example of a collection of single threaded applications used in a relatively light multitasking manner; the Athlon 64 X2 4400+ does better than Intel's fastest dual core CPUs, but it is still slower than the fastest single core AMD chips.
Office Productivity SYSMark 2004SYSMark's Office Productivity suite consists of three tests, the first of which is the Communication test. The Communication test consists of the following:
"The user receives an email in Outlook 2002 that contains a collection of documents in a zip file. The user reviews his email and updates his calendar while VirusScan 7.0 scans the system. The corporate web site is viewed in Internet Explorer 6.0. Finally, Internet Explorer is used to look at samples of the web pages and documents created during the scenario."
Right off the bat, we see that the Athlon 64 X2 4400+ is reasonably competitive. Here, it is within striking distance of the FX-55, but all of the contenders are fairly close in performance.
The next test is Document Creation performance, which shows very little difference in drive performance between the contenders:
"The user edits the document using Word 2002. He transcribes an audio file into a document using Dragon NaturallySpeaking 6. Once the document has all the necessary pieces in place, the user changes it into a portable format for easy and secure distribution using Acrobat 5.0.5. The user creates a marketing presentation in PowerPoint 2002 and adds elements to a slide show template."
With a score of 224, we have a new record for performance. Remember that the Athlon 64 has never been able to execute more than one thread at a time. So, the performance benefit that AMD will see from dual core can be larger than what Intel has seen simply because Intel has had Hyper Threading on all of their desktop Pentium 4 CPUs for quite some time now. This is one such example where AMD gets a pretty big benefit from dual core, with the 4400+ outpacing the FX-55.
The final test in our Office Productivity suite is Data Analysis, which BAPCo describes as:
"The user opens a database using Access 2002 and runs some queries. A collection of documents are archived using WinZip 8.1. The queries' results are imported into a spreadsheet using Excel 2002 and are used to generate graphical charts."
The 4400+ offers the best performance that AMD can, but this test clearly favors Intel's Pentium 4/D architectures more.
Microsoft Office XP SP-2Here, we see in that the purest of office application tests, performance doesn't vary all too much.
Mozilla 1.4Quite possibly the most frequently used application on any desktop is the one that we pay the least amount of attention to when it comes to performance. While a bit older than the core that is now used in Firefox, performance in Mozilla is worth looking at as many users are switching from IE to a much more capable browser on the PC - Firefox.
ACD Systems ACDSee PowerPack 5.0ACDSee is a popular image editing tool that is great for basic image editing options such as batch resizing, rotating, cropping and other such features that are too elementary to justify purchasing something as powerful as Photoshop for. There are no extremely complex filters here, just pure batch image processing.
Once again, we find the X2 4400+ in between the two high end Athlon 64s and the two dual core Intel chips.
Multitasking Content Creation
MCC Winstone 2004Multimedia Content Creation Winstone 2004 tests the following applications in various usage scenarios:
All chips were tested with Lightwave set to spawn 4 threads.
- Adobe® Photoshop® 7.0.1
- Adobe® Premiere® 6.50
- Macromedia® Director MX 9.0
- Macromedia® Dreamweaver MX 6.1
- Microsoft® Windows MediaTM Encoder 9 Version 9.00.00.2980
- NewTek's LightWave® 3D 7.5b
- SteinbergTM WaveLabTM 4.0f
Here, we have another situation where the Athlon 64 X2 takes the lead. Note that this isn't the fastest Athlon 64 X2, but one of the more "affordable" CPUs, and cheaper than the FX-55, we might add.
ICC SYSMark 2004The first category that we will deal with is 3D Content Creation. The tests that make up this benchmark are described below:
"The user renders a 3D model to a bitmap using 3ds max 5.1, while preparing web pages in Dreamweaver MX. Then the user renders a 3D animation in a vector graphics format."
Immediately, the Athlon 64 X2 4400+ has become the most competitive AMD CPU that we have ever seen when it comes to SYSMark scores. We were curious as to why AMD said that SYSMark 2004 was the best contained benchmark that showcased dual core performance today; now we understand.
Next, we have 2D Content Creation performance:
"The user uses Premiere 6.5 to create a movie from several raw input movie cuts and sound cuts and starts exporting it. While waiting on this operation, the user imports the rendered image into Photoshop 7.01, modifies it and saves the results. Once the movie is assembled, the user edits it and creates special effects using After Effects 5.5."
The Internet Content Creation suite is rounded up with a Web Publishing performance test:
"The user extracts content from an archive using WinZip 8.1. Meanwhile, he uses Flash MX to open the exported 3D vector graphics file. He modifies it by including other pictures and optimizes it for faster animation. The final movie with the special effects is then compressed using Windows Media Encoder 9 series in a format that can be broadcast over broadband Internet. The web site is given the final touches in Dreamweaver MX and the system is scanned by VirusScan 7.0."
In all of the Internet Content Creation tests, the Athlon 64 X2 4400+ yielded the highest performance results that we've ever seen from any CPU, AMD or Intel.
Mozilla + Media EncoderObviously, the dual core Athlon 64 X2 will do very well if there's any sort of multitasking involved:
Video Creation/Photo Editing
Adobe Photoshop 7.0.1
Roxio VideoWave Movie Creator 1.5
MusicMatch Jukebox 7.10
Not all audio encoding applications are multithreaded, and thus, we don't see overly impressive performance here from the X2 4400+. That being said, the X2 is at least competitive with its faster single core relatives.
DivX 5.2.1 with AutoGKArmed with the DivX 5.2.1 and AutoGK 1.60, we took all of the processors to task at encoding a chapter out of "Pirates of the Caribbean". We set AutoGK to give us 75% quality of the original DVD rip and did not encode audio.
The days of AMD losing the encoding comparisons are over - the Athlon 64 X2 4400+ offers encoding performance that rivals the Pentium D 840. Unfortunately at the sub-$500 level, AMD remains fairly non-competitive in encoding performance.
XviD with AutoGKAnother very popular codec is the XviD codec, and thus, we measured encoding performance using it instead of DivX for this next test. The rest of the variables remained the same as the DivX test.
Windows Media Encoder 9
The Athlon 64 X2 continues to dominate in encoding performance in our own Windows Media Encoder 9 test.
Gaming PerformanceFor the next year or two, games will continue to be mostly single threaded, meaning that alone, they get no performance benefit from being run on a dual core CPU.
It is important to note that the Athlon 64 X2 4400+ is already faster in games than Intel's fastest dual core solutions. With the X2 series, you don't necessarily have to give up much gaming performance in order to reap the benefits of dual core. On average, the Athlon 64 X2 4400+ provides around 90% of the gaming performance of the single core FX-55, while being cheaper and offering all of the benefits of a dual core CPU. The choice at this price point, even for gamers, is obvious.
In Doom 3, the Athlon 64 X2 4400+ gives you 91% of the performance of a FX-55, but with all of the benefits of dual core - not too shabby.
Splinter Cell: Chaos Theory
In Splinter Cell, the X2 continues to offer around 90% of the performance of the single core Athlon 64 FX-55.
Half Life 2
The X2 gives us about 91% of the performance of the FX-55 in Half Life 2. The trend continues to look good for AMD.
Unreal Tournament 2004
Wolfenstein: Enemy Territory
3ds max 6For our 3ds max test, we ran the SPECapc benchmark for 3ds max and reported only the rendering composite score, as well as its components.
Despite the strong showing by the Athlon 64 X2 4400+, it isn't able to outperform the Pentium Extreme Edition 840, but it comes very close. The price difference alone is enough to give the X2 4400+ our nod, but AMD's performance here is nothing short of impressive.
Development Performance - Compiling FirefoxOur Quake 3 compile test was getting a bit long in the tooth, so we're introducing a brand new test: compiling Firefox. We followed the instructions diagramed here.
This particular test is only single threaded, and so we see that the fast single core CPUs take the lead. Intel's performance in this compiling test, as always is the case with our compiling benchmarks, is not up to par with AMD.
The Real Test - AnandTech's Multitasking ScenariosBefore our first dual core articles, we asked for feedback from the readers with regards to their multitasking usage patterns. Based on this information, we formulated some of our own benchmarks that would stress multitasking performance. We've already gone over the impacts of dual core CPUs on subjective interactions, so we'll just point you back to previous articles for our take on that if you haven't read them already. In the end, we know that dual core CPUs make our systems much more responsive and provide the same sort of smooth operation that SMP systems have done for years. But, the question now is: who has better multitasking performance? AMD or Intel? And that's exactly what we're here to find out.
We started with a test bed configured with a number of fairly popular applications:
Daemon ToolsWhat's important about this list is that a handful of those programs were running in the background at all times, primarily Microsoft's AntiSpyware Beta and Norton AntiVirus 2004. Both the AntiSpyware Beta and NAV 2004 were running with their real time protection modes enabled, to make things even more real world.
Norton AntiVirus 2004 (with latest updates)
DVD Shrink 3.2
Microsoft AntiSpyware Beta 1.0
Visual Studio .NET 2003
Macromedia Flash Player 7
Adobe Photoshop CS
Microsoft Office 2003
3ds max 7
Norton Ghost 2003
Adobe Reader 7
Splinter Cell: Chaos Theory
Multitasking Scenario 1: DVD ShrinkIf you've ever tried to backup a DVD, you know the process can take a long time. Just ripping the disc to your hard drive will eat up a good 20 minutes, and then there's the encoding. The encoding can easily take between 20 and 45 minutes depending on the speed of your CPU, and once you start doing other tasks in the background, you can expect those times to grow even longer.
For this test, we used DVD Shrink, one of the simplest applications available to compress and re-encode a DVD to fit on a single 4.5GB disc. We ran DVD Decrypt on the "Star Wars Episode VI" DVD so that we had a local copy of the DVD on our test bed hard drive (in a future version of the test, we may try to include DVD Decrypt performance in our benchmark as well). All of the DVD Shrink settings were left at default including telling the program to assume a low priority, a setting many users check in order to be able to do other things while DVD Shrink is working.
We did the following:
1) Open Firefox using the ScrapBook plugin loaded locally archived copies of 13 web pages; we kept the browser on the AT front page.
2) Open iTunes and start playing a playlist on repeat all.
3) Open Newsleecher.
4) Open DVD Shrink.
5) Login to our news server and start downloading headers for our subscribed news groups.
6) Start backup of "Star Wars Episode VI - Return of the Jedi". All default settings, including low priority.
This test is a bit different than the test we ran in the Intel dual core articles, mainly in that we used more web pages, but with more varied content. In the first review, our stored web pages were very heavy on Flash. This time around, we have a much wider variety of web content open in Firefox while we conducted our test. There is still quite a bit of Flash, but the load is much more realistic now.
DVD Shrink was the application in focus. This matters because by default, Windows gives special scheduling priority to the application currently in the foreground. We waited until the DVD Shrink operation was complete and recorded its completion time. Below are the results:
As we showed in the first set of dual core articles, tests like these are perfect examples of why dual core matters. The performance of the single core Athlon 64 FX-55 is dismal compared to any of the dual core offerings. You'll also note that the Athlon 64 X2 4400+ completes the DVD Shrink task in less than half the time of the higher clocked single core FX-55. The reasoning behind this is more of an issue with the Windows' scheduler. The problem in situations like these is that the Windows scheduler won't always preempt one task in order to give another its portion of the CPU's time. For a single threaded CPU, that means that certain tasks will take much longer to complete simply because the OS' scheduler isn't giving them a chance to run on the CPU. With a dual core or otherwise multi-threaded CPU, the OS' scheduler can dispatch more threads to the CPU, and thus, is less likely to be in a situation where it has to preempt a CPU intensive task.
In this test, the Athlon 64 X2 4400+ does better than the Pentium D 840, but the Extreme Edition manages to offer slightly better performance. A faster X2 shouldn't have much of a problem remaining competitive, however.
Multitasking Scenario 2: File CompressionFor our next test, we simulated what would happen if we performed two disk intensive tasks at the same time: zipping the source code to Firefox while importing a 260MB PST file into Outlook 2003. You'll note that this is a slightly modified version of the test that we originally created. We modified the test by archiving the Firefox source instead of a single smaller file; the reason being that we wanted a more realistic test (from a file size/count perspective) as well as the ability to discern better between contenders.
We ran the same Firefox and iTunes tasks from the last test again, and then did the following:
1) Open Outlook.
2) Start importing 260MB PST.
3) Start WinRAR.
4) Archive Firefox source.
WinRAR remained the application in focus during this test.
Here, we looked at two metrics: how long it took WinRAR to compress our test file, and how many emails were imported into Outlook during the time that WinRAR was archiving. Let's have a look at the results:
The Pentium D 840 was the fastest CPU here, even faster than the HT enabled Extreme Edition 840, which actually came in last. What's even more interesting is that the FX-55, a single core CPU, did better than two of the dual core chips. Remember that Windows' scheduler will give, by default, priority to the foreground task, which is why we see such a strong showing from the FX-55 here. But let's take a look at the other main task that ran in the background, the Outlook PST import:
Update: AnandTech reader manno pointed out that the metric we should be looking at here is emails imported per second while the archive task was running. Looking at these numbers, Intel actually comes out ahead with the Pentium D 840, with AMD in second place. All of the dual core chips outperform the single core Athlon 64 FX-55 by a huge margin, and once again, we see that Hyper Threading isn't always beneficial as the Extreme Edition actually runs slower than the regular Pentium D here.
Multitasking Scenario 3: Web BrowsingFor this benchmark, we decided to switch things up a bit and keep Firefox as our foreground application while background tasks ran.
The Firefox, iTunes and Newsleecher tasks from the first test scenario were also present in this one, plus we did the following:
Open Outlook, immediately import 130MB PST file and immediately switch app focus to Firefox.
We then recorded the total time required to import the new PST while Firefox was our foreground application. The results were very interesting:
The original version of this test wasn't even competitive on single-core AMD CPUs, but by toning down the Flash usage in Firefox (the major modification was that we removed the IGN page with a huge, and very animated Flash ad), we can now do a good single-to-dual core AMD comparison.
Once again, we see some very good results from the dual core platforms; the Athlon 64 X2 4400+ manages to offer significantly better performance than the higher clocked FX-55, as do the two Intel platforms.
Multitasking Scenario 4: 3D RenderingWe received several requests for a 3D rendering multitasking test, so we put one together. For this test, we ran our SPECapc 3ds max 6 benchmark while we had iTunes, Firefox and Newsleecher all running like we have in previous tests. The application focus remained on Firefox to give it the highest scheduler priority, and the results are below:
Once again, we have one of those situations where the Athlon 64 X2 4400+ is more than twice as fast as the Athlon 64 FX-55. 3ds max is actually one of the best ways to guarantee that you exploit problems with Windows' scheduler in a repeatable fashion. In fact, part of the reason for such huge performance gains for AMD in SYSMark 2004 is this exact type of scenario caused by 3ds max not allowing the Windows scheduler to preempt other running tasks properly. The result here is that single core systems are basically horrendous in performance and system response, while all of the dual core systems actually let you get work done.
What's also interesting is that the performance of the Athlon 64 X2 4400+ is virtually identical with the Athlon 64 FX-55 from our standalone 3ds max test (1.65 vs 1.66). In this benchmark, the Pentium Extreme Edition 840 takes a pretty significant lead, thanks to HT. We see that even with a dual core CPU, there are still some issues to overcome with the OS' scheduler. So, we get an unusually large increase in performance due to HT due to the scheduler being tricked into sending more threads to the CPU rather than attempting to have them preempt one another for CPU time.
Multitasking Scenario 5: CompilingOur final non-gaming multitasking scenario is quite possibly our most strenuous. It involves the following background tasks: iTunes playing a playlist, Firefox with the same 13 tabs open as in our other tests, and Newsleecher updating newsgroup headers. On top of those tasks, we compiled Firefox as well as ran our DVD Shrink operation on the "Star Wars Episode VI" DVD. Firefox remained the application in focus during the test.
The results were fairly interesting. First, let's look at how long it took us to compile Firefox:
The Athlon 64 X2 4400+ was stronger than either of the Intel CPUs in compiler performance, so it is no surprise that it is faster here. You'll notice that the single core Athlon 64 FX-55 isn't present in this chart - you'll find out why in a moment, but first, let's look at the performance of our DVD Shrink task that also ran in the background:
Once again, AMD is ahead of the competition, thanks to better general performance as well as all of the benefits of their low latency architecture. As for why the single core Athlon 64 FX-55 wasn't included here, well in this particular test, the DVD Shrink operation would have taken over 13 hours - which doesn't exactly fit with our graph's scale. The compiler operation also took significantly longer to complete. Whichever task completed first would eventually have let the other finish sooner, but we didn't care to find out as it was already ridiculously longer than any of the dual core solutions.
Gaming Multitasking ScenarioOur gaming multitasking test basically performs all of the tasks from our first Multitasking Scenario, with the exception of DVD Shrink. We have Firefox loaded with all 13 tabs from our new suite test, iTunes is running and playing a playlist, and Newsleecher is downloading headers. We kept Newsleecher in this test simply because it's the best way for us to be able to have a fairly CPU/disk intensive downloading task running in the background while still maintaining some semblance of repeatability. So, replace Newsleecher with BitTorrent or any other resource-consuming downloading that you may be doing and you're good to go. Note that although we refer to Newsleecher as disk intensive, it, like most downloading operations, isn't that disk intensive at all in the grand scheme of things; it just acts as a good real world background task to have running.
Of course, Norton AntiVirus 2004 and Microsoft's AntiSpyware Beta were also running in the background.
First, we ran our Doom 3 benchmark:
It's not surprising to see AMD at the top of the charts in a gaming comparison, but what's truly interesting is that the Athlon 64 X2 4400+ barely loses any performance due to the multitasking going on in the background. The non-loaded X2 4400+ platform runs at 99.6 fps and here, it drops down to 92% of that speed at 92.2 fps. Even the dual core Intel CPUs don't scale that well, with the Pentium Extreme Edition delivering 81% of its single task performance here. The only explanation for the excellent showing by AMD here is the benefits of their dual core architecture over Intel's, and it is a very impressive showing at that.
Next up is Splinter Cell:
We continue to see an impressive showing by AMD in their dual core performance - there is virtually no performance drop for AMD in this test.
The dual vs. single core comparison is pretty cut and dry. The Athlon 64 X2 4400+ offers nearly twice the performance of the fastest single core Athlon 64 FX.
Final WordsAt this point, having seen dual core CPUs from both AMD and Intel, there's no question that dual core is desirable on all fronts; whether we're talking about the server world or on your desktop, dual core improves performance by a noticeable amount and the performance benefits will only get better down the road.
As a server solution, the dual core Opterons enable a whole new class of performance to be realized on platforms. Two socket servers will now be capable of having the performance of a 4-way system, something that has never been possible in the past. AMD's push with dual core into the server markets half a year before Intel's dual core Xeon arrives is going to tempt a lot of IT departments out there; the ability to get 4-way server performance at much lower prices is an advantage that can't be beat.
Despite AMD's lead in getting dual core server/workstation CPUs out to market, Intel has very little reason to worry from a market penetration standpoint. We've seen that even with a multi-year performance advantage, it is very tough for AMD to steal any significant business away from Intel, and we expect that the same will continue to be the case with the dual core Opteron. It's unfortunate for AMD that all of their hard work will amount to very little compared to what Intel is able to ship, but that has always been reality when it comes to the AMD/Intel competition.
On the desktop side, we are extremely excited about the Athlon 64 X2. The 4400+ that we compared here today had no problem competing with and outperforming Intel's fastest dual core CPUs in most cases, and at a price of $581, the 4400+ is the more reasonably priced of the X2 CPUs. That being said, we are concerned that availability of the lower cost X2 CPUs will be significantly more limited than the higher priced models. At the ~550 marker, your best bet is clear - the Athlon 64 X2 will be faster than anything that Intel has for the desktop.
What's quite impressive is how competitive the Athlon 64 X2 is across the board. With the Pentium D, we had to give up a noticeable amount of single threaded performance (compared to Intel's top of the line Pentium 4 CPUs) in order to get better multithreaded/multitasking performance, but with AMD, you don't have to make that sacrifice. Everything from gaming to compiling performance on the Athlon 64 X2 4400+ was extremely solid. In multithreaded/multitasking environments, the Athlon 64 X2 is even more impressive; video encoding is no longer an issue on AMD platforms. You no longer have to make a performance decision between great overall performance or great media encoding performance - AMD delivers both with the Athlon 64 X2. Also keep in mind that the performance preview that we gave of the Athlon 64 X2 today is actually a very conservative estimate. The shipping Athlon 64 X2 CPUs will run with regular DDR memory and with much faster motherboards - meaning that you should be prepared to be impressed even further down the road.
The real problem is that AMD has nothing cheaper than $530 that is available in dual core, and this is where Intel wins out. With dual core Pentium D CPUs starting at $241, Intel will be able to bring extremely solid multitasking performance to much lower price points than AMD will. And from what we've seen, it looks like that price advantage will continue for quite some time. It all boils down to economics, and in the sense of manufacturing capacity, Intel has AMD beat - thus allowing for much more aggressively priced volume dual core solutions. Then there's the issue of availability; as impressive as AMD's dual core desktop offerings are, we're honestly worried that we won't see any real volume until late this year at best. Intel does have a golden opportunity now to really step forward and regain some enthusiast marketshare, but we seriously doubt that we'll see anything faster than the Pentium D 3.2 anytime soon. It's strange how tables have turned, making Intel look like the value CPU manufacturer in the dual core race.
Now that we've seen both AMD and Intel dual core solutions, it's time to play the waiting game. Dual core Opteron 8xx series CPUs should be available now, with the 2xx and 1xx series following in about a month. The Pentium D and Pentium Extreme Edition should be shipping before the end of this month, with expected retail availability next month. And the big wait, of course, will be for the Athlon 64 X2, which will be available towards the end of this year.
Our dual core coverage does not stop here. We have more in the works including the promised Workstation comparison, a look at how multitasking in Linux is impacted by dual core, and even more multitasking scenarios modeled based on your feedback (so, keep it coming).