Original Link: http://www.anandtech.com/show/1209
AMD Opteron 248 vs. Intel Xeon 2.8: 2-way Web Servers go Head to Headby Anand Lal Shimpi & Jason Clark on December 17, 2003 9:15 AM EST
- Posted in
- IT Computing
The launch of the Athlon MP was a bittersweet victory for AMD just under two years ago. AMD was able to deliver performance that was significantly faster than Intel’s brand new Xeon, but despite performance leadership, the CPU never really took off.
AMD had limited success in the server market with the Athlon MP, with most of their sales going to HPC customers for large clusters, but very few sales in the web and database server arenas. With no Tier 1 OEMs supporting the platform, most of the larger IT firms wouldn’t touch the Athlon MP with a 10-foot pole, so Intel enjoyed uninterrupted dominance in the web/database server markets.
By the end of the Athlon MP’s life, Intel’s performance had improved significantly to the point where AMD no longer held a performance advantage (although their usual low cost was a factor), further reducing any reason to pursue Athlon MP based servers.
The launch of the Opteron processor gave AMD a much needed breath of new life and energy, especially with the announcement that IBM would be producing servers based on the new Opteron platform. Unfortunately, IBM’s designs are, once again, targeted at the HPC market and left the web and database servers for Intel and IBM processors to handle.
More recently, Sun announced support for the Opteron in their 2004 product line, but again, it is on the shoulders of the 2nd and 3rd tier manufacturers to provide Opteron solutions for web and database serving applications. But before there can be a demand, there must be some information on the performance of the Opteron in these sorts of applications.
We’ve already seen how the Opteron can perform in most computation-intensive applications as well as workstation applications, but what about as a web server? Or a database server? In our original coverage of AMD’s Opteron, we offered some performance analysis of both web and database server applications with the Opteron, but AMD has made a couple of steps recently to warrant a second look at the performance picture.
First and foremost, the launch of 4-way Opteron platforms has made many of our IT readers (and us included) wonder how a 4-way Opteron would stack up against a 4-way Xeon MP box. With AMD’s more scalable Opteron architecture, any performance advantages a 2-way Opteron had over a 4-way Xeon should, in theory, be greater.
AMD has also recently launched higher clock speed versions of the Opteron at 2.2GHz, equal in speed to the fastest Athlon 64 FX currently available.
But quite possibly one of the biggest reasons for this comparison is that we’ve been looking internally to upgrade our server platforms from the aging Athlon MPs and needed to evaluate the Opteron as a potential upgrade path.
Since we last wrote about our server upgrades at AnandTech, we added a 2-way Xeon DP 2.8GHz server with Hyper-Threading and were pleasantly surprised with the performance offered by the platform. We have also spent a great deal of time looking at 4-way solutions for a potential upgrade to our database servers, also requiring a more in-depth look at the latest in Opteron offerings.
We have more than just this one article to bring to you the full spectrum of Opteron performance; but to kick it all off, we’re going to look at web serving performance in a head-to-head match between the Opteron and Xeon.
We’re not going to rehash any of the Opteron’s architecture in this article, so make sure that you’ve read our Intro to Opteron/K8 Architecture before proceeding.
AMD Updates their 2xx Series
When the Opteron was launched back in April, it was launched at 1.4, 1.6 and 1.8GHz speeds with support for up to two processors and DDR333 memory. Since then, as we’ve already mentioned, 4-way setups have been announced (Opteron 8xx series) and higher clock speeds have debuted as well. More recently, support for DDR400 has been announced so that the Opterons are no longer running any slower or with fewer features than the Athlon 64 FX.
The Opteron’s naming system can be a little confusing at first, so let’s revisit that given the current batch of CPUs out there.
With the Opteron, AMD introduced their first-ever 3-digit naming system for server/workstation CPUs. The first digit indicates whether the CPU was designed for 1-way, 2-way or 4- to 8-way operation. For example, the Opteron 100 series is only validated for use in uniprocessor configurations while the Opteron 200 series is validated for use in uni- and dual processor configurations. Finally, the Opteron 800 series is validated for use in up to 8 processor configurations.
Note that we used the phrased “validated for use” because there is very little stopping an Opteron 100 CPU from being used in a dual processor environment (an Opteron 100 is identical to an Opteron 200 and an Opteron 800). As far as we know at this point, AMD has not prevented CPUs from being used in configurations in which they weren’t intended to be used. Although, it wouldn’t be too hard for them to prevent it in the future if it becomes a problem.
The second two digits in the Opteron name are what determine the “performance” of the part and currently, the remaining two digits of Opteron 1xx, 2xx and 8xx series processors are comparable to one another.
Currently, the Opteron is available in clock speeds ranging from 1.40GHz to 2.20GHz in 200MHz increments, which correspond to the following parts:
Opteron 140, 240, 840 = 1.4GHz
Opteron 142, 242, 842 = 1.6GHz
Opteron 144, 244, 844 = 1.8GHz
Opteron 146, 246, 846 = 2.0GHz
Opteron 148, 248, 848 = 2.2GHz
Just recently, the x48 parts were launched, and with them, the Opteron gained support for DDR400 memory. Support for DDR400 has trickled down to all members of the Opteron family, but only certain revisions of the CPUs support DDR400. To tell whether or not a CPU has DDR400 support, you will have to look at the last two digits of its part number. If they are “AL”, “AK” or “AM”, then the CPU is a Rev C0 CPU and it supports DDR400. If the part number ends in “AH”, “AG” or “AI”, then the CPU is an older revision and does not support DDR400 memory.
For this review, we are only comparing Opteron 248 and 244 parts. We will take a look at the performance of the new 8xx parts in a 4-way configuration in our upcoming database server performance shootout.
Let’s Not Forget IntelIntel also has made some new CPUs available since the last time we dusted off our Xeon test beds. The Xeon MP is now available in 1MB and 2MB L3 cache sizes at speeds of up to 2.8GHz.
The 2MB L3 version of the Xeon MP was actually the basis for the Pentium 4 Extreme Edition that was launched just a week before AMD released the Athlon 64 FX.
AnandTech Web TestsOne of the biggest issues in doing enterprise class web and database server benchmarking is that the test suites for conducting such benchmarks are simply too few and far in between.
There are some benchmarks out there that are great for simulating extremely large workloads. However, there is very little designed for the vast majority of scenarios into which these 2-way and 4-way servers are deployed.
In order to compare the Opteron and Xeon in a real-world fashion, we took the time to create some of our own benchmarks. Necessity is the mother of invention, after all...
The Evolution of the Web ServerWeb servers have evolved over the years from the traditional static delivery model to the dynamic model common to present web architecture. The static model was basically static text files with all of the web content pre-authored and ready for client delivery. Today, most web architectures are much more complex.
Present architecture usually consists of a web server, application server and a back-end database server. In most cases, the application server resides on the same physical machine as the web server (two-tier architecture), but in some cases, the Application server resides on its own physical machine (three-tier architecture). The web server’s job is to deliver the content that is generated by the application server to the client. The application server either interprets code or executes compiled code written by the developer and hands it off to the web server for client delivery. As you can guess, the web architecture of today is much more strenuous on hardware as it has to actually work to deliver web content to the client instead of just spitting out HTML files.
There are numerous application servers used in today’s web applications. AnandTech uses Macromedia ColdFusion MX, which is simply a language that runs on top of a J2EE server (Macromedia JRUN in our case). J2EE is a Java based runtime platform that Sun Microsystems created, which allows developers to create enterprise level applications and deploy those applications on a standards based platform. Because of our familiarity and the fact that J2EE is a popular industry standard application server, we chose it for our web tests.
The Web Application Server Test EnvironmentWith a web application server in place, we needed an application to put under load. We used FuseTalk Enterprise (http://www.fusetalk.com) as our web application as it’s written for ColdFusion MX and is based on the component architecture that most enterprise level applications consist of.
To put FuseTalk under load, we used WAS (Web Application Stress Tool), a free tool released by Microsoft that can record a user’s interaction with a web application and play it back as fast or as slow as required. We used a 200-user load and didn’t record any delay between requests. ColdFusion was set up to process 20 simultaneous threads, a number derived from CPU load. WAS was also set to deliver a maximum of 20 threads to ColdFusion at a time. The main goal was to simulate heavy load on the server in order to bring out differences between the platforms compared.
Once the application was under load, we recorded how long the hardware took to process a page from when the application server starts processing the template to when it hands it off to the client. Some web tests record from when the client requests the page to when the client receives the data. We think that method is flawed when testing how hardware impacts the performance of an Application server. When testing web application server performance as it relates to hardware, you want to remove as many bottlenecks and variables as possible. If you measure how long it took to deliver the data to the client, you are relying on consistent network performance. Network performance can be a variable based on data collision and many other factors.
The Database Server BottleneckThere is almost always a bottleneck in hardware tests — ours was the database. The majority of web applications require a database server for data retrieval and storage. We used Microsoft SQL Server 2000 as our back-end database server for these tests. Prior web server tests that we’ve performed have been on single or dual CPU servers and have reflected accurately the performance of the hardware on which the tests were run. When we started running the tests on the servers, specifically the Opteron, we found that the tests had hit a wall and were not reflecting accurately the performance of the hardware. The issue was I/O, the largest bottleneck in any database server. We were putting so much load on the servers to get the CPU working hard that the database server I/O was maxed out. This, of course, caused the web tests to hit a performance plateau as each request run against the web application server has to wait for the database to complete its data transaction.
So how do you get rid of an I/O bottleneck? Well, if you have the resources, you throw more disks at the problem. The problem with that solution is it gets expensive to have massive RAID arrays that are just used for testing purposes and have to be continually upgraded as hardware improves. To get around this expensive issue, we used the fastest and most inexpensive storage medium you can — memory. We created a RAMDISK and put the database on that new drive. RAMDISK has been around for years and has matured over the years to solutions that can be formatted as NTFS drives, which are easier to manage. Effectively, a RAMDISK partitions a portion of physical memory and allows it to be formatted as a drive letter that is available to the operating system.
Once we solved our bottleneck, the testing began and as you read on, you’ll see how the different platforms performed while under real-world web application load.
The TestFor the test, we had three systems: the Opteron and Xeon test beds, as well as a database server to feed the web servers being tested.
The database server common to all tests had the following components:
2 x Opteron 246 processors (2.0GHz)
4 x 512MB DDR333 DDR SDRAM modules
MSI K8T Master2-FAR
Microsoft Windows 2000 Server SP4
Microsoft SQL Server 2000 SP3
As we mentioned in the sections on setup of the tests, we used a RAMDISK for the database, so I/O performance was not an issue.
The Opteron test bed was configured as follows:
2 x Opteron 244 or 248 processors (1.8GHz/2.2GHz respectively)
8 x 512MB DDR333 DDR SDRAM modules
AMD reference 4-way Opteron motherboard
Microsoft Windows 2003 Enterprise Server with IIS6
Macromedia ColdFusion 6.1
The Xeon test bed was configured as follows:
2 x Xeon MP 2.0/2.8GHz processors
8 x 512MB DDR333 DDR SDRAM modules
Intel 4-way Xeon motherboard based on the ServerWorks GrandChampion HE chipset
Microsoft Windows 2003 Enterprise Server with IIS6
Macromedia ColdFusion 6.1
First Round K.O.We measured performance using two metrics: the average time it took to fulfill a request to the web server, and the total number of templates (pages) served by the web server during the 30-minute test period. The two numbers are related, but both are useful to look at in order to get an idea of the real world difference in performance between the platforms.
All of our tests were done on dual processor configurations. So, to make the charts easier to read, we omitted any 2-way labeling on the CPU names themselves.
Here, you can see the real-world performance advantages from another angle. Instead of looking at it as how much more responsive the Opteron server was, look at it from a standpoint of how many more people were able to access the site being hosted.
The performance, once again, speaks for itself. Just as the Athlon MP was a leader in web and database serving performance, the Opteron carries the torch for AMD this time around.
Keep in mind that web and database server applications are very sensitive to memory performance. So, although the Xeon attempts to hide larger memory access latencies with its 2MB L3 cache, the Opteron’s on-die memory controller helps improve performance significantly. The Opteron’s TLB optimizations work alongside the on-die memory controller to ensure that accesses to main memory (which will happen more frequently on the Opteron than on the Xeon because of the absence of any L3 cache) occur as quickly as possible.
Final WordsWhat gave the Athlon MP its performance advantage was AMD’s short-pipeline, high IPC (Instructions Per Clock) architecture. What we saw at the beginning of the Athlon MP vs. Xeon matches back in 2001 was that AMD was trouncing Intel without even breaking a sweat. However, as the Xeon ramped up in clock speed, the performance gap and later the advantage began to shift towards Intel.
With the Opteron, we are seeing an even more devastating advantage for AMD because, this time around, AMD isn’t only relying on a higher IPC core to gain the upper hand. The Opteron’s on-die memory controller is one of the biggest assets that the CPU has in the server environment, and as you can see by the performance results we’ve shown here today, it is an asset that is more valuable than the Xeon’s Hyper-Threading.
The choice today is clear. In 2-way configurations, the Opteron is a much more powerful and capable web server than Intel’s Xeon. But the performance tests are nowhere near over. We’ve been playing around with AMD’s 4-way Opteron 848 machines for months now and are not far away from bringing you the first head-to-head comparison between the Opteron 848 and a 4-way Intel Xeon MP system. AMD has been praising their Opteron architecture for MP scalability, and soon, we’ll be putting their claims to the test.
The true test that remains, however, is a test comparing AMD’s Opteron to Intel’s Itanium 2. Intel was not very receptive to the idea of doing a head-to-head; not out of a fear of losing, but out of a desire not to lend AMD any credibility by showing that the Opteron is indeed a competitor to the Itanium 2. While we do believe that the Itanium 2 in its 128-way configurations is definitely out of the Opteron’s league, in the 2-way and 4-way configurations that we are interested in comparing, the two are absolutely competitors.
Whether Intel is looking to supply us with an Itanium 2 system or not, we will make that comparison. It seems that if these web server results are any early indication, AMD has more than enough credibility with the Opteron to at least step up to bat with the Itanium 2 pitching.