Original Link: http://www.anandtech.com/show/1615
Intel Xeon 3.6 2MB vs AMD Opteron 252 Database Testby Jason Clark & Ross Whitehead on February 14, 2005 8:00 AM EST
- Posted in
- IT Computing
It's been five months since either of the processor giants released a new server processor. Today, both Intel and AMD have new offerings. Intel has updated their 3.6 GHz Xeon to include an additional 1MB of L2 cache, and AMD has bumped their quickest Opteron up 200Mhz to 2.6GHz with the Opteron 252. Neither one of these upgrades is groundbreaking, but they do offer some performance increases, especially the 2MB Xeon. We'll see some more significant releases later this year from both manufacturers with their Dual Core offerings.
Instead of a clock increase, Intel decided to throw some cache at the existing 3.6 Xeon units. In one of our previous articles, we took a look at a 4MB Gallatin Xeon and compared it to an Opteron. The results showed that the 4MB cache on the Gallatin didn't boast any large increases over that of the Opteron with 1MB of L2 cache. The main reason for that was the 400Mhz bus, which starved the Gallatin of precious bandwidth. Times have changed; Intel recognized the bandwidth issue and today, an extra 1MB of L2 cache on the 800Mhz bus that the Nocona and Irwindale Xeons offer does make a difference. Of course, the difference depends entirely on the workload, which we'll explain further as we reveal our results.
The Opteron 252 is mostly a clock speed increase from 2.4GHz to 2.6GHz, but there are a few of other differences that are worth mentioning. The packaging has changed on the new 252 from ceramic to organic - you can see the difference from a 250 to the 252 below. Aside from the packaging, AMD has also thrown in SSE3 instructions, increased the HyperTransport to 1GHz, and the 252 is manufactured on 90nm. As for the Dual Core roadmap for AMD, it remains on schedule for mid-2005. Dual core Opterons will be socket compatible with existing 940 pin sockets that support 90nm (95W/80A).
|Click images to enlarge.|
64bit SQL Server Tests?
In our recent SQL articles, we've been asked, "where are the 64 bit tests?" Who cares about 32 bit based tests? First, we're right on top of 64 bit testing for SQL Server - remember that this application is still in beta. Regarding the second question, the large majority of SQL Server database servers are running on 32 bit platforms, so a lot of people do care. That being said, 64 bit SQL Server is definitely sought after, and we are going to provide coverage as soon as we can.
Tyan K8SRE S2891The Tyan K8SRE is the latest server based Opteron board from a well-known motherboard manufacturer, Tyan. The K8SRE features Nvidia's nForceTM Professional 2200 core logic solution. For more information on the nForce Professional chipset, check out Derek Wilson's excellent coverage. Overall, the Tyan board performed well in our tests. We did, however, have some compatibility issues with our Crucial memory on this board. Some minor BIOS tweaks managed to get us up and running, and stable. We'd recommend that you adhere to memory that is officially supported by Tyan to avoid any compatibility issues - our memory was not on the recommended list.
1Ghz HyperTransport SupportAccording to AMD, the 252 supports a 1GHz HyperTransport bus frequency. The Tyan board sets the HyperTransport bus frequency automatically to 800MHz, which is what we used for our tests. We did, however, manually forced the HyperTransport frequency to 1GHz using nVidia's nTune and there was no difference in performance in any of our tests.
Test Software ConfigurationWindows 2003 was configured with /3GB and /PAE switches in the boot.ini to support the 8GB of memory used for our tests. SQL Server Enterprise was set to use AWE extensions and a maximum memory limit was set at 6144MB.
Test hardware configuration
Intel Xeon System
3.6 GHz Nocona 1MB L2
3.6 GHz Nocona 2MB L2
Intel SE7620AF2 Motherboard
8GB Crucial PC2-3200 DDR2 Memory
Windows 2003 Enterprise Server (32 Bit)
8 x 36GB 15,000RPM Ultra320 SCSI drives in RAID-0
LSI Logic 320-2 SCSI Raid Controller
Tyan K8SRE S2891 Motherboard
8GB Crucial DDR-3200 Memory
Windows 2003 Enterprise Server (32 Bit)
8 x 36GB 15,000RPM Ultra320 SCSI drives in RAID-0
LSI Logic 320-2 SCSI Raid Controller
SQL Stress Tool Benchmark
Our first benchmark was custom-written in .NET, using ADO.NET to connect to the database. The AnandTech Forums database, which is over 14GB in size at the time of the benchmark, was used as the source database. We'll dub this benchmark tool "SQL Stress Tool" for the purposes of discussing what it does. We have done some updates to the tool since we first used it; it now supports Oracle, and MySQL. We also adjusted the test time for this test and future tests to 20 minutes. The reason for this was to ensure that we used as much memory as possible for future planned 64 bit tests.
SQL Stress allows us to specify the following: an XML based workload file for the test, how long the test should run, and how many threads it should use in which to load the database. The XML workload file contains queries that we want executed against the database, and some random ID generator queries that populate a memory resident array with ID's to be used in conjunction with our workload queries. The purpose of using random ID's is to keep the test as real-world as possible by selecting random data. This test should give us a lot of room for growth, as the workload can be whatever we want in future tests.
Example Random ID Generator:
The workload used for the test was based on every day use of the Forums, which are running FuseTalk. We took the most popular queries and put them in the workload. Functions, such as reading threads and messages, getting user information, inserting threads and messages, and reading private messages, were in the spotlight. Each reiteration of the test was run for 20 minutes, with the first being from a cold boot. SQL was restarted in-between each test that was run consecutively.
The importance of this test is that it is as real world as you can get; for us, the performance in this test directly influences what upgrade decisions we make for our own IT infrastructure.
SQL Stress Results
The SQL Stress results have changed somewhat from some of our earlier articles using this tool. We did a revamp of the tool itself, which is more performant on high volume queries. Also, we lengthened the test time to 20 minutes and changed the queries around some to reflect our current FuseTalk version. The new 2MB L2 Xeon part did quite well here, churning out a 7% gain over its 1MB counterpart. The Opteron 252 gained its usual 7% for its clock increase of 200MHz. There was no gain from the 1GHz HT link support as we discussed in our Test Hardware configuration on Page 2. Overall, the new Xeon 2MB part was the "hands down" winner for this test with a 13% lead over the Opteron 252, thanks to its 1MB cache boost.
Total queries executed
The number of queries that were executed throughout the duration of the test.
Queries per second
An average of how many queries per second were executed throughout the duration of the test.
"Order Entry" Stress Test: Measuring Enterprise Class Performance
One complaint that we've historically received regarding our Forums database test was that it isn't strenuous enough for some of the Enterprise customers to make a good decision based on the results.
In our infinite desire to please everyone, we worked very closely with a company that could provide us with a truly Enterprise Class SQL stress application. We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment).
We will use an Order Entry system as an analogy for how this test interacts with the database. All interaction with the database is via stored procedures. The main stored procedures used during the test are:
sp_AddOrder - inserts an Order
sp_AddLineItem - inserts a Line Item for an Order
sp_UpdateOrderShippingStatus - updates a status to "Shipped"
sp_AssignOrderToLoadingDock - inserts a record to indicate from which Loading Dock the Order should be shipped
sp_AddLoadingDock - inserts a new record to define an available Loading Dock
sp_GetOrderAndLineItems - selects all information related to an Order and its Line Items
The above is only intended as an overview of the stored procedure functionality; obviously, the stored procedures perform other validation, and audit operations.
Each Order had a random number of Line Items, ranging from one to three. Also randomized was the Line Items chosen for an order, from a pool of approximately 1500 line items.
Each test was run for 10 minutes and was repeated three times. The average between the three tests was used. The number of Reads to Writes was maintained at 10 reads for every write. We debated for a long while about which ratio of reads to writes would best serve the benchmark, and we decided that there was no correct answer. So, we went with 10.
The application was developed using C#, and all database connectivity was accomplished using ADO.NET and 20 threads - 10 for reading and 10 for inserting.
So, to ensure that IO was not the bottleneck, each test was started with an empty database and expanded to ensure that auto-grow activity did not occur during the test. Additionally, a gigabit switch was used between the client and the server. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.
At the beginning of each platform, both the server and client workstation were rebooted to ensure a clean and consistent environment. The database was always copied to the 8-disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, and the empty one was copied again to the clean array. SQL Server was not restarted.
"Order Entry" Stress Test results
Our Vendor test has received quite a bit of interest from certain processor vendors, rightfully so as the workload is quite difficult to recreate. As you can see from the results below, we have a completely different outcome from the SQL Stress results. The extra 1MB of L2 cache on the Xeon part made a significant difference. In a test formally dominated by the Opteron, the Xeon now takes a 12% lead. This test obviously benefits from the added cache, and the 800MHz front side bus does a much better job of moving the data than the slower bus architectures of the Xeon platform. In a previous article, we tested a 4MB Xeon part, and it barely managed a 3% gain over the Opteron - times have changed.
To give you an idea of the scale of this benchmark, we have graphs of stored procedures calls per second. We decided to focus on Stored Procedures / Second rather than Transactions / Second, as the definition of a Transaction can have a business context or a technical context.
Data Warehouse Test Explained
We are always looking to improve the quality of our reviews and as a result, we have added a new Stress Test to our suite.
This "Data Warehouse" test is focused on large record sets with plenty of aggregation. This test is based on a system that we developed to track and manage Request statistics for www.AnandTech.com and Forums.AnandTech.com. It tracks statistics like Requests/Hour, Requests/Hour/IP Address, Unique IP Addresses/Hour, Unique Users/Hour, Daily Browser stats, etc. These stats are further summarized by site, i.e.: www or Forums.
As with the other Stress Tests, each test was repeated three times and the average between the three tests was used. For this Data Warehouse Stress Test, we defined a quantity of work to complete and measured how long each platform required to process the workload.
So, to ensure that IO was not the bottleneck, each test was started with a database, including tempdb, which had already been expanded so that autogrow activity did not occur during the test. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor where used when establishing the baseline for the test, but never during execution of the tests.
At the beginning of each platform, the server was rebooted to ensure a clean and consistent environment. The database was always copied to the 8 disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, the original database was copied again to the array, and SQL Server was restarted.
There is no "client" required for this test. The workload is initiated by a stored procedure call from Query Analyzer.
Data Warehouse results
As can be seen from the results, the doubling of the L2 Cache did not improve performance on the Xeon's in this test. We speculate that this is because the test consists of approximately 10 long running queries, which are processing datasets that are 100's of megabytes, thus not allowing for reuse of the L2 Cache. The Opteron, on the other hand, excelled due to its on-die memory controller. The different platforms exhibited approximately 50% CPU utilization across all CPUs. This demonstrates that you do not have to be CPU bound to experience the benefits of faster CPUs.
Price comparison & Final Words
In previous articles, we've taken a look at the cost of the processor itself. Since servers aren't just about the processor, we've taken our pricing to an entire platform. We've attempted to spec out Intel and AMD servers from 2 different vendors and have them as close as possible in terms of features. There are obviously a few differences here and there, but as illustrated below, the price difference is negligible between either platform when taking into account the features missing on either platform. Note that we are comparing Dual Intel 3.6 1MB L2 based servers against Dual Opteron 250 servers, since the newer products that we have discussed in this article are not yet in the retail channel.
|HP ProLiant DL360 SCSI||HP ProLiant DL145 SCSI||IBM xSeries 336||IBM eServer 326|
|CPU||Dual 3.6 GHz 1MB L2||Dual Opteron 250 (2.4GHz)||Dual 3.6 Ghz 1MB L2||Dual Opteron 250 (2.4 GHz)|
|Hard Drive||36.4 Pluggable Ultra320 (15,000 RPM)||36.4 Non Pluggable Ultra320 (15,000 RPM)||IBM 36GB 2.5" 10K SCSI HDD HS||36GB 10K U320 SCSI HS Option|
|SCSI Controller||Smart Array 6i Plus controller (onboard)||Dual Channel Ultra 320 SCSI HBA||Integrated Single-Channel Ultra320 SCSI Controller (Standard)||Integrated Single-Channel Ultra320 SCSI Controller (Standard)|
|Bays||Two Ultra 320 SCSI Hot Plug Drive Bays||Two non-hot plug hard drive bays||4 hot swap bays||2 hot swap bays|
|Network||NC7782 PCI-X Gigabit NICs (embedded)||Broadcom 5704 Gigabit Nics (embedded)||Dual integrated 10/100/1000 Mbps Ethernet (Standard)||Dual integrated 10/100/1000 Mbps Ethernet (Standard)|
|Power||460W hot pluggable power supply||500W non hot plug power supply||585W power supply||411W Power Supply (Standard)|
|Server Management||SmartStart & Insight Manager||None||System Management Processor (Standard)||System Management Processor (Standard)|
We've illustrated how workload has a significant effect on platform decision when it comes to database servers. Obviously, for a small to medium business, where there are multiple different workloads being run on the same server, the decision to go with a platform architecture best suited for Data warehousing alone doesn't make sense. But for larger organizations where multiple database servers are used, each having a specific purpose, the decision to go with one platform or another could have a significant impact on performance. For dual-processor applications, Intel leads the way in everyday small to heavy transactional applications, whereas AMD shines in the analytical side of database applications "Data Warehousing".
These results do raise some questions as to what is going on exactly during each test at an architectural level. Is the processor waiting for data from the L2 cache? Is the processor branch prediction units not suited for this particular workload? Is there a bottleneck with memory latency? We want these questions answered, and are going to investigate ways to provide concrete answers to these tough questions in the future.