Original Link: https://www.anandtech.com/show/1473




Introduction

Over a year and half has passed since AMD announced their K8 architecture to the world, and what has changed? Well, "a heck of a lot" is the answer. The Opteron has proven itself as a worthy competitor to the infamous Intel Xeon line-up of processors, and Intel has been following AMD for a change, something no one could have predicted a few years ago. Dual-core processors are on the horizon; AMD demonstrated theirs with Hewlett Packard just a few weeks ago, and Intel demonstrated theirs at the Intel Developers Conference in September, 2004.

So, we've seen AMD compete on both the desktop and server market, but does this transgress into a victory in corporate America? Well, it has certainly piqued their interest enough for Intel to comment about it in a recent news.com article. Itanium hasn't been quite the success that Intel was hoping for, but that doesn't mean that AMD has the server market by the reigns quite yet, not even close. AMD still has an uphill battle to fight, with Intel owning over 80% of the PC processor market, and AMD owning about 15% as of August 2004. To AMD's credit, they have signed a few of the first-tier customers like HP, IBM and Sun, and last November, AMD has announced their new manufacturing plant in Dresden to keep up with demand.

One thing for certain is that neither of the processor giants are sitting around taking their success lightly. Well aware that AMD is knocking on the door, Intel has finally released the new Nocona line of Xeons, which follow in the 64bit footsteps of AMD with EM64T. AMD has released their latest Opteron clock increase with the new 250 line of processors, which is a 2.4 GHz Opteron for those who prefer the clock speed version.




Nocona - New Life into the Xeon Line-up

When AMD first broke news of their K8 announcement, Intel basically denounced AMD's move, stating that it was premature and the world wasn't ready for it. OK, so Intel was half right on the software side of things. The Windows world is still punting along at 32bits, while the Unix gang have embraced 64bit computing like a new flavor of coffee at Starbucks. Microsoft has promised that we'll have 64bit versions of Windows XP and Windows 2003 Server sometime next year. Microsoft is also readying 64bit versions of SQL Server and the .NET framework.

Although the 64bit landscape is currently bleak for Windows users, that didn't stop Intel from conceding that AMD was stealing some thunder from their server line-up. In February 2004, Intel announced their first processor that runs 32bit and 64bit applications (Nocona), and their naming schema for AMD's x86-64, EM64T. The Nocona processor is essentially a re-badged Pentium F Prescott processor with validated multiprocessor support. If you're interested in the nitty-gritty on the architecture of the Nocona, read our extensive article covering the Prescott architecture written in February 2004. The highlights of the new Nocona processor are a front side bus jump to 800MHz and an increase of the entry level processors, L2 cache to 1MB.

Along with Nocona comes Intel's new chipsets, E7525 Tumwater, which is targeted at the workstation market with PCI-Express x16 graphics, and the E7520/7320 Lindenhurst, which are targeted at the server market. We have both chipsets in the lab, but obviously used the E7520 Lindenhurst server chipset for this comparison.

Opteron 250

The Opteron 250 is yet another clock speed increase in the Opteron line, taking clock speed from 2.2GHz to 2.4GHz. The 250 is still built on AMD's 130nm fab process, and we should see 90nm Opterons by year's end.




Hyper Threading

Intel's Hyper Threading technology has been widely accepted in the enterprise and desktop markets, to the point where the vast majority of systems ship with Hyper Threading enabled.

Our tests have shown that Hyper Threading improved performance 3% - 5% on average and thus, we left it enabled for all of our tests here.

The Tests

We ran two sets of tests for this comparison: an updated version of our own home-grown tests on the AnandTech Forums Database, as well as another more strenuous test representative of enterprise-class transactional database serving applications. We will discuss the two tests in greater detail in the coming pages, but first, the basic hardware configuration for our tests:

Opteron System
Dual 250 Opteron processors
4GB PC3200 DDR (Kingston KRX3200AK2) memory
Tyan K8W motherboard
Windows 2003 Enterprise Server (32 Bit)
8 x 36GB 15,000RPM Ultra320 SCSI drives in RAID-0

Xeon System
Dual 3.6GHz Xeon processors
4GB DDR2 memory
Intel SE7520AF2 motherboard
Windows 2003 Enterprise Server (32 Bit)
8 x 36GB 15,000RPM Ultra320 SCSI drives in RAID-0




Constructing a Database Benchmark (average load)

Our first benchmark was custom-written in .NET, using ADO.NET to connect to the database. The AnandTech Forums database, which is over 14GB in size at the time of the benchmark, was used as the source database. We'll dub this benchmark tool "SQL Loader" for the purposes of discussing what it does.

SQL Loader allows us to specify the following: an XML based workload file for the test, how long the test should run, and how many threads it should use with which to load the database. The XML workload file contains queries that we want executed against the database, and some random ID generator queries that populate a memory resident array with ID's to be used in conjunction with our workload queries. The purpose of using random ID's is to keep the test as real-world as possible by selecting random data. This test should give us a lot of room for growth, as the workload can be whatever we want in future tests.

Example workload:

< workload>

< !--- A SAMPLE WORKLOAD QUERY THAT RETURNS ALL THE FIELDS FROM THE PRIVATEMESSAGES TABLE RANDOMLY --->

<query>

<code>select * from privatemessages where imessageid = @pmessageid</code>

<type>read</type>

<randkey>pmessageid</randkey>

</query>

<!--- RANDOM ID GENERATOR FOR SELECTING RANDOM PRIVATE MESSAGES --->

<randomid>

<rcode>select imessageid,newid() as pmsgid from privatemessages order by pmsgid</rcode>

<name>pmessageid</name>

</randomid>

< /workload>


A screenshot of the SQL Loader


Test Information

The workload used for the test was based on every day use of the Forums, which are running FuseTalk. We took the most popular queries and put them in the workload. Functions, such as reading threads and messages, getting user information, inserting threads and messages, and reading private messages, were in the spotlight. Each iteration of the test was run for 10 minutes, with the first being from a cold boot. SQL was restarted in between each test that was run consecutively.

The importance of this test is that it is as real world as you can get; for us, the performance in this test directly influences what upgrade decisions we make for our own IT infrastructure.




AnandTech Forums Database Test Results

It appears that the front side bus increase for the Nocona has helped performance quite a bit with this workload; an 11% increase in performance (over the 3.2GHz Prestonia Xeon) for a 12.5% increase in clock speed is quite good. The difference in performance between the Opteron 250 and the Nocona 3.6 is approximately 2%, which is also our tolerance for deviation between test runs. For this test, we'd have to call it a draw between the Opteron and Nocona.

SQL Stress Tool Benchmark (Reads)

SQL Stress Tool Benchmark (Writes)




"Order Entry" Stress Test: Measuring Enterprise Class Performance

One complaint we've historically received about our Forums database test was that it isn't strenuous enough for some of the Enterprise customers to make a good decision based on the results.

In our infinite desire to please everyone, we worked very closely with a company that could provide us with a truly Enterprise Class SQL stress application. We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment).

We will use an Order Entry system as an analogy for how this test interacts with the database. All interaction with the database is via stored procedures. The main stored procedures used during the test are:
sp_AddOrder - inserts an Order
sp_AddLineItem - inserts a Line Item for an Order
sp_UpdateOrderShippingStatus - updates an status to "Shipped"
sp_AssignOrderToLoadingDock - inserts a record to indicate from which Loading Dock the Order should be shipped
sp_AddLoadingDock - inserts a new record to define an available Loading Dock
sp_GetOrderAndLineItems - selects all information related to an Order and its Line Items
The above is only intended as an overview of the stored procedure functionality; obviously, the stored procedures perform other validation, and audit operations.

Each Order had a random number of Line Items, ranging from one to three. Also randomized was the Line Items chosen for an order, from a pool of approximately 1500 line items.

Each test was run for 10 minutes and was repeated three times. The average between the three tests was used. The number of Reads to Writes was maintained at 10 reads for every write. We debated for a long while about which ratio of reads to writes that would best serve the benchmark, and we decided that there was no correct answer. So, we went with 10.

The application was developed using C#, and all database connectivity was accomplished using ADO.NET and 20 threads - 10 for reading and 10 for inserting.

So, to ensure that IO was not the bottleneck, each test was started with an empty database and expanded to ensure that autogrow activity did not occur during the test. Additionally, a gigabit switch was used between the client and the server. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.

At the beginning of each platform, both the server and client workstation was rebooted to ensure a clean and consistent environment. The database was always copied to the 8-disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, and the empty one was copied again to the clean array. SQL Server was not restarted.




Order Entry Stress Test Results

Here is where the architectural differences of the Opteron and Nocona are highlighted. The Opteron 250 managed a 7% gain in performance over the 248, impressive scaling for approximately an 8% clock speed increase. The Nocona story doesn't read as well here as the small-medium work load tests, as saturated bus due to the shared FSB implementation of the Xeon is written all over the enterprise test results. If you look at the 3.2 Prestonia results, the 3.6 Nocona only barely managed a 1% gain in performance, which is really within our deviation between runs. The longer pipeline of the Prescott core, combined with a saturated bus, is not helping the Nocona when the going gets tough. Of course, the Nocona is suffering from 1MB less of L2 Cache than the Prestonia Xeon. It's hard to say how much of a difference that makes at this point.

Vendor Heavy Workload Test (Reads)

Vendor Heavy Workload Test (Writes)

To give you an idea of the scale of this benchmark, we have graphs of stored procedures calls per second. We decided to focus on Stored Procedures/Second rather than Transactions/Second, since the definition of a Transaction can have a business context or a technical context.

Vendor Heavy Workload Stored Procedures




Final Words

Intel managed to raise the bar in our small to medium tests, and matched the Opteron 250's performance. Thanks to a 400MHz clock increase and a 266 MHz FSB increase, Intel is competitive in a small-medium load pattern like our test simulates. Although the results in the enterprise workload were interesting, they really aren't all that surprising. As we said in the "AMD Opteron vs. Intel Xeon: Database Performance Shootout" article, the Xeon's shared FSB implementation is holding back performance. The longer pipeline of the Prescott core was also a factor, as the Nocona 3.6 barely managed a 1% lead in performance over the Prestonia 3.2 GHz part.

AMD shows strength in architecture again; their point-to-point HyperTransport and the on die memory controller are the pillars of AMD's server architecture. The question is, does it translate into market share? Time will tell on this one. Hopefully, the results that we are illustrating here will make IT directors and those responsible for implementation educate themselves on the processor architectures available to them.

Log in

Don't have an account? Sign up now