Name: Dual CPU Database Server Comparison
Item: Dual CPU Database Server Comparison
Author: Johan De Gelas

Original Link: https://www.anandtech.com/show/1559

Dual CPU Database Server Comparison

VIEW ARTICLE

by Johan De Gelas on December 2, 2004 12:11 AM EST

Posted in
IT Computing

46 Comments

Introduction

Despite its incredible importance, it is difficult to find independent hardware advice on database servers. Only a few major hardware and software vendors publish the majority of the TPC and other benchmark numbers. Although a discussion on TPC benchmarks is beyond the scope of this article, it is clear that there is no substitute for independent benchmarking.

Benchmarks that vendors provide have a tendency to be to rosy or perhaps even flawed. Vendors may use hardware setups or software configurations that are unlikely to exist in the real world, yet attain the highest score on a particular benchmark. Benchmarking done by Jason and Ross are a notable exception on the internet, of course.

Because many of our readers are interested or are engaged in this field, we started a new databaseserver benchmarking project just a few months ago.

The primary objective of this project is to determine the hardware that makes sense for a database server of small and medium-sized organizations. We tested DB2, and My SQL on SUSE SLES 8 on many different systems based on four different Xeons CPUs and two Opteron configurations.

"Servers are all about large caches and fast I/O." This is a generalization that is heard a lot in the IT community and the cliché has been proven, more or less, to be accurate in the high-end server market. But does this common wisdom also apply to the smaller dual processor systems that act as database servers? Should you pay more for a Xeon that has a healthy amount of L3-cache, or will a less expensive Intel without L3-cache do just fine? Does 64-bit really matter? How important is memory latency/bandwidth? Is a hyperthreaded CPU better equipped when the database is accessed by many users simultaneously?

While we still continue to improve the quality of our benchmarks, we decided to report our first impressions.

A gigantic market

For those of you who are relatively new to databases, I will be providing some information for you shortly and comparing it to the more popular, but much smaller, gaming market.

Database software sales continue to be a barometer of the overall IT market health. Database servers are, without any doubt, a critical part - the beating software heart - of many companies.

The market of relational database software was worth $7.1 billion in 2003 according to research firm, Gartner. Up to $46 billion is spent in the Servers (hardware) market, and while a small portion of those servers is used for other things than running relational databases (about 20% for HPC applications), the lion's share of those servers are bought to ensure that a DB2, Oracle, MS SQL server or MySQL database can perform its SQL duties well. So, in essence, IT people are spending over $50 billion on databases and servers.

According to a very recent report, about $15.8 billion of the $46 billion server market is spent on PC servers. AMD's Opteron has conquered about 5.3 percent of this PC server market, but the remaining 94.7 percent belongs to Intel's Xeon.

For comparison, the total PC gaming software and hardware market is about $1.3 billion (according to IDC), while the complete gaming market is worth $3.4 billion.

At about 36 percent, the DB2 database is the number one database. Since it is available on many operating systems, and both x86 and 64 bit x86 archtitectures are supported, it received our special attention.

MySQL is, by far, the most popular relational database in the low end of this market and is steadily growing in features and market share, taking away market share from the big players.

The scope of this test

Our primary objective for this project is quite extensive, so we have narrowed our scope a little more for this article.

Firstly, we focused on single and dual processing systems.

Secondly, we focused on "read" performance. This means that our benchmarks do not try to write information in the tables, but rather, always fetch and report information from one or more tables.

There are two reasons for this:

We narrow our focus to a certain kind of application based on when write performance plays a minor part compared to read performance.
We want to focus on operation, which involves little harddisk activity, and focus on the platform: the processor(s) and the interaction with the memory.

The benchmarks were inspired by the fact that in many cases, people simply search and browse for information on a website, intranet or another RDBMS application.

So, this article is not about a typical large central database of banks that need to handle a large number of transactions, with frequent writes operations. It is more about a server that needs to handle a lot of ad-hoc queries. AnandTech readers who are searching for a certain article, who like to read the news of the day, or who are checking out the Realtime pricing guide are perfect examples of ad-hoc queries.

Other examples of typical "read heavy" applications are data warehousing applications. The write heavy databases in a company send data to a data warehouse during the night to produce statistical data, which in turn is a read-heavy process.

In these cases, a Systems Administrator or database administrator will try to cache as much information as possible in the fast RAM memory. Database applications and SQL Queries tend to be optimised to be "RAM cache friendly".

Considering that even the best harddisk RAID system accesses information in Milliseconds while RAM memory does the same job in Nanoseconds (Nano is one millionth of Milli), making good use of caching and avoiding storage I/O offers superb performance boosts. As a result, storage I/O limited databases are, in many cases, high end database applications that are run on much more expensive machines than the ones which we are discussing in this article.

Lastly, we test on SUSE SLES 8 (SUSE Enterprise Edition) SP3, Linux kernel 2.4.21. Rest assured that a new report with kernel 2.6 will follow. However, we think that the results might still interest a lot of people as the enterprise market does not upgrade as quickly as desktop users. We found SUSE SLES 8 SP3 Kernel 2.4.21 to be a very stable environment for our database tests.

As a final comment before we move to the benchmarks, I have been working with Anand and his great team for a couple of weeks, and together with your feedback, we will make sure that this project improves over time.

The benchmark

Our client application that fires off SQL requests is called DBconn, written in VB.NET. As I am no .NET expert, DBconn was written by one of my best students, Jo Neve.

DBconn is similar to AnandTech's SQL Loader, with a few differences. DBconn increases the workload progressively by augmenting the concurrency level. In other words, the client simulates that more and more users request data from the database system.

You can see the simplified interface of DBconn above. DBconn can also connect to different databases, and at the moment, DB2, MySQL and MS SQL server are supported. The precision of the tests can be increased in two ways. We can specify the number of repetitions of one test, and the total number of requests. A request can be described as a "package" of SQL statements sent to the Database. Every request is done by one thread.

The databases are not accessed via ODBC, but by the following drivers:

Microsoft SQL server driver (for MS SQL testing, to be discussed later)
IBM beta DB2 driver, .NET data provider (IBM.Data.DB2.dll 15/06/2004)
MYSQL beta4 byteFX driver (ByteFX.MySqlClient.dll 15/06/2004)

When you start so many threads, it is easy to get wrong results. Here is a small excerpt of the original code:

First, we used this code for making threads. The "join" method in this VB.Net code should normally wait until the last thread is finished. However, we found out by experiment that sometimes previous threads are still running. To avoid this, a special method in DBconn checks if all threads are indeed done before firing off a new load of threads.

The application also performs randomly in two ways: variables in SQL statements are randomized to avoid getting the same results over and over again. Queries are also called in a randomized order.

DBconn proved to be pretty accurate and results were repeatable. After weeks of tuning the application, we measured an error margin of +/- 2% to 3% between runs in DB2 8.1.3 and MS SQL server 2000.

The database was about 1 GB and has been extracted out of a database of about 7 GB of data. We cannot go into much detail, but the database is the backend of an e-commerce site, which provides a vast number of products from many suppliers. Extensive descriptions of products exist in the form of articles and newsflashes about the latest developments, so the database is smaller, but similar to Anandtech with its news and articles.

The following is a typical and very common SQL statement in DBconn:

SELECT table.x ..... table.z,  function(table.y)
			FROM table3
				LEFT JOIN othertable ON matching keys
				LEFT JOIN yetanothertable ON matching keys

			WHERE othertable.y IN condition AND condition1 AND condition2
			GROUP BY table.xyz
			ORDER BY othertable.zyx;

There are queries to request the newspage, to request an overview of certain category of products ordered in different ways, and so on. The mix of queries is based on the real world use of the database.

64 bit Software Experiences...

As we stated above, we didn't use the latest and greatest version of MySQL, but rather, we used the version that came with the SUSE SLES8 to avoid software issues. The wisdom of this decision was quickly illustrated when we tried to install DB2 8.1.2 64 bit on our 64 bit SLES 8.

To make db administration easy, IBM provides a control center, which is a very powerful tool that gives a good overview of all tables and instances. This tool runs on top of Java. We noticed that DB2cc was completely unstable with SUN's java, so we had to un-install it and install IBM's version. This might not been new to many DB2 users out there, but for us, it was a disappointment that there are incompatibilities between different JVM. After all, it was the ease of porting applications that made Java so popular.

And installing IBM java wasn't so easy. But finally after adapting profiles, paths and making several softlinks, we made it work.

The next problem was that Java 64 bit wasn't stable yet, so we had to create a 32 bit instance on top of our 64 bit database. Even then, we received cryptic messages such as "CLI0622E Error accessing JDBC administration service extensions".

At last, we came to conclusion that 64 bit DB2 8.1.2 would never run on our system. Luckily by that time, DB2 8.1.3 was out, which fixed quite a few of the problems mentioned above.

This experience shows that moving to 64 bit software is as easy as many reports might indicate. In fact, 32 bit software on a 64 bit operating system will never be 100% free of incompatibilities. System calls from a 32-bit application always need to be converted to 64-bit calls by a type of emulation layer.

As quoted by Andi Kleen:

"A system call from an 32bit application needs to be converted to 64bit. While this isn't that hard for the Unix/Linux system calls (Unix traditionally has a relatively clean and not too big interface between user and kernel space) there are lots of "backdoors" - ioctls - for various driver specific tasks. For these a 64bit kernel will likely never be 100% compatible because there are thousands of these. Of course the majority of applications don't use obscure driver ioctls, but some still do. So in short you have some risk of incompatibility when running 32bit binaries on 64bit kernels".

Don't get us wrong - it is probably 98% or so compatible. But before moving to a fully 64 bit system, you should test your 32 bit applications thoroughly on your favorite 64 bit OS.

On the bright side, we were amazed how easily Yast2 (SUSE install tool) recognized all our hardware on both Xeon and Opteron platforms. Yast2 is simply a wonderful tool that made software and hardware upgrades very easy. I used to have very different experiences with Linux and I remember vividly how I struggled to get some network cards and SCSI cards working.

Words of thanks

A lot of people gave us assistance with this project, and we like to extend our thanks to them of course:

Martina Krahmer, Corporate Communications SUSE LINUX
Frank Balzer, SUSE Linux Expert
Dan Behman, Leading developer, DB2 for Linux Platform Development
Erin Roche, IBM US
Markus Weingartner, Intel Germany
Matty Bakkeren, Intel Netherlands
Damon Muzny, AMD US
Patrick Sempels, Sun Belgium Demosystem Engineer
Erwin Vanluchene, product manager HP ProLiant servers
Andi Kleen, Linux Guru

And last, but not least, I like to thank Benny Boone, Pieter Beel, Michael Aerens and Jo Neve, four of my best students at the Technical University of Kortrijk - Belgium - for helping me and making sure that all those database servers were tested.

These four students of the Bachelors of Multimedia and Communcations technology program lived for weeks among the database servers while they had to bear my presence, which is truly remarkable!

And thanks are extended to Lode De Geyter, Manager of the PIH, for letting us use the infrastructure of the TUK to test the database servers.

Benchmark Configuration

To ensure that our databases work stably and reliably, we followed the guidelines of SUSE and IBM. For example, DB2 is only certified to run on the SLES versions of SUSE Linux; you cannot run it on any Linux distribution. We also used the MySQL version (3.23.52) that came with the SUSE SLES8 CD's, which was certified to work on our OS.

Software

IBM DB2 Enterprise Server Edition 8.1.3 (DB2ESE), 32 bit and 64 bit
MySQL 3.23.52, 32 bit and 64 bit
SUSE SLES 8 (SUSE Entreprise Edition) SP3, Linux kernel 2.4.21, 32 bit and 64 bit

Hardware

We'll discuss the different tested servers in more detail below. Here is the list of the different configurations:

Intel Server 1: Dual Intel Nocona 3.6 GHz 1 MB L2-cache, 800 MHz FSB - Lindenhurst Chipset
Intel® Server Board SE7520AF2
2 GB (4x512 MB) Micron Registered DDR-II PC2-3200R, 400 MHz CAS 3, ECC enabled
NIC: Dual Intel® PRO/1000 Server NIC (Intel® 82546GB controller)

Intel Server 2: Dual Xeon DP 3.06 GHz 1 MB L3-cache, Dual Xeon 3.2 GHz 2MB L3-cache, Dual Xeon 3.2 GHz
Intel SE7505VB2 board - Dual DDR266
2 GB (4x512 MB) Crucial PC2100R - 250033R, 266 MHz CAS 2.5 (2.5-3-3-6)
NIC: 1 Gb Intel RC82540EM - Intel E1000 driver.

Opteron Server 1: SUN V20z Dual AMD Opteron 248 (2.2 GHz)
Newisys Khepri based board
2 GB: 4x512 MB Infineon PC2700R - 250033R (2.5-3-3-6)
OR 2 GB: 4x512 MB Mushkin PC3200R-232 (2-3-2-6)
NIC: Broadcom 5703, bcm5700 driver

Opteron Server 2: HP DL-145
HP motherboard
2 GB: 4x512 MB Infineon PC2700R - 250033R (2.5-3-3-6)
or 2 GB: 4x512 MB Mushkin PC3200R-232 (2-3-2-6)
NIC: Broadcom 5703, bcm5700 driver

Opteron Server 3: Tyan Thunder K8W (S2885) based
Tyan K8W S2885
2 GB: 4x512 MB Infineon PC2700R - 250033R (2.5-3-3-6)
or 2 GB: 4x 12 MB Mushkin PC3200R-232 (2-3-2-6)
NIC: Broadcom 5703, bcm5700 driver

Opteron Server 4: HP DL-585
HP custom designed motherboard
4 GB: 8x512 MB Infineon PC2700R - 250033R (2.5-3-3-6)
NIC: Broadcom 5703, bcm5700 driver

Opteron Server 5: AMD Quartet: Dual 844, Dual 848, Quad 844 and Quad 848
Quartet motherboard, Zildjian personality board, Tobias backplane board and Rivera power distribution board.
Quad configurations: 4 GB: 8x512 MB infineon PC2700 Registered, ECC enabled
Dual configurations: 2 GB: 4x512 MB infineon PC2700 Registered, ECC enabled
NIC: Broadcom NetExtreme Gigabit

Client Configuration 1: 1x Pentium 4 3.06 GHz HT on - 533 MHz FSB
MSI GNB MAX FISR (E7205)
2x512 MB Crucial PC2100R - 250033R (2-2-2-6)
NIC: 1 GB Intel RC82546EB - Intel E1000 driver

Shared Components
1 Seagate Cheetah 36 GB - 15000 rpm - 320 MB/s
Maxtor 120 GB DiamondMax Plus 9 (7200 RPM, ATA-100/133, 8 MB cache)

More about the servers in this test

A few months ago, I contacted a few server manufacturers, namely IBM, Dell, HP and SUN. I wanted to compare the reference servers of Intel and AMD to the real machines on the market.

At that time, the responses of the different manufacturers were surprising. While I asked for a database server system and did not specify whether it should be Xeon or Opteron, HP immediately sent their DL-585 and DL-145. This was surprising, since those two Opteron systems were marketed as HPC solutions, and more targeted toward scientific applications. HP's database servers were, in fact, all Xeon-based solutions.

Sun was also pretty enthusiastic about this test, and they sent us their v20z. Dell told us that they were not got going to send a machine. This was understandable, since the 3.6 GHz Xeon Nocona was not yet available. The Xeon DP "Gallatin" 3.2 GHz with 2 MB L3-cache, which used to be a Xeon MP, was Intel's fastest offering in the dual Xeon scene. To be honest, I did not see the problem. I expected that a 3.2 GHz Xeon with such a large 2 MB L3-cache to do well in our database server tests.

The fact that Dell didn't want to join the test illustrates clearly that, at that point in time, even Intel's best ally recognized the potential of the newly introduced Opteron 250. This is also a reason why you will find only two Intel servers in this test. We are confident that this will change in our future projects.

Sun's V20z

SUN's V20z was almost identical to the Newisys server, the server which was reviewed when the Opteron was launched. The SUN V20z has two internal hot-swap drive bays, two Gigabit Ethernet interfaces, and two PCI-X slots - one at 66MHz, and the other at 133MHz. What sets it apart is the on-board management processor, accessible via your browser (it has a separate 100 Mbit Ethernet interface) and of course, support for Solaris x86.

Unfortunately, at that time, Solaris 10 with 64 bit x86 support was not available.

HP Proliant DL585

While our focus is on Dual processor entry level Database servers, we could not refuse the Quad Opteron DL585. This is one impressive server, which has a daughtercard for each Opteron and its local memory.

When we received this monster, it was outfitted with 8 GB of PC2100, 16 x 512 MB. The DL585 is able to use up to 64 GB of PC2100. It was remarkable that the DL585 with PC2100 (266 MHz) came close to AMD's reference Quartet server, which featured faster PC2700 (333 MHz) DDR SDRAM.

With PC2700 DDR SDRAM, the DL585 can "only" use 6 of its 8 DIMMs per daughtercard; thus, 12 GB per processor installed, using 2GB DIMMs. With all four CPUs installed, the DL585 is capable of supporting a maximum of 48 GB DDR SDRAM. Quite impressive, as AMD's Quartet based systems only support 4 DIMMs per CPU or 32 GB in total.

It is especially impressive if you consider the fact that the load on the address lines of DDR makes it very hard to use more than 4 DIMMs per memory channel. Most Xeon and Opteron systems with DDR-I are limited to 4 DIMMs per memory channel

The newest addition: Intel's Lindenhurst server

Our Dual Intel Xeon 3.6 GHz server was based on the Lindenhurst chipset, a server chipset similar to the i925 desktop chipsets.

The Intel Server Board SE7520AF2 is one of the first boards to make extensive use of the real world advantages that PCI Express and DDR-II offer in the server world. This is in contrast with the desktop market, where DDR-II is mostly a much more expensive and only marginally faster alternative to DDR SDRAM.

One of the problems of SDRAM is that the complete address and command bus must be connected to each chip. (The databus does not have this problem - the 8 chips are connected to 8 parallel wires.) To make it worse, addresses and commands must be presented to all DRAM chips on a DDR DIMM at the same time.

You can read more about it here. To make a long story short: one of the most important reasons for limiting the number of SDRAM chips per memory channel is the load and thus, signal integrity on the address bus. DDR-II has the same problem, but many of the new features (OCD-calibration, On Die termination, BGA packaging, lower differential voltage swing) of DDR-II improve signal integrity significantly compared to DDR-I.

Secondly, DDR-II consumes almost 30% less than comparable DDR-I DIMMs, which is a big advantage if you consider that each GB of DDR-I consumes about 10 watts. Especially in a 64 bit server, where you want to use more than 4 GB of RAM, this will be a nice improvement.

The result is that the Lindenhurst board can offer 4 DIMMs per channel while the other Xeon servers with DDR-I were limited to 4 DIMMs in total, or one per memory channel. This might seem trivial, but it neutralizes an advantage that the Opteron previously had over the slightly older Xeon: it is cheaper to use 8x 1 GB than to use 4 x 2 GB. More DIMM slots result in more flexibility; a lower price per GB or a higher maximum RAM capacity.

Thirdly, the E7520 chipset makes a full duplex 2.1 GB/s x8 PCIe slot and an x4 PCIe slot available, in addtion to the typical 100 and 133 MHz PCI-X slots that we find in many servers.

The 7520 chipset is also the first server chipset with a FSB of 800 MHz, which connects to two channels of Registered ECC DDR2 400 memory. While the latter does not have the bandwidth of their 533 MHz / 667 MHz brothers, the actual latency (3-3-3, 5 ns) is pretty low, which is, for most server applications, more important than raw bandwidth.

The new serverboard also includes an updated version of Intel's Server Management 8 solution and has a dual-channel Ultra320 SCSI controller on board. Notice the white rectangle at the bottom of the server, which contains the RAID cache and battery backup of this RAM cache.

The reference machines versus HP and SUN

We did not expect to see much performance difference between the different machines as they were all spec'd very similarly. And after quite a bit of benchmarking, it became clear that SUN's v20Z and HP DL-145 perform equal within the error margin. The SUN v20z was probably a tiny bit slower as benchmarks reported 1% - 2% lower numbers in DB2, and 2% - 3% lower in MySQL. But again, we cannot be sure because of the margin of error.

The HP DL-585 compared favourable to the AMD's Quartet. Despite being equipped with only DDR266, it managed to stay within 2% - 3% of the latter, which had access to DDR333. But it is clear that processing performance is not something that sets these systems apart from each other.

Benchmarks IBM DB2 8.1.3: Intel versus AMD

The first question that most people will ask is, of course, how the best AMD Opteron compares to the newest Intel Xeon "Nocona" CPU. Below is a quick table to refresh your memory and to enable you to compare price/performance:

Intel Xeon CPUs	Core	L2 cache	L3 cache	x86-64 bit	In Test	Price
3.60 GHz w/ 1M cache 800 MHz FSB (90nm)	Nocona = "Prescott server"	1 MB	No	Yes	Yes	$851
3.40 GHz w/ 1M cache 800 MHz FSB (90nm)	Nocona = "Prescott server"	1 MB	No	Yes	No	$690
3.20D GHz w/ 1M cache 800 MHz FSB (90nm)	Nocona = "Prescott server"	1 MB	No	Yes	No	$455
3 GHz w/ 1M cache 800 MHz FSB (90nm)	Nocona = "Prescott server"	1 MB	No	Yes	No	$316

3.20C GHz w/ 2M cache 533 MHz FSB (.13)	Galatin = "P4 EE Server"	0,5 MB	2 MB	No	Yes	$1,043
3.20 GHz w/ 1M cache 533 MHz FSB (.13)	Galatin = "P4 EE Server"	0,5 MB	1 MB	No	No	$690
3.06A GHz w/ 1M cache 533 MHz FSB (.13)	Galatin = "P4 EE Server"	0,5 MB	1 MB	No	Yes	$455
3.06 GHz w/ 512k cache 533 MHz FSB (.13)	Prestonia = "Northwood Server"	0,5 MB	No	No	Yes	$316

AMD Opteron CPU's	Core	L2 cache	L3 cache	x86-64 bit	In Test	Price
Model 250 (2.4 GHz)	Sledgehammer	1 MB	No	Yes	Yes	$851
Model 248 (2.2 GHz)	Sledgehammer	1 MB	No	Yes	Yes	$690
Model 246 (2.0 GHz)	Sledgehammer	1 MB	No	Yes	No	$455
Model 244 (1.8 GHz)	Sledgehammer	1 MB	No	Yes	No	$316

We were also very curious about the Xeon Nocona, as the it brings higher clock speeds, a bigger L2-cache, no L3-cache and a pipeline 11 stages longer than the previous Xeon "Prestonia" and Xeon "Gallatin", which maxed out at 3.2 GHz. The first two features mentioned should boost the performance quite well, while the two last are disadvantages.

We should emphasize that, as we tested with SUSE SLES 8 (kernel 2.4.21), the Xeon Nocona was disadvantaged, since we could not test it in 64-bit mode. We assure you that we will update this report with 2.6 kernel. For now, we decided to give you a full report on SLES 8 and kernel 2.4. (All numbers are expressed in queries per second.)

Concurrency	Xeon 3.6 GHz	Dual Xeon 3.2 L3 (2MB)	Dual Xeon 3.2	Dual Xeon 3.06 L3 (1MB)	Dual Xeon 3.06	Opteron 250 DDR400 32 bit	Dual Opteron 250 DDR 400 64 bit	Dual Opteron 248 DDR 400 64 bit
1	55	46	44	43	42	57	61	57
2	87	74	61	72	61	105	118	107
5	128	104	100	98	98	123	137	129
10	136	112	107	105	102	129	145	132
20	136	113	106	106	104	131	147	132
35	138	113	106	104	99	133	150	129
50	138	110	106	102	100	130	145	128

All concurrency tests below 5 are not reliable enough to make any firm conclusion, especially for the Xeon. The margin of error is somewhat higher, but that is not all.

As the Dual Xeon with Hyperthreading spawns 4 logical CPUs, with a concurrency of 2, it is possible that only one physical CPU is doing all the work. Looking at the numbers and the linux tool top, we feel pretty sure that this is exactly what happens most of the time. Compare Row "5" with "2", and "2" with "1" to see what I mean. Note that the results of rows 10 to 50 do not vary a lot; so, we look at these numbers for our conclusions. In the table below, you can see an overview of how the different CPUs compare in percentages.

3.6 vs 3.2	2 MB L3-cache vs none	1 MB L3-cache vs none	Xeon 3.2 vs 3.06	Xeon 3.2 vs 3.06 (both with L3)	Xeon 3.6 vs Opteron 250	Opteron 64 bit vs 32 bit
20%	3%	1%	7%	7%	-4%	6%
17%	22%	18%	3%	3%	-17%	12%
24%	4%	1%	5%	5%	5%	12%
21%	5%	3%	6%	6%	6%	13%
21%	6%	2%	6%	6%	3%	12%
22%	7%	5%	8%	8%	3%	12%
26%	4%	2%	8%	8%	7%	12%

If we had published a similar report back in August, the Opteron would enjoyed a landslide victory. Luckily for Intel, Nocona is very competitive and is about 5% faster than the Opteron 250.

The gigantic - for x86 - L3-cache can not help the Xeon much. We measured only a 2% to 5% performance boost from the 1 MB L2-cache (at 3.06 GHz), and a 4% to 7% performance boost from the 2 MB L3-cache (at 3.2 GHz). The L3-cache seems to boost performance as much as 5% to 6% clock speed increase - nothing to write home about. So a Xeon "Galatin" 3.2 GHz 2 MB L3-cache performs more or less like a Xeon "Galatin" 3.4 GHz, if such a beast should exist.

A comparison between the 3.2 GHz and 3.06 GHz shows that CPU clockscaling - given equal cache sizes - is almost perfect, a testimony to how CPU intensive this benchmark is. Clearly, the generalisation, "databases are all about I/O" is not accurate for a number of database applications. Read-heavy databases seem to be "all about the CPU".

Using a 64 bit database (DB2 8.1.3) on a 64 bit operating system delivers about 12% to 13% better performance. Since we didn't use more than 2 GB, the most likely explanation is the fact that the software can make use of 16 registers instead of 8. We also tested with a twice as large database and 4 GB of RAM, and the results were very similar.

The performance of the Nocona Xeon compared to the older Xeons is also remarkable. The database doesn't mind the longer pipeline and absence of the L3-cache. On the contrary, it performs better than its clock speed indicates, leaving the older 3.2 GHz Xeon (with 2 MB L3 cache!) behind with 21% to 22%, while the Nocona has only a 13% clock speed advantage over the latter. To be honest, we expected Nocona, with its huge branch misprediction penalty, a result of its extremely long pipeline, to scale much worse.

Benchmarks IBM DB2: DDR400 vs DDR333

How important is the speed of the DDR memory for the Opteron? We compared buffered DDR333 ECC to the expensive buffered DDR400 ECC.

Concurrency	Dual Opteron 250 DDR 333 64 bit	Dual Opteron 250 DDR 400 64 bit	DDR400 vs DDR333
1	60	61	1%
2	112	118	5%
5	136	137	1%
10	141	145	3%
20	138	147	6%
35	139	150	7%
50	140	145	4%

DDR400 gives a boost of about 4% - 6%. That might seem very little, but considering that DDR400 is only 20% faster than DDR333, and the 128 KB L1 and 1 MB L2-cache on board the Opteron, we can say that Database performance on the Opteron still benefits from faster memory.

Benchmarks IBM DB2: Single versus Dual versus Quad

How well does our DB2 database scale with more than one CPU? We measured a 92% to 96% increase in performance when we equipped it with a second CPU. The quad performance data is even more interesting.

Concurrency	Quad 848 Opteron DDR 333 32bit	Quad 848 Opteron DDR 333 64bit	Dual Opteron 248 DDR 333 64 bit	64 bit vs 32 bit	Quad vs Dual
1	50	54	55	8%	-1%
2	94	107	103	14%	4%
5	156	182	125	17%	45%
10	192	222	128	15%	73%
20	210	239	127	14%	88%
35	220	242	128	10%	89%
50	214	247	128	16%	93%

With a concurrency of 10, it seems that the Quad machine still hasn't reached its full potential. Based on the rest of the data, we see about 88% - 90% extra performance when going from two to four CPUs. Of course, it must be said that it is possible to equip a dual Opteron with DDR400, yet as far as we know, it is not an option for quad Opterons.

Also note the slightly higher boost that the Quad Opteron gets from 64 bit, 14% to 16%. The measurement at concurrency 35 is a bit of an exception (2% too high on 32 bit and 2% too low on 64 bit, for example, but within our margin of error), so we ignore it. This is probably the result of a better optimized OS in 64 bit.

Let us check out the Xeon, of which we did not (yet) have a Quad configuration. We'll do everything to make sure that our article includes one.

Concurrency	Single Xeon 3.06 1 MB L3	Dual Xeon 3.06 1 MB L3	Dual vs Single
1	43	43	0%
2	50	72	45%
5	55	98	78%
10	55	105	90%
20	54	106	96%
35	53	104	96%
50	54	102	91%

This is quite impressive scaling, and underlines how much databases like more MP power. We note the 90% to 96% performance improvement from the second CPU.

Benchmarks IBM DB2: Hyperthreading?

Heavily threaded benchmarks, such as Dbconn, should run quite a bit faster, thanks to Hyperthreading, at least in theory.

Concurrency	Dual Xeon 3.06 1 MB L3 without HT	Dual Xeon 3.06 1 MB L3 HT	HT vs no HT
1	43	43	0%
2	83	72	-13%
5	97	98	2%
10	99	106	6%
20	98	106	7%
35	98	104	7%
50	95	102	8%

Although it is not really spectacular, 6% to 8% increase in performance is still a nice extra. Let us see if the new Nocona Xeon benefits just as much.

Concurrency	Dual Xeon 3.6 without HT	Dual Xeon 3.6 HT	HT vs no HT
1	52	55	5%
2	113	87	-23%
5	128	128	0%
10	133	136	2%
20	133	136	2%
35	136	138	1%
50	133	138	4%

No, it does not. Only 2% or 4% of additional performance for two extra logical CPUs is pretty weak. We have to admit that we don't know why this is the case.

Analyses: IBM DB2 8.1.3 32 bit and 64 bit

Let us summarize what we have learned so far: L3-caches and Hyperthreading offer very limited performance increases. Hyperthreading is essentially free of charge, so we are happy with whatever little bonus we get. However, one can seriously question the wisdom of paying $800 more for a dual Xeon 3.2 GHz with 2 MB L3 instead of one with 1 MB L3-cache. Of course, it is a small amount when compared to license and labour costs, but still...

However, we must not be blind to the possible limitations of our benchmarks. The limited effect of the large caches made us think. Did we randomize a little too well? Often, some rows will be requested far more often than others. For example, if you have a forum, the most recent messages/threads will be accessed much more often than the oldest. If you have an e-commerce site, the items appearing on the front page or on major pages will be loaded much more often. While our real world queries do not access all rows equally, more research will be needed to see closer how our randomize function could mimick real world behavior.

Still, we don't expect spectacular improvements, as the benchmark scales well with CPU clock speed.

64 bit offers a very decent improvement (12% - 16%), although it might be less than what some reports speculated.

We certainly didn't expect Nocona to do so well. SQL databases access data in the memory sometimes randomly, courtesy of, for example, linked lists, and processing a single SQL database query requires parsing and checking the query, optimising it, and ordering results for output, which is very branch and integer intensive, similar to interpreting programming languages. The Nocona Xeon with its extremely long pipeline (and hence, high branch misprediction penalties), slightly higher latency caches (compared to Opteron and previous Xeons) and no L3-cache seemed like the worse architecture that one could think of for database serving.

The new Nocona Xeon surprised us in a positive way, and it outperformed the previous Xeons, even on a clock per clock basis. More in-depth research with CPU counters must give us the exact reasons why, but right now, we can only conclude that the faster memory bus (800 MHz versus 533 MHz) and the twice as large L1- and L2-caches outweigh the fact that it has no L3-cache and a longer pipeline than the previous Xeons. Maybe the improved branch predictor performs miracles, but we are not sure right now.

Benchmarks MySQL 3.23.52: Intel versus AMD

A Linux database server report would not be complete without the open source database MySQL. It must be said that MySQL results had a large margin of error, especially at high levels of concurrency. At 100 concurrent threads, MySQL started to lose performance pretty quickly, and the margin of error grew fast. When we limit ourselves to a concurrency of 50, we get a decent margin of error (2% - 4%).

conc lev	Dual Xeon 3,6 GHz	Dual Xeon 3,2 2 MB L3	Dual Xeon 3.2	Dual Xeon 3.06 1 MB L3	Dual Xeon 3.06	Opteron 250 DDR400 32 bit	Dual Opteron 250 DDR 400 64 bit	Dual Opteron 848 DDR400 - 64 bit
1	127	105	107	110	104	156	201	190
2	191	166	161	175	158	249	317	287
5	239	206	194	206	189	303	368	337
10	249	221	203	218	194	289	381	353
20	251	226	195	218	190	309	403	377
35	259	226	190	223	189	315	405	375
50	251	225	186	222	185	300	388	363

Those were the raw numbers. Let us now analyze this:

conc lev	Xeon 3.6 vs 3.2	2 MB L3-cache vs none	1 MB L3-cache vs none	Xeon 3.2 vs 3.06	Xeon 3.2 vs 3.06. both with L3	Xeon 3.6 vs Opteron 250	Opteron 64 bit vs 32 bit
1	20%	-2%	7%	-5%	-5%	-19%	29%
2	15%	3%	10%	-5%	-5%	-23%	27%
5	16%	6%	9%	0%	0%	-21%	22%
10	13%	9%	12%	1%	1%	-14%	32%
20	11%	15%	15%	3%	3%	-19%	30%
35	15%	19%	18%	1%	1%	-18%	29%
50	11%	21%	20%	1%	1%	-16%	29%

MySQL paints a totally different picture. Again, don't pay too much attention to the results of the lower concurrency levels.

This time, the Nocona Xeon is about 11% to 15% faster, more or less equal to the clock speed advantage of 13% that it has over the other Xeon brothers. Investing in large L3-caches pays off a lot more than it did in DB2. It seems that a 1 MB L3-cache is exactly what our application needs because that much delivers almost no boost at all (the margin of error may blur the results a bit).

And last, but not least, the Opteron makes a clean sweep of the Xeons. The 3.6 Nocona is up to 30% slower. When the Opteron moves to 64 bit, it can crunch through 30% more SQL statements.

Benchmarks MySQL: DDR400 vs DDR333

Does faster memory make a difference for the Opteron?

conc lev	Dual Opteron 848 DDR333	Dual Opteron 848 DDR400	Speedup DDR400 vs DDR333
1	174	190	9%
2	279	287	3%
5	353	337	-5%
10	351	353	1%
20	371	377	2%
35	370	375	1%
50	372	373	0%

No, that doesn't seem to be the case. All results point to a meagre 0 to 2% performance gain.

Benchmarks MySQL: Single versus Dual

Let us check how MySQL scales from one Xeon to a second one. As always, Hyperthreading is enabled on both CPUs.

conc lev	Single Xeon 3.06 1 MB L3	Dual Xeon 3.06 1 MB L3	Dual Speedup
1	102	110	8%
2	137	175	28%
5	153	206	34%
10	158	218	38%
20	162	218	35%
35	160	233	45%
50	157	222	42%

MySQL doesn't seem to scale as well as DB2. We measured a 40% performance increase when the concurrency level is high enough. Might this be an important difference between the big established players and the Open Source database?

Benchmarks MySQL: Hyperthreading?

What can hyperthreading do for MySQL performance?

conc lev	Dual Xeon 3.06 1 MB L3 no HT	Dual Xeon 3.06 1 MB L3 HT	HT Speedup
1	106	110	4%
2	158	175	11%
5	193	206	7%
10	213	217	2%
20	205	218	6%
35	206	232	13%
50	203	222	10%

Amazingly, Hyperthreading pushes performance quite a bit higher. Although, once again, the higher margin of error blurs the picture a bit. We measure a 6% to 10% speed increase. Let us see if the Xeon Nocona is any different.

conc lev	Dual Xeon 3.6 no HT	Dual Xeon 3.6 HT	HT Speedup
1	127	127	0%
2	189	191	1%
5	215	239	11%
10	235	249	6%
20	238	251	5%
35	237	259	9%
50	236	251	6%

Again, when we concentrate on row 20 to 50, where the margin of error is a bit lower, we see that Nocona benefits a little less from Hyperthreading than the older Xeons despite being equipped with a slightly more advanced version.

Final Conclusion

Database benchmarking is full of pitfalls. Databases are pretty complicated to set up and tune, and depending on the amount of data, and the way that the database is accessed (a few rows a lot of the time, or a lot of rows sometimes), results can be quite different. We are well aware that it will take a lot of time before our benchmarks will be really "mature".

However, we see a few trends emerging out of this report. First of all, while file serving and firewalls tend to be always "all about I/O", this generalization is simply not true for database servers that run "read heavy" database applications. Our DB2 results depended only - well, 95% or so - on CPU processing power. This was not completely the case in MySQL tests, but the CPU was still by far the most important component.

The Opteron deals out a decisive blow to the Xeons in MySQL. We know from past experiences that when you run extremely complicated SQL statements, the Opterons were a lot faster. After this project, we also know that the Opteron is still the winner when you mix a lot of simple queries with a few heavier queries.

Nevertheless, AMD cannot sit on its laurels. Intel made a very good comeback with Nocona, as this 3.6 GHz CPU is just a tiny bit faster in DB2. This concurs with some of Jason's MS SQL server results. Of course, it is a very big question mark whether or not Intel can push this Xeon much higher. Meanwhile, it is clear that AMD has quite a bit of headroom with its new 90 nm process technology.

In a nutshell, we can conclude that the 3.2 GHz Xeon with 2 MB L3-cache is too expensive compared to its 3.6 GHz Nocona brother.

The Opteron systems still have a price advantage over similar Xeons, mostly thanks to the cheaper-to-produce motherboards and DDR-I. A ProLiant DL145 2.4GHz Opteron 2GB ATA Rack Model with 2 CPUs and 2 GB of memory costs about $4300, while a comparable ProLiant DL360 G4 Xeon 3.60GHz Processor, SATA - Rack Model arrives at about $4900. In any comparison, prices can be a bit different, but generally, it is safe to say that the Opteron systems are a bit cheaper. So for now, the Opteron has an advantage still, but it can't knock out the Xeon, as it could have a few months ago, before the Xeon Nocona arrived.