Original Link: http://www.anandtech.com/show/1713



Introduction

Enterprise versions of Linux based on kernel 2.6, and 64 bit database servers are now very mature. Dual core 64 bit Opteron and 64 bit Xeons with 2 MB L2-caches are available. It was definitely time to update our previous Linux Database Server CPU comparison.

In this article, you will find a comparison of the latest Xeon (Irwindale), the previous Xeon (Nocona), the old Xeon (Galatin), the Dual core Opteron, and the "normal" Opteron, of course. We also included the Pentium-D to get an idea of what a Dual core Xeon could do, although the comparison is not completely fair: the memory subsystem of a Dual core Xeon will have higher latency and slightly lower bandwidth as it will use ECC buffered DIMMs instead of non-buffered DIMMs.

In our previous article, we used SUSE SLES 8 (kernel 2.4.21) and the Xeon 3.6 GHz "Nocona" matched the performance of the Opteron 250 in 32 bit DB2, but failed to impress in MySQL. Intel's Xeon was not recognized as a 64 bit capable CPU by SLES 8 with kernel 2.4 however, and the Opteron gained 12% (DB2) and 30% (MySQL) when running in 64 bit.

On SLES 9, we can unleash the full 64 bit potential of both the Intel Xeon and Opteron. Kernel 2.6 includes better and improved support for NUMA, 64 bit, large memory pages, threading and fully recognizes EM64T CPUs as 64 bit capable. How do the Xeon and Opteron compare when they both run 64 bit applications on a 64 bit enterprise version of Linux? Should you invest in Dual core CPUs, or are these expensive CPUs beaten by two single CPUs? Should you wait for Dempsey, the dual core Xeon?

These are a few of the questions that we will answer. While we still continue to improve the quality of our benchmarks, we decided to report our first impressions.

The scope and focus of this test

Our last Database server comparison generated quite a bit of very useful and interesting feedback. Living up to the excellent AnandTech tradition, we have read them carefully and taken many suggestions to heart.

In a nutshell, the foci of this article are as follows:
  • Only CPU and CPU-chipset-memory database performance tests
  • Mostly Database reads
  • DB2 and MySQL on SUSE SLES 9 - Kernel 2.6.5
  • Database use of small and medium-sized enterprises
  • single and dual processing systems.
Our benchmark Quality assurance methods include:
  • Checking the disk activity with iostat and vmstat
  • Constant monitoring of the Client's CPU load, network load and memory usage
  • Tests were repeated at least 3 times
  • All tests were performed with two different clients: a Dual Opteron 850 2.4 GHz and a Quad Opteron 848 2.2 GHz
  • Improved and optimised Client program
Real world databases are in many cases disk limited. Jason and Ross have been running 8 x 36GB 15,000RPM Ultra320 SCSI drives in RAID-0 to avoid the Enterprise Class Performance tests being limited by disk I/O performance.

However, the Lab of the Technical University of Kortrijk where we performed our tests did not dispose of such an impressive disk array, and we were determined to focus on the database performance of the different CPUs and CPU-chipset-memory combinations. All tests were done (99% of the time) with in-memory queries. Investigating the performance of different disk storage systems is a time-consuming and completely different project.

We still tested with our 1 GB big database imported in MySQL MyISAM, InnoDB and IBM's DB2 8.2 .

Some of you might still be convinced that in-memory tests are not really relevant. Consider that the availability of cheap 64 bit system makes it possible to use much more RAM than before. Flat 64 bit addressing of more than 4 GB of RAM used to be a privilege of very expensive servers (Power4, etc.), but this is no longer the case with the introduction of Intel's EM64T Xeons and AMD's AMD64 Opteron.

With the current prices of 1 GB DDR(-II) sticks, it is very easy and inexpensive to build a database server with 8 GB of RAM. Even 16 GB (16x1 GB) is not that expensive, considering the price of a quad Opteron server. As a seasoned sys-admin told me, "the performance of database servers can be brought back to life with some extra RAM." It is in many cases that a large amount of RAM can do more than very expensive 15,000RPM SCSI disks.

Again, this article is not about the typical huge central databases of banks that need to handle a large number of transactions, with writes operations being very frequent.

We test on SUSE SLES 9 (SUSE Enterprise Edition) SP1, Linux kernel 2.6.5-151smp. Yes, this is not the latest kernel version, which is 2.6.12 at the time of this article. We used 2.6.5 because it is the last kernel available for our enterprise version of SUSE. The very nature of this project also forces us to check our numbers with at least 5 consecutive tests, and a lot of time is spent in checking parameters and so on, so we need to "freeze" the kernel version for a few weeks. We did perform a few tests on Gentoo, however, with kernel version 2.6.12.



The current market situation

Depending on the source and the definition of "server", the x86 servers are good for about 33% - 50% of the revenue ($49 billion) of the server market. Depending on the report, the AMD Opteron has captured a bit more than 5% of the total x86 server market.

It is interesting to note that Linux is the server operating system of a little more than 9% of the servers, but the number of Linux servers is growing with about 40%. More than 60% of the Opteron servers are running Linux (according to IDC), while the lion share of the Xeons are running 32 bit Windows. It is clear that the Opteron rise in the market share is not only slowed down by the rapid ramp of EM64T Xeons, but also by the lack of a 64 bit Windows 2003.

In the second half of 2004, already one million EM64T Xeons were shipped, about three times as many as the total number of Opterons shipped until then. The percentage of 64 bit systems deployed is thus increasing rapidly, making the switch to 64 bit software more interesting for developers too.

Xeon and Opteron

Since our previous test, four interesting new CPUs have entered the scene. First of all, there is the Pentium-D. Although the Pentium D is a desktop CPU, it is a very interesting low cost solution for low end servers, so we decided to include it in this review. Of course, a Pentium-D server does not have the same RAS features as an Opteron or Xeon based machine. The Pentium-D requires a heavy power supply: cheap 400 Watt power supplies in our lab were not able to power up the Pentium-D, even with a relatively slow Geforce FX 5600 PCIe video card.

Secondly, there is the Intel Xeon Irwindale, which is essentially the Xeon version of the desktop Pentium 6xx series ("Prescott core") that includes a massive 2 MB L2-cache. Also interesting is the "Demand Based Switching" feature of the new Xeons: this allows them to throttle back to 2.8 GHz when the load on the server is low. This results in about 15 to 20% in power savings on the CPU's power dissipation. The Xeon Irwindale is a demanding CPU: it requires 110 Watt under full load.

Cool'n quiet is functional on the new 2.6 GHz Opteron 252, and offers much more impressive power gains. Power dissipation is reduced from 92.6 W (only attainable under extreme conditions) to less than 20 Watt.

The new Dual core Opteron makes our test complete. While Windows (XP and 2003) recognized and utilized the cores easily, SUSE SLES 9 Linux was a little more stubborn. With the original SLES 9 kernel 2.6.5-97, the dual Opteron would just crash. We applied Service Pack 1 and the new Opteron would boot and recognize the two cores, but the second CPU was disabled because of APIC IRQ problems.

Therefore, we were only able to run the Dual core Opteron on Gentoo with a 2.6.12 kernel.

A quick table to refresh your memory and to enable you to compare price/performance:

Intel   Xeon CPUs Core L2 cache L3-cache x86 -64 bit? Power saving? In test? Price
3.60 GHz w/ 2M cache 800 MHz FSB (90nm) Irwindale = "Nocona, twice as big L2" 2 MB No Yes DBS Yes $851
3.2 GHz w/ 2M cache 800 MHz FSB (90nm) Irwindale = "Nocona, twice as big L2" 2 MB No Yes DBS Yes $455
3.60 GHz w/ 1M cache 800 MHz FSB (90nm) Nocona = " Prescott server" 1 MB No Yes DBS Yes $690
3.40 GHz w/ 1M cache 800 MHz FSB (90nm) Nocona = " Prescott server" 1 MB No Yes DBS No $455
3.20D GHz w/ 1M cache 800 MHz FSB (90nm) Nocona = " Prescott server" 1 MB No Yes DBS No $316
3 GHz w/ 1M cache 800 MHz FSB (90nm) Nocona = " Prescott server" 1 MB No Yes DBS No $256
               
3.20C GHz w/ 2M cache 533 MHz FSB (.13) Galatin = "P4 EE Server" 0,5 MB 2 MB No No Yes $1,043
3.20 GHz w/ 1M cache 533 MHz FSB (.13) Galatin = "P4 EE Server" 0,5 MB 1 MB No No No $690
3.06A GHz w/ 1M cache 533 MHz FSB (.13) Galatin = "P4 EE Server" 0,5 MB 1 MB No No Yes $455
3.06 GHz w/ 512k cache 533 MHz FSB (.13) Prestonia = "Northwood Server" 0,5 MB No No No Yes $316
Pentium 4-D "Dual Prescott - Smithfield"  2 x 1 MB  No  No  No  Yes  $312
             
AMD Opteron CPU's Core L2 cache L3-cache x86 -64 bit?   In test? Price
Model 275 (2x 2.2 GHz) Dual core 2x 1 MB No Yes Cool'n Quiet Yes* $1299
Model 265 (2x 1.8 GHz) Dual core 2x 1 MB No Yes Cool'n Quiet No $851
Model 252 (2.6 GHz) Troy 1 MB No Yes Cool'n Quiet Yes $851
Model 250 (2.4 GHz) Sledgehammer 1 MB No Yes No Yes $690
Model 248 (2.2 GHz) Sledgehammer 1 MB No Yes No Yes $455
Model 246 (2.0 GHz) Sledgehammer 1 MB No Yes No No $316
Model 244 (1.8 GHz) Sledgehammer 1 MB No Yes No Yes $209

The introduction of Irwindale resulted in Intel reducing the prices of the Xeon "Nocona", making this CPU more attractive. The Dual core Opteron is still a bit pricey, but definitely an alternative for two Opterons or two Xeons.



Words of thanks

A lot of people gave us assistance with this project, and we like to thank them of course:

David Van Dromme, Iwill Benelux Helpdesk (http://www.iwill-benelux.com)
Ilona van Poppel, MSI Netherlands (http://www.msi-computer.nl)

Frank Balzer, IBM DB2/SUSE Linux Expert
Jasmin Ul-Haque, Novell Corporate Communications - SUSE LINUX

Matty Bakkeren, Intel Netherlands
Trevor E. Lawless, Intel US
Larry.D. Gray, Intel US
Markus Weingartner, Intel Germany

Nick Leman, MySQL expert
Bert Van Petegem, DB2 Expert
Ruben Demuynck, Vtune and OS X expert
Yves Van Steen, developer Dbconn

Damon Muzny, AMD US

I would also like to thank Lode De Geyter, Manager of the PIH, for letting us use the infrastructure of the Technical University of Kortrijk in which to test the database servers.

Benchmark configuration

To ensure that our databases were stable and reliable, we followed the guidelines of SUSE and IBM. For example, DB2 is only certified to run on the SLES versions of SUSE Linux - you cannot run it - in theory - on any Linux distribution. We also used the MySQL version (4.0.18) that came with the SUSE SLES9 CD's, which was certified to work on our OS.

Network performance wasn't an issue. We used a direct Gigabit Ethernet link between client and server. On average, the server received 4 Mbit/s and sent 19 Mbit/s of data, with a peak of 140 Mbit/s, way below the limits of Gigabit. The disk system wasn't overly challenged either: up to 600 KB of reads and at most 23 KB/s writes. You can read more about our MySQL and DB2 test methods.

Software:

IBM DB2 Enterprise Server Edition 8.2 (DB2ESE), 32 bit and 64 bit
MySQL 4.0.18, 32 en 64 bit, MyISAM and InnoDB engine SUSE SLES 9 (SUSE Entreprise Edition) , Linux kernel 2.6.5, 64 bit.

Hardware

We'll discuss the different servers that we tested in more detail below. Here is the list of the different configurations:

Intel Server 1:
Dual Intel Nocona 3.6 GHz 1 MB L2-cache, 800 MHz FSB - Lindenhurst Chipset
Dual Intel Irwindale 3.6 GHz 2 MB L2-cache, 800 MHz FSB - Lindenhurst Chipset
Intel® Server Board SE7520AF2
8 GB (8x1024 MB) Micron Registered DDR-II PC2-3200R, 400 MHz CAS 3, ECC enabled
NIC: Dual Intel® PRO/1000 Server NIC (Intel® 82546GB controller)

Intel Server 2:
Dual Xeon DP 3.06 GHz 1 MB L3-cache, Dual Xeon 3.2 GHz 2MB L3-cache
Dual Xeon 3.2 GHz
Intel SE7505VB2 board - Dual DDR266
2 GB (4x512 MB) Crucial PC2100R - 250033R, 266 MHz CAS 2.5  (2.5-3-3-6)
NIC: 1 Gb Intel RC82540EM - Intel E1000 driver.

Intel Server 3:
Pentium-D EE 840
Intel SE7505VB2 board - Dual DDR266
2 GB (4x512 MB) Crucial PC2100R - 250033R, 266 MHz CAS 2.5  (2.5-3-3-6)
NIC: 1 Gb Intel RC82540EM - Intel E1000 driver.

Opteron Server 1: Dual Core Opteron 875 (2.2 GHz), Dual/ Single Opteron 850, Dual/Single Opteron 848
Iwill DK8ES Bios version 1.20
4 GB: 4x1GB MB Transcend (Hynix 503A) DDR400 - (3-3-3-6)
NIC: Broadcom BCM5721 (PCI-E)

Quad Opteron Server 2: Iwill H4103: Quad Opteron 844, 848
Iwill H4103
4-8 GB: 4-8x1GB MB Transcend (Hynix 503A) DDR400 - (3-3-3-6)
NIC: Intel 82546EB (PCI-E)

Opteron Server 3: Dual Core Opteron 875 (2.2 GHz), Dual/ Single Opteron 848
MSI K8N Master2-FAR
4 GB: 4x1GB MB Transcend (Hynix 503A) DDR400 - (3-3-3-6)
NIC: Broadcom BCM5721 (PCI-E)

Opteron Server 4: AMD Quartet: Dual Opteron 848, Quad 848
Quartet motherboard, Zildjian personality board, Tobias backplane board and Rivera power distribution board.
Quad configurations: 4 GB: 8x512 MB infineon PC2700 Registered, ECC enabled
Dual configurations: 2 GB: 4x512 MB infineon PC2700 Registered, ECC enabled
NIC: Broadcom NetExtreme Gigabit

Client Configuration: Dual Opteron 850
MSI K8T Master1-FAR
4x512 MB infineon PC2700 Registered, ECC enabled
NIC: Broadcom 5705

Shared Components
1 Seagate Cheetah 36 GB - 15000 RPM - 320 MB/s
Maxtor 120 GB DiamondMax Plus 9 (7200 RPM, ATA-100/133, 8 MB cache)

Software

Vtune for Windows version 7.2, Vtune for Linux remote agent 3.0
Code Analyst for Linux 3.4.8
Code Analyst for Windows 2.3.4

More about the servers in this test

Although our main focus is the database server performance of the different AMD and Intel platforms, allow me to introduce some of the motherboards and server barebones that we used in this test.

Iwill H4103

The amount of power that the Iwill H4103 can pack in a 1U rack mounted case is nothing short of amazing. As you can see, there are no less than 4 Opterons in this pizza box, which also allows you to put up to 32 GB of DDR RAM in there. Even more impressive is the inclusion of two redundant 700 Watt power supplies.

The preferred habitat of such a beast is, of course, a HPC (High Performance Computing) environment, but it can also be used as a database server. The single problem is that the only disk interface available is the old and (especially for server applications) slow P-ATA interface. It is possible to cram two disks and a slim CDROM in there, but P-ATA disks are not a decent solution for a server. So, why show this quad CPU monster be in a database server review?

First of all, the H4103 is equipped with 4 Gigabit Ethernet ports, courtesy of a Intel 82546EB dual-channel GbE LAN controller, connected to the PCI-X bus. This allows you to connect to a NAS devices at Gigabit speeds. The integration of the Intel Gigabit Ethernet chip is a very good move: Intel's Gigabit chips are capable of reaching up to 900 Mbit/s at CPU's loads of less than 20% (measured with an Opteron 248).

The second option is to use the PCI-X 64bit 133/100/66MHz expansion slot with riser card (what we did) and connect Direct Attached Storage (DAS) externally such as an array of SCSI disks. It is pretty clear that when it comes to saving rack space, the Iwill H4103 is a very interesting option.

What about cooling? Well, this is a server of course and a whole battery of 10,000 RPM fans keep the copper heat sinks cool. In our poorly cooled lab, temperatures rose easily to 30°C and higher, but the Iwill H4103 heat sinks hardly became warm under full load. According to Iwill, you should be able to use dual core Opterons in this pizza box, but we haven't been able to verify this as the necessary BIOS version still had yet to arrive.

The H4103 left a very stable and highly performing impression upon us. The only thing left that would make this ultra compact quad Opteron machine complete is the integration of a SCSI controller or at least a very good SATA controller.

Iwill DK8ES

The Iwill DK8ES is a server board, based on NVIDIA's nForce 4 2200 Professional chipset, which includes two x16 PCI Express slots. The board also integrates an ATI RageXL VGA controller, 2 x PCI-Express x16 expansion slots (one in PCI-Express x2 mode), 3 PCI-X 64bit 133/100/66MHz expansion slots and 4 SATA ports. Two Broadcom BCM5721PCI-E Gigabit Ethernet Controllers are connected to a PCI-E port.

The interesting thing about the Iwill board is the high quality components that have been used on the board, such as tantalum capacitors and high end Digital VRMs.

The digital VRMs allow a very precise voltage regulation, increasing the stability of the server.

The board has proven to be fully stable during 6 weeks of heavy database server testing. The only problem was that Linux didn't like running Dual core Opterons on this board. A BIOS update (to 1.20) made the Opteron 275 and 875 run stable on this board, but Linux (kernel 2.6.12) still didn't use both cores. Both cores were reported, but only one was used. We suspect that Iwill may have to work out a BIOS issue, or that some of the NVIDIA Linux drivers still need some tuning.

MSI's K8Master-FAR2

MSI sent us a completely different, relatively cheap workstation board that should enable very compact quad core (two dual cores) Opteron machines. The MSI K8Master-Far2 is based on the NVIDIA nForce 4 Pro chipset.

In order to get two Opteron sockets, one 32-bit/33 MHz PCI slot, one PCI Express x4 slot and two PCI Express x16 slots (SLI mode supported) on a standard ATX board, MSI didn't give the second CPU local memory. Despite this limitation, the MSI K8Master-FAR2 proved to be an excellent performer in our database server tests.

Six memory slots allow up to 12 GB of RAM, not bad for such a compact board.

We weren't very happy with the Gigabit Marvell 88E1111 PHY interface, which consumes more than 60% CPU power (Opteron 248) easily and delivers only 500 Mbit/s. Luckily for our tests, we could use the second gigabit Ethernet chip, the Broadcom BCM5788 (800 Mbit at 30%). We also would like to see the fan on the NVIDIA chipset replaced by a decent heat sink. Four SATA-II (300 MB/s) connections are available.

But at the end of the tests, the MSI K8Master-FAR2 (BIOS version 1.0) proved to be a very capable board. Supported by our Vantec 470W power supply, it had no trouble at all with two 875 Opterons running heavy database server tests for more than 3 weeks.



Benchmarks MySQL 4.0.18: Intel versus AMD

A Linux database server report would not be complete without the open source database MySQL. Many of our readers requested that we test with both MyISAM (default storage engine in MySQL 3.x) and InnoDB (default storage engine in MySQL 4.x), so we performed many more tests than last time.

It must be said that the MySQL results had a large margin of error (3% - 4%) compared to DB2, especially at high levels of concurrency.

Here is our MySQL configuration:

           Read_buffer=2GB
           Port=3306
           socket = /var/lib/mysql/mysql.sock
           skip-locking
           set-variable = max_user_connections= 2000
           set-variable = max_connections= 2000
           key_buffer=2G
           Read_buffer=2G
           table_cache=1024
           tmp_table=128M
           max_heap_table=256M
           read_rnd_buffer = 64M
           thread_cache=16
           net_buffer_length=16k

The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%. At 1000 queries/s and more, other bottlenecks started to kick in, such as the latency of the network driver, the operating system and so on.

All numbers are expressed in queries per second. All concurrency tests below 5 are not reliable enough to make any firm conclusion as the margin of error is much higher.

Concurrency Dual Xeon (Gallatin)
with L3 cache
Single Xeon (Gallatin)
with L3 cache
Dual Xeon (Nocona)
with HT
Single Xeon (Nocona)
with HT
Dual Xeon (Irwindale)
3.6GHz with HT
Dual Core Intel
3.2GHz
Dual Opteron 250
2.4Ghz
Single Opteron 250
2.4GHz
Single Opteron 252
2.6 GHz
1 243 248 280 277 286 233 290 298 319
2 357 317 423 338 450 344 438 370 399
5 466 356 473 358 497 442 543 435 470
10 505 361 521 375 517 487 629 465 502
20 496 350 531 371 545 507 670 455 498
35 508 355 555 371 506 490 665 470 507
50 497 348 526 368 495 502 669 472 508
                   
AVG 494 354 521 368 512 486 635 460 497
MAX 508 361 555 375 545 507 670 472 508

Those were the raw numbers. Let us now analyse this...

Concurrency Dual versus Single Xeon Galatin Dual versus Single Xeon Nocona/ Irwindale Dual Opteron 250 vs Single
1 -2% 1% -3%
2 12% 25% 18%
5 31% 32% 25%
10 40% 39% 35%
20 42% 43% 47%
35 43% 50% 41%
50 43% 43% 42%
       
AVG 40% 41% 38%

MySQL ISAM is an incredibly fast database engine in our benchmark situation: it handles the same workload about twice as fast as DB2. I have to emphasize "our benchmark situation" because we cannot forget that our workload is mainly about reading the database and not writing. And of course, it must be said that the MySQL ISAM engine does less work on each query than DB2; it does not support transaction-safe (ACID compliant) commit, rollback, and crash recovery capabilities.

MySQL, as we have also noticed 6 months ago, doesn't seem to scale as well as DB2. At best, you get a 40% - 45% performance increase when the concurrency level is high enough. When we move to quad CPUs, we only get a 20% - 30% increase while DB2 still offers a 70% increase. The better scaling of DB2 means that with enough CPUs, it runs almost as fast as the MySQL ISAM engine, and offers all the transaction-safe capabilities as a bonus.

Let us check if the architectural differences between the CPUs make a difference . Again, don't pay too much attention to the results of the lower concurrency levels.

Concurrency Dual Xeon Irwindale versus Nocona (3,6 GHz) Xeon Nocona (3,6 GHz) vs Galatin (3,06) Opteron 2.6 vs Nocona 3.6 Opteron 2.6 vs Pentium-D Xeon Nocona 3,6 GHz vs Pentium-D
1 2% 12% 15% 37% 19%
2 6% 7% 18% 16% -2%
5 5% 1% 31% 6% -19%
10 -1% 4% 34% 3% -23%
20 3% 6% 34% -2% -27%
35 -9% 5% 37% 4% -24%
50 -6% 6% 38% 1% -27%
           
AVG -2% 4% 35% 2% -24%
MAX -2% 4% 36% 0% -26%

The bigger L2-cache of the Xeon Irwindale did nothing more than compensate for the slightly higher latency of the L2-cache. The Xeon Irwindale and Nocona perform alike.

MySQL, unless you get the special Intel Compiler optimized version, remains the stronghold of the Opteron. The fastest (single core) Opteron outperforms the best Intel CPU by a 35% margin. We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article.

The relatively limited scaling also means that high clocked single CPUs can be an interesting option. This is illustrated by the Opteron 252 2.6 GHz, which outperforms the dual core Pentium-D 3.2 GHz by a small margin.



Benchmarks MySQL: 64 bit versus 32 bit

Sixteen registers, more than 4GB physical and virtual memory, using 64 bit software should have nothing but advantages. To see how much advantage that the new 64 bit binaries offer, we tested both the Xeon Irwindale and the Opteron on 32 bit MySQL and 64 bit MySQL, both version 4.0.18.

The Opteron was tested on the MSI board for these tests, contrary to previous tests where the Iwill board was used. The Intel CPU was running on the Intel board as in all previous tests.

Concurrency Dual Xeon (Irwindale)
3.6GHz with HT 64 bit
Dual Xeon (Irwindale)
3.6GHz with HT 32 bit
Dual Opteron 248 64 bit Dual Opteron 248 32 bit Dual Xeon (Irwindale)
3.6GHz 64 bit versus 32 bit
Dual Opteron 248
64 bit vs 32 bit
1 286 245 324 261 16% 24%
2 450 379 532 421 19% 26%
5 497 534 642 485 -7% 32%
10 517 563 691 509 -8% 36%
20 545 631 692 527 -14% 31%
35 506 616 670 514 -18% 30%
50 495 559 666 516 -11% 29%
             
AVG 512 580 672 510 -12% 32%
MAX 545 631 692 527 -14% 31%

This is really remarkable, as the Xeon does not benefit from 64 bit at all. Worse, a 10% performance penalty is paid for moving over to 64 bit. The Opteron, however, thrives on 64 bit and gets a 30% boost from 64 bit.

Now, it is possible that the 64 bit binary is simply very well optimized for the Opteron. The 64 bit compiler used by the MySQL engineers (obviously not the Intel compiler, gcc) might not have the necessary optimisations to get the best out of the Xeon architecture. That is probably the most important reason why the difference (+30% versus - 12%) is so big.

However, when we take a look at the numbers in DB2, you will notice that the Xeon runs about 2 to 3% slower, while the Opteron gains a 12% boost from 64 bit. IBM's 32 bit binaries make the Xeon run as fast as the best Opterons. Once we turn to 64 bit binaries, the Opteron gets the upper-hand again. So, there is more: for some reason, the Xeon is not too happy with 64 bit binaries. We can only speculate, but maybe (some) 64 bit calculations have to cycle twice through the ALU's of the Prescott/Nocona/Irwindale architecture.

The consequence is that a Xeon running a 32 bit application is quite a bit faster than the competition, but once you switch to 64 bit, the Xeon does not stand a chance against the Opteron.

Benchmarks MySQL: Single core versus Dual core

Some of you might already get nervous: where is the dual core Opteron? SUSE SLES 9 Linux was a little more stubborn. With the original SLES 9 kernel 2.6.5-97, the dual Opteron would just crash. We applied Service Pack 1 (2.6.5-157smp) and the new Opteron would boot and recognize the two cores, but the second CPU was disabled because of APIC IRQ problems.

Therefore, we were only able to run the Dual core Opteron on Gentoo with a 2.6.12 kernel. The Iwill board still had trouble running two cores. We run the tests on the MSI board. To give you an idea of how Gentoo and the new kernel compare to SUSE SLES 9 SP1, and IWill K8ES to MSI's K8Master2-FAR, we ran a few tests with SUSE on the MSI board too.

Concurrency Dual Core Opteron 875 - MSI -
Gentoo
Dual Opteron 248 - MSI -
Gentoo
Dual Opteron 248
- MSI - SUSE
Dual Opteron 248 -
Iwill - SUSE
Dualcore vs Dual CPU Gentoo vs SUSE Iwill vs MSI
1 288 270 324 264 7% -17% -19%
2 463 443 532 461 4% -17% -13%
5 583 558 642 591 5% -13% -8%
10 616 601 691 670 2% -13% -3%
20 648 610 692 683 6% -12% -1%
35 664 611 670 659 9% -9% -2%
50 628 579 666 662 8% -13% -1%
AVG 628 592 672 653 6% -12% -3%
MAX 664 611 692 683 9% -12% -1%

SUSE SLES 9 SP1 is quite a bit faster than a standard tuned Gentoo installation. Some of the improvements in kernel 2.6.12 might have traded performance in for more stability.

The second CPU on the MSI board does not have its own local memory, and has to access the RAM via the Hypertransport connection to the crossbar switch of the first CPU. Just like one dual core Opteron, the two CPUs have to share the bandwidth of one dual channel memory bus. Therefore, the comparison of one dual core Opteron and two single Opterons at the same clock speed is very interesting: it gives us some insight on how much performance is gained by letting the two cores talk over the System Request Queue instead of over the Hypertransport connection. How much does this design boost performance? Quite a bit, according to our benchmarks. This relatively simple design decision offers a 6% performance increase.

The Iwill board is a tiny bit slower than the MSI board, and that might raise some eyebrows. However, Vtune tells us that the Xeon Nocona (1 MB L2) needs to access the RAM memory only 2% of the time. Assuming that the Opteron with its 1 MB cache needs about the same, it is clear that memory bandwidth is not going to determine the results by much. Slightly more aggressive timings (and thus lower latency) or clock speeds might give MSI the edge. These tiny performance differences are not important, however.

Benchmarks MySQL: Hyperthreading?

What can hyperthreading do for MySQL performance?

Concurrency Dual Xeon (Irwindale)
3.6GHz with HT
Dual Xeon (Irwindale)
3.6GHz no HT
HT On vs HT off
1 286 287 0%
2 450 457 -2%
5 497 559 -11%
10 517 583 -11%
20 545 561 -3%
35 506 573 -12%
50 495 570 -13%
       
AVG 512 569 -10%
Max 545 583 -7%

Amazingly, Hyperthreading decreases performance by quite a bit. This leads to a rather weird conclusion. If you want maximum MySQL (Read) performance from your Xeon server, you have to disable Hyperthreading and run in 32 bit mode. The former is of course not dramatic. The latter might, in some cases, be a serious limitation.

Benchmarks MySQL InnoDB: Intel versus AMD

What if we change the MyISAM engine for the ACID compliant, row level locking InnoDB engine under the hood of MySQL? Surely that should make scaling better, as the MyISAM table locking mechanism is simple, but could be one of the reasons why it scales less in multi-CPU configurations. Let us take a look.

Concurrency Dual Xeon (Irwindale)
3.6GHz with HT
with InnoDB
Single Xeon (Irwindale)
3.6GHz with HT
with InnoDB
Dual Xeon (Irwindale)
3.6GHz without HT
with InnoDB
Dual Opteron 248
Dual Channel

With InnoDB
Single Opteron 248
Dual Channel

With InnoDB
1 207 191 210 216 192
2 283 201 303 312 223
5 324 219 334 396 259
10 319 204 360 397 242
20 301 199 330 357 236
35 281 193 308 353 221
50 274 181 298 333 209
           
AVG 300 199 326 366 233
MAX 324 219 360 397 259

The InnoDB engine is at about 60% of the speed of the MyISAM engine. Let us analyze these numbers in detail.

Concurrency Dual versus Single Xeon Dual versus Single Opteron Dual Opteron vs Dual Xeon HT on vs off
1 8% 13% 3% -2%
2 41% 40% 3% -6%
5 48% 53% 19% -3%
10 57% 64% 10% -11%
20 51% 51% 8% -9%
35 45% 60% 15% -9%
50 51% 59% 12% -8%
         
AVG 51% 57% 13% -8%

Yes, we only used the 2.2 GHz Opteron 248, due to time constraints. We tested with this CPU because we also tried to get some numbers on the Dual core Opteron 275 (also 2.2 GHz), but as you know, we could not get that CPU running at dual core in SUSE SLES 9 SP1. It is pretty clear that a 2.6 GHz Opteron 252 would bring in another 16% - 18%. So, even with a different engine, the Opteron keeps outperforming the Xeon with a significant margin. This margin can again be lowered by disabling Hyperthreading.

The Opteron scales a little better than the Xeon in this test. All in all, the InnoDB scales better than the MyISAM engine, but not spectacular: a second CPU offers a 50% - 57% boost instead of 40% - 41% one.

What happens if we use the Dual core Opteron 275? To make this work, we had to resort to the Gentoo distribution again, with the 2.6.12 kernel. All CPUs are running at 2.2 GHz.

Concurrency Dual Dual Core 875 Single Dual Core 875 Dual Opteron 248 Dual Dual core vs One Dual core Dual core vs Dual single
1 199 206 200 -3% 3%
2 308 305 293 1% 4%
5 397 368 338 8% 9%
10 401 379 345 6% 10%
20 400 359 308 11% 17%
35 388 342 305 14% 12%
50 361 322 290 12% 11%
AVG 389 354 317 10% 12%
MAX 401 379 345

InnoDB does not scale better with 4 cores than MyISAM. On the contrary, both Engines show very small performance benefits from more than 2 cores. Interestingly once again, the dual core CPU is quite a bit faster than our Dual CPU (single core) machine. A 10% bonus is nothing to sneeze at, especially when you consider that server boards with only one socket are quite a bit cheaper. It seems that one dual core Opteron is an ideal solution for a rather powerful MySQL database server.

Next, we test with an enterprise database solution: DB2 8.2.



Benchmarks IBM DB2 8.2: Intel versus AMD

Below, you will find our results for the different platforms of AMD and Intel. At the last moment, the Pentium 4 670 3.8 Ghz arrived in the labs, so we decided to give this CPU a quick test run. In these tests, we enabled the new Asynchronous I/O feature, which gave the Intel Xeon a small performance boost (4 to7%), while it made the Opteron perform only a tiny bit faster (1%).

Concurrency Dual Xeon
Irwindale
Single Xeon
Irwindale
Dual Xeon Nocona Single Xeon Nocona Dual Opteron Dual Opteron Single Opteron Dual Opteron Intel Pentium D Dual Core Intel Pentium 4
3.6 GHz 3.6 GHz 3.6 GHz 3.6 GHz 2.2 GHz 2.4 GHz 2.4 GHz 2.6 GHz 3.2 GHz 3.8 GHz
                   
1 94 90 101 95 97 116 119 124 89 99
2 172 109 164 107 202 219 151 233 141 118
5 207 114 215 110 262 287 156 308 199 123
10 228 115 223 117 268 294 156 320 201 126
20 225 118 207 112 264 306 153 328 202 124
35 232 116 215 116 275 284 153 308 174 120
50 230 114 214 113 275 281 150 307 203 127
                     
AVG 225 115 215 114 269 291 153 314 196 124

All averages are calculated on the concurrency levels from 5 to 50. There is no doubt about it: it pays off big time to invest in a multi-CPU machine in DB2. It is of no use to invest in the fastest single CPU system. A mid-range dual CPU system will easily outperform it.

The table below is an overview of the differences in the CPUs.

Concurrency Dual versus Single Xeon Irwindale Dual versus Single Xeon Nocona Dual Opteron 250 vs Single Dual Opteron 2,6 GHz versus Irwindale 3,6 GHz Xeon Irwindale versus Nocona
1 5% 6% -3% 32% -7%
2 57% 53% 45% 36% 4%
5 82% 96% 84% 49% -4%
10 99% 91% 89% 40% 2%
20 92% 84% 100% 46% 9%
35 99% 86% 86% 33% 8%
50 102% 89% 88% 33% 7%
           
AVG 95% 89% 89% 40% 5%

The performance of DB2 scales almost perfectly on the different platforms. Irwindale scales a little better than two other CPUs, probably thanks to the larger L2-cache. However, this does not save Intel from defeat: the Opteron 2.6 GHz is the champion in these tests. What happened? In our previous test, the fastest Xeon (Nocona 3.6 GHz) was a bit faster than the best Opteron (250, 2.4 GHz). First of all, the Opteron 252 scales very well, and is 8% faster than its older 2.4 GHz brother, as the 252 is clocked at 8.3% higher. But the Xeon Irwindale gets a 5% - 7% performance from its larger L2-cache, so that is not the real issue.

However, when we compared a 64 bit with a 32 DB2 instance, the Opteron gained 13% performance from moving to 64 bit, while the Xeon lost 3 to 4%! Secondly, with the 2.4 kernel, the Xeon gained an additional boost from Hyperthreading, while we could not measure this performance increase anymore. Thirdly, it seems that the Opteron gains more due to the move from the 2.4 kernel to 2.6 kernel than the Xeon.

Benchmarks IBM DB2: Single core versus Dual core

What about our Dual core Opteron 875/275? We managed to get DB2 running on Gentoo, kernel 2.6.12rc5. You can find the results below. All tests have been performed on the MSI K8Master-FAR2.

Concurrency Dual Dual Core AMD Single Dual Core AMD Dual Opteron Quadcore vs Dual Dualcore versus Dual Single
2.2 GHz 2.2 GHz 2.2 GHz
         
1 107 118 111 -9% 6%
2 194 213 162 -9% 32%
5 368 242 222 52% 9%
10 423 256 227 66% 13%
20 448 253 216 77% 17%
35 434 246 213 76% 16%
50 429 251 218 71% 15%
           
AVG 421 250 219 68% 14%

Simply amazing how much punch the Dual core 275/875 has. It offers a 14% performance increase over a completely similar configured dual CPU Opteron 248 setup. Add a second core, and DB2 8.2 rewards you with another 70% performance increase. And all this is happening on our ATX MSI K8Master-FAR2 board.

Benchmarks IBM DB2: Single versus Dual versus Quad

What about the “conventional” quad CPU configuration? The Iwill H4103 was our testing platform.

Concurrency Dual Opteron 848 Quad Opteron 848 Quad versus Dual
2.2 GHz 2.2 GHz  
     
1 102 104 2%
2 184 186 1%
5 212 318 50%
10 218 358 64%
20 212 375 77%
35 223 393 76%
50 208 377 81%
       
AVG 214 364 70%

DB2 continues to scale very well. A 70% performance increase is the result of adding two more CPUs. Notice that the Quad CPU need 20 concurrent connections running many queries to get to the full potential (up to 80% performance increase). The Quad Xeon was unfortunately not available to the lab.



Analyses and Conclusion

First of all, we would like to emphasize that we are well aware of our findings - they are only applicable to your database applications if you run a "read heavy, few writes" database server and the database is not too large, so the most used parts can run mainly from the RAM. As 1 GB DIMMs are very cheap now and with the introduction of 64 bit CPUs and 64 bit Linux 2 years ago, it is clear that making sure that your database has enough memory for its disposal should become a lot easier for many database administrators.

There are a few interesting conclusions that we can make about the software side of things. First of all, DB2 8.2 scales fantastic when you add more CPUs. This makes the dual core Opterons very attractive: an Opteron 265 costs as much as an Opteron 252 or Xeon Irwindale 3.6 GHz, but it is clear that it will perform a lot better. It also offers a better upgrade path, since you can use up to four cores on relatively cheap motherboards - compared to the average price of a quad CPU motherboard - with two sockets.

The MySQL MyISAM benches make it clear that pure speed isn't everything. MySQL MyISAM allows you to get away with a single CPU system as it delivered up 300 queries per second, while DB2 was only capable of delivering a bit more than one third of that performance. The picture quickly changes when we need safe transactions too (even with few writes, this might be critical): the InnoDB engine is about 40% slower in our environment. MySQL remains very fast, but as we add more CPUs, the difference gets very small with DB2. While this article has no ambition to be a guide to the software part of database servers, it is clear that you should choose your hardware in function of the database server software that you select. With DB2, you get enterprise class database serving, and dual core CPUs are a very good solution for it. MySQL is excellent to save on your hardware costs, but if you expect the number of transactions/data mining queries to rise quickly, adding more than two CPUs will buy you little performance (10 to 20% boost).

The most surprising thing that we noticed while comparing our new findings on the 2.6 kernel with those of our previous report (32 bit, 2.4 kernel) is that the Xeon benefits a lot less from 64 bit and the new 2.6 kernel than the Opteron. While the 64 bit binaries run consistently (much) faster on the Opteron, the Xeon isn't too happy with them and runs them 4 to 10% slower. Hyperthreading isn't - in our case - helping either, with 1 to 10% lower performance.

Branch prediction penalties, due to the longer pipeline of Nocona/Irwindale, are not the problem. We noticed with Vtune and Code Analyst that the Branch Prediction Unit of the Xeon Nocona and Irwindale does a marvellous job and predicts between 96% (MySQL) and 97% (DB2) of the branches correctly, while the Opteron's BPU is about 93% and 94% correct of the time. MySQL consists of 20% branches, and DB2 has only 16% branches. The L2-caches also do a good job with only 2% of data demands being covered by the RAM, and a 98% hitrate on the L1 and L2-caches.

According to our research, we can assume that the 64 bit implementation of the new Xeon is simply not as powerful as the Opteron's. Intel has some catching up to do, especially when you look at the dual core Opterons. We already discussed AMD's elegant dual core architecture in detail, but in this review, we have seen very good indications that the design with the two cores connected by the SRQ does improve performance in real world applications and not only in our cache-to-cache tests.

This architecture together with AMD being six months ahead with their dual core server product gives AMD significant advantages in the server market today. The lack of mature server versions of Windows (2003) and the fact that only the latest kernels of Linux support the dual core Opteron might slow AMD a bit down, but not for long.

Log in

Don't have an account? Sign up now