Analyses and Conclusion

First of all, we would like to emphasize that we are well aware of our findings - they are only applicable to your database applications if you run a "read heavy, few writes" database server and the database is not too large, so the most used parts can run mainly from the RAM. As 1 GB DIMMs are very cheap now and with the introduction of 64 bit CPUs and 64 bit Linux 2 years ago, it is clear that making sure that your database has enough memory for its disposal should become a lot easier for many database administrators.

There are a few interesting conclusions that we can make about the software side of things. First of all, DB2 8.2 scales fantastic when you add more CPUs. This makes the dual core Opterons very attractive: an Opteron 265 costs as much as an Opteron 252 or Xeon Irwindale 3.6 GHz, but it is clear that it will perform a lot better. It also offers a better upgrade path, since you can use up to four cores on relatively cheap motherboards - compared to the average price of a quad CPU motherboard - with two sockets.

The MySQL MyISAM benches make it clear that pure speed isn't everything. MySQL MyISAM allows you to get away with a single CPU system as it delivered up 300 queries per second, while DB2 was only capable of delivering a bit more than one third of that performance. The picture quickly changes when we need safe transactions too (even with few writes, this might be critical): the InnoDB engine is about 40% slower in our environment. MySQL remains very fast, but as we add more CPUs, the difference gets very small with DB2. While this article has no ambition to be a guide to the software part of database servers, it is clear that you should choose your hardware in function of the database server software that you select. With DB2, you get enterprise class database serving, and dual core CPUs are a very good solution for it. MySQL is excellent to save on your hardware costs, but if you expect the number of transactions/data mining queries to rise quickly, adding more than two CPUs will buy you little performance (10 to 20% boost).

The most surprising thing that we noticed while comparing our new findings on the 2.6 kernel with those of our previous report (32 bit, 2.4 kernel) is that the Xeon benefits a lot less from 64 bit and the new 2.6 kernel than the Opteron. While the 64 bit binaries run consistently (much) faster on the Opteron, the Xeon isn't too happy with them and runs them 4 to 10% slower. Hyperthreading isn't - in our case - helping either, with 1 to 10% lower performance.

Branch prediction penalties, due to the longer pipeline of Nocona/Irwindale, are not the problem. We noticed with Vtune and Code Analyst that the Branch Prediction Unit of the Xeon Nocona and Irwindale does a marvellous job and predicts between 96% (MySQL) and 97% (DB2) of the branches correctly, while the Opteron's BPU is about 93% and 94% correct of the time. MySQL consists of 20% branches, and DB2 has only 16% branches. The L2-caches also do a good job with only 2% of data demands being covered by the RAM, and a 98% hitrate on the L1 and L2-caches.

According to our research, we can assume that the 64 bit implementation of the new Xeon is simply not as powerful as the Opteron's. Intel has some catching up to do, especially when you look at the dual core Opterons. We already discussed AMD's elegant dual core architecture in detail, but in this review, we have seen very good indications that the design with the two cores connected by the SRQ does improve performance in real world applications and not only in our cache-to-cache tests.

This architecture together with AMD being six months ahead with their dual core server product gives AMD significant advantages in the server market today. The lack of mature server versions of Windows (2003) and the fact that only the latest kernels of Linux support the dual core Opteron might slow AMD a bit down, but not for long.

Benchmarks (continued)
Comments Locked

45 Comments

View All Comments

  • JohanAnandtech - Saturday, June 18, 2005 - link

    Mino, thanks for pointing that out. Query cache enabling has nothing to do with "stressful". It has to do with accelarting a few queries that are run over and over again. Which is very interesting for reducing the response time of a website serving up the last article, but which is not limited by CPU power at all.



  • JohanAnandtech - Saturday, June 18, 2005 - link

    To the people who make a fuss about disabling the query cache: this has nothing to with the Opteron not performing well in that situation. Single Xeon: 980 queries/s. Dual xeon: 985 queries/s Opteron 250: 1020 queries/s . Get it now why I say "other bottlenecks started to kick in"?

    It impossible that a dual xeon can't outperform a single one in these tests. We tried to find the bottleneck and even used a quad opteron 850 as client. The client was not the problem. My bet is on the network latency, but I have no knowledge of tools to profile the complete machine. The disk was not the problem, we tested that. Network bandwidth neither. My bet is on the network latency, or even the OS as the bottleneck kicked in a lot sooner w kernel 2.4
  • mino - Friday, June 17, 2005 - link

    #32 try to think for a moment
    "Because the Opteron can't perform that well in stressful situations you won't post the scores?"

    If the CPU is not the bottleneck in the query cache scenario then why test the effect of CPU at all !!!

    You reminded me friend of mine who "tested" effect the "FSB" has on A64 system NOT having an FSB at all !!! ;-)
    Funny guy indeed.

    And about an intel compiler not beeing used.
    Like it or not, It IS a fact that it is not widely adopted especially among the target audience of this site an article.

    BTW given the past experience intel compiler would produce better code even on AMD systems so don't be so sure! Best code for K7 is made by intelcc set to PIII config. Albeit it does not use 3DNow! functionality at all.
  • ElMoIsEviL - Friday, June 17, 2005 - link

    I think I have to agree with #20, as much as I am un-biased I feel this test was doctored by AMD... it ressembles the tests we see released by Apple often...

    "We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article."

    Translation, "with the intel compiler AMD lost so being a marketing force for AMD we opted not to post those scores".


    and also as was mentioned before...
    ""The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%."

    Why not?
    Because the Opteron can't perform that well in stressful situations you won't post the scores?

    Seriously.. this test is the biggest load of BS I have ever read... and I'm a current AMD adopter.
  • JohanAnandtech - Friday, June 17, 2005 - link

    Viditor, it is possible that the IOMMU might have to do something with it.

    The IOMMU is a memory mapping unit sitting between the I/O bus and physical memory.

    Memory mapping is AFAIK only necessary if a certain device (PCI devices come to mind) can not do a 64 bit DMA. Now it seems that almost everything inside the newest Intel southbridges can do 64 bit DMA.

    So the IOMMU can only play a role when the driver is a 32 bit only, and the memory mapping has to happen. Now I would think that Intel would have an advantage here with their ultra modern southbridges. There might be a device that I am overlooking of course. Maybe our SCSI controller... But I don't think so.
  • Viditor - Friday, June 17, 2005 - link

    Johan, if you're still reading (great article BTW)...
    A question I have had for quite awhile now is what effect the IOMMU has on these tests.
    The reasons I'm asking are
    1. I noticed that there was quite a disparity between the AMD and Intel 64bit performance (which you mentioned).
    2. I know that one difference between the 2 platforms is that AMD has a hardware IOMMU (of sorts) and Intel (at present) does not.
    3. I saw a thread last year with Linus T mentioning this quite a bit. He seemed to think that this would impair the EM64T substantially...

    Your thoughts?
  • JohanAnandtech - Friday, June 17, 2005 - link

    If your database is running many "identical databases".... I meant "queries"

  • JohanAnandtech - Friday, June 17, 2005 - link

    Juhl: It was 2.6.12rc5.

    Viditor: thanks for the helpful comment. Indeed, if you turn on the query cache, your CPU is doing very little.
    Everybody else: note the "identical" word in viditor's quote. If your database is running many identical databases, than you are not going to spend time reading this kind of article: you simply buy the cheapest decent server. Any CPU today can run 1000s of querries if everything comes out the query cache.

    Running benchmarks with the query cache on is simply not interesting. The query cache is all about accelerating the IDENTICAL queries that are run from time to time. You might reserve a bit of RAM to make sure that the most common queries (getting the latest article of a website for example) are run faster.

    But those numbers don't tell you anything about the load that your server is going to be able to take. You want worst case performance numbers!
  • Viditor - Friday, June 17, 2005 - link

    Questar - the reason the query cache was turned off (guessing here) is to more reasonably simulate a real-world test. Obviously in this test, the same queries are repeated quite often. But that is not usually the case in the real world...
    For those who don't know what the heck a "query cache" is:

    "the query cache stores the text of a SELECT query together with the corresponding result that was sent to the client. If the identical query is received later, the server retrieves the results from the query cache rather than parsing and executing the query again"
  • Questar - Friday, June 17, 2005 - link

    #23,

    We don't know, it specifically says Xeon. We don't have any idea what happens on an Opteron.

Log in

Don't have an account? Sign up now