Benchmarks IBM DB2 8.2: Intel versus AMD

Below, you will find our results for the different platforms of AMD and Intel. At the last moment, the Pentium 4 670 3.8 Ghz arrived in the labs, so we decided to give this CPU a quick test run. In these tests, we enabled the new Asynchronous I/O feature, which gave the Intel Xeon a small performance boost (4 to7%), while it made the Opteron perform only a tiny bit faster (1%).

Concurrency Dual Xeon
Irwindale
Single Xeon
Irwindale
Dual Xeon Nocona Single Xeon Nocona Dual Opteron Dual Opteron Single Opteron Dual Opteron Intel Pentium D Dual Core Intel Pentium 4
3.6 GHz 3.6 GHz 3.6 GHz 3.6 GHz 2.2 GHz 2.4 GHz 2.4 GHz 2.6 GHz 3.2 GHz 3.8 GHz
                   
1 94 90 101 95 97 116 119 124 89 99
2 172 109 164 107 202 219 151 233 141 118
5 207 114 215 110 262 287 156 308 199 123
10 228 115 223 117 268 294 156 320 201 126
20 225 118 207 112 264 306 153 328 202 124
35 232 116 215 116 275 284 153 308 174 120
50 230 114 214 113 275 281 150 307 203 127
                     
AVG 225 115 215 114 269 291 153 314 196 124

All averages are calculated on the concurrency levels from 5 to 50. There is no doubt about it: it pays off big time to invest in a multi-CPU machine in DB2. It is of no use to invest in the fastest single CPU system. A mid-range dual CPU system will easily outperform it.

The table below is an overview of the differences in the CPUs.

Concurrency Dual versus Single Xeon Irwindale Dual versus Single Xeon Nocona Dual Opteron 250 vs Single Dual Opteron 2,6 GHz versus Irwindale 3,6 GHz Xeon Irwindale versus Nocona
1 5% 6% -3% 32% -7%
2 57% 53% 45% 36% 4%
5 82% 96% 84% 49% -4%
10 99% 91% 89% 40% 2%
20 92% 84% 100% 46% 9%
35 99% 86% 86% 33% 8%
50 102% 89% 88% 33% 7%
           
AVG 95% 89% 89% 40% 5%

The performance of DB2 scales almost perfectly on the different platforms. Irwindale scales a little better than two other CPUs, probably thanks to the larger L2-cache. However, this does not save Intel from defeat: the Opteron 2.6 GHz is the champion in these tests. What happened? In our previous test, the fastest Xeon (Nocona 3.6 GHz) was a bit faster than the best Opteron (250, 2.4 GHz). First of all, the Opteron 252 scales very well, and is 8% faster than its older 2.4 GHz brother, as the 252 is clocked at 8.3% higher. But the Xeon Irwindale gets a 5% - 7% performance from its larger L2-cache, so that is not the real issue.

However, when we compared a 64 bit with a 32 DB2 instance, the Opteron gained 13% performance from moving to 64 bit, while the Xeon lost 3 to 4%! Secondly, with the 2.4 kernel, the Xeon gained an additional boost from Hyperthreading, while we could not measure this performance increase anymore. Thirdly, it seems that the Opteron gains more due to the move from the 2.4 kernel to 2.6 kernel than the Xeon.

Benchmarks IBM DB2: Single core versus Dual core

What about our Dual core Opteron 875/275? We managed to get DB2 running on Gentoo, kernel 2.6.12rc5. You can find the results below. All tests have been performed on the MSI K8Master-FAR2.

Concurrency Dual Dual Core AMD Single Dual Core AMD Dual Opteron Quadcore vs Dual Dualcore versus Dual Single
2.2 GHz 2.2 GHz 2.2 GHz
         
1 107 118 111 -9% 6%
2 194 213 162 -9% 32%
5 368 242 222 52% 9%
10 423 256 227 66% 13%
20 448 253 216 77% 17%
35 434 246 213 76% 16%
50 429 251 218 71% 15%
           
AVG 421 250 219 68% 14%

Simply amazing how much punch the Dual core 275/875 has. It offers a 14% performance increase over a completely similar configured dual CPU Opteron 248 setup. Add a second core, and DB2 8.2 rewards you with another 70% performance increase. And all this is happening on our ATX MSI K8Master-FAR2 board.

Benchmarks IBM DB2: Single versus Dual versus Quad

What about the “conventional” quad CPU configuration? The Iwill H4103 was our testing platform.

Concurrency Dual Opteron 848 Quad Opteron 848 Quad versus Dual
2.2 GHz 2.2 GHz  
     
1 102 104 2%
2 184 186 1%
5 212 318 50%
10 218 358 64%
20 212 375 77%
35 223 393 76%
50 208 377 81%
       
AVG 214 364 70%

DB2 continues to scale very well. A 70% performance increase is the result of adding two more CPUs. Notice that the Quad CPU need 20 concurrent connections running many queries to get to the full potential (up to 80% performance increase). The Quad Xeon was unfortunately not available to the lab.

Benchmarks (continued) Analyses and Conclusion
Comments Locked

45 Comments

View All Comments

  • JohanAnandtech - Saturday, June 18, 2005 - link

    Mino, thanks for pointing that out. Query cache enabling has nothing to do with "stressful". It has to do with accelarting a few queries that are run over and over again. Which is very interesting for reducing the response time of a website serving up the last article, but which is not limited by CPU power at all.



  • JohanAnandtech - Saturday, June 18, 2005 - link

    To the people who make a fuss about disabling the query cache: this has nothing to with the Opteron not performing well in that situation. Single Xeon: 980 queries/s. Dual xeon: 985 queries/s Opteron 250: 1020 queries/s . Get it now why I say "other bottlenecks started to kick in"?

    It impossible that a dual xeon can't outperform a single one in these tests. We tried to find the bottleneck and even used a quad opteron 850 as client. The client was not the problem. My bet is on the network latency, but I have no knowledge of tools to profile the complete machine. The disk was not the problem, we tested that. Network bandwidth neither. My bet is on the network latency, or even the OS as the bottleneck kicked in a lot sooner w kernel 2.4
  • mino - Friday, June 17, 2005 - link

    #32 try to think for a moment
    "Because the Opteron can't perform that well in stressful situations you won't post the scores?"

    If the CPU is not the bottleneck in the query cache scenario then why test the effect of CPU at all !!!

    You reminded me friend of mine who "tested" effect the "FSB" has on A64 system NOT having an FSB at all !!! ;-)
    Funny guy indeed.

    And about an intel compiler not beeing used.
    Like it or not, It IS a fact that it is not widely adopted especially among the target audience of this site an article.

    BTW given the past experience intel compiler would produce better code even on AMD systems so don't be so sure! Best code for K7 is made by intelcc set to PIII config. Albeit it does not use 3DNow! functionality at all.
  • ElMoIsEviL - Friday, June 17, 2005 - link

    I think I have to agree with #20, as much as I am un-biased I feel this test was doctored by AMD... it ressembles the tests we see released by Apple often...

    "We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article."

    Translation, "with the intel compiler AMD lost so being a marketing force for AMD we opted not to post those scores".


    and also as was mentioned before...
    ""The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%."

    Why not?
    Because the Opteron can't perform that well in stressful situations you won't post the scores?

    Seriously.. this test is the biggest load of BS I have ever read... and I'm a current AMD adopter.
  • JohanAnandtech - Friday, June 17, 2005 - link

    Viditor, it is possible that the IOMMU might have to do something with it.

    The IOMMU is a memory mapping unit sitting between the I/O bus and physical memory.

    Memory mapping is AFAIK only necessary if a certain device (PCI devices come to mind) can not do a 64 bit DMA. Now it seems that almost everything inside the newest Intel southbridges can do 64 bit DMA.

    So the IOMMU can only play a role when the driver is a 32 bit only, and the memory mapping has to happen. Now I would think that Intel would have an advantage here with their ultra modern southbridges. There might be a device that I am overlooking of course. Maybe our SCSI controller... But I don't think so.
  • Viditor - Friday, June 17, 2005 - link

    Johan, if you're still reading (great article BTW)...
    A question I have had for quite awhile now is what effect the IOMMU has on these tests.
    The reasons I'm asking are
    1. I noticed that there was quite a disparity between the AMD and Intel 64bit performance (which you mentioned).
    2. I know that one difference between the 2 platforms is that AMD has a hardware IOMMU (of sorts) and Intel (at present) does not.
    3. I saw a thread last year with Linus T mentioning this quite a bit. He seemed to think that this would impair the EM64T substantially...

    Your thoughts?
  • JohanAnandtech - Friday, June 17, 2005 - link

    If your database is running many "identical databases".... I meant "queries"

  • JohanAnandtech - Friday, June 17, 2005 - link

    Juhl: It was 2.6.12rc5.

    Viditor: thanks for the helpful comment. Indeed, if you turn on the query cache, your CPU is doing very little.
    Everybody else: note the "identical" word in viditor's quote. If your database is running many identical databases, than you are not going to spend time reading this kind of article: you simply buy the cheapest decent server. Any CPU today can run 1000s of querries if everything comes out the query cache.

    Running benchmarks with the query cache on is simply not interesting. The query cache is all about accelerating the IDENTICAL queries that are run from time to time. You might reserve a bit of RAM to make sure that the most common queries (getting the latest article of a website for example) are run faster.

    But those numbers don't tell you anything about the load that your server is going to be able to take. You want worst case performance numbers!
  • Viditor - Friday, June 17, 2005 - link

    Questar - the reason the query cache was turned off (guessing here) is to more reasonably simulate a real-world test. Obviously in this test, the same queries are repeated quite often. But that is not usually the case in the real world...
    For those who don't know what the heck a "query cache" is:

    "the query cache stores the text of a SELECT query together with the corresponding result that was sent to the client. If the identical query is received later, the server retrieves the results from the query cache rather than parsing and executing the query again"
  • Questar - Friday, June 17, 2005 - link

    #23,

    We don't know, it specifically says Xeon. We don't have any idea what happens on an Opteron.

Log in

Don't have an account? Sign up now