The New Benchmark Suite

We've made some changes to our benchmarks to accommodate the required multiple load scenarios we used in this article. The first benchmark we overhauled was the Dell DVD Store test (http://linux.dell.com/dvdstore/). In the last article (the first time we used Dell DVD Store), we used the stock Dell SQL driver along with a medium sized database (which is approximate 3GB). This time around we wanted to use a larger database to show a more enterprise based e-commerce scenario. To get a larger database we took the medium database and upped the customers to 20 million from 2 million and upped the products from a hundred thousand to 1 million. This resulted in a 14GB database.

We modified the driver code as well. We started off by taking the included C# driver source code and changing the way it created the threads (users). Basically, in stock form the driver creates all the threads and users in one shot and then starts executing orders. Since we wanted to be able to dynamically add threads to achieve certain load levels, we added a method to the class to add users. At the same time we also added a few properties so that we could use a Windows Form application to house the class and report back various performance counters. This allows us to graph CPU usage and orders per minute over the duration of the test, and we can save the graphs for historical reporting. The Forum benchmark also got an overhaul using the same GUI driver, and a few changes to the way the queries were executed against the database.

Both of the benchmark applications record their results back to a database server, where we average the results over the N number of runs for our graphs. We also allow the GUI to take command line parameters, which allows us to set up batch files to run an entire platform. On average it takes almost 20 hours to run a platform (due to the fact we run 5 iterations of each load point). It is important to look at the deviations between benchmark runs to ensure scores are consistent and representative of typical performance. The deviations are all relatively low which is very good, with the average deviation being 1.6%.

Dell & Forum SQL Trace Analysis

The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).

Test Configuration

Below are the configurations of the test machines. We should note that the Opteron system memory was set to 1T and NUMA was enabled.

Client
Dual AMD Opteron 256
4GB Memory
Gigabit Ethernet
Windows 2003 x64 Server

Woodcrest/Dempsey System
Intel OEM System (Pre-Production)
8GB 533MHz FB-DIMM
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller

Opteron 280/285 System
Tyan S2891 Motherboard
8GB PC3200 DDR 400MHz
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller

Architecture Summary Multiple Load Points
POST A COMMENT

59 Comments

View All Comments

  • peternelson - Thursday, July 13, 2006 - link

    Agreed!

    I'm not interested in 32 bit performance.

    If you're gonna be spending this much money on an upmarket system you better be running it in 64 bit mode. I know I will.

    So if Opteron benches better than Woodcrest in 64 bit mode that changes the equation for me.

    Also isn't Opteron 290 out any time now? The would close the % gap a little because of the higher clock speed.

    Also S1207 Opterons will be here 1st August. The new nforce5 based pro/server chipsets might give a little boost over existing ones too, as could the bandwidth boost and lower power of DDR2.
    Reply
  • defter - Thursday, July 13, 2006 - link

    quote:

    I'm not interested in 32 bit performance.


    Check the review, this (and previous Linux review) uses only 64bit software.
    Reply
  • Kiijibari - Thursday, July 13, 2006 - link

    Oh indeed, x64 Windows was used.

    Good to see than, that woodcrest doesnt take a 64bit penalty. Maybe the Linux application of my source uses FP calculations and no SSE, or it is due to the compiler which may be in favour for Intel on the windows. Anyways 64bit is a big "playing" field for benchmarks looking forward to read more ;-)

    cheers

    Kiijibari
    Reply
  • Calin - Thursday, July 13, 2006 - link

    Slightly better performance and slightly lower power consumption. Looks like you have a winner for new servers.
    However, for a Fortune 500 company, there are other things much more important than slightly better performance and slightly lower power consumption.
    Reply
  • JarredWalton - Thursday, July 13, 2006 - link

    After the poor showing of NetBurst Xeons against Opteron, I'd think any Dell shops would be thrilled to regain the performance crown. Also, frankly, a 5-10% lead is about all most things get you these days, especially when I/O and everything else comes into play. The Woodcrest systems have better overall CPU performance, but it often isn't that important when working on massive databases.

    Incidentally, from what we've seen of Conroe, it seems like Intel could release Core chips at up to 3.4-3.6 GHz without difficulty right now. Rather surprising, given the 14 stage pipeline vs. 39 for Prescott.
    Reply
  • FesterOZ - Thursday, July 13, 2006 - link

    Actually its not a big thrill at all. One of the major pushes at the firm is to consolidate into VMware based servers or larger raw servers but in all cases stop the traditional 1 server per application that seems to affect most firms. Therefore we are more focused on 4 socket 8 core style servers i.e. HP BL45 blades than 2 socket blades. We had all the top level Dell executives coming in trying to convince us to stay with Dell because at this time, they have no answer for the larger server (the Oct/Nov timeframe for the Dell 4 socket AMD server is too far out). So in the short term we will be a hybrid Dell/HP shop. Maybe we will shift back if Dell's commitment to AMD indeed ramps as expected. Reply
  • Dubb - Thursday, July 13, 2006 - link

    I doubt this has much practical use, but I am nonetheless curious...could you "pinmod" a cloverton to run 1333 FSB?

    might make for some speedy rendering if it was stable.
    Reply
  • Kiijibari - Thursday, July 13, 2006 - link

    quote:

    I doubt this has much practical use, but I am nonetheless curious...could you "pinmod" a cloverton to run 1333 FSB?
    No that is not possible, dont you think that intel would release it, if it would be possible ? Just think about it, to lower the FSB bandwidth on the 4core part doesnt make sense, does it ? 4 cores are much more bandwith hungry than 2 ..

    Reason is the "Intel bolt-together" architecture. The 4core part is just 2 Dies in one package, thus it will have twice the bus load of a single (dual core) CPU. Intel did the same already with Netburst dual cores, hence you have the same FSB limitations there.

    All in all it is a little bit odd, Cloverton/Kentsfield performance increases will be much less than linear, but Intel has the advantage of time to market vs. the AMD K8L quad core. Though AMD's QC design looks much more sound I expect intel to be 1st with releasing a quad core CPU.

    cheers

    Kiijibari
    Reply
  • Dubb - Thursday, July 13, 2006 - link

    okaaaayyy...

    cloverton's platform supports 1333, and the kentsfield ESs run 1333 easily. most clovertons probably CAN, it's just a question of if the 1066>1333 pinmod some have suggested for dempsey or 1066 woodcrests actually works, and if so, clovertons might be an interesting application of it.

    I'm just curious, is'all.
    Reply
  • Kiijibari - Thursday, July 13, 2006 - link

    It may work, however that kind of overclocking is more dangerous than normal overclocking. It is easy to oc a chip that run at, lets say 2 GHz, and there is e.g. a 3 GHz top model. Chances are good that yields are well, thus your 2 GHz model may be able to run faster as most models pass 2.6 GHz tests, thus your model was just down binned to 2 GHz.

    However with the FSB1066 vs. FSB1333 I assume that you are playing around at the absolutly maximum. Intel would do everything to raise FSB speeds, exspecially with Quad cores. It is nonsense from the performance point of view, to decrease the available bandwidth while the number of bandwidth consumers (i.e. cores) increases.

    It might boot & work with a FSB1333 though, but Intel cant and wont gurantee that. It may be good enough for Super Pi or other "fun stuff", but if you run I/O intensive applications, cross your fingers and be prepared for data corruption.

    bb

    Alex
    Reply

Log in

Don't have an account? Sign up now