"Order Entry" Stress Test: Measuring Enterprise Class Performance

One complaint that we've historically received regarding our Forums database test was that it isn't strenuous enough for some of the Enterprise customers to make a good decision based on the results.

In our infinite desire to please everyone, we worked very closely with a company that could provide us with a truly Enterprise Class SQL stress application. We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment).

We will use an Order Entry system as an analogy for how this test interacts with the database. All interaction with the database is via stored procedures. The main stored procedures used during the test are:

sp_AddOrder - inserts an Order
sp_AddLineItem - inserts a Line Item for an Order
sp_UpdateOrderShippingStatus - updates a status to "Shipped"
sp_AssignOrderToLoadingDock - inserts a record to indicate from which Loading Dock the Order should be shipped
sp_AddLoadingDock - inserts a new record to define an available Loading Dock
sp_GetOrderAndLineItems - selects all information related to an Order and its Line Items

The above is only intended as an overview of the stored procedure functionality; obviously, the stored procedures perform other validation, and audit operations.

Each Order had a random number of Line Items, ranging from one to three. Also randomized was the Line Items chosen for an order, from a pool of approximately 1500 line items.

Each test was run for 10 minutes and was repeated three times. The average between the three tests was used. The number of Reads to Writes was maintained at 10 reads for every write. We debated for a long while about which ratio of reads to writes would best serve the benchmark, and we decided that there was no correct answer. So, we went with 10.

The application was developed using C#, and all database connectivity was accomplished using ADO.NET and 20 threads - 10 for reading and 10 for inserting.

So, to ensure that IO was not the bottleneck, each test was started with an empty database and expanded to ensure that auto-grow activity did not occur during the test. Additionally, a gigabit switch was used between the client and the server. During the execution of the tests, there were no applications running on the server or monitoring software. Task Manager, Profiler, and Performance Monitor were used when establishing the baseline for the test, but never during execution of the tests.

At the beginning of each platform, both the server and client workstation were rebooted to ensure a clean and consistent environment. The database was always copied to the 8-disk RAID 0 array with no other files present to ensure that file placement and fragmentation was consistent between runs. In between each of the three tests, the database was deleted, and the empty one was copied again to the clean array. SQL Server was not restarted.

SQL Stress Results Order Entry Results
Comments Locked

144 Comments

View All Comments

  • Opteron - Monday, June 20, 2005 - link

  • mikeshoup - Wednesday, June 15, 2005 - link

    I think a better option for testing compiling speed would be to pass a -j argument to make when compiling FireFox, and tell it to run as many parallel operations as the processor can take threads. IE: -j2 for a dual core or ht cpu
  • fritz64 - Thursday, May 5, 2005 - link

    I know what I will be getting after fall this year. Those numbers are impresive!
  • jvarszegi - Friday, April 29, 2005 - link

    So they're reproducible, but only in secret. And you knew, as usual, about mistakes you were making, but made them anyway to, um, make a valid comparison to something else that no one can verify. Nicely done. Whatever they're paying you, it's not enough.
  • Ross Whitehead - Thursday, April 28, 2005 - link

    Zebo -

    You are correct you can not reproduce them, but we can and have 10's of times over the last year w/ different hardware. I do not believe that because you cannot reproduce them discounts their validity but it does require you have a small amount of trust in us.

    We have detailed the interaction of the application with the database. With this description you should be able to draw conclusions as to whether it matches the profile of your applications and database servers. Keep in mind, when it comes to performance tuning the most command phrase is "it depends". This means that there are so many variables in a test, that unless all are carefully maintained the results will vary greatly. So, even if you could reproduce it I would not recommend a change to your application hardware until it was validated with your own application as the benchmark.

    The owner of the benchmark is not AMD, or Intel, or anyone remotely related to PC hardware.

    I think if you can get beyond the trust factor there is a lot to gain from the benchmarks and our tests.
  • Reginhild - Thursday, April 28, 2005 - link

    Wow, the new AMD dual cores blow away the "patched together" Intel dual cores!!

    I can't see why anyone would choose the Intel dually over AMD unless all the AMDs are sold out.

    Intel needs to get off their arse and design a true dual core chip instead of just slapping two "unconnected" processors on one chip. The fact that the processors have to communicate with each other by going outside the chip is what killed Intel in all the benchmarks.
  • Zebo - Thursday, April 28, 2005 - link

    Ross,

    How can I reproduce them when they are not available to me?

    From your article:
    " We cannot reveal the identity of the Corporation that provided us with the application because of non-disclosure agreements in place. As a result, we will not go into specifics of the application, but rather provide an overview of its database interaction so that you can grasp the profile of this application, and understand the results of the tests better (and how they relate to your database environment)."

    Then don't include them. Benchmarking tools to which no one else has access is not scientific because it can't be reproduced so that anyone with a similar setup can verify the results.

    I don't even know what they do. How are they imporatant to me? How will this translate to anything real world I need to do? How can I trust the mysterious company? Could be AMD for all I know.
  • MPE - Wednesday, April 27, 2005 - link

    #134

    How can it be the best for the buck ? Unless you are seeing benchmarks from Anand that says so how could come to the conclusion?
    At some tests the 3800+ was the worse performer while the X2 and PD where the best.

    You are extrapolating logic from air.
  • Ross Whitehead - Wednesday, April 27, 2005 - link

    #131

    "no mystery unreproducable benchmarks like Anand's database stuff."

    It is not clear what you mean by this statement. The database benchmarks are 100% reproducable and are real life apps not synthetic or academic calcs.
  • nserra - Wednesday, April 27, 2005 - link

    You are discussion price and it's not correct since intel goes from 2800 to 3200 and amd goes from 3500+ into 4000+ (i'm ignoring amd new model numbers, still based on older).

    I complete disagree the AMD model numbers, the should be = to single core, the should just had the X2.

    The TRUE X2 will be more performer than opteron, by 2% to 5%.

Log in

Don't have an account? Sign up now