Quad Xeon 7500, the Best Virtualized Datacenter Building Block?

Name: Quad Xeon 7500, the Best Virtualized Datacenter Building Block?
Item: Quad Xeon 7500, the Best Virtualized Datacenter Building Block?
Author: Johan De Gelas

by Johan De Gelas on August 10, 2010 5:10 PM EST

Posted in
IT Computing

51 Comments | Add A Comment

51 Comments

vApus Mark II

vApus Mark II uses the same applications as vApus Mark I, but they have been updated to newer versions. vApus Mark I uses five VMs with three server applications:

One VM with the Nieuws.be OLAP database, based on SQL Server 2008 x64 running on Windows 2008 64-bit R2, stress tested by our in-house developed vApus test.
Three MCS eFMS portals running PHP, IIS on Windows 2003 R2, stress tested by our in house developed vApus test.
One OLTP database, based on the Swing bench 2.2 “Calling Circle benchmark” of Dominic Giles. We updated the Oracle database to version 11G R2 running on Windows 2008 R2.

All VMs are tested with several sequential user concurrencies. All VMs are “warmed up” with lower user counts. We measure only at the higher concurrencies, later in the test. At that point, results are repetitive as the databases are using their caches and buffers optimally.

The OLAP VM is based on the Microsoft SQL Server database of the Dutch Nieuws.be site, one of the newest web 2.0 websites launched in 2008. We updated to SQL Server 2008 R2. This VM gets now eight virtual CPUs (vCPUs), a feature that is supported by the newest hypervisors such as VMware ESX 4.0 and Xen 4.0. This kind of high vCPU count is one of the conditions that needs to be met before administrators will virtualize these kind of “heavy duty” applications. The application hardly touches the disk, as the vast majority of activity is in memory during the test cycle. About 135GB of disk space is necessary, but the most used data is cached in about 4GB of RAM.

The MCS eFMS portal, a real-world facility management web application, has been discussed in detail here. It is a complex IIS, PHP, and FastCGI site running on top of Windows 2003 R2 32-bit. Note that these two VMs run in a 32-bit guest OS, which impacts the VM monitor mode. We left this application running on Windows 2003, as virtualization allows you to minimize costs by avoiding unnecessary upgrades. We use three MCS VMs, as web servers are more numerous than database servers in most setups. Each VM gets two vCPUs and 2GB of RAM space.

Since OLTP testing with our own vApus stress testing software is still in beta, our fourth VM uses a freely available test: "Calling Circle" of the Oracle Swingbench Suite. Swingbench is a free load generator designed by Dominic Giles to stress test an Oracle database. We tested the same way as we have tested before, with one difference: we use an OLTP database that is only 2.7GB (instead of 9.5GB). The OLTP test runs on the Oracle 11g R2 64-bit on top of Windows 2008 Enterprise R2 (64-bit). Data is placed on an Intel X25-E SLC SSD, with logs on a separate SSD. This is done for each Calling Circle VM to avoid storage bottlenecks. The OLTP VM gets four vCPUs.

Notice that our total vCPU count is 18 (8 + 3 x 2 + 4). The advantage of using 18 vCPUs per tile is it will not be straightforward to schedule virtual CPUs on almost every CPU configuration. You might remember from our previous testing that if the number of virtual CPUs is a multiple of the number of physical cores, the server gets a performance advantage over other systems.

Careful monitoring (ESXtop) showed us that four tiles of vApus Mark II (72 vCPUs) were enough to keep the fastest system at an average of 96.5% CPU utilization during performance measurements.

Stress Testing the High End The Virtualization Landscape So Far

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

51 Comments

View All Comments

haplo602 - Wednesday, August 11, 2010 - link
This is one of the bottlenecks of your virtualised environemnt. A storage solution is only the limit if you do not use it as it was designed to be used.

the more IO demanding application you have, the less virtualisation is going to offer any benefits. usualy CPU power is the last issue after netwrok, disk and memory.

I had a good laugh at the opening page. High end servers are High end not because of the increased performance but because of the better management and disaster tolerance/recovery they offer. After all, they use the same CPUs and memory as the low end servers, just everything else is different (OLRAD, hot swap/plug of almost anything except memory and CPU).
webdev511 - Thursday, August 12, 2010 - link
Well, if you're willing to spend some more money on Solid State (if you go with two twelve core cpus you'll save on licences) you could stuff four of the new Fusion IO 1.28 TB Duo Drives into the box and map them as System Drives and then use attached storage for big files.
SomeITguy - Wednesday, August 11, 2010 - link
No offense intended, and I know this will put you on the defensive, but it sounds to me like the "development environment" was ill conceived in the design phase. You obviously overbought on processor power. The first step in designing an environment, is knowing what your apps need. You can't just buy servers, then whine about how poorly the performance matches the overall system capability...

Last job I had Citrix Xen on HP blades with 53xx and 54xx CPU's, running about 150 production VM's. On the order of >300 total, with R&D and QA. The company had no money, and because of that we only ran local storage for the OS and most functions. The shared data we did have were on Netapps, and that alone constantly spiked up to +25k IOPS. I can't remember were each blade sat on IOPS, but it was high. I was able to balance resources utilized most of the day to about the ~60% level, with spikes hitting the high 80's. No resources being overly wasted. To do this effectively takes time and patience. You need to economize. 12 VM's on a blade with 16GB of memory was not unheard of...

Then there is the whole ESX thing, eh, won't get into that. Again, you need to know what is going to run on the servers before you spend (waste) money.

In my experience, It's typical that managers just override the lowly sysadmin advice, take a vendors word over the sysadmin who manages the app, or a business unit buys you the equipment without consulting, then says "here, make it work".

Overall, I thought the article good. It is just a guide, not a bible.
davegraham - Tuesday, August 10, 2010 - link
So, i'm sitting here with a spanking new Dell R815 which is a quad socket G34 system and is shipping today w/ AMD Opteron 6176SE parts...so, this article is outdated even before it begins. (oh, did i mention it's only 2RU?)

I'm also very curious as to what the underlying storage is for all these tests as it definitely can have an impact on the servicability of the testing.

I'm curious as to the details per VM was well...IOMMU choices, HT sharing, NUMA settings, as well as the version of ESX being used?

dave
JohanAnandtech - Wednesday, August 11, 2010 - link
"So, i'm sitting here with a spanking new Dell R815 which is a quad socket G34 system and is shipping today w/ AMD Opteron 6176SE parts...so, this article is outdated even before it begins. (oh, did i mention it's only 2RU?)"

Testing servers is not like testing videocards. I can not plug the R815 in a ready installed windows pc and push the button of "Servermark". It does not work that way as you indicate yourself. A complete storage system must be set up, and in many cases ESX fails to install the first time on a brand new server. We perform a whole battery of monitoring tests for example that confirm that the DQL is low enough.

The storage system we use for the 4 tile test is a 8 disk SSD system for the OLTP tests (described in this article). The VMs themselves sit on a separate RAID controller connect to a promise JBOD. The JBOD has 8 15000 rpm SAS disks. The only really disk intensive app is Swingbench in this test, and by making sure both data and logs get their separate SSD , we achieve DQLs under 0.1. There is lot more to the Oracle config, but if you are interested, we can share the parameter file.

Anyway, the low DQL and the fact that we scale well from 2 tot 4 tiles shows that we are not limited by the disks.
davegraham - Wednesday, August 11, 2010 - link
johan,

I work with VMware for a living doing platform testing for the product i support. ;) consequently, I'm very well aware of the requirements for testing VMware and the various and sundry components within the server. Hence, my slightly critical view of what you're doing here.

appreciate the response on the storage....again, all well and good with that explanation.

I'll put my quad socket 6176SE system against your 7500 system anyday and i'll enjoy lower rack footprint, lower power consumption, and a positively brilliant VMware experience. ;)

keep up the good work.

dave
blue_falcon - Wednesday, August 11, 2010 - link
If you wan to do a similar 2U config, try the R810, only has 32 dimm sockets but nearly identical to the R910.
mapesdhs - Tuesday, August 10, 2010 - link

Johan, how would this system compare to a low-end quad-socket Altix UV 10? (max
RAM = 512GB).

Ian.
JohanAnandtech - Wednesday, August 11, 2010 - link
I never tested an SGI server, so I can not say for sure. But the hardware looks (and probably is) identical to what we have tested here.
Casper42 - Wednesday, August 11, 2010 - link
Due to the way Dell implemented the memory on their latest Quad socket machines, if you run 2 CPUs with the FlexMem bridge, you get full memory bandwidth but half of the memory sockets are further away from the CPU due to the extra trace length of going to the empty CPU socket and through the FlexMem bridge.

When you put in 4 CPUs you only get half the memory bandwidth of an Intel reference design. This is because the traces that would normally go to the empty CPU socket and through the FlexMem now go essentially nowhere because the CPU in that socket needs the access instead.

I would say try IBM or HP. Just beware that IBM does some weird stuff when it comes to their Max5 memory expansion module that can also cause additional memory latency for some of the DIMM sockets and not the others.

Quad Xeon 7500, the Best Virtualized Datacenter Building Block?

vApus Mark II

Post Your Comment

51 Comments

View All Comments

haplo602 - Wednesday, August 11, 2010 - link

webdev511 - Thursday, August 12, 2010 - link

SomeITguy - Wednesday, August 11, 2010 - link

davegraham - Tuesday, August 10, 2010 - link

JohanAnandtech - Wednesday, August 11, 2010 - link

davegraham - Wednesday, August 11, 2010 - link

blue_falcon - Wednesday, August 11, 2010 - link

mapesdhs - Tuesday, August 10, 2010 - link

JohanAnandtech - Wednesday, August 11, 2010 - link

Casper42 - Wednesday, August 11, 2010 - link

Log in

Don't have an account? Sign up now