Quad Xeon 7500, the Best Virtualized Datacenter Building Block?

Name: Quad Xeon 7500, the Best Virtualized Datacenter Building Block?
Item: Quad Xeon 7500, the Best Virtualized Datacenter Building Block?
Author: Johan De Gelas

by Johan De Gelas on August 10, 2010 5:10 PM EST

Posted in
IT Computing

51 Comments | Add A Comment

51 Comments

vApus Mark II

vApus Mark II uses the same applications as vApus Mark I, but they have been updated to newer versions. vApus Mark I uses five VMs with three server applications:

One VM with the Nieuws.be OLAP database, based on SQL Server 2008 x64 running on Windows 2008 64-bit R2, stress tested by our in-house developed vApus test.
Three MCS eFMS portals running PHP, IIS on Windows 2003 R2, stress tested by our in house developed vApus test.
One OLTP database, based on the Swing bench 2.2 “Calling Circle benchmark” of Dominic Giles. We updated the Oracle database to version 11G R2 running on Windows 2008 R2.

All VMs are tested with several sequential user concurrencies. All VMs are “warmed up” with lower user counts. We measure only at the higher concurrencies, later in the test. At that point, results are repetitive as the databases are using their caches and buffers optimally.

The OLAP VM is based on the Microsoft SQL Server database of the Dutch Nieuws.be site, one of the newest web 2.0 websites launched in 2008. We updated to SQL Server 2008 R2. This VM gets now eight virtual CPUs (vCPUs), a feature that is supported by the newest hypervisors such as VMware ESX 4.0 and Xen 4.0. This kind of high vCPU count is one of the conditions that needs to be met before administrators will virtualize these kind of “heavy duty” applications. The application hardly touches the disk, as the vast majority of activity is in memory during the test cycle. About 135GB of disk space is necessary, but the most used data is cached in about 4GB of RAM.

The MCS eFMS portal, a real-world facility management web application, has been discussed in detail here. It is a complex IIS, PHP, and FastCGI site running on top of Windows 2003 R2 32-bit. Note that these two VMs run in a 32-bit guest OS, which impacts the VM monitor mode. We left this application running on Windows 2003, as virtualization allows you to minimize costs by avoiding unnecessary upgrades. We use three MCS VMs, as web servers are more numerous than database servers in most setups. Each VM gets two vCPUs and 2GB of RAM space.

Since OLTP testing with our own vApus stress testing software is still in beta, our fourth VM uses a freely available test: "Calling Circle" of the Oracle Swingbench Suite. Swingbench is a free load generator designed by Dominic Giles to stress test an Oracle database. We tested the same way as we have tested before, with one difference: we use an OLTP database that is only 2.7GB (instead of 9.5GB). The OLTP test runs on the Oracle 11g R2 64-bit on top of Windows 2008 Enterprise R2 (64-bit). Data is placed on an Intel X25-E SLC SSD, with logs on a separate SSD. This is done for each Calling Circle VM to avoid storage bottlenecks. The OLTP VM gets four vCPUs.

Notice that our total vCPU count is 18 (8 + 3 x 2 + 4). The advantage of using 18 vCPUs per tile is it will not be straightforward to schedule virtual CPUs on almost every CPU configuration. You might remember from our previous testing that if the number of virtual CPUs is a multiple of the number of physical cores, the server gets a performance advantage over other systems.

Careful monitoring (ESXtop) showed us that four tiles of vApus Mark II (72 vCPUs) were enough to keep the fastest system at an average of 96.5% CPU utilization during performance measurements.

Stress Testing the High End The Virtualization Landscape So Far

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

51 Comments

View All Comments

fynamo - Wednesday, August 11, 2010 - link
WHERE ARE THE POWER CONSUMPTION CHARTS??????

Awesome article, but complete FAIL because of lack of power consumption charts. This is only half the picture -- and I dare to say it's the less important half.
davegraham - Wednesday, August 11, 2010 - link
+1 on this.
JohanAnandtech - Thursday, August 12, 2010 - link
Agreed. But it wasn't until a few days before I was going to post this article that we got a system that is comparable. So I kept the power consumption numbers for the next article.
watersb - Wednesday, August 11, 2010 - link
Wow, you IT Guys are a cranky bunch! :-)

I am impressed with the vApus client-simulation testing, and I'm humbled by the complexity of enterprise-server testing complexity.

A former sysadmin, I've been an ignorant programmer for lo these past 10 years. Reading all these comments makes me feel like I'm hanging out on the bench in front of the general store.

Yeah, I'm getting off your lawn now...
Scy7ale - Wednesday, August 11, 2010 - link
Does this also apply to consumer HDDs? If so is it a bad idea to have an intake fan in front of the drives to cool them as many consumer/gaming cases have now?
JohanAnandtech - Thursday, August 12, 2010 - link
Cold air comes from the bottom of the server aisle, sometimes as low as 20°C (68F) and gets blown at high speed over the disks. Several studies now show that this is not optimal for a HDD. In your desktop, the temperature of the air that is blown over the hdd should be higher, as the fans are normally slower. But yes, it is not good to keep your harddisk at temperatures lower than 30 °C . use hddsentinel or speedfan to check on this. 30-45°C is acceptable.
Scy7ale - Monday, August 16, 2010 - link
Good to know, thanks! I don't think this is widely understood.
brenozan - Thursday, August 12, 2010 - link
http://en.wikipedia.org/wiki/UltraSPARC_T2
2 sockets =~ 153GHz
4 sockets =~ 306GHz
Like the T1, the T2 supports the Hyper-Privileged execution mode. The SPARC Hypervisor runs in this mode and can partition a T2 system into 64 Logical Domains, and a two-way SMP T2 Plus system into 128 Logical Domains, each of which can run an independent operating system instance.

why SUN did not dominate the world in 2007 when it launched the T2? Besides the two 10G Ethernet builtin processor they had the most advanced architecture that I know, see in
http://www.opensparc.net/opensparc-t2/download.htm...
don_k - Thursday, August 12, 2010 - link
"why SUN did not dominate the world in 2007 when it launched the T2?"

Because it's not actually that good :) My company bought a few T2s and after about a week of benchmarking and testing it was obvious that they are very very slow. Sure you get lots and lots of threads but each of those threads is oh so very slow. You would not _want_ to run 128 instances of solaris, one on each thread, because each of those instances would be virtually unusable.

We used them as webservers.. good for that. Or file servers that you don't need to do any cpu intensive work.

The theory is fine and all but you obviously have never used a T2 or you would not be wondering why it failed.
JohanAnandtech - Thursday, August 12, 2010 - link
"http://en.wikipedia.org/wiki/UltraSPARC_T2
2 sockets =~ 153GHz
4 sockets =~ 306GHz"

You are multiplying threads times clockspeed. IIRC, the T2 is a finegrained multithread CPU where 8 (!!) threads share two pipelines of *one* core.

Compare that with the Nehalem core where 2 threads share 4 "pipelines" (sustained decode/issue/execution/retire) per cycle. So basically, a dual socket T2 is nothing more than 16 relatively weak cores which can execute 2 instructions per clockcycle at the most, or 32 instructions per cycle. The only advantage of having 8 threads per core is that (with enough indepedent software threads) the T2 is able to come relatively close to that kind of throughput.

A dual six-core Xeon has a maximum throughput of 12 cores x 4 instructions or 48 instructions per cycle. As the Xeon has only 2 threads per core, it is less likely that the CPU will ever come close to that kind of output (in business apps). On the other hand, it performs excellent when you have some amount of dependent threads, or simply not enough threads in parallel. The T2 will only perform well if you have enough independent threads.

Quad Xeon 7500, the Best Virtualized Datacenter Building Block?

vApus Mark II

Post Your Comment

51 Comments

View All Comments

fynamo - Wednesday, August 11, 2010 - link

davegraham - Wednesday, August 11, 2010 - link

JohanAnandtech - Thursday, August 12, 2010 - link

watersb - Wednesday, August 11, 2010 - link

Scy7ale - Wednesday, August 11, 2010 - link

JohanAnandtech - Thursday, August 12, 2010 - link

Scy7ale - Monday, August 16, 2010 - link

brenozan - Thursday, August 12, 2010 - link

don_k - Thursday, August 12, 2010 - link

JohanAnandtech - Thursday, August 12, 2010 - link

Log in

Don't have an account? Sign up now