Original Link: http://www.anandtech.com/show/1724



Introduction

Several months ago, Sun gave us the opportunity to look at a quad Opteron, 3U rackmount server that had everyone reevaluating Intel's dominance in the server arena. Four months later, AMD finally has some significant market share for entry level servers. Our original quad Opteron 850 server was an impressive piece of machinery, but since AMD's launch of dual core processors, a whole new class of high performance entry servers has evolved.

Sun was extremely pleased to announce to us that their dual core V40z had set the 64-bit SPEC JBB2000 World Record. We couldn't have been more excited to get a similar configuration for our testing to do some real world benchmarks for ourselves! Today we will look at one of these high performance Sun Fire V40z machines and see how they compare against the previous generation of V40z.

Since we are looking at an 8-core Opteron server today, a huge portion of our time will be dedicated to investigating scalability. While it is relatively hard to take an operating system or application and design it to run on two processors instead of one, it is also equally hard to take a system designed for two processes and scale it to eight. The cost prohibitive nature of 8-way systems traditionally make scaling defects difficult to really ascertain, and likewise its not too often that we even pay attention to them. With processors heading toward dual and eventually quad core, scalability of the OS and hardware begin to carry more and more weight.

For anyone in the computer business, there really isn't a better feeling than seeing 8 Opterons POST.



New Changes to the V40z

The heart of the new V40z is, of course, the Opteron 875 processors. Since a single 2.6GHz Opteron draws the same power as a dual core 2.2GHz Opteron, the V40z doesn't seem to have too much difficulty with drop in compatibility. In theory at least, a dual core Opteron 875 should have the same power envelope as an Opteron 252 but provides an additional 50-80% performance boost. For more details on the Opteron 875, please check out our earlier article here. Although AMD has some background in 8-way server configurations, it is not nearly as extensive as that of Intel, Sun, IBM or HP. Putting eight cores in a 3U is a huge step forward for AMD (and Sun) and processor scaling is a colossal issue.

Additional cooling is not necessary for the dual core Opteron 875s because they have approximately the same TDP as Opteron 852s.

Our previous V40z (Quad Opteron 850) managed to draw 585 watts during peak operation - well within the capability of the redundant 760W power supplies. During peak operation the new V40z with quad Opteron 875s hit 615W during heavy load. This falls in line with AMD and Sun's claim concerning the power consumption.

Aside from the dual core processors, there have been some other changes to the V40z since our last analysis.

  • The two Broadcom BCM5703 gigabit controllers in the previous V40z have been replaced with a single BCM5704.
  • The 800MHz HyperTransport links have been upgraded to full 1GHz HyperTransport links.
  • The Service Processor now supports IPMI 2.0.
  • Support for 300GB hard drives has been added to the BIOS.

The new features of the V40z do come at a premium; the base quad dual core systems from Sun start around $38,995 direct from Sun. However, there are incentive programs (including a Xeon trade-in program) that can reduce this cost by up to 15% and other promos running on Sun's webpage. Third party retailers are also selling Sun system at lower prices, but unfortunately the third party retailers do not offer the same support packages.



Getting a Feel for Solaris 10

Solaris has always had the edge over Linux when it comes to scalability; SunOS had its roots in big iron. As a FreeBSD user first and a Linux user second (and a Windows user a distant third), a lot of things felt very familiar inside Solaris 10. Sun's Java Desktop System, JDS, is immediately recognizable as a SUSE derivative. We won't touch on JDS too much in the next couple pages, but instead focus on the underlying kernel and some Solaris's more interesting and original features.

As we mentioned earlier, scalability is a big issue but if anyone can tackle that obstacle, it's Sun. Sun spent a lot of time getting Solaris 10 ready for x86, and as you will see in our benchmarks section, the "Slowlaris" moniker might be dead. Solaris 10 features a new scheduler that allows per-CPU optimizations as well as faster string functions, SSE2 support and new x86_64 specific libc system calls. Granted, we would like to see some SSE3 optimizations as well, but these features sound pretty good for now.

One of the neatest, unexpected features in Solaris 10 was Zones; also known as N1 Grid Containers. Zones behave very similar to UserMode Linux (UML); each Zone is a virtual instance of Solaris 10 with it's own IP address and user land. However, Zones are different because each virtual OS shares the base kernel. This is a huge performance boost for the virtual operating system because each instance is not constantly waiting for resources like in UML; they can just demand them when necessary. Zones can be configured to only utilize a specified amount of resources as well. Of course the downfall to this is if a single Zone is compromised, the whole system is effectively compromised. Fortunately, compromising a Solaris 10 system is not an easy task either; additional Process Rights Management and User Rights Management are prevalent in Solaris 10.

David Comay from Sun writes:

    One of the design goals of Zones was to ensure that if a single zone *was* compromised that the whole system would *not* also be compromised. We believe that we achieved that to a very large extent in Solaris 10. From an isolation perspective, the primary weakness is that there is a single kernel and if a user program somehow trips over a kernel bug and causes the system to panic, then of course, that affects the whole system. But from a security perspective, someone who is a privileged user (namely, root) in a zone can only cause damage to that zone and *not* the system as a whole. The virtualization that Zones provides ensures that they're not able to see, affect or modifyprocesses and their data running in other zones on the system.

Dynamic Tracing (DTrace) was another heavily hyped feature of Solaris 10, and rightfully so. DTrace, and it's scripting language D, allow an administrator or a developer to observe and debug problems in a production system with very little overhead. The problem with traditionally debugging tools like gdb, truss, pstack and ptrace is several fold:

  • We need to either monitor everything on the system, or only very specific processes - it would be impossible to trace every process on the system
  • We can only view a core dump of a snapshot in time - often times administrators and developers have transient problems
  • Usually, a program like gdb needs to halt or stop the system - in a production server this isn't acceptable
  • A poor debugging routine can actually be more detrimental than helpful - debugging a system incorrectly can actually bring it down, which is totally unacceptable at times

DTrace can effectively do the same as truss, pstack or ptrace but it can also be used on a production system without completely crippling it via "Probes" that are inserted all over the OS. Sun's DTrace introduction conference explains some of DTrace's functionality in a 40 minute session for those really interested in all of it's features. It's kind of like installing 30,000 debug statements all over the kernel, and allowing the admin to collect data from these probes whenever they would like via DTrace.

Solaris is really a developer's dream. During the V40z's brief stay in our labs, we actually used the machine extensively for development of our RTPE software platform; partially because the V40z is 10 times more powerful than our entire RTPE cluster, but partially because of DTrace, MDB and libumem. Using some of the examples from the DTrace introduction above, we were able to use the analyzer to isolate instances where some of our RTPE bots were getting preempted for supposedly no reason at all. Kudos to that team at Sun!

Just a few weeks ago Sun opened Solaris 10 to all with OpenSolaris; Sun's partially open sourced version of Solaris under the CDDL license. OpenSolaris now provides much of the OS core for modification, but not the entire OS just yet. According to the OpenSolaris roadmap, crypto and storage drivers should be available soon, which would be a great step forward for the entire FOSS community.



But.... (Solaris 10 Cont.)

Unfortunately, not all was entirely well in Solaris-land. Even though we had a final build of Solaris 10, there were a lot of things broken or "misplaced" from time to time. OpenSSL would segfault out of the box when running the "speed" benchmark, for example. Also, Solaris 10 fragments the user land into several directories based on the nature of the software. For example GNU software is located in /usr/sfw. This non-traditional pathing seems to play havoc on programs that want to be placed in /usr/ and /usr/local.

There were other small dissapointments like the fragmented userland. ZFS, Sun's successor to UFS, is not present in Solaris 10 yet. ZFS will be capable of 128-bit data storage and complete disk virtualization while being completely endian neutral (i.e., you can take ZFS discs from a SPARC or RISC machine and use them in an x86 machine). Unfortunately ZFS missed the ship date of Solaris 10 and it doesn't look like we will see it for a few months still.

We could write another page just about the driver support, but we are somewhat mixed on this issue. Fortunately all of the devices on our V40z are fully supported with drivers that (at least on the surface) appear to be functioning flawlessly. When we took Solaris 10 for a test spin on some off the shelf hardware, things were really hit or miss. There was a deep lack of support for our RAID controllers, something we would expect a server-oriented operating system to focus more on.

Some of the other rough edges include a security advisory that just came out a few days ago concerning Solaris 10's ld.so. It's not to say that Linux or FreeBSD don't have these problems either, but Solaris 10 definitely has the feeling of "unpolished". Zones and DTrace are excellent features but at times they can be a bit overwhelming. With continually better interfaces and maturity, we feel pretty confident the Solaris 10 operating system as a whole will have a pretty strong future as a competitive server OS.



The Test

We are mostly concerned with how the machine performs under Solaris 10 in comparison to SUSE 9, but we are also concerned with performance from the previous generation V40z to this one. Our very own Johan De Gelas recently wrote a very detailed comparison of various Linux database setups, so we won't spend as much time on database benchmarks for this review. Make no mistake, however, that a machine like the V40z makes the most sense in a database environment. We will use very similar benchmarks to the previous V40z examination, but we will also draw on some references from our Linux workstation articles.


Test Configurations

Machine:

Sun Fire V40z (Dual Core)

Sun Fire V40z

Processor:

(4) AMD Opteron 875

(4) AMD Opteron 850

RAM:

8 x 1024MB PC-2700

8 x 1024MB PC-2700

Hard Drives

SCSI u320 Seagate Cheetah 10,000RPM

SCSI u320 Seagate Cheetah 10,000RPM

Memory Timings:

Default

Operating System(s):

SuSE 9.1 Professional
RedHat 9
JDS 2.0

SUSE SLES 9

Solaris 10

Kernel:

Linux 2.6.8
Linux 2.4 (JDS 2.0)

Linux 2.6.5
SunOS 5.10

Compiler:

linux:~ # gcc -v
Reading specs from /usr/local/lib/gcc/i686-pc-linux-gnu/3.4.2/specs
Configured with: ./configure
Thread model: posix
gcc version 3.4.2

Our tests consist of everything from render benchmarks to database to compilation benchmarks. Each of these are designed to stress a particular portion of the system. As we mentioned earlier, the V40z is a premiere platform for databases due to the large amounts of CPU and memory. All tests are done with x86_64 binaries on Solaris and Linux unless otherwise noted. Furthermore, all programs are compiled via GCC with the flags mentioned in the table above.

X was disabled during these benchmarks to reduce overhead.



Database Benchmarks

MySQL 4.0.20d

MySQL has been a staple of our Linux tests since their inception. Below, you can see our results for sysbench on both the 64-bit RedHat 2.4 kernel. We ran the sysbench 0.3.1 oltp tests for 1,000,000 and 10,000,000 record sized table. Our MySQL configuration details are identical to Johan's earlier database benchmarks.



The difference between Solaris and SLES in this instance is almost negligible. We see a 58% speed boost from the quad Opteron server is certainly exciting, although it is a far cry from the 50-80% speed boost we were promised over a higher clocked quad Opteron 252 configuration.



Apache Benchmarks

In a web server configuration Apache immediately becomes the HTTP daemon of choice for anyone using Linux. Apache's ApacheBench is a relatively synthetic benchmark that can give us some baseline performance ideas without straying too far into the realm of artificial. We ran both configurations under 10 and 100 concurrent threads to demonstrate the number of requests per second the server can handle. These requests only reflect static HTML requests, which is useful for servers like AnandTech that run on cached pages.




This benchmark demonstrates more of what we actually anticipated with the database benchmark on the previous page. The V40z with the quad Opteron 875 performs 90% faster than the quad Opteron 250 we looked at before. It's also interesting to note the difference between Solaris 10 and SLES 9 here. As the threads increased, there was a wider gap between performance of the Solaris configuration and the SLES configuration in favor of SLES 9.



Rendering Benchmarks

We include Mental Ray and Shake as a point of reference, although both applications are strictly 32-bit at this time. Mental Ray is further hindered by the fact that the version we have is not SMP-aware.

Mental Ray 3.3.3

You may be interested to see how some single CPU setups perform on the same test render here. Once again, we are running the same Maya benchmark file found in our other reviews. While Solaris allows us to run some Linux binaries with emulation, we could not get the license servers to install correctly, which made this a Linux-only portion of the test. We ran Mental Ray via Maya using the command below:

# maya_render_with_mr -file Benchmark_Mental.mb

Since Mental Ray cannot utilize all eight physical cores, this benchmark doesn't do the V40z a lot of justice. Nevertheless, you can see how a single core from an Opteron 875 stacks up against a single Opteron 250 from the previous two benchmarks.

Shake 3.5c

Apple develops a great digital effects package called Shake. We took the opportunity to run a benchmark script by Lindsay Adams, which you can download here. The benchmark script renders 10 frames under various effects using one or multiple CPUs. We sum the render times and display them below. The times recorded are the averages of three runs.

The command run for this benchmark is:

# shake -exec hardware_test_v01.shk -vv

Even though Shake seems to support all the processor cores, there is a scaling issue. This is just one case in many where having lots of slower processors might not be as efficient as having fewer faster processors instead.



Compiling

We put a particular emphasis on compiling because it stresses the entire system (hard drive, processor and memory) but also because any *nix user knows, compiling is no fun on a slow machine.

GNU Make 3.79.1 / GCC 3.4.2

While GCC isn't multithreaded, we can run multiple jobs using the -j command in make. Below, you can see the significant improvement on performance going from 1 to 3 to 5 jobs. We used the commands as below to compile the Linux 2.6.4 kernel from kernel.org. The kernel is set for a cross-compilation of a default x86_64 machine:

# yes "" | make config
# time make -jX

There were minor advantages of the Solaris compilation over the SLES one. Like most of our other tests, as our compilation moved closer to nine jobs, the reduction in processing time was not quite linear. We do see considerable improvements in speed, but we really aren't getting all of the bang for our buck.

We also threw in some compile tests of entire GCC base, which take significantly longer than the Linux kernel to compile.

You'll notice pretty much the same problems from the Linux kernel compilation test; scaling becomes an issue as we increase the number of jobs. We see a 43% performance increase over the quad Opteron 250 V40z; certainly impressive but we would like to see more.



Final Thoughts

Looking back at our V40z, we have to applaud Sun for another design win. We didn't achieve the theoretical 80% increase in our test benchmarks, but we did achieve significant performance increases that would be unobtainable without a second core. Benchmarks that were not very CPU intensive but where very thread-sensitive - like ApacheBench - fared the best. Whether or not the extreme cost of such a system is easily justifiable is a matter of debate on more factors. For instance, where density becomes an issue, buying dual core processors are really the only way to obtain faster speeds, even if it is only 50% faster than the current generation. Furthermore, buying a dual core V40z might not be faster than buying two single core V40z servers for the same price, but the long term costs of administration and power consumption put the dual core machine in a more favorable position.

Sun has done a great job capitalizing on the performance of Opteron, but they aren't the only ones selling dual core high-density servers either. PogoLinux recently started selling their 8-way dual core Opteron 875 servers (16 physical cores!) for a price that isn't too distant from Sun's quad Opteron V40z. However when compared to the Tier 1 clan (HP, IBM and Dell), Sun has an extremely competitive pricing structure. HP and IBM have been late to phase in dual core servers, and Dell... Dell has just missed the boat altogether for now.

Solaris 10 proved a fascinating endeavor for us as well. Our experience with the operating system as a whole were mixed, generally due to the amount of sharp edges around such a new OS. On the other hand, tools like DTrace proved invaluable to us and Sun really has a great tool on their hands for developers and administrators alike. Also note that Solaris didn't have a problem keeping up with SLES 9 in most of our benchmarks. SLES 9 is a tad slower than some of the slicker installs out there, but it wouldn't be very insightful to put Gentoo on a $39,000 system either. We were very impressed by the fact that Solaris managed to stay a little bit ahead of SLES during benchmarks with heavy scheduling. We really didn't expect this, so perhaps all the efforts of Sun to incorporate better code into the x86 portion of Solaris 10 really paid off. Coupled with the extensive support community and projects like OpenSolaris, Solaris 10 is a winner.

Without much pressure from Intel, Sun has been pretty free to do what they want with AMD's processors. Sun is even going a bit on the offensive with Intel trade-in programs. Even though both AMD and Sun have been through some hard times recently, Sun is a great ally for AMD for two reasons; first, Sun knows servers - this is a critical market for AMD. Second, Sun isn't afraid of Intel and doesn't have nearly the problems AMD does with their customers.

Log in

Don't have an account? Sign up now