Original Link: http://www.anandtech.com/show/1622




Introduction

Several weeks ago, we took a look at Sun's first attempt at an Opteron workstation with the W2100z workstation. Today, we have a follow up to that piece of hardware with the acclaimed Sun Fire V40z 4-way Opteron 850 entry server. The Sun Fire V40z is the first four-way Opteron lineup in Sun's portfolio; the two-way variation of the V40z is dubbed the V20z.

The V40z is an entry-level server geared for everything from data mining to CAE to database work. Granted, "entry-level" for Sun might be a bit different than what other people consider entry-level. Our forums database runs on a similar four-way Opteron machine, currently the 11 th largest forum on the internet. Regardless of application, the need for powerful, reliable servers is still universal. In January, Sun sent us a V40z demonstration unit that was complete with four Opteron 850s and 8GB of PC2700. For the last several weeks, we spent some time getting to know the machine while it ran data mining exercises on our own Price Engine database.

With the recent introduction of AMD's Opteron/Athlon64 "E4" stepping, Sun has also introduced a newer version of the Sun Fire V40z based on four Opteron 852 processors and PC3200 memory. Availability for the four-way 2.6GHz Sun Fire V40z is a few weeks away, and in the interim, Sun reduced prices on the entire V40z lineup and also put a few rebates out. The machine that we reviewed has an MSRP of $20,995, but there are several rebates available through Sun.com right now that make the machine a bit more desirable (and affordable).


Click to enlarge.

The Sun Fire V40z is fully supported under Windows Server 2003 and (of course) Solaris, but our primary focus on this initial analysis of the V40z is under Linux. In particular, Red Hat 9 came preconfigured on our demo unit. SUSE Professional and Enterprise are also certified for the Sun Fire, but the beauty of Linux is that we can completely roll our own distribution with whichever components of SUSE and Red Hat that we need for management or driver support.




Taking a Look Inside

As with most hardware at AnandTech, we needed to take a look inside to fully determine the capabilities of our V40z. Upon entry of the system, we are greeted with a diagram of the V40z quick reference on the back of the removable panel.


Click to enlarge.

With the exception of the removable top panel, the entire Sun Fire V40z can be disassembled completely without any tools. Locking mechanisms in strategic places assure that power supplies, bezels and hard drives stay in place when they should, but at the same time, to remove a critical component – even during operation – seems relatively easy. Without the main bank of internal fans, we can see the rear processors under their giant 4” copper heat sinks.


Click to enlarge.

We actually stumbled across a block diagram of the motherboard and daughterboard outlining most of the components that make up the Sun Fire V40z. We intend to go in detail about each of the components and their purpose.


Click to enlarge.


Click to enlarge.

Because of the 3U design of the Sun Fire, a motherboard and daughterboard design was needed to allow maintenance of the second set of Opteron processors. The processors on the daughterboard are referred to as the “forward” CPUs because they pull out of the front of our V40z. The “rear” processors are technically the CPU0 and CPU1 to our operating system and are located on the mainboard.




Detailing the Chipsets

The soul of the V40z runs on four Opteron 850 (2.4GHz, 1MB L2, 130nm) processors. The daughterboard obstructs the majority of the airflow to the rear of the system, so plastic rails partition air to each bank of memory and each processor. The cooler air from the hard drive bays is pulled over the daughterboard to cool the rear processors. You'll notice that there is no active cooling on this portion of the chassis; fans directly opposite the daughterboard (slightly above the mainboard) pull cool air from the outside of the case directly over these heat sinks.


Click to enlarge.

The daughterboard connects to the mainboard via a proprietary Sun interface, but with the rails and guides holding the daughterboard, we had no problems determining if we had a clean connection between boards.


Click to enlarge.

As with any Opteron system, each processor has a dedicated bank of memory; in our case, two Samsung 1GB PC2700 modules per processor. The older 130nm "CG" stepping on Opteron 8xx only allows for PC2700 memory in Sun's V40z, but the newer "E4" stepping now supports PC3200 as well. We will get more into Sun's 90nm "E4" stepping solution in just a bit. Each processor bank can utilize 8GB of memory (four DIMMs) in 64-bit operation, giving the V40z a total capacity of 32GB.


Click to enlarge.

Behind the bank of DDR DIMMs in the image above, we can also see the 12V individual voltage regulator module (VRM). With larger processor configurations, regulating clean power to each processor becomes essential, and thus, each processor has a dedicated VRM. Below, you can see one of the Opteron 850s is exposed from under the copper heat sink on the daughterboard.


Click to enlarge.





Chipsets (con't)

We have mentioned this in the past, but Opteron severely differs from Xeon due to the HyperTransport links from processor to processor, and that the memory controller is on the processor die. With each processor sharing a 6.4GB/s link to two other processors, and indirectly their memory banks, a four-way Opteron configuration does not get bottlenecked on a memory controller or Northbridge.

Most high end dual and quad Opteron solutions use two AMD 8131 PCI-X tunnels to control IO off the processors, but they go a step further by daisy-chaining a third 8131 tunnel to the secondary tunnel (which is why the V40z can utilize four 64-bit PCI-X 133MHz slots and still have enough IO for the integrated controllers). The HyperTransport specification details that five devices can be within an HT chain, so having two 8131 PCI-X tunnels daisy-chained is clearly within spec. A brief block diagram of our V40z can be found below.


Click to enlarge.

There is enough headroom on primary PCI-X tunnel for two more 100MHz PCI-X channels; and a third and slowest channel runs in tandem with the gigabit Ethernet and SCSI controller. Browsing through some online documentation revealed that the seventh, 66MHz PCI-X card shares bus resources with the LSI and Broadcom controllers, which means inserting a 33MHz PCI expansion card in the bus will reduce the operating bus of the SCSI and gigabit Ethernet controllers. Considering the 800MB/s headroom on that particular bus, it would seem like a poor choice to install a high bandwidth PCI device in the 66MHz interface anyway. Likewise, using a 66MHz and a 100MHz PCI-X device in tandem on the primary 8131 tunnel will result in both buses slowing to 66MHz.

AMD’s 8111 I/O Hub is linked off the primary 8131 PCI-X tunnel, and from there, most of the basic system functions are controlled including the XGI graphics adaptor. Even though XGI hasn’t been particularly popular on the desktop, Trident’s penetration into the server market solidified XGI’s server market share. The majority of the features on the AMD 8111 remain disabled, like AC’97 audio and the integrated 10/100 Ethernet controller.

Two Broadcom BCM703 chips control the external gigabit Ethernet for the server, but there are also two 10/100 out-of-band Ethernet ports that we will go into more depth later. Winbond provides the rest of the basic functionality of the machine not handled by the AMD 8111 I/O hub. LSI’s 53C1020 Ultra320 SCSI adaptor provides the V40z with the onboard SCSI.


Click to enlarge.

If the soul of our V40z is the Opteron 850, then the temporal lobe would be the Motorola MPC855T Service Processor. The Motorola MPC855T is a particularly interesting chip that we saved an entire page for more detail.




System Management: Another Linux Success

Sun easily separates themselves from whitebox manufacturers with their management capabilities. The fact that Sun chose an embedded Linux platform as the nerve center of their server really only proves to sweeten the pie.

The MPC855T PowerPC – or Service Processor (SP) as it’s more commonly called in this analysis – is in fact an entire embedded Linux computer of its own. Even when plugging in one of the managed power supplies, the SP kicks on and boots up. All management of the system is handled through this minicomputer: the serial console, front console, BIOS, fan speeds and even power draw. Even when the machine is off, the SP allows us to manage the status of the system, even if it has crashed, remotely or locally. In a worst case scenario, the SP can actually be rebooted from a hard switch in the rear of the machine.

Fortunately, Sun provided us with another block diagram to explain the inner workings of the Service Processor.


Click to enlarge.

Two 10/100 out-of-band Ethernet ports are routed via a dedicated three-port Ethernet switch solely to the Service Processor. This way, any management Ethernet can actually be daisy-chained to reduce the total number of cables in a rack. IPMI, SNMP or Sun Control Station can all route over this out-of-band (or in-band) network for server status and maintenance. The out-of-band network address of the Service Processor can actually be set via the console in the front of the server, or via DHCP (default). Of course, the traditional serial console is also available for those who need it.

Given the versatility, and since it’s always running, we can actually connect remotely to the SP via SSH and do things like update the BIOS, or perhaps just change some settings in it. This “Lights Out Management” approach is not a new concept, but Sun clearly has the most thorough implementation that we have yet to touch.

The console on the front of the server acts as our basic portal into the Service Processor. From here, we can view the status of individual components like the fan and temperature. All of our commands on the console are routed to the SP, which then decides what to do with them; for example, when we tell the machine to turn on via the forward console, the service processor (which is already on) hands off the instruction to the managed power supply to enable.

Overall, we were incredibly impressed with the thoroughness of Sun’s Service Processor. Anything short of forgetting the BIOS password or replacing hardware will ensure that the system stays up. Considering that most of the tools used inside the SP environment are free and/or open sourced, it only adds further to its desirability as clever administrators could very easily expand on the SP’s original functionality.



Storage and Power

Storage

As we mentioned earlier, the V40z’s six hot swappable Ultra320 storage bays utilize LSI’s 53C1020 SCSI controller to connect up to six LVD SCSI devices. The sixth expansion slot in the front of the server can be used for a floppy drive/DVD/CD drive combo, as illustrated in our configuration below.

During the original release of the V40z, the LSI 53C1020 did not support 292GB hard drives. In the most recent BIOS upgrades, the V40z fully supports these sizes, which gives the machine a storage capability of just over 1.7TB. All six of the SCSI devices have activity and fault LEDs routed to the front of the machine and via the SMBus to the Service Processor. Even in the event of a stale kernel, we can tell if a hard drive has gone faulty via one of the various remote connections to the SP. The hard drive states are also viewable via the front panel LCD console.

Power

Power on our Sun V40z comes from two, redundant 760W power supplies – both hot swappable. A metal arm swings out from the back of each power supply, unlocking the unit for removal.


Click to enlarge.

The entire power supply housing comes apart from the main chassis of the case via a small locking device that connects the PSU bay to the hard drive. Another proprietary Sun interface carries power from the housing to the motherboard without any wires. With the enormous concerns for air flow inside the chassis, the removal of as much wiring as possible is an absolute must.


Click to enlarge.


Click to enlarge.

Under the power supply, we have room for a seventh horizontal 66MHz PCI expansion slot that connects via a vertical riser. Given the condition that we mentioned on the chipset page, this PCI slot should probably go unused.

As we also mentioned earlier, both of these power supplies are directly managed by the Service Processor. As a result, when plugging in the system, the Service Processor boots up automatically its own operating system to oversee the functionality of the rest of the computer. This intelligent design allows for us to view the exact details of power draw and operating temperature.




Thermals, Acoustic

Cooling four processors crammed into a 3U is not an easy process – particularly considering the fact that the V40z does not utilize any active cooling directly on their CPU heat sinks. As we mentioned earlier, the two forward Opterons under the hard drive bay use two low profile copper heat sinks; the two rear processors use 4” high copper heat sinks with heatpipe risers. All air must be pulled from the front intake of the system (below and through the hard drive bays) all the way to the rear power supplies before it is exhausted.

The majority of cooling is provided by a bank of eight intelligent fans behind all four processors and another bank of four fans that sit between the forward and rear processors. In the image below, this bank is being removed from the system.


Click to enlarge.

All of Sun’s cooling fans are modular. The 60mm brushless fans can be pulled out of the system and replaced without powering down the system; obviously a benefit if a fan dies. All of these fans are also accessible with the top panel removed, which means that we don’t have to pull a hard drive or power supply in order to replace a fan either.


Click to enlarge.


Obviously, with twelve primary 60mm fans just providing the active cooling on the processors and memory, the Sun Fire V40z is not a quiet machine. Each redundant power supply also employs very loud fans, which gives the V40z a baseline operating noise level of 70 dBA even when the machine isn’t on. At a distance of twelve inches, we measured the Sun Fire V40z at a little over 85dBA. This is loud even by a rackmount standard, but in enterprise server configurations in dedicated server environments, this is certainly not a problem.

Even though the Sun Fire V40z is only 3U high, a standard 72-inch rack can only hold twelve V40z’s due to its thermal density, according to Sun documentation. Any more than twelve servers in a 72” rack wouldn’t allow for enough airflow.




The Test

Testing the Sun Fire V40z is not something that we can easily reference, since the server configurations that we have in our review portfolio are generally Windows based or in a dual configuration. Our quad processor database analysis from early last year goes into specific detail about database performance analysis, and Jason's Opteron 252 article from a week ago adds more depth to that data. Johan wrote a very thorough article detailing some of the differences between various database benchmarks, and we will be using some of his analysis procedure from the Sun Fire V20z benchmarked as well.

To give a baseline performance in our benchmarks, we took some data from our Sun W2100z analysis.


 Test Configurations
Machine: Sun W2100z Sun Fire V40z
Processor(s): (2) AMD Opteron 250 (4) AMD Opteron 850
RAM: 4 x 1024MB PC-3200 8 x 1024MB PC-2700
Hard Drives: SCSI u320 Seagate Cheetah 10,000RPM SCSI u320 Seagate Cheetah 10,000RPM
Memory Timings: Default
Operating System(s): SuSE 9.1 Professional
RedHat 9
JDS 2.0
RedHat 9
Kernel: Linux 2.6.8
Linux 2.4 (JDS 2.0)
Linux 2.4.21
Compiler: linux:~ # gcc -v
Reading specs from /usr/local/lib/gcc/i686-pc-linux-gnu/3.4.2/specs
Configured with: ./configure
Thread model: posix
gcc version 3.4.2
Other than the two additional processors, differences between these two machines are very small. However, take notice that the V40z is running slower memory than our workstation baseline. As stated in the introduction of this analysis, the "E4" stepping on the Opteron 8xx lineup allows for PC-3200 in a four/eight-way configuration.

We will put the majority of our emphasis on database benchmarks for this analysis because a quad Opteron server with 8GB of memory is typically an ideal platform for a database. Our rendering benchmarks are also important, but our compilation benchmarks represent the best real-world analysis in our testing.

The core of our benchmarks today run on Red Hat 9 (kernel 2.4.21), which is NUMA aware. Anand wrote a little introduction to NUMA almost two years ago during the Opteron launch, which should illustrate the importance of NUMA in our particular configuration. Some of our benchmarks won't need more than a few hundred megabytes of data and it becomes much more efficient to copy all of this data into the memory of each processor bank. All of our tests run on x86_64 kernels and environments. With the exception of Mental Ray and Shake, all binaries are 64-bit as well.




Database Benchmarks

MySQL 4.0.20d

MySQL has been a staple of our Linux tests since its inception. Even though it does not carry high relevance for a workstation test, we still regard it as the de facto free, open sourced benchmark for Linux. Below, you can see our results for sysbench on both the 64-bit RedHat 2.4 kernel. Below, we ran the sysbench 0.3.1 oltp tests for 1,000,000 and 10,000,000 record sized table.

MySQL 4.0.20d: sysbench 10M

MySQL 4.0.20d: sysbench 1M





Apache Benchmarks

In a web server configuration, Apache immediately becomes the HTTP daemon of choice for anyone using Linux. Apache’s ApacheBench is a relatively synthetic benchmark that can give us some baseline performance ideas without straying too far into the realm of artificial. We ran both configurations under 10 and 100 concurrent threads to demonstrate the number of requests per second that the server can handle.

ApacheBench 2.0.46: 10 Threads

ApacheBench 2.0.46: 100 Threads

Obviously, these requests only reflect static HTML requests, which is useful for servers like AnandTech that run on cached pages.




Rendering Benchmarks

Although one would probably not purchase a $20,000 server for the explicit task of rendering, we thought to include our standard Mental Ray and Shake benchmarks in this review to demonstrate the scalability of all four processors in the system. Similarly, we have looked at how these benchmarks perform on other systems in the past, so we can get a real clear view of the power of our V40z in relation to those systems.

Mental Ray 3.3.3

Just to give a bit of PC trivia, MentalRay was actually first developed on Sun machines during its early stages, and it's interesting to see that we have come full circle to testing Mental Ray on an x86_64 machine designed by Sun. We are running the 32-bit binaries provided by Alias Wavefront. You may be interested to see how some single CPU setups perform on the same test render here. Once again, we are running the same Maya benchmark file found in our other reviews. We ran Mental Ray via Maya using the command below:

# maya_render_with_mr -file Benchmark_Mental.mb

MentalRay 3.3

Clearly, something does not seem right here as we do not see the benchmark utilizing all four processors. We were able to trace this flaw down to the software itself and not the server, but we thought it was worth mentioning in the benchmarks section.

Shake 3.5c

Apple develops a great digital effects package called Shake. We took the opportunity to run a newer benchmark script by Lindsay Adams, which you can download here. The benchmark script renders 10 frames under various effects using one or multiple CPUs. We sum the render times and display them below. The times recorded are the averages of three runs.

The command run for this benchmark is:

# shake -exec hardware_test_v01.shk -vv

Shake 3.5c





Compiling

We put a particular emphasis on compiling because it stresses the entire system (hard drive, processor and memory), but also because any *nix user knows that compiling is no fun on a slow machine.

GNU Make 3.79.1 / GCC 3.4.2

While GCC isn't multithreaded, we can run multiple jobs using the -j command in make . Below, you can see the significant improvement on performance going from 1 to 3 to 5 jobs. We used the commands as below to compile the Linux 2.6.4 kernel from kernel.org:

# yes "" | make config
# time make -jX


Kernel 2.6.4; make -jX

We also threw in some compile tests of entire GCC base, which take significantly longer than the Linux kernel to compile.

GCC 3.4.2: make -jX

Results were on par with what we had expected. Since we are not physically removing the processor in these benchmarks, the V40z scales much better than the dual W2100z. Three of our four processors are not doing anything particularly important, but all 8GB of memory are available to the single processor running the make threads.




Final Thoughts

Sun obviously gave us a lot of data to digest here. We took a look at a piece of hardware that truly has a few competitors; HP’s ProLiant DL585 seems to be the only remotely Tier 1 solution – and not surprisingly, pricing out a similar setup as the one that we tested today from Sun, which was well over $22,000. Second Tier competitors like ASA and Appro are able to provide solutions based on similar specifications, but even those readily approach $20,000 without half the management or PCI-X options. Furthermore, Sun provides the smallest implementation of any of these quad Opteron servers in a 3U form; the ProLiant DL585 comes in 4U form only. There are many more small differences between each server, but we took the time to illustrate the design wins and flaws of just the Sun Fire V40z in this analysis; HP and Appro will have to wait for another day.

Sun has a speed daemon on their hands, and they know it. Sun was very quick to announce the next generation V40z (4 x Opteron 852, 8GB PC-3200) that set more than half a dozen performance records at LinuxWorld last week. With only a single server running on four of the 130nm Opterons in this review, it’s difficult for us to judge Sun’s performance on the market as a whole. However, the enthusiastic approach to Linux coupled with high quality design and management already assure that Sun has won the battle to most, without even raising a finger for benchmarks. In the world of High Power, High Availability computing, stability and features go much further than a 1% boost in performance.

As far as stability goes, we know that the Sun Fire V40z is certainly best of breed. Between the Motorola Service Processor, dedicated out-of-band management network, redundant 760W power supplies and hot swappable active cooling, it becomes real hard for us to determine a single point of failure that could cripple a server. The seven featured PCI-X expansion slots are also a great addition to the feature portfolio of the V40z, even if Sun (and we) recommend that the seventh PCI adaptor goes unused.

Things are just starting to get really interesting at Sun, and at AMD. Sun’s Galaxy 8-way Opteron servers will soon be upon us, but in the meantime, we are already hearing about V40z configurations with dual core Opterons. Obviously, a dual core V40z – which is already dual core ready – will give Sun the only 3U, 8-way Opteron that we’ve heard of. Between dual core Opterons and continual improvements on the 90nm Opteron steppings, server administrators have a lot to look forward to this year.

Log in

Don't have an account? Sign up now