Original Link: http://www.anandtech.com/show/1622
Sun Fire V40z: Four Opterons in a 3Uby Kristopher Kubicki on February 22, 2005 12:05 AM EST
- Posted in
IntroductionSeveral weeks ago, we took a look at Sun's first attempt at an Opteron workstation with the W2100z workstation. Today, we have a follow up to that piece of hardware with the acclaimed Sun Fire V40z 4-way Opteron 850 entry server. The Sun Fire V40z is the first four-way Opteron lineup in Sun's portfolio; the two-way variation of the V40z is dubbed the V20z.
The V40z is an entry-level server geared for everything from data mining to CAE to database work. Granted, "entry-level" for Sun might be a bit different than what other people consider entry-level. Our forums database runs on a similar four-way Opteron machine, currently the 11 th largest forum on the internet. Regardless of application, the need for powerful, reliable servers is still universal. In January, Sun sent us a V40z demonstration unit that was complete with four Opteron 850s and 8GB of PC2700. For the last several weeks, we spent some time getting to know the machine while it ran data mining exercises on our own Price Engine database.
With the recent introduction of AMD's Opteron/Athlon64 "E4" stepping, Sun has also introduced a newer version of the Sun Fire V40z based on four Opteron 852 processors and PC3200 memory. Availability for the four-way 2.6GHz Sun Fire V40z is a few weeks away, and in the interim, Sun reduced prices on the entire V40z lineup and also put a few rebates out. The machine that we reviewed has an MSRP of $20,995, but there are several rebates available through Sun.com right now that make the machine a bit more desirable (and affordable).
Taking a Look InsideAs with most hardware at AnandTech, we needed to take a look inside to fully determine the capabilities of our V40z. Upon entry of the system, we are greeted with a diagram of the V40z quick reference on the back of the removable panel.
Detailing the ChipsetsThe soul of the V40z runs on four Opteron 850 (2.4GHz, 1MB L2, 130nm) processors. The daughterboard obstructs the majority of the airflow to the rear of the system, so plastic rails partition air to each bank of memory and each processor. The cooler air from the hard drive bays is pulled over the daughterboard to cool the rear processors. You'll notice that there is no active cooling on this portion of the chassis; fans directly opposite the daughterboard (slightly above the mainboard) pull cool air from the outside of the case directly over these heat sinks.
"E4" stepping now supports PC3200 as well. We will get more into Sun's 90nm "E4" stepping solution in just a bit. Each processor bank can utilize 8GB of memory (four DIMMs) in 64-bit operation, giving the V40z a total capacity of 32GB.
Chipsets (con't)We have mentioned this in the past, but Opteron severely differs from Xeon due to the HyperTransport links from processor to processor, and that the memory controller is on the processor die. With each processor sharing a 6.4GB/s link to two other processors, and indirectly their memory banks, a four-way Opteron configuration does not get bottlenecked on a memory controller or Northbridge.
Most high end dual and quad Opteron solutions use two AMD 8131 PCI-X tunnels to control IO off the processors, but they go a step further by daisy-chaining a third 8131 tunnel to the secondary tunnel (which is why the V40z can utilize four 64-bit PCI-X 133MHz slots and still have enough IO for the integrated controllers). The HyperTransport specification details that five devices can be within an HT chain, so having two 8131 PCI-X tunnels daisy-chained is clearly within spec. A brief block diagram of our V40z can be found below.
AMD’s 8111 I/O Hub is linked off the primary 8131 PCI-X tunnel, and from there, most of the basic system functions are controlled including the XGI graphics adaptor. Even though XGI hasn’t been particularly popular on the desktop, Trident’s penetration into the server market solidified XGI’s server market share. The majority of the features on the AMD 8111 remain disabled, like AC’97 audio and the integrated 10/100 Ethernet controller.
Two Broadcom BCM703 chips control the external gigabit Ethernet for the server, but there are also two 10/100 out-of-band Ethernet ports that we will go into more depth later. Winbond provides the rest of the basic functionality of the machine not handled by the AMD 8111 I/O hub. LSI’s 53C1020 Ultra320 SCSI adaptor provides the V40z with the onboard SCSI.
System Management: Another Linux SuccessSun easily separates themselves from whitebox manufacturers with their management capabilities. The fact that Sun chose an embedded Linux platform as the nerve center of their server really only proves to sweeten the pie.
The MPC855T PowerPC – or Service Processor (SP) as it’s more commonly called in this analysis – is in fact an entire embedded Linux computer of its own. Even when plugging in one of the managed power supplies, the SP kicks on and boots up. All management of the system is handled through this minicomputer: the serial console, front console, BIOS, fan speeds and even power draw. Even when the machine is off, the SP allows us to manage the status of the system, even if it has crashed, remotely or locally. In a worst case scenario, the SP can actually be rebooted from a hard switch in the rear of the machine.
Fortunately, Sun provided us with another block diagram to explain the inner workings of the Service Processor.
Given the versatility, and since it’s always running, we can actually connect remotely to the SP via SSH and do things like update the BIOS, or perhaps just change some settings in it. This “Lights Out Management” approach is not a new concept, but Sun clearly has the most thorough implementation that we have yet to touch.
The console on the front of the server acts as our basic portal into the Service Processor. From here, we can view the status of individual components like the fan and temperature. All of our commands on the console are routed to the SP, which then decides what to do with them; for example, when we tell the machine to turn on via the forward console, the service processor (which is already on) hands off the instruction to the managed power supply to enable.
Overall, we were incredibly impressed with the thoroughness of Sun’s Service Processor. Anything short of forgetting the BIOS password or replacing hardware will ensure that the system stays up. Considering that most of the tools used inside the SP environment are free and/or open sourced, it only adds further to its desirability as clever administrators could very easily expand on the SP’s original functionality.
Storage and Power
StorageAs we mentioned earlier, the V40z’s six hot swappable Ultra320 storage bays utilize LSI’s 53C1020 SCSI controller to connect up to six LVD SCSI devices. The sixth expansion slot in the front of the server can be used for a floppy drive/DVD/CD drive combo, as illustrated in our configuration below.
During the original release of the V40z, the LSI 53C1020 did not support 292GB hard drives. In the most recent BIOS upgrades, the V40z fully supports these sizes, which gives the machine a storage capability of just over 1.7TB. All six of the SCSI devices have activity and fault LEDs routed to the front of the machine and via the SMBus to the Service Processor. Even in the event of a stale kernel, we can tell if a hard drive has gone faulty via one of the various remote connections to the SP. The hard drive states are also viewable via the front panel LCD console.
PowerPower on our Sun V40z comes from two, redundant 760W power supplies – both hot swappable. A metal arm swings out from the back of each power supply, unlocking the unit for removal.
As we also mentioned earlier, both of these power supplies are directly managed by the Service Processor. As a result, when plugging in the system, the Service Processor boots up automatically its own operating system to oversee the functionality of the rest of the computer. This intelligent design allows for us to view the exact details of power draw and operating temperature.
Thermals, AcousticCooling four processors crammed into a 3U is not an easy process – particularly considering the fact that the V40z does not utilize any active cooling directly on their CPU heat sinks. As we mentioned earlier, the two forward Opterons under the hard drive bay use two low profile copper heat sinks; the two rear processors use 4” high copper heat sinks with heatpipe risers. All air must be pulled from the front intake of the system (below and through the hard drive bays) all the way to the rear power supplies before it is exhausted.
The majority of cooling is provided by a bank of eight intelligent fans behind all four processors and another bank of four fans that sit between the forward and rear processors. In the image below, this bank is being removed from the system.
Obviously, with twelve primary 60mm fans just providing the active cooling on the processors and memory, the Sun Fire V40z is not a quiet machine. Each redundant power supply also employs very loud fans, which gives the V40z a baseline operating noise level of 70 dBA even when the machine isn’t on. At a distance of twelve inches, we measured the Sun Fire V40z at a little over 85dBA. This is loud even by a rackmount standard, but in enterprise server configurations in dedicated server environments, this is certainly not a problem.
Even though the Sun Fire V40z is only 3U high, a standard 72-inch rack can only hold twelve V40z’s due to its thermal density, according to Sun documentation. Any more than twelve servers in a 72” rack wouldn’t allow for enough airflow.
The TestTesting the Sun Fire V40z is not something that we can easily reference, since the server configurations that we have in our review portfolio are generally Windows based or in a dual configuration. Our quad processor database analysis from early last year goes into specific detail about database performance analysis, and Jason's Opteron 252 article from a week ago adds more depth to that data. Johan wrote a very thorough article detailing some of the differences between various database benchmarks, and we will be using some of his analysis procedure from the Sun Fire V20z benchmarked as well.
To give a baseline performance in our benchmarks, we took some data from our Sun W2100z analysis.
|Machine:||Sun W2100z||Sun Fire V40z|
|Processor(s):||(2) AMD Opteron 250||(4) AMD Opteron 850|
|RAM:||4 x 1024MB PC-3200||8 x 1024MB PC-2700|
|Hard Drives:||SCSI u320 Seagate Cheetah 10,000RPM||SCSI u320 Seagate Cheetah 10,000RPM|
|Operating System(s):||SuSE 9.1 Professional
Linux 2.4 (JDS 2.0)
|Compiler:||linux:~ # gcc -v
Reading specs from /usr/local/lib/gcc/i686-pc-linux-gnu/3.4.2/specs
Configured with: ./configure
Thread model: posix
gcc version 3.4.2
We will put the majority of our emphasis on database benchmarks for this analysis because a quad Opteron server with 8GB of memory is typically an ideal platform for a database. Our rendering benchmarks are also important, but our compilation benchmarks represent the best real-world analysis in our testing.
The core of our benchmarks today run on Red Hat 9 (kernel 2.4.21), which is NUMA aware. Anand wrote a little introduction to NUMA almost two years ago during the Opteron launch, which should illustrate the importance of NUMA in our particular configuration. Some of our benchmarks won't need more than a few hundred megabytes of data and it becomes much more efficient to copy all of this data into the memory of each processor bank. All of our tests run on x86_64 kernels and environments. With the exception of Mental Ray and Shake, all binaries are 64-bit as well.
MySQL 4.0.20dMySQL has been a staple of our Linux tests since its inception. Even though it does not carry high relevance for a workstation test, we still regard it as the de facto free, open sourced benchmark for Linux. Below, you can see our results for sysbench on both the 64-bit RedHat 2.4 kernel. Below, we ran the sysbench 0.3.1 oltp tests for 1,000,000 and 10,000,000 record sized table.
Apache BenchmarksIn a web server configuration, Apache immediately becomes the HTTP daemon of choice for anyone using Linux. Apache’s ApacheBench is a relatively synthetic benchmark that can give us some baseline performance ideas without straying too far into the realm of artificial. We ran both configurations under 10 and 100 concurrent threads to demonstrate the number of requests per second that the server can handle.
Obviously, these requests only reflect static HTML requests, which is useful for servers like AnandTech that run on cached pages.
Rendering BenchmarksAlthough one would probably not purchase a $20,000 server for the explicit task of rendering, we thought to include our standard Mental Ray and Shake benchmarks in this review to demonstrate the scalability of all four processors in the system. Similarly, we have looked at how these benchmarks perform on other systems in the past, so we can get a real clear view of the power of our V40z in relation to those systems.
Mental Ray 3.3.3Just to give a bit of PC trivia, MentalRay was actually first developed on Sun machines during its early stages, and it's interesting to see that we have come full circle to testing Mental Ray on an x86_64 machine designed by Sun. We are running the 32-bit binaries provided by Alias Wavefront. You may be interested to see how some single CPU setups perform on the same test render here. Once again, we are running the same Maya benchmark file found in our other reviews. We ran Mental Ray via Maya using the command below:
# maya_render_with_mr -file Benchmark_Mental.mb
Clearly, something does not seem right here as we do not see the benchmark utilizing all four processors. We were able to trace this flaw down to the software itself and not the server, but we thought it was worth mentioning in the benchmarks section.
Shake 3.5cApple develops a great digital effects package called Shake. We took the opportunity to run a newer benchmark script by Lindsay Adams, which you can download here. The benchmark script renders 10 frames under various effects using one or multiple CPUs. We sum the render times and display them below. The times recorded are the averages of three runs.
The command run for this benchmark is:
# shake -exec hardware_test_v01.shk -vv
CompilingWe put a particular emphasis on compiling because it stresses the entire system (hard drive, processor and memory), but also because any *nix user knows that compiling is no fun on a slow machine.
GNU Make 3.79.1 / GCC 3.4.2While GCC isn't multithreaded, we can run multiple jobs using the -j command in make . Below, you can see the significant improvement on performance going from 1 to 3 to 5 jobs. We used the commands as below to compile the Linux 2.6.4 kernel from kernel.org:
# yes "" | make config
# time make -jX
We also threw in some compile tests of entire GCC base, which take significantly longer than the Linux kernel to compile.
Results were on par with what we had expected. Since we are not physically removing the processor in these benchmarks, the V40z scales much better than the dual W2100z. Three of our four processors are not doing anything particularly important, but all 8GB of memory are available to the single processor running the make threads.
Final ThoughtsSun obviously gave us a lot of data to digest here. We took a look at a piece of hardware that truly has a few competitors; HP’s ProLiant DL585 seems to be the only remotely Tier 1 solution – and not surprisingly, pricing out a similar setup as the one that we tested today from Sun, which was well over $22,000. Second Tier competitors like ASA and Appro are able to provide solutions based on similar specifications, but even those readily approach $20,000 without half the management or PCI-X options. Furthermore, Sun provides the smallest implementation of any of these quad Opteron servers in a 3U form; the ProLiant DL585 comes in 4U form only. There are many more small differences between each server, but we took the time to illustrate the design wins and flaws of just the Sun Fire V40z in this analysis; HP and Appro will have to wait for another day.
Sun has a speed daemon on their hands, and they know it. Sun was very quick to announce the next generation V40z (4 x Opteron 852, 8GB PC-3200) that set more than half a dozen performance records at LinuxWorld last week. With only a single server running on four of the 130nm Opterons in this review, it’s difficult for us to judge Sun’s performance on the market as a whole. However, the enthusiastic approach to Linux coupled with high quality design and management already assure that Sun has won the battle to most, without even raising a finger for benchmarks. In the world of High Power, High Availability computing, stability and features go much further than a 1% boost in performance.
As far as stability goes, we know that the Sun Fire V40z is certainly best of breed. Between the Motorola Service Processor, dedicated out-of-band management network, redundant 760W power supplies and hot swappable active cooling, it becomes real hard for us to determine a single point of failure that could cripple a server. The seven featured PCI-X expansion slots are also a great addition to the feature portfolio of the V40z, even if Sun (and we) recommend that the seventh PCI adaptor goes unused.
Things are just starting to get really interesting at Sun, and at AMD. Sun’s Galaxy 8-way Opteron servers will soon be upon us, but in the meantime, we are already hearing about V40z configurations with dual core Opterons. Obviously, a dual core V40z – which is already dual core ready – will give Sun the only 3U, 8-way Opteron that we’ve heard of. Between dual core Opterons and continual improvements on the 90nm Opteron steppings, server administrators have a lot to look forward to this year.