Motherboard

A number of vendors exist in the dual processor workstation motherboard market. At the time of the build, LGA 2011 Xeons had already been introduced, and we decided to focus on boards supporting those processors. Since we wanted to devote one physical disk and one network interfaces to each VM, it was essential that the board have enough PCI-E slots for multiple quad-ported server NICs as well as enough native SATA ports. For our build, we chose the Asus Z9PE-D8 WS motherboard with an SSI EEB form factor..

Based on the C602 chipset, this dual LGA 2011 motherboard supports 8 DIMMs and has 7 PCIe 3.0 slots. The lanes can be organized as (2 x16 + 1 x16 + 1x8 or 4 x8 + 1x16 + 1 x8). All the slots are physically 16 lanes wide. The Intel C602 chipset provides two 6 Gbps SATA ports and eight SATA 3 Gbps ports. A Marvell PCIe 9230 controller provides four extra 6 Gbps ports making for a total of 14 SATA ports. This allows us to devote two ports to the host OS of the workstation and one port to each of the twelve planned VMs. The Z9PE-D8 WS motherboard also has two GbE ports based on the Intel 82574L. Two Gigabit LAN controllers are not going to be sufficient for all our VMs. We will address this issue further down in the build.

The motherboard also has 4 USB 3.0 ports, thanks to an ASMedia USB 3.0 controller. The Marvell SATA - PCIe bridge and the ASMedia USB3 controller are connected to the 8 PCIe lanes in the C602. All the PCIe 3.0 lanes come from the processors. Asus also provides support for SSD caching (where any installed SSD can be used as a cache for frequently accessed data, without any size limitations) in the motherboard. The Z9PE-D8 WS also has a Realtek ALC898 HD audio codec, but neither of the above aspects are of relevance to our build.

CPUs

One of the main goals of the build was to ensure low power consumption. At the same time, we wanted to run twelve VMs simultaneously. In order to ensure smooth operation, each VM needs at least one vCPU allocated exclusively to it. The Xeon E5-2600 family (Sandy Bridge-EP) has CPUs with core counts ranging from 2 to 8, with TDPs from 60 W to 150 W. Each core has two threads. Keeping in mind the number of VMs we wanted to run, we specifically looked at the 6 and 8 core variants, as two of those processors would give us 12 and 16 cores. Within these, we restricted ourselves to the low power variants. These included the hexa-core E5-2630L (60 W TDP) and the octa-core E5-2648L / E5-2650L (70 W TDP).

CPU decisions for machines meant to run VMs have to be usually made after taking the requirements of the workload into consideration. In our case, the workload for each VM involved IOMeter and Intel NASPT (more on these in the software infrastructure section). Both of these softwares tend to be I/O-bound, rather than CPU-bound, and can run reliably on even Pentium 4 processors. Therefore, the per-core performance of the three processors was not a factor that we were worried about.

Out of the three processors, we decided to go ahead with the hexa-core Xeon E5-2630L. The cores run at 2 GHz, but can Turbo up to 2.5 GHz when just one core is active. Each core has a 256 KB L2 cache, with a common 15 MB L3. With a TDP of just 60W, it enabled us to focus on energy efficiency. Two Xeon E5-2630Ls (a total of 120W TDP) enabled us to proceed with our plan to run 12 VMs concurrently.

Coolers

The choice of coolers for the processors is dictated by the chassis used for the build. At the start of the build, we decided to go with a tower desktop configuration. Asus recommended the Dynatron R17 for use with the Z9PE-D8 WS, and we went ahead with their suggestion.

The R17 coolers are meant for the LGA 2011 sockets for 3U and above rackmount form factors as well as tower desktop and workstation solutions. They are made of aluminium fins with four copper heat pipes. A thermal compound is pre-printed at the base. Installation of the R17s was quite straightforward, but care had to be taken to ensure that the side meant to mount the cooler’s fans didn’t face the DIMM slots on the Z9PE-D8 WS.

The fans on the R17 operate between 1000 and 2500 rpm, and consume between 0.96W and 3W at these speeds. Noise levels are respectable and range from 17 dbA to 32 dbA. The R17 has the ability to cool CPUs with up to 160W TDP. The 60W E5-2630Ls were effectively maintained between 45C and 55C even under our full workloads by the Dynatron R17s.

Introduction & Goals of the Build Hardware Build - Memory and Storage
Comments Locked

74 Comments

View All Comments

  • mfed3 - Wednesday, September 5, 2012 - link

    someone didn't read the title of the article or the article itself. the purpose is to set up a testbed, not build a system with this software target in mind.
  • Zink - Wednesday, September 5, 2012 - link

    At the same time this system seems extremely over the top for the uses mentioned. It seems likely that the same tests could be run with much less hardware. I know the testbed as specced can be used for much more than testing NAS performance but the only use discussed is simulating the network utilization of a SMB environment.
    The SSDs are justified because a single HDD was "not suitable" for 12VMs but it seems there are intermediate solutions such as RAIDing two 512GB SSDs that would provide buckets of performance and a cleaner solution than 14 individual disks. I also do not understand how having a physical CPU core per VM is needed to “ensure smooth operation” if network benchmarking software is I/O bound and runs fine on a Pentium 4. Assuming you really do need 64GB of RAM for shared files and Windows VMs then it seems a 1P 2011 board would be more than up to running these benchmarks. Switch to Linux VMs for Dynamo and you could try running the benches from an even lighter system such as an i7-3770.
    On the network side would it not also be possible to virtualize the physical LAN? The clients could connect together over the internal network and the host OS on the tested perform the switch’s role and stress the NAS over a single aggregated link? For testing NAS performance specifically, what would the effect be of removing the VMs entirely and just running multiple Iometer sessions over a single aggregated link or letting Iometer use the multiple NICs from the host OS?
    NAS benchmarking would be an interesting application to try to optimize a system for. A simpler system would help you out with reducing power consumption, increasing reliability and reducing cost. You could run some experiments by changing the system configuration and benching again to see if the same NAS performance can be generated. Figuring out what other kinds of systems generate the same results would also make it possible for other editors to bench NAS units without having to purchase 14 SSDs.
    Sorry for complaining about the system configuration, I know you built it to test other hardware and not as a project in itself but I find the testbed more interesting than the NAS performance.
  • ganeshts - Thursday, September 6, 2012 - link

    Zink, Thanks for your comment. Let me try to address your concerns one-by-one, starting with the premise that the current set of tests are not the only ones we propose to run in the testbed. That premise accounts for devoting a single physical core to each VM.

    As for the single disk for each VM vs. RAIDed SSDs, that was one of the ideas we considered. However, we decided to isolate the VMs from each other as much as possible. In fact, if you re-check the build, the DRAM is the only 'hardware component' that is shared.

    We didn't go with the 'virtualizing the physical LAN' because that puts an upper limit to the number of clients which can be set up for benchmarking purpose (dependent on the host resources). In the current case, using an external switch and one physical LAN port for each VM more accurately represents real world usage. Also, in case we want to increase the number of clients, it is a simple matter of connecting more physical machines to the switch.

    Multiple IOMeter sessions: As far as we could test out / understand, IOMeter doesn't allow multiple simultaneous sessions on a given machine. One can create multiple workers, but synchronizing across them is a much more difficult job than synchronizing the dynamo processes across multiple machines. I am also not sure if the workers on one machine can operate through different network interfaces.

    As noted by another reader, 12 VMs haven't been able to max out the N4800 from Thecus. The next time around, we will probably go with the RAIDing 512 GB SSD option for storage of the VM disks. Physical NICs are probably going to remain (along with one physical CPU core or, probably, thread, for each VM).
  • bobbozzo - Thursday, September 6, 2012 - link

    Hi Ganesh,

    Could you post power consumption for the server with the CPUs loaded (with Prime95 or whatever)?

    I'm thinking of building something like this for a webserver.

    Thanks!
  • ganeshts - Friday, September 7, 2012 - link

    Power consumption with Prime95 set for maximum power consumption was 202 W with all CPU cores 100% loaded. Note that the BIOS has a TDP limit of 70W before throttling the cores down.

    However, I noticed that RAM usage in that particular scenario was only 4 GB in total out of the 64 GB available. It is possible that higher DRAM activity might result in more power usage.
  • Stahn Aileron - Wednesday, September 5, 2012 - link

    Just out of curiosity, when you run with multiple clients accessing the NAS, are they all running the (exact?) same type of workload? Or is each VM/client set to use a slightly (if not entirely) different workload?

    I'm curious since, from a home network PoV, I can see multiple access coming from say:

    -One (or more) client(s) streaming a movie (or maybe music)
    -Another (or several) doing copy (reads) from the NAS
    -Others doing writes to the NAS
    -Maybe even one client (I can't really imagine more) doing a torrent (I don't like the idea of a client using a mounted shared network device as the primary drive for torrenting, but you never know. Also, some NASes feature built-in torrent functionality as a feature.)

    I'm just wondering how much the workload from each client differs from one another, if at all, when conducting your tests/benchmarks.

    Also, for the NASes that do RAID, will you be testing how array degradation and/or rebuilding impacts client usage benchmarks?
  • ganeshts - Wednesday, September 5, 2012 - link

    Stahn,

    Thanks for your feedback. This is exactly what I am looking at from our readers.

    As for your primary question, in our benchmark case, all the VMs are running the same type of workload at a given time. The type of workload is given in the title of each graph.

    It should be possible to set up an IOMeter benchmark ICF file with the type of multiple workloads that you are mentioning. I will try to frame one and try to get it processed for the next NAS review.

    Ref. array degradation / rebuild process : Right now, we present results indicating the time taken to rebuild the array when there is no access to the NAS. I will set up a NASPT run when rebuild is in progress to get a feel of how the rebuild process affects the NAS performance.
  • Stahn Aileron - Thursday, September 6, 2012 - link

    Glad to be of some help. To be honest, benchmarking and running tests (troubleshooting) is something I used to do in the Navy as Avionics Technician. I actually do kind of miss it (especially being a tech geek.) Reminiscing aside...

    Back on-topic: what I described in my previous post was more of a home user secenario. Is there anything else you would also need/want to consider in a more work-oriented "dissimilar multi-client workload" benchmark/test? If this was a SOHO environment, I would add the following to my previous post:

    -DB access (not sure how you want to distribute the read/write workload, though I suppose leaning heavier to reads).

    I mention this now because my previous post for read/writes was more along the lines of sequential instead of random. I would guess DB access would be more random-ish in nature.

    For other work-oriented scenarios in a "dissimilar multi-client workload" benchmark, I'm not sure what else could be added. I'm mainly just a power-user. I dunno is people would really use an NAS for, say, an Exchange Server's storage or maybe a locally-hosted website. (Some NASes come with Web service funtions and features, no?)

    I'm just throwing out ideas for consideration. I don't xpect you to implement everything and anything since you don't have the time to do that. Time is your most precious resource during testing and benchmarking, after all.

    Thank you all for running a wonderful website and to Ganesh for a quick reply.

    Oh, one last thing: does disk fragmentation matter in regards to NASes? Would it affect NAS perfomance? Do any NASes defrag themselves?

    This is more of a long-term issue, so you can't really test it readily I'm guessing. (Unless you happen to have a fragmented dataset you could clone to the NAS somehow...) I haven't heard much about disk fragmentation since the advent of SSDs in the consumer space. That, and higher perfomance HDDs. This is mainly just a curiosity for me. (I do have a more personal reason for my interest, but it's a long story...)
  • insz - Wednesday, September 5, 2012 - link

    Interesting article. Would it be possible to add some pics of the final setup? It'd be interesting to see what the testbed would look like assembled and wired up.
  • ganeshts - Friday, September 7, 2012 - link

    I didn't add the pics to the article because the setup wasn't 'photogenic' after final assembly and placement in my work area :) (as the album below shows). Doesn't matter, I will just link it in this comments section

    2012 AnandTech SMB / SOHO NAS Testbed : http://imgur.com/a/h4bQR

    Individual images:

    http://i.imgur.com/hjD9qh.jpg

    http://i.imgur.com/PJ91Vh.jpg

    http://i.imgur.com/2BcEfh.jpg

    http://i.imgur.com/dvmbrh.jpg

Log in

Don't have an account? Sign up now