Original Link: http://www.anandtech.com/show/6068/lrdimms-rdimms-supermicros-latest-twin
LRDIMMs, RDIMMs, and Supermicro's Latest Twinby Johan De Gelas on August 3, 2012 4:45 AM EST
Supermicro's 2U Twin
We already introduced Supermicro's Twin 2U server (6027TR-D71FRF) in our Xeon E5 review. It is basically a two node 2U server that offers the density of 1U servers without the disadvantages. Instead of four redundant PSUs you only need two, and instead of noisy, energy hogging and prone to break 40mm fans you get slower turning 80mm fans.
The two servers are held in place using screwless clips.
There was one big disadvantage: there were only four DIMM slots per CPU, which limits each node to 128GB of RAM (8 x 16GB). That is a bit on the low side for 16 cores and 32 threads and makes this server less suitable for virtualization loads.
Of course, this server was never meant to be a virtualization server as it is equipped with 56Gb/s TFDR InfiniBand interconnect technology, great for processing intensive cluster applications like some clustered HPC apps. Nevertheless, we were intrigued. Supermicro has recently released a new Twin, the 6027TR-D70RF+, which has 16 DIMM slots.
Most 2U servers are limited to 24 memory slots and as a result 384GB of RAM. With two nodes in a 2U server and 16 slots per node, you get cram up to 512GB of RDIMMs in one server. The Supermicro Twin node (6027TR-D70RF+) looks like an attractive alternative for the more common 1U and 2U servers:
- Much more (3, 2 full height) PCIe expansion slots than a 1U, almost as good as a traditional 2U
- Lower energy consumption as two (up to 95% efficient) PSUs are powering two nodes
- Much better and more efficient 80mm cooling fans than a 1U
- Density of a 1U
- 33% more DIMM slots than a 2U
That all sounds great for any cluster solution including a virtualization cluster, but there is more. If you use LRDIMMs, you can double your capacity. LRDIMMs at 1333MHz are available as quad rank 32GB DIMMs. But before we can introduce you to these DIMMs, we want to take a step back and look at all the RAM options that a typical server buyers has.
An Overview of Server DIMM types
Typically, desktop and mobile systems use Unbuffered DIMMs (UDIMMs). The memory controller inside your CPU addresses each memory chip of your UDIMM individually and in parallel. However, each memory chip places a certain amount of capacitance on the memory channel and thus weakens the high frequency signals through that channel. As a result, the channel can only take a limited number of memory chips.
This is hardly an issue in the desktop world. Most people will be perfectly happy with 16GB (4x4GB) and run them at 1.6 to 2.133GHz while overvolting the DDR3 to 1.65V. It's only if you want to use 8GB DIMMs (at 1.5V) that you start to see the limitations: most boards will only allow you to install two of them, one per channel. Install four of them in a dual channel board and you will probably be limited to 1333MHz. But currently very few people will see any benefit from using slow 32GB instead of 16GB of fast DDR3 (and you'd need Windows 7 Professional or Ultimate to use more than 16GB).
In the server world, vendors tend to be a lot more conservative. Running DIMMs at an out-of-spec 1.65V will shorten their life and drive the energy consumption a lot higher. Higher power consumption for 2-3% more performance is simply insane in a rack full of power hogging servers.
Memory validation is a very costly process, another good reason why server vendors like to play it safe. You can use UDIMMs (with ECC most of the time, unlike desktop DIMMs) in servers, but they are limited to lower capacities and clockspeeds. For example, Dell's best UDIMM is a 1333MHz 4GB DIMM, and you can only place two of them per channel (2 DPC = 2 DIMMs Per Channel). That means that a single Xeon E5 cannot address more than 32GB of RAM when using UDIMMs. In the current HP servers (Generation 8), you can get 8GB UDIMMs, which doubles the UDIMM capacity to 64GB per CPU.
In short, UDIMMs are the cheapest server DIMMs, but you sacrifice a lot of memory capacity and a bit of performance.
RDIMMs (Registered DIMMs) are a much better option for your server in most cases. The best RDIMMs today are 16GB running at 1600MHz (800MHz clock DDR). With RDIMMs, you can get up to three times more capacity: 4 channels x 3 DPC x 16GB = 192GB per CPU. The disadvantage is that the clockspeed throttles back to 1066MHz.
If you want top speed, you have to limit yourself to 2 DPC (and 4 ranks). With 2DPC, the RDIMMs will run at 1600MHz. Each CPU can then address up to 128GB per CPU (4 channels x 2 DPC x 16GB). Which is still twice as much as with UDIMMs, while running at a 20% higher speed.
RDIMMs add a register, which buffers the address and command signals.The integrated memory controller in the CPU sees the register instead of addressing the memory chips directly. As a result, the number of ranks per channel is typically higher: the current Xeon E5 systems support up to eight ranks of RDIMMs. That is four dual ranked DIMMs per channel (but you only have three DIMM slots per channel) or two Quad Ranked DIMMs per channel. If you combine quad ranks with the largest memory chips, you get the largest DIMM capacities. For example, a quad rank DIMM with 4Gb chips is a 32GB DIMM (4 Gbit x 8 x 4 ranks). So in that case we can get up to 256GB: 4 channels x 2 DPC x 32GB. Not all servers support quad ranks though.
LRDIMMs can do even better. Load Reduced DIMMs replace the register with an Isolation Memory Buffer (iMB™ by Inphi) component. The iMB buffers the Command, Address, and data signals.The iMB isolates all electrical loading (including the data signals) of the memory chips on the (LR)DIMM from the host memory controller. Again, the host controllers sees only the iMB and not the individual memory chips. As a result you can fill all DIMM slots with quad ranked DIMMs. In reality this means that you get 50% to 100% more memory capacity.
There is a fourth option: Netlist's HCDIMM. Netlist, a company that specializes in making VLP (Very Low Profile) memory, offers an alternative for LRDIMMs: HyperCloud DIMMs.
Instead of using a centralized buffer (LRDIMMs), HCDIMMs use a distributed buffer to reduce the electrical load on the memory channels. Combined with an HCDIMM register, four ranks are presented as two.
The advantage of HCDIMMs is that HCDIMMs run one speed bump faster than LRDIMMs. So while LRDIMMs have to throttle back to 1066MHz at 3 DPC, HCDIMMs run at 1333MHz. According to a Netlist sponsored report, HCDIMMs offer about 17% higher bandwidth, which sounds reasonable to us. Secondly, the distributed buffer architecture is the same architecture that DDR4 is converging on. However DDR4 will be running at a much lower 1.2V.
The combination of a register and distributed buffers (and clock redundancy) comes with a serious drawback: power. Although we could not test HCDIMMs ourselves, most industry sources talk about 20% higher power than LRDIMMs, while LRDIMMs only consume more than RDIMMs with 1 DPC. In 3 DPC configurations, LRDIMMS consume about the same as RDIMMs.
Secondly, HCDIMMs are not a JEDEC standard. As a result the HCDIMM ecosystem—module, server and CPU vendors—is smaller. AMD and Intel do not officially validate HCDIMMs, so server vendors have do the complete validation effort themselves. IBM for example only offers them in one system (IBM system x3650 M4), and 16GB HCDIMMs are quite a bit more expensive than both 16GB LRDIMMs and RDIMMs. LRDIMMs are much more widespread; almost every server vendors supports them in a wide range of server models.
Lastly, HCDIMMs are only available from one module vendor, Netlist.
Nevertheless, HCDIMMs are viable but somewhat expensive alternative in the high capacity memory market if your server supports them. Netlist has been quite successful in convincing the server vendors: several models of HP, Supermicro, and Gigabyte support HCDIMMs. HP offers HCDIMMs in their most popular servers (DL380/DL360), although HCDIMMs can only be installed by HP.
Once Netlist gets the 32GB parts out in large volumes, the extra competition can probably drive prices down. Until then, HCDIMMs offer only a speed advantage over RDIMMs.
Let us structure all this info in a table.
|DIMM Types, Speed, and Capacity Limitations|
|DIMM type||UDIMM||RDIMM||RDIMM LV||LRDIMM||HCDIMM|
|Maximum speed at 2 DPC||1333MHz||1600MHz||1333MHz||1333MHz||1600MHz|
|Maximum speed at 3 DPC||Not possible||1333MHz||1333MHz||1066MHz||1333MHz|
|Maximum capacity per CPU (Quad channel)||64GB||
192GB (3 DPC)
256GB (2 DPC)*
192GB (3 DPC)
256GB (2 DPC)*
|Top speed at maximum capacity||1066||
1066 (DR) **
800 (QR) **
1066 (DR) **
800 (QR) **
|loaded, 3 DPC Power usage (at 1.5V) per DIMM||4 W||4.5 W||<= 4 W||5-6 W||8-9 W|
|Intel CPU support||
|AMD CPU support||
|Server support||all servers||all servers||all servers||Dell, HP, IBM,Supermicro||
HP DL360G8 DL380G8
IBM x3650 M4
* Quad Rank DIMMs require special BIOS support and validation and are not available on all servers.
** DR = Dual Rank, QR = Quad Rank.
1600MHz LRDIMMs are possible but not commercially available as far as we know. 32GB HCDIMMs are available in very small quantities. LRDIMMs are available from almost every server module manufacturer out there, while HCDIMMs are netlist modules only.
All test have been done on the Supermicro SuperServer 6027TR-D71FRF.
|Supermicro SuperServer 6027TR-D71FRF (2U Chassis)|
|CPU||2x Intel Xeon processor E5-2660 (2.2GHz, 8c, 20MB L3, 95W)|
64GB (8x 8GB) 1600MHz RDIMM Samsung M393B1K70DH0-CK0 or
128GB (8x 16GB) 1333MHz LRDIMM Samsung M386B2K70DM0-YH90
|BIOS version||R 1.1|
|PSU||PWS-1K28P-SQ 1280W (80 Plus Platinum)|
We used only one PSU for the energy measurements.
We used VMware ESXi 5.0 (vSphere 5 Enterprise Plus) for the virtualization tests, Windows 2008 R2 SP1 (64 bit) for the Windows AIDA latency test, and Ubuntu 12.04 for the Stream bandwidth test.
Two Intel i350 Gigabit NICs were load balanced with ESXi's NIC Teaming.
The storage consisted of one Adaptec RAID 5805 with two Raidsets. The first RAIDset consisted of three Micron Crucial P300 100GB 6Gbps MTFDDAC100SAL-1N1AA in RAID 5. The LUN on top was made for all Zimbra VMs, which create quite a bit of disk I/O.
The second RAIDset consisted of three Western Digital WD1000FYPS 1TB in RAID 5. The LUN on top of these disks was used for the other VMs. The other VMs only require intensive disk I/O in the beginning of the test. During that time, the performance measurements are not used for our final results.
The Purpose of this Test
We will not state that we had the most optimal testing configuration for our objectives. The most interesting LRDIMMs are the 32GB ones, and we had the cheaper (and thus easier to borrow) 16GB parts. The most interesting Supermicro Server is the 6027TR-D70RF+; we had the slightly older but very similar 6027TR-D71FRF. As always, we try to make the best with what we have in the lab. We believe that with this testing configuration we can still answer the questions that will pop up when you consider the different server configuration such as:
- How much bandwidth and latency will you sacrifice when buying LRDIMMs instead of RDIMMs?
- Does investing in expensive high capacity DIMMs pay off?
- Can you really host twice as many VMs in twice as much memory?
- How much performance do you gain by giving your VMs more physical memory?
Those two last questions seem silly at first sight but they are not. In most cases, virtual machines get (much) more memory than they really need. Most administrators prefer to give the VM quite a bit of memory headroom. It is a good practice, but it means that it is not necessarily a bad thing if the total amount of virtual memory is quite a bit higher than the total available physical memory. Unless all VMs are working hard, a modern hypervisor can make sure that the "needy" VMs get what they need and the "lazy" ones get only the bare minimum of physical RAM.
Indeed, an advanced hypervisor such as ESXi (especially ESXi) has a lot of tricks up its sleeve to make sure that even if you don't have enough physical memory, your VM will still run fine. Physical memory use is optimized by:
- Transparent page sharing (TPS): the Hypervisor will only claim one page for several pages of several different VMs with identical content (e.g. the Windows kernel and HAL in several Windows based VMs)
- Ballooning: the Hypervisor reclaims memory that a VM does not use and gives it to more "needy" VMs
- Memory compression: pages that need to be swapped out (see further) are checked if they have a high compression ratio. If that is the case they are compressed and kept in a memory cache. As a result...
- Hypervisor swapping: memory that is not active and not compressible can be swapped to disk, similar to a "normal" OS (Supervisors). This does not necessarily result in a large performance hit, as pages swapped out are rarely used.
So we thought it would be interesting to design a scenario where we could measure the performance differences between a system with lots of memory and a more budget limited one.
Low Level Measurements
Before we start with the rather complex virtualization benchmarks, it is good to perform a few low level benchmarks. First, we measured the bandwidth in Linux with our Pathscale compiled Stream binary on the latest Ubuntu Linux. For more details about our Stream binary, check here.
To get a better understanding we tested the 8GB RDIMMs at both the rated speed (1600MHz) as well as configured (via the BIOS) as 1333MHz DIMMs. The comparison of 1333MHz RDIMM and LRDIMM allows us to measure the impact of the iMB buffer. That impact is small but measurable: RDIMMs offer about 5% more bandwidth than LRDIMMs at the same speed. 1600MHz RDIMMs offer 14% higher bandwidth than LRDIMMs.
Of course, bandwidth only matters when you run out of it. Latency always matters, although a 15MB (up to 20MB) L3 cache can hide a lot of memory latency. We tested memory latency with AIDA64.
The iMB adds about 11% if we disable turbo and compare the LRDIMM and RDIMM at the same clockspeed (1333MHz). Somewhat interesting is that the latency of the RDIMM at 1600MHz is higher. The memory chips are accessed with a significantly higher CAS and RAS to CAS latency at 1600MHz, which explains this counter intuitive result. Once we enable turbo, the latency differences get very small. The iMB causes only 2% extra latency, which is negligible.
The conclusion so far: the iMB decreases bandwidth by a measureable but small amount, but the latency impact is hardly measurable.
vApus Mark Mixed Performance
You might remember that we have two virtualization benchmarks, vApus Mark II and vApus Mark FOS. The first one is a virtualization benchmark with only Windows (mostly 2008) based VMs and third party proprietary applications; the second one has only Linux based VMs and contains strictly open source workloads so that third parties can verify our results if necessary.
You won't be surprised that as always both benchmarks have their limitations. Most enterprises run a mix of Windows and Linux VMs, and our benchmarks are not network and memory intensive enough. Remember that we developed them to test the newest multi-core processors from AMD and Intel. As 16GB or even 8GB DIMMs are way too expensive, we tried to keep the memory footprint rather modest as we would quickly run out of memory if we did not. Intel and AMD tend to send us their CPUs with the highest core count and the highest clock speeds inside an acceptable power budget.
Enter vApus Mark Mixed beta. We designed our third virtualization benchmark to be ultra realistic:
- A mix of Windows and Linux VMs
- High memory use
- Heavy I/O with lots of network traffic (>1 Gbit/s) and disk I/O
A vApus Mark Mixed beta tile consists of
- A LAMP virtual machine (Linux, Apache, MySQL, PHP) with four vCPUs and 4GB RAM
- An MSSQL2008 on Windows 2008 that gets eight vCPUs and 8GB of RAM
- A Zimbra VM with two vCPU and 8GB of RAM
- SPECJbb VM with eight vCPUs and 16GB of RAM
So, each tile needs 22 vCPUs and 36GB of RAM. If we test with three tiles, we would theoretically need 108GB. Note that this is theoretical. For example, we run with four instances of SPECJBB and each instance gets 4GB of RAM, but in reality, although those instances try to allocate 4GB of RAM, this does not mean that they effectively use 16GB actively. The same is true for the other VMs.
Virtualized Cluster Testing
To fully understand the impact of adding more RAM, we test with both two tiles (72GB allocated) and three tiles (108GB). At two tiles, there will be some RAM left in the 64GB setup. ESXi allocates very little memory for the file system caches ("cached" in Windows) and gives priority to the active memory pages, the pages that are actually used by the most important application inside the VM.
To keep the benchmark results easily readable, we standardized all performance to the performance of the system configured with the LRDIMMs (128GB) and two tiles. So that system always scores 100 (%). All other performance numbers are relative to that.
First we check the total throughput. This is the geometric average of the standardized (a percentage relative to our LRDIMM system) throughput of each VM.
The 15% higher bandwidth and slightly better latency of the RDIMMs at 1600MHz allows the RDIMM configured server to outperform the one with LRDIMMs by 6% when running two tiles.However, once we place four more VMs (three tiles) on top of the machine, things start to change. At that point, ESXi had to create "Balloon memory" (10GB) and swap memory (4GB) on the 64GB RDIMM machine. So it is not like we went far beyond the 64GB of active memory. ESXi still managed to keep everything running.
The 128GB LRDIMM server is about 7% faster running three tiles instead of 6% slower (two tiles). That is not spectacular by any means, but it is interesting to delve a little deeper to understand what is happening. To do so, we check out the average latency. The average latency is calculated as follows: the average of the response time of each VM is divided by the response time of the same VM on the LRDIMM system (two tiles). We then calculate the geometric mean of all percentages.
As soon as we add more VMs both systems suffer from higher response times. This shows that we made our test quite CPU intensive. The main culprit is SPECJBB, the VM that we added to save some time, as developing a virtualization test is always a dire and complex undertaking. But trying to save time unfortunately reduced the realism of our test. The problem is that SPECJBB runs the CPU at close to 100%, and as a result the CPU becomes the bottleneck at three tiles. In our final "vApus Mark Mixed" we will replace the SPECjbb test.
We decided to show you this data already until we develop a completely real world benchmark. We are currently evaluating a real world Ruby On Rails website and a Drupal based site. Please feel free to share your opinion on virtualization benchmarking if you are active in this field. Until then, can the extra RAM capacity help even if your applications are CPU intensive? 17% better response times might not be impressive, but there is more to it.
Investigating VM performance
The performance per VM can tell us much more. Even if our test is too CPU oriented, investing in more memory clearly pays off. ESXi is very efficient in allocating memory. There was for example 3GB of RAM ("Memory Shared Common") that was saved by Transparant Page Sharing.
Nevertheless, the pressure on the memory system caused the SQL Server VM to crawl. Throughput plunged to one third. As a result the Zimbra VM benefited from the lower CPU usage of MS SQL 2008 and performance increased. This is another sign that our test is a bit too CPU oriented. But still, even this corner case scenario is telling us much, especially if we check out the latency numbers.
The latency impact was even worse. The MSSQL server response time was on average four times higher. Even more interesting is our LAMP site. Throughput remained the same, but response times were 3.5 times higher than with two tiles. The more spacious LRDIMM Supermicro server saw the response time double in the worst case.
Next stop is of course power and energy. First we check out idle power. Idle power is measured with one node and one power supply active.
For a system equipped with an LSI SAS 6G and a Mellanox ConnectX-3 FDR Infiniband controller, 168W per node is a very decent power level for our Supermicro 6027TR-D71FRF server. The HP DL380 G7, the most efficient server of the previous generation, needed 158W with 15 4GB DIMMs but with a 460W PSU.
The premium for the iMB buffer is about 1.5W per DIMM, which is a bit higher than expected (1W). Of course we measure at the wall, so some voltage regulation and PSU overhead is added.
As each power line has only a limited amount of amps available in the datacenter, it is good to check power under heavy load.
The LRDIMM equipped server needs 19W more or 2.4W per DIMM. Again, this is a bit higher than expected. To get the complete picture, we measured the total energy consumed during one hour of testing. All configs got the same workload but the LRDIMM equipped servers finished the jobs more quickly.
The result is that the LRDIMM equipped server is consuming less with three tiles, while the 64GB RDIMM consumes less with two tiles. Better performance results in lower power thanks to the advanced core sleep states of the Xeon E5 "Sandy Bridge EP".
Value for Money: LRDIMMs
Superior response times, slightly better energy consumption: it does not matter if the price ain't right. Last year, a server filled with 32GB LRDIMMs cost about the same as a decent sports car. That makes LRDIMMs only attractive to those that buy servers to run software with huge licensing fees.
What is the situation anno 2012? We decided to scan the market and jotted down an overview of the prices that we could find. We checked out both the OEMs and the rest of the market. For example, the IBM X3750M4 can make use of 16GB LRDIMMs and RDIMMS. The HP DL380 servers support 32GB LRDIMMs, 16GB HCDIMMs, and 16GB RDIMMs.
|DIMM price comparison|
|DIMM type||Product number||Remark||Price|
|1333MHz 32GB RDIMM||Dell||In Dell's R720||$2000|
|1333MHz 32GB LRDIMM||Dell||In Dell's R720||$3800|
|1333MHz 32GB LRDIMM||HP 647885-B21||HP memory||$2004|
|1333MHz 32GB LRDIMM||Samsung M386B4G70BM0-YH9||various websites||$1100-1250|
|1333MHz 32GB RDIMM||Samsung M393B4G70AM0-CF8Q5||various websites||$1400-1800|
|1600MHz 16GB RDIMM||HP 672631-B21||HP memory||$455|
|1333MHz 16GB HC DIMM||HP 678279-B21||HP exclusive option||$760|
|1333MHz 16GB RDIMM||IBM 49Y1562||In IBM X3750M4 server||$435|
|1333MHz 16GB LRDIMM||IBM 49Y1566||In IBM X3750M4 server||$324|
|1333MHz 16GB RDIMM LV||Samsung M393B2G70BH0-CH9||Certified on Supermicro's Twin||$342|
|1333MHz 16GB LRDIMM LV||
|Tested LR DIMM||$467|
|1333MHz 16GB LRDIMM LV||CT16G3ELSLQ41339||Crucial LR DIMM||$277|
|1333MHz 8GB RDIMMs||Samsung M393B1K70BM1-CH9||various websites||$124|
LRDIMMs are very well adopted in the market; HP, Dell and IBM offer them in various servers. However, the way the technology is adopted depends very much on the vendor. IBM offers 16GB LR DIMMs and 16GB RDIMMs, while Dell and HP only offer 32GB LRDIMMs.
Dell demands a huge premium for LRDIMM: $3800 instead of $2000. There can be no more than eight 32GB RDIMMs. You can only populate all 12 slots if you use 32GB LRDIMMs. The end result is that you get only 50% more capacity for twice the investment, which is not really an attractive proposition. For some reason, server vendors like to slow down progress instead of supporting it. Another example are the ridiculous premiums that server vendors demand for using relatively slow and outdated SSDs in their servers.
16GB LRDIMMs are competitively priced, but they are not really interesting in most servers except in the IBM X3750M4. In all other servers, you can get faster 1600MHz dual ranked RDIMMs, which offer better performance at a lower power level: 16GB dual rank RDIMMs use half as many but twice as large memory chips than the quad ranked 16GB LRDIMMs. 16GB HCDIMMs seem to be priced even higher and probably offer a much worse performance per Watt ratio.
As far as we could see, HP does not offer any 32GB quad ranked RDIMMs at this point in time, so HP positiones the 32GB LRDIMM as the one and only high capacity option. The premium is significant but much more reasonable than Dell: about $2000 per DIMM. Nevertheless, this makes LRDIMMs only interesting for those people with huge software and hardware budgets.
Luckily we have Supermicro to put some price pressure on the tier one OEMs. Several resellers of Supermicro servers offer 32GB LRDIMMs for prices as "low" as $1100. Considering that typical 32GB quad ranked RDIMM parts cost at least 25% more and run at only 1066MHz, it looks like LRDIMMs really are the only attractive option for high capacity servers.
Most of you will probably go for the cheaper 8GB and 16GB parts. Notice however that 8GB parts start to lose their appeal. With prices hovering around $250-$400 and most servers being more memory limited than CPU limited, 16GB RDIMMs are a better option.
Total cost of acquisition is only one part of the overall cost story. If you are memory capacity limited, would it be interesting to invest in LRDIMMs? After all, running two LRDIMM equipped servers instead of four RDIMM servers is more energy efficient and you need to invest fewer in servers. We did a quick TCO study, but before we can do that we should decide on which server platform to choose. How does the latest Supermicro compare with other similar solutions?
Value for Money: the Supermicro Server
It is pretty hard to compare the Supemicro Twin with other servers on the market. Firstly, HP and Dell do not fully disclose the pricing of their dense rack servers. Secondly, as each server vendor has its own chassis and form factors, it is very hard to find a close match.
By puzzling together the available information from the HP (US) and several Supermicro (US based) resellers, we were able to make a rather rough comparison. Take it with a grain of salt as both servers are not completely targeted at the same market, but they are similar and the targeted markets do overlap.
|DIMM price comparison|
|Chassis||S6500||Empty Chassis||$2549 : 2 = $1274 *|
|Node 1||HP ProLiant SL250s Gen8 2U||2 x Xeon 2665, 8GB||$5659|
|Node 2||HP ProLiant SL250s Gen8 2U||2 x Xeon 2665, 8GB||$5659|
|Total Base System||$13593|
|Chassis||6027TR-D71RF+||Chassis with motherboards||$2500|
|Extra||4 x Xeon 2665, 2 x 8GB||To compare with HP||$6000|
|Total Base System||$8500|
*Chassis is 4U, with four 2U servers
HP designed a 4U server in which you can fit four half width 2U rack servers. We divided the chassis cost by two, to get an idea of how two HP SL-servers compare with the Supermicro 2U Twin.
As always, such a proprietary "blade-ish" design comes with a premium. The result is that the HP server is about 50% more expensive. The next thing you have to do is fill the server with DIMMs.
|DIMM price comparison|
|8GB RDIMM||1600MHz||HP 647899-B21||$191|
|8GB RDIMM LV||1333MHz—LV||HP 647897-B21||$146|
|16GB RDIMM||1600MHz||HP 672631-B21||$ 405|
|16GB RDIMM LV||1333MHz—LV||HP 647901-B21||$386|
|32GB LRDIMM LV||1333MHz—LV||HP 647885-B21||$2000|
|32GB LRDIMM||1333MHz LV||Samsung||$1150|
HP's pricing is pretty decent until we go up in DIMM capacity. The 8GB premium is around $40 per DIMM, which is relatively small in a $10000 server. At most you will paying about $640 extra if you buy 16 DIMMs. Once you get 16GB DIMMs, you are paying $115 more per DIMM, or a price premium of 40%. At $400 per DIMM, the DIMM investment starts to eat up a large share of the server budget.
Although the comparison is far from perfect, the conclusion is pretty straightforward. If hardware costs play an important role in your IT budget, Supermicro offers much more for your dollar. Two fully populated (16x16GB) Supermicro Twin servers will cost about $26k; a quad 4U HP S6500 server will run about $40k.
Of course, If the deal is part of a larger IT project involving consultancy and software costs, HP can make a case. Nevertheless, it must be said that the pricing of the Supermicro's servers is very attractive.
We did a quick cost calculation based on the following assumptions:
- Your applications are memory limited not CPU limited
- A fully equipped system without memory costs about $8000
- You pay 12 cents per kWh
- The real yearly (and not only in the winter) PUE is about two, or if you don't own the datacenter, you pay twice the electricity costs (24 cents per kWh; datacenter clients often pay higher fees for energy costs)
We did a 3-year TCO calculation, without the management costs. Managing half as many physical servers reduces management costs, but it won't be much. Most of the administration costs depends on how many VMs you have to administer, which is the same in both cases.
|Cost||32GB LRDIMMs||16GB RDIMMs|
|Number of servers||1 (2 nodes)||2 (4 nodes)|
|Price per server||8000||8000|
|Max RAM per server||1024GB||512GB|
|Price per DIMM||1100 (32GB)||290 (16GB)|
|Number of DIMMs||32||64|
|Energy per server (kW)||0.4||0.4|
|Energy costs (3 years)||2523||5046|
|TCO w/o management costs||45723||39606|
With the current prices, LRDIMMs cannot justify the cost by hardware alone. But the difference is not that large anymore (about 15%). On a 3-year basis, administering half as many servers could definitely reduce the administration costs, but it is very hard to calculate. It could amount to several thousand dollars though, depending on the local labor costs.
If you are paying a software license per server (Ansys comes to mind) and you are memory starved, 32GB LRDIMMs are very attractive. In all other cases, 16GB RDIMMs are most likely the best choice. Unfortunately, VMware decided to make their larger customers pay per vRAM used, so investing in the largest memory available does not reduce licensing costs at first sight when you are running an ESXi cluster. The situation is entirely different with Microsoft's Hyper-V, KVM and Citrix Xen however. This customer unfriendly licensing could convince quite a few people to look the other way. In that case, LRDIMMs allow you to significantly reduce the response time even when the CPU load is high. Our virtualization benchmarking scenario was among the worst to show off the benefits of larger memory capacities and we still noticed up to 2.5 lower response times and 15% lower power consumption.
Back to the hardware. The iMB buffer technology has very little latency impact, so you should not worry about that. The end conclusion is that LRDIMM is a mature technology that offers 50 to 100% more memory capacity or slightly higher bandwidth, but currently at a cost that is about 3 to 4 times higher. If your application is memory limited and you are paying hefty licenses per server, you should definitely consider it. Also, LRDIMM prices are falling. As we type this, industry sources tell us that the price of 32GB LRDIMMs will soon be around $800. If you are in the market for a new high capacity server, keep an eye on the DIMM pricing.
And what about the Supermicro SuperServers? Well, the Twins are typically not considered for virtualization purposes, but now that the latest ones come with 16 slots and 10Gb Ethernet, they become a worthy cost, space, and energy efficient alternative. Our Twin worked flawlessly with the latest ESXi 5, although with 8 DIMM slots (16 DIMMs total), it is not meant to be a virtualization server. We have good hopes that the SuperServer 6027TR-D70RF+ (32 DIMMs total) will be a very affordable alternative to the tier oone OEM servers out there.
Best of all, most of the Twins can be found on VMware's and other Hypervisors's Hardware Compatibility Lists (HCLs). You can definitely save a lot of money as not only the servers themselves are very attractively priced, but you only also benefit from the fact that you do not have to pay a "vendor specific" premium for high capacity memory.