HP's Moonshot 1500: Our Evalution So Far

We have not tested it yet, but we have no doubts that HP's Moonshot 1500 is a great chassis. The server nodes get power and access to 3 different fabrics from a very advanced backplane, and share power, cooling and management. HP brings some of the best ideas of the blade and the microserver world together. Network and server management gets a lot simpler this way. 

But the server cartridge is a whole different story. The inclusion of the local harddisk is the ideal recipe for increasing management costs quickly. Replacing a bad harddisk - still the most common failure inside the server rack - involves pulling out an extremely heavy server as far as you can, removing the access panel, removing a server cartridge and lastly replacing the harddisk. That is a long and costly procedure compared to simply pushing the release button of hotpluggable harddisks, which can be found in the front of most servers, including the Boston Viridis.

Secondly, the performance per Watt is fantastic... if you compare it with old 1U servers. Consuming 850 Watt for 180 slow threads, or 5W per thread is nothing to write home about. A modern low power blade or a design like Supermicro's Twin² or even HP's own SL series can offer 32 Xeon E5 threads for about 200 W. That is 6.25 Watt per "real Xeon" thread, which is, even in very low IPC workloads, at least 3 times faster!  Want even more proof that 5W for a wimpy thread is nothing special? SeaMicro claims 3.2 kW for a complete system with harddisks, memory and 512 Opteron Piledriver cores. That is 6.25W per real heavy duty core!

Of course, those calculations are based upon paper specs. But we tested Calxeda's technology first hand, and the Boston Viridis with Calxeda went as low as 8.33W for 4 threads, or a little bit more than 2W per thread. Granted, that is without storage but even if you add a 4W 2.5 inch harddisk, we get 3W per thread!

Still not convinced? Well, Intel's own benchmarks are pretty spicy to say the least. This slide can be found in the S1200 presentation:

Again, a relatively simple Atom S1260 server node needs about 20 W. But notice how little power the Xeon E3 needs. Even with 2 SSDs, 16 (!)GB of RAM, 10 Gigabit Ethernet, you are looking at 60W for 8 fast threads. Make the two systems similar (the 10 Gb PHY consume easily 5-8W more than the 1 Gb) and you get about 6 W for the Xeon thread and 5 W for the Atom thread.  

Let us cut to the chase: the current Atom has a pretty bad performance/watt ratio. And the way the HP server cartridges are built today does not make it better. Compare the mini-blade approach with the EnergyCard approach of Calxeda or the "credit card servers" of SeaMicro and you'll understand that there are better and more innovative ways to design microservers. 

To sum it all up, the HP Moonshot is a good platform. But both SeaMicro and Calxeda already offer better designed server nodes. And there are much better CPUs on the market for microservers: AMD's lowest power Piledrivers, Calxeda A9 based EnergyCore and Intel's own Xeon E3-1265L offer a massively better performance/watt ratio. No matter what the software stack is, no matter what the most important metric is for you, one of those three will be able to beat the S1260. The Xeon offers the best single threaded integer performance, the Opteron can offer the best floating point performance (for HPC apps that can be recompiled) and if performance does not matter, the ARM based Calxeda EnergyCard sips much less power than the Atom.

Sure, the Atom S1200 can run on top of (64 bit) Windows Server and ESXi, something that is not possible now with the ARM based Calxeda EneryCore.  But Windows server is seldomly seen in the hyperscale servers and so is ESXi. ESXi is a lot more popular in the hosting market, but let us not forget that VT-x is not enough to run ESXi smoothly. We have been accustomed to very low virtualization overhead thanks to the fact that the CPU architects have reduced the VMexit and entry latencies and introduced technologies like Extended Page Tables combine with very large TLBs. Those vast improvements for virtualization performance have not been implemented in the Atom. Running ESXi on top of the S1260 with 8GB of RAM might not be a very pleasant experience..

To sum it up, we are not exactly thrilled with HP's CPU choices. Luckily, Calxeda is one of the prime partners of HP's Moonshot Project. Lastly, this platform will be outdated soon.

Estimating S1260 Server Performance Intel's Next Generation Low Power Server CPUs
Comments Locked

26 Comments

View All Comments

  • dealcorn - Thursday, April 11, 2013 - link

    It makes me wonder why HP has been hogging almost all Intel's S1200 production capacity. HP may think there is a use case where some customers will find Moonshot attractive.

    Is Briarwood (S12X9) the end of the road for Atom at 32 mn? The addition of many Crystal DMA engines to provide a hardware assist in RAID6 calculations lets Atom be a category killer (in a niche market). I find it funny that after all the criticism, the venerable Atom core is departing 32 nm as a (niche) category killer.
  • Ammohunt - Thursday, April 11, 2013 - link

    The should have named this product line Crapshoot you would think they would learn from past dealings with intel. As a career Systems Administrator i don't find this to be an attractive product as compared to scaling density using 1-2U servers crammed with ram running a modern hyper visor with 24 or more cores. At the same time a 1-2U server can be re-purposed for dedicated tasks.
  • Spunjji - Friday, April 12, 2013 - link

    Pun win.
  • Jaybus - Friday, April 12, 2013 - link

    Which makes me question why a E5 2650L, a 1.8 GHz Sandy Bridge part, was used as a comparison. WIth the E5-2600 V2 series being launched soon, I think an Ivy Bridge E3 would have been a better comparison. The 10 core E5 V2 at 70 W will allow 20 Ivy Bridge cores (40 hyper-threads) at around 3 GHz and at least 256 GB of RAM in a 1U space. That will allow a lot of web server VMs from a 1U. Can these Atom and ARM systems run as many web servers in a 1U space? For some of us, that is a more important question. Performance per Watt is important, but it doesn't necessarily translate to better performance per 1U space, which is the more important metric for some of us.
  • Wilco1 - Friday, April 12, 2013 - link

    Well according to Anand's Calxeda test, you need 2.7x as many Cortex-A9 cores than E5-2660 threads to equal it on webserving. With Cortex-A15 being at least 50% faster than the A9 that reduces to 1.8x. Assuming the E5 V2 is 25% faster, it becomes 2.3x. The max density of the Moonshot is 45 x 4 quadcores in 4.3U, so about 167 cores per 1U vs 40 threads for the E5 V2, ie. it gives 1.8 times as much performance per 1U.
  • vFunct - Thursday, April 11, 2013 - link

    Can we have ARM SoC's with stacked 128GB NAND flash chips on an interconnected grid already?

    It's obvious that this is where everything is headed. Don't need a giant cartridge/module when everything can be done in a stacked die. Or, perhaps just add an ARM core/network interface to NAND FLASH.

    You could probably fit several thousand of them in a chassis. Maybe several hundred thousand in a rack.
  • wetwareinterface - Friday, April 12, 2013 - link

    and that would be a fire you could see from space. a stacked die? and several thousand in a chassis...

    would require a cooling solution based on vapor phase and massive heat exchangers to keep it from burning itself up.

    and further the arm cpu with the ability to access more than 4 GB of ram doesn't yet exist. tying slower flash to it isn't a solution either except in a san. and that would be fairly pointless as you'd hit a bottleneck on the network side so flash storage would be a pointless expense.
  • Wilco1 - Friday, April 12, 2013 - link

    The HP Moonshot server supports 7200 ARM cores already using air cooling. Given that a typical quad core node uses about 5 Watts, stacking flash and/or RAM is certainly feasible. This is pretty much what many mobile phone SoCs already do.

    Also ARM's with more than 4GB capability have been on the market for at least 6 months - Cortex-A15 supports 40-bit addressing.
  • wetwareinterface - Monday, April 15, 2013 - link

    okay i'll bite...

    first off, and i quote, "Don't need a giant cartridge/module when everything can be done in a stacked die." everything that module contains stacked up would equal a heat dissipation nightmare.

    or the second option of just adding an arm core and network interface on top of flash would net nothing but a slow waste of cash.

    and also the a15 has a 40 bit address space so it can see up to 1 TB of ram but each thread can only use 32bit of that address space so.... 4GB cap.

    and stacking the ram and or flash is feasible... but when you cram a bunch of modules together in the thousands you have heat dissipation problems. the tighter you group the heat sources the more geometrical progression for heat buildup becomes an issue. that heat has to go somewhere and with nowhere to go and no air volume to exchange with the more you have to go with extreme cooling solutions.

    phones get away with stacking the cpu and other elements because they don't have to run more than a few minutes accessing those elements at once so heat buildup doesn't become a problem in a typical use scenario. but it does cause issues when you use a phone outside the typical usage it sees.

    a server would melt down with all that being accessed at once and constantly if it were stacked and air cooled.
  • vFunct - Monday, April 15, 2013 - link

    If only there were some way to remove that heat, in a way that would "cool" the system..

    Also, the ARM A15 is the last ARM core ARM will ever design. They don't plan on making any future designs past that. There are no plans on making 64-bit ARM cores, ever.

    So, everything you say is right.

Log in

Don't have an account? Sign up now