Yesterday HP announced retail availability of two ARM based servers, the ProLiant m400 and m800. Each are offered in a server cartridge as part of the Moonshot System. A single 4.3U Moonshot chassis can hold 45 server cartridges. Usually higher numbers mean better, but in this case the m400 and m800 are so significantly different I wouldn’t consider them competitors. The m800 is focused on parallel compute and DSP, while the m400 is focused on compute, memory bandwidth, IO bandwidth and features the first 64-bit ARM processor to reach retail server availability.

HP ProLiant ARM Servers
  m400 m800
Processors 1 4
Processor AppliedMicro X-Gene
Custom 64-bit ARMv8
TI KeyStone II 66AK2H
Cortex-A15 ARMv7A + DSP
Compute cores per processor


Clock Speed 2.4 GHz 1.0 GHz
Cache Memory Each core: 32KB L1 D$ and I$
Each pair: 256KB L2
All cores: 8MB L3
Each DSP core: 1MB L2
Memory Quad Channel
8 SODIMM Slots
DDR3-1600 Low Voltage
Max: 64GB (8x8GB)
Single Channel
4 SODIMM Slots
DDR3-1600 Low Voltage
Max: 32GB (4x8GB)
Network Controller Dual 10GbE Dual 1GbE
Storage M.2 2280 M.2 2242
PCIe 3.0 2.0

Starting with the m400, HP designed in a single AppliedMicro X-Gene SoC at 2.4 GHz. AppliedMicro has been discussing the X-Gene processor for several years now, and with this announcement becomes the first vendor to achieve retail availability of a 64-bit ARMv8 SoC other than Apple. Considering Apple doesn’t sell their processors stand-alone, this is a significant milestone. AppliedMicro has significantly beaten AMD’s A1100 processor to market, as AMD has not yet entered production. Marquee features of the X-Gene SoC include 8 custom 64-bit ARM cores, which at quad-issue should be higher performance than A57, quad channel DDR3 memory, and integrated PCIe 3.0 and dual 10GbE interfaces. Look out for a deep dive on the X-Gene SoC in a future article.

The m800 is a 32-bit ARM server containing four Texas Instruments KeyStone II 66AK2H SoCs at 1.0 GHz. Each KeyStone II SoC contains four A15 CPU cores alongside eight TI C66x DSP cores and single channel DDR3 memory, for a total of 16 CPU and 32 DSP cores. IO steps back to dual GbE and PCIe 2.0 interfaces. It is clear from the differences in these servers that m400 and m800 target different markets. There isn’t yet a best-of-both-worlds server combining the core count and memory + IO interfaces of the m400 and m800 together.

Each server is available with Ubuntu and IBM Informix database preinstalled, and will be demonstrated at ARM TechCon October 1-3 in Santa Clara, California.

Source: HP



View All Comments

  • ddriver - Tuesday, September 30, 2014 - link

    "Atom CPU will beat this pref wise" - when? In 2020? Reply
  • Wilco1 - Tuesday, September 30, 2014 - link

    "Our next version will beat ARM, next year, we promise!" - we've heard that since 2008...

    Once again Intel aimed too low with Silvermont. Avoton is significantly slower than X-Gene in terms of CPU performance (4-way aggressive OoO vs 2-way partial OoO - no contest). The IO/Network performance is a fraction of the X-Gene version too, with just a dual 1GbE port per core, while X-Gene does dual 10GbE per core. X-Gene also has double the memory, so overall performance will be significantly better in any possible application.
  • fteoath64 - Wednesday, October 01, 2014 - link

    True. Atom, now BayTrail is still slower than the Arm64 cores in cpu terms on the same clock speed. Power consumption wise, it is demolished by the Arm. You can see that Intel is lowering the TDPs of the big core Haswell to address the lower power market knowing the BayTrail microarchitecture does not really cut it in terms of performance per watt. Reply
  • JohanAnandtech - Thursday, October 02, 2014 - link

    Can you back that up? I have not seen any sign that Intel's baytrail is slower in pure performance. Perf/watt is a different matter. Reply
  • iwod - Thursday, October 02, 2014 - link

    Exactly. Why are people doubting Intel on Performance? Perf / Watt on Server Level ( Few W to 10s W ) Intel pretty much dominate the game. You want a lower power CPU for File serving? Atom has it there and future Atom will do even better. You want a little more CPU Power? Broadwell -U is many times faster then even top end ARM64 CPU.

    So yes, Intel cant win the game at Mobile mW range. On the server side and its software ecosystem, Intel has everything, its only a matter of market and price fit.
  • Wilco1 - Friday, October 03, 2014 - link

    People doubt Atom performance as it underperforms. Nobody is saying real Xeons are slow, just Atom. Reply
  • Wilco1 - Friday, October 03, 2014 - link

    You don't really believe a 2-way partially out-of-order core can beat a 4-way aggressive OoO one?

    If you want hard evidence, check how a 2.2GHz A15 beats a 2.6GHz Avoton on single-threaded performance by a good margin, both on integer and FP (ignore the hardware accelerated AES test):

    So given that has ~30-35% higher IPC than Silvermont, and A57 has > 30% better IPC than A15, and X-Gene is faster still, it is pretty obvious that X-Gene at 2.4GHz beats Avoton even if it manages to turbo all cores to 2.6GHz. The large L3 cache and much faster memory system in X-Gene helps as well of course.
  • shodanshok - Friday, October 03, 2014 - link

    Silvermont is not "partially out-of-order" or "moderately aggresive out-of-order".
    Please read David Kanter dissection here:
    In short: Silvermont OOOE is very aggressive, and its OOO memory capabilities are top notch.

    Regarding performance, I can not 100% predict which processor (X-Gene or Silvermont) is faster at single-thread level, but I bet X-Gene would be faster. However, Bay Trail (read: silvermont) is absolutely competitive with Cortex-A15, and generally faster. See and
    While they are mainly browser-based benchmark, Bay Trail is always faster than Cortex-A15, often by a significant margin.

    Moreover, even the GeekBench3 score you provided show Bay Trail at least on par with A15 in single core integer performance, with 1234 point for Tegra K1 and 1321 (both chip had similar clocks).

    Anyway, the single, most notable thing that instantly makes X-Gene a very interesting product are the two integrated 10GbE links. This is an area where Intel is (as often happens) too much conservative, relaying on external (and costly) network chips.

  • patrickjchase - Saturday, October 04, 2014 - link

    Wilco is harping on the fact that Silvermont's FP/vector path is in-order, so in that sense he's right.

    With that said, over-obsession with size (or width) is as common in microarchitecture as in other domains. I've seen plenty of cases where cores with high theoretical or peak IPC get clobbered by designs that look much slower on paper. It often comes down to memory subsystem design and/or branch prediction.

    Without stooping to the "who's better" argument, the fact that we're having this debate at all is prima facia evidence that A15 is such an underperformer (i.e. that its real world performance is much lower than its width and clock rate suggest).
  • shodanshok - Saturday, October 04, 2014 - link

    Are you sure Silvermont's FP are in-order? From David Kanter article posted above:

    "The floating point schedulers also have 8 entries each, but do not hold data. Instead, the input operands are read when the instruction is dispatched to the execution units (similar to Haswell’s unified scheduler) to minimize the movement of the larger 128-bit SSE data."

    If I remember correctly, the problem with Silvermont FP performance is that many instruction are not executed with single cycle throughput. CortexA15 FPU is probably faster... ;)


Log in

Don't have an account? Sign up now