Conclusions So Far

Of one thing we are sure: the "cheaper, smaller, higher volume option historically wins" is a very weak argument to make when claiming that ARM SoCs will overtake Intel in the server market. It is hard to make all of the puzzle pieces come together: performance, power, volume, and software. Low prices and volume are not enough. We would love to see some real competition in the server market, but Intel is a lot better positioned today to fend off attacks than the RISC players were back in the 90s.

The current ARM server SoCs are a lot more powerful than Calxeda's ECX-1000, but they do not face a hopelessly outdated Atom S1200 anymore. The Atom C2000 is a huge step forward and the Xeon E3 has continued to evolve in such a way that even eight of the best ARM cores cannot deliver more raw integer processing power than a quad-core E3 with SMT. Meanwhile, the Xeon-D will offer all the advantages of the high performance "Broadwell" architecture, the flexibility of Intel's Turbo Boost, Intel's excellent process technology, and the highly integrated Atom C2000 SoC in one very competitive package.

The first – albeit very rough – performance data indicates that the server ARMada is not ready (yet?) to take on the best Intel Xeons in a broad range of server applications, at least in terms of performance. However, the ARM challengers do have an opportunity. Despite the massive number of Intel SKUs, Intel's market segmentation is rather crude and assumes that all customers can easily be categorized into three (maybe four) large groups: For low budgets, get the low range Xeon E3 (e.g. E3-1220 v3). Pay a bit more and you get Hyper-Threading and higher clock speeds (E3-1240 v3). Pay slightly more and you get another speed bump. Pay much more and you get four memory channels. We'll throw in more cores and a larger cache as a bonus (Xeon E5).

What if I have a badly scaling HPC application (low core count) that needs a lot of memory bandwidth? There is no Xeon E3 with quad channel. What if I need massive amounts of memory but moderate processing power? The Xeon E3 only supports 32GB. What if my application needs lots of cores and bandwidth but does not benefit from large and slow LLC caches? There is no Xeon E5 for that; I can only choose one of the most expensive E5s. And these examples are not invented; applications like these exist in the real world and are not exotic exceptions. What if my application benefits from a certain hardware accelerator? Buy a few 100k of SoCs and we'll talk. Intel's market segmentation is based largely on the assumption that every need (I/O, caches, memory bandwidth, memory capacity) is proportional to processing power.

The ARM based challengers have the potential to serve those "odd" but relatively large markets better. The cost to develop new SoCs is lower and ARMv8 has the inherent RISC advantage of spending fewer transistors on ISA complexity. This lowers the Intel advantage of process technology leadership.

Cavium has a clear focus and targets the scale-out, telecom, and storage markets. We are very curious how the first chip which is specialized for "scale-out" applications will perform. It has been a long time since we have seen such a specialized SoC and it is crystal clear that performance will vary a lot depending on the application. Our first impression is that the chip will be ideal running lots of network intensive virtual machines on top of a hypervisor, such as Xen or KVM.

AppliedMicro's X-Gene seems to target a much wider range of applications, attacking the Intel Xeon E3 and the fastest Atom C2000. The hardware accelerators and quad-channel memory should give it an edge in some server applications while staying close enough in others. Much will depend on how quickly the X-Gene 2 is available in real servers. The X-Gene 2 "ShadowCat" is already up and running, so we have high hopes.

Broadcom seems to have a similar approach. Broadcom is late but is a market leader with deep pockets and an impressive list of customers. The same is true for Qualcomm. But we needs specs and not just broad and vague statements before we dedicate more words to the server plans of Qualcomm.

AMD's Opteron A1100 is definitely betting on undercutting Intel's low-end Xeons in price and features. Everything about it screams "time to market, inexpensive but proven low power design". The more ambitious AMD ARM SoCs will come later, however, as the current A1100 is missing a crucial feature: a link to the Freedom Fabric. The network fabric is a critical feature as OEMs can then build a low power, high performance networked micro server cluster. It was the strongest point of the Calxeda based servers as it kept power per node low, offered very low latency network, and lowered the investments in expensive network gear (Cisco et al.). AMD is a well known brand with the enterprise folks and has a lot of unique server/HPC IP.

Last but not least, many enterprises in the IT world including HP, Facebook and Google want to see more competition in the server market. So all ARM licensees can count on some goodwill to make it happen.

We from our side have been preparing as well. We have developed several new benchmarks to test this new breed of servers. Hard numbers say more than just words, but you'll have to wait for part two of this series for those.

 

 

The RISC Advantage
Comments Locked

78 Comments

View All Comments

  • aryonoco - Wednesday, December 17, 2014 - link

    I just wanted to thank you Johan De Gelas for this very insightful and interesting article.

    Hugely enjoyed reading it and your thoughts on the subject.

    Good to see high quality content continue to be published at AT now that Anand has left.
  • JohanAnandtech - Wednesday, December 17, 2014 - link

    aryonoco, Jann Thanks for letting me know. A good motivation to always push a bit harder to make sure I don't let my readers down :-).
  • jann5s - Wednesday, December 17, 2014 - link

    Thank you Johan, for writing this very interesting article!
  • przemo_li - Wednesday, December 17, 2014 - link

    Very well written walk through current and possible CPU/SOC parts.

    Will there be similar piece for software?
    ARM (embedded) folks aren't famous for quality drivers/code.

    It must change, so it will change. But for now such overview would be great!
  • bobbozzo - Wednesday, December 17, 2014 - link

    Typo on page2:
    "(4 Slots x 8 DIMMs)" - change 8 to 8GB

    Thanks
  • bobbozzo - Wednesday, December 17, 2014 - link

    and page 4:
    "you will be able to choose between SoCs that have 100 Gbit Ethernet and 10GBit Ethernet."

    should 100 be 40?
  • bobbozzo - Wednesday, December 17, 2014 - link

    Page 12:
    "Most of them are the usual IPSec, TPC offloading engines"

    Should that be TCP?

    Also, are there still accelerators for AntiVirus engines and IDS/IPS search (there were some back in 2005).

    Thanks
  • bobbozzo - Wednesday, December 17, 2014 - link

    ...
    I guess that's what the RegEx would be useful for.

    However, not all IDS/IPS / A/V patterns use RegEx, and there are other means of acceleration.
  • eanazag - Wednesday, December 17, 2014 - link

    Welcome back Johan.

    Glad to see you're still writing here. Good stuff in the article.
  • JKflipflop98 - Wednesday, December 17, 2014 - link

    I simply don't get where this whole "microserver" thing is coming from.

    By the time you cluster up enough ARM processors to match the processing power of an Intel/AMD solution, you're burning just as much power and spent just as much money as you would have by using x86 in the first place. Except now you have to use some janky middleware solution because all your software is x86 and you're running on ARM cores.

Log in

Don't have an account? Sign up now