ARM Challenging Intel in the Server Market: An Overview

Name: ARM Challenging Intel in the Server Market: An Overview
Item: ARM Challenging Intel in the Server Market: An Overview
Author: Johan De Gelas

by Johan De Gelas on December 16, 2014 10:00 AM EST

78 Comments | Add A Comment

78 Comments

AMD Opteron A1100

The 28nm octal-core AMD Opteron A1100 is a lot more modest and aims at the low end Xeon E3s. Stephen has described the chip in more detail. To ensure a quick time to market, the AMD Opteron A1100 is made of existing building blocks already designed by ARM: the Cortex-A57 core and the Cache Coherent Network or CCN.

The AMD Opteron A1100 is one of the few vendors that uses the ARM interconnect. ARM put a lot of work into this design to enable ARM licensees to build SoCs with lots of accelerators and cores. CCN is thus a way of attaching all kinds of cores, processors, and co-processors ("accelerators") coherently to a fast crossbar, which also connects to four 64-bit memory controllers, integrated NICs, and L3 cache. CCN is very comparable to the ring bus found inside all Xeon processors beginning with "Sandy Bridge". The top model is the CCN-512 which supports up to 12 clusters of quad-cores. This could result in an SoC with 32 (8x4) A57 cores and four accelerators for example.

AMD would not tell us which CCN they are using but we suspect that it is CCN-504. The reason is this CCN was available around the time work started on the Opteron A1100 and the fact that AMD mentions the ARM bus architecture AMBA 5 in their slides. And it also makes sense: the CCN-504 supports up to 4 x 4 cores and supports the Cortex-A57.

It was rumored that the A1100 still used the CCI-400 interconnect, which is used by smartphone SoCs, but that interconnect uses the AMBA 4 architecture. Meanwhile the CCN-502 was announced in October 2014, way too late to be inside the A1100.

The AMD Opteron A1100 consists of four pairs of "standard" triple issue Cortex-A57 cores and 1MB L2 cache, with 8MB L3 cache.

The key differentiator is the cryptographic processor that can accelerate RSA (Secure Connection/hand shake) and AES (encrypting the data you send and receive) and SHA (part of the authentication). Intel uses the PCIe Quick Assist 89xx-SCC add-in card or the special Intel Communication chipset to provide a cryptographic coprocessor. These coprocessors are mostly used in professional firewalls/routers. As far as we know such cryptographic processors are of limited use in most https web services. Most modern x86 cores now support AES-NI, and these instructions are well supported. As a result, the current x86 CPUs from AMD and Intel outperform many co-processors when it comes to real world AES encoding/decoding of encrypted data streams.

A cryptographic coprocessor could still be useful for the RSA asymmetric encrypted handshake, but it remains to be seen if offloading the handshakes will really be faster than letting the CPU take care of it, as each offload operation causes all kinds of overhead (such as a system call). A cryptographic coprocessor running on the same coherent network as the main cores could be a lot more efficient than a PCIe device though. It has a lot of potential, but AMD could not give us much info on the current state of software support.

Cavium Thunder-X Late to the Party: Broadcom and Qualcomm

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

78 Comments

View All Comments

hojnikb - Tuesday, December 16, 2014 - link
Wow, i have never motherboard that simple :)
CajunArson - Tuesday, December 16, 2014 - link
OK you devote another huge block of text to the typical x86 complexity myth* followed by: Oh, but the ARM chips are superior because they have special-purpose processors that overcome their complete lack of performance (both raw & performance per watt).

Uhm... WTF?? I need to have a proprietary, poorly documented add-on processor to make my software work well now? How is that a "standard"? How is requiring a proprietary add-on processor that's not part of any standard and requires boatloads of software cruft working in a "reduced instruction set architecture" exactly?

I might as well take the AVX instruction set for modern x86... which is leagues ahead of anything that ARM has available, and say that x86 is now a "RISC" architecture because the AVX part of x86 is just as clean or cleaner than anything ARM has. I'll just conveniently forget about the rest of x86 just like the ARM guys conveniently forget about all the non-standard "application accelerators" that are required to actually make their chips compete with last-year's Atoms.

* Maybe in a micro-controller setting where you are using a PIC or Arduino the x86 decoding is a real issue, but in a server? Please. Considering the only hard numbers you have show a 2013-model Atom beating a 2015-model ARM server processor, you'll have to try harder.
hlmcompany - Tuesday, December 16, 2014 - link
The article describes ARM chips as becoming more competitive, but still lagging behind...not that they're superior.
Kevin G - Tuesday, December 16, 2014 - link
The coprocessor idea is something stems from mainframe philosophy. Historically things like IO requests and encryption were always handled by coprocessors in this market.

The reason coprocessors faded away outside of the mainframe market is that it was generally cheaper to do a software implementation. Now with power consumption being more critical than ever, coprocessors are seen as a means to lower overall platform power while increasing performance.

Philosophically, there is nothing that would prevent the x86 line from doing so and for the exact same reasons. In fact with PCIe based storage and NVMe on the horizon in servers, I can see Intel incorporating a coprocessor to do parity calculations for RAID 5/6 in there SoCs.
kepstin - Tuesday, December 16, 2014 - link
Intel has already added some instructions in avx and avx2 that vastly improve the performance of software raid5 and 6; the Haswell chip in my laptop has the Linux software raid implementation claiming 24GiB/s raid5 with avx, and 23GiB/s raid6 with avx2 (per core).
MrSpadge - Tuesday, December 16, 2014 - link
Of course additional power draw for more complex instruction deconding mattes in servers: today they are driven by power-efficiency! The transistors may not matter as much, but in a multi-core environment they add up. Using the quoted statement from AMD of "only 10% more transistors" means one could place 11 RISC cores in the same area for the same cost as 10 otherwise identical x86 cores. Johan said it perfectly with "the ISA is not a game changer, but it matters".

And you completely misunderstood him regarding the accelerators. Intel is producing "CPUs for everyone" and hence only providing few accelerators or special instructions. In the ARM ecosystem it's obvious that vendors are searching niches and are willing to provide custom solutions for it - hence the chance is far higher that they provide some accelerator which might be game-changing for some applications.

This doesn't mean the architecture has to rely entirely on them, neither does it mean they have to be undocumented. The accelerators do not even have to be faster than software solutions, as long as they're easy enough to work with and provide significant power savings. Intel is doing just that with special-purpose hardware in their own GPUs.

And don't act as if much would have changed in the Atom space ever since 22 nm Silvermont cores appeared. It doesn't matter if it's from 2013 or 2015 - it's all just the same core.
OreoCookie - Tuesday, December 16, 2014 - link
What's with all the unnecessary piss and vinegar?

All CPU vendors rely increasingly on specialized silicon, newer Intel CPUs feature special crypto instructions (AES-NI) and Quick Sync, for instance. Adding special purpose hardware to augment the system (in the past usually done for performance reasons) is quite old, just think of hardware RAID cards and video »accelerators« (which are not called GPUs). The reason that Intel doesn't add more and more of these is that they build general purpose CPUs which are not optimized for a specific workload (the article gives a few examples). In other environments (servers, mobile) the workload is much more clearly defined, and you can indeed take advantage of accelerators.

The biggest advantage of ARM cpus is flexibility -- the ARM ecosystem is built on the idea to tailor silicon to your demands. This is also a substantial reason why Intel's efforts in the mobile market have been lackluster. Recently, Synology announced a new professional NAS (the DS2015xs) which was ARM-based rather than Intel-based. Despite its slower CPU cores, the throughput of this thing is massive -- in part, because it sports two (!) 10 GBit ethernet ports out of the box. Vendors are looking for niches where ARM-based servers could gain a foothold, so they are trying a lot of things and see what sticks.
goop666666 - Saturday, December 20, 2014 - link
LOL! Most of the comments here like this one seem to be written by people who think computers should all be like gaming machines or something.

Here'a tip: no-one cares about "complexity," "standardization," "RISC," or anything else you mention. All they care about in the target market for ARM server chips is price, performance and power, and I mean ALL THREE.

On this Intel cannot compete. They sell wildly overpriced legacy hardware propped up by massive R&D expenditures and they're wedded to that model. The rest of the industry is wedded to the new and cheap model. Just like how the industry moved to mobile devices and Intel stood still, this change will also wash over Intel while they sit still in denial.

There's a reason why Intel stock has gone no-where for years.
nlasky - Monday, December 22, 2014 - link
Jan 8, 2010, Intel stock price $20.83. Dec 19, 2010, Intel stock price $36.37. If by gone no-where in for years you mean increased by 70% I guess you would be correct. Intel can't compete because they are wedded to their model? They have a profit margin of 20% and an operating margin of 27%. They could easily cut prices to compete with any ARM offerings. Servers have been around forever, unlike the mobile computing platform. Intel has an even larger stranglehold on this industry than ARM has in the mobile space. Here's a tip - stop spewing a bunch of uniformed nonsense just to make an argument.
nlasky - Monday, December 22, 2014 - link
*Dec 19, 2014

ARM Challenging Intel in the Server Market: An Overview

AMD Opteron A1100

Post Your Comment

78 Comments

View All Comments

hojnikb - Tuesday, December 16, 2014 - link

CajunArson - Tuesday, December 16, 2014 - link

hlmcompany - Tuesday, December 16, 2014 - link

Kevin G - Tuesday, December 16, 2014 - link

kepstin - Tuesday, December 16, 2014 - link

MrSpadge - Tuesday, December 16, 2014 - link

OreoCookie - Tuesday, December 16, 2014 - link

goop666666 - Saturday, December 20, 2014 - link

nlasky - Monday, December 22, 2014 - link

nlasky - Monday, December 22, 2014 - link

Log in

Don't have an account? Sign up now