X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World

Name: X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
Item: X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World
Author: Johan De Gelas

by Johan De Gelas on March 9, 2015 2:00 PM EST

47 Comments | Add A Comment

47 Comments

Memory Subsystem: Latency

To measure latency, we use the open source TinyMemBench benchmark. The source was compiled for x86 with gcc 4.8.2 and optimization was set to "-O2". The measurement is described well by the manual of TinyMemBench:

Average time is measured for random memory accesses in the buffers of different sizes. The larger the buffer, the more significant the relative contributions of TLB, L1/L2 cache misses, and DRAM accesses become. All the numbers represent extra time, which needs to be added to L1 cache latency (4 cycles).

We tested with dual random read, as we wanted to see how the memory system coped with multiple read requests. To keep the graph readable we limited ourselves to the CPUs that were different.

The X-Gene's L2 cache offers slightly better latency than the Atom C2750. That is not surprising as the L2 cache is four times smaller: 256KB vs 1024KB. Still, considering Intel has a lot of experience in building very fast L2 caches and the fact that AMD was never able to match Intel's capabilities, AppliedMicro deserves kudos.

However, the L3 cache seems pretty mediocre: latency tripled and then quadrupled! We are measuring 11-15 cycle latency for the L2 (single read) to 50-80 cycles (single read, up to 100 cycles in dual read) for the L3. Of course, on the C2750 it gets much worse beyond the 1MB mark as that chip has no L3 cache. Still, such a slow L3 cache will hamper performance in quite a few situations. The reason for this is probably that X-Gene links the cores and L3 cache via a coherent network switch instead of a low-latency ring (Intel).

In contrast to the above SoCs, the smart prefetchers of the Xeon E3 keep the latency in check, even at high block sizes. The X-Gene SoC however has the slowest memory controller of the modern SoCs once we go off-chip. Only the old Atom "Saltwell" is slower, where latency is an absolute disaster once the L2 cache (512KB) is not able to deliver the right cachelines.

Bandwidth Single-Threaded Integer Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

47 Comments

View All Comments

gdansk - Monday, March 9, 2015 - link
xgene is not looking so great. Even if it is 50% more efficient as they promise they'll still be behind Atom.
Samus - Monday, March 9, 2015 - link
HP Moonshot chassis are still *drool*
Krysto - Monday, March 9, 2015 - link
The main problem with the non-Intel systems is not only that they use older processes compared to Intel, but that they use older processes even compared to the rest of the non-Intel chip industry. AMD is typically always behind 1 process node among non-Intel chip makers. If they'd at least use the cutting edge processes as they become available from non-Intel processes, maybe they'd stand a chance, especially now that the gap in process technologies is shrinking.
Samus - Monday, March 9, 2015 - link
AMD simply isn't as bad as people continually make them out to be. Yes, they're "behind" Intel but it's all in the approach. We are talking about two engineering houses that share nothing in common but a cross licensing agreement. AMD has very competitive CPU's to Intel's i5's for nearly half the price, but yes, they use more power (at times 1/3 more.)

But facts are facts: AMD is the second high-tech CPU manufacture in the world. Not Qualcomm, not Samsung. It's pretty obvious AMD engineering talent spreads more diversity than anyone other than Intel, and potentially superior to Intel on GPU design (although this has obviously been shifting over the years as Intel hires more "GPU talent.")

AMD in servers is a hard pill to swallow though. If purchasing based on price alone, it can be a compelling alternative, but for rack space or low-energy computing?
Taneli - Tuesday, March 10, 2015 - link
AMD doesn't even make it in top 10 semiconductor companies in sales. Qualcomm is three, Samsung semicondutors six and Intel almost ten times the size of AMD.

Outside of the gaming consoles they are being completely overrun by competition.
owan - Tuesday, March 10, 2015 - link
I'm sorry, at one point I was an AMD fanboy, back when they actually deserved it based on their products, but you just sound like an apologist. Facts are the facts, FX processors aren't competitive with i5's in performance or power or performance/$ because they get smacked so hard they can't be cheap enough to make up for it. Their CPU designs are woefully out of date, their APU's are bandwidth starved and use way too much power to be useful in the one place they'd be great (mobile), and their lagging process tech means theres not much better coming on the horizon. I don't want to see them go, but at the rate ARM is eating up general computing share, it won't be long before AMD becomes completely irrelevant. It will be Intel vs. ARM and AMD will be an afterthought.
xenol - Wednesday, March 11, 2015 - link
Qualcomm is used in pretty much used in most cell phones in the US to the point you'd think Qualcomm is the only SoC manufacturer. I'm pretty sure that's also how it looks in most of the other markets as Korea. Plus even if their SoCs aren't being used, they're modems are heavily used.

If anything, Qualcomm is bigger than AMD. Or rather, Qualcomm is the Intel of the SoC market.
xenol - Wednesday, March 11, 2015 - link
[Response to myself since I can't edit]
Qualcomm's next major competitor is Apple. But that's about it.

Also I meant to say other markets except Korea.
CajunArson - Monday, March 9, 2015 - link
Bear in mind that the Atom parts were commercially available in 2013, so they are by no means brand-new technology and the 14nm Atom upgrades will definitely help power efficiency even if raw performance doesn't jump a whole lot.

Anandtech is also a bit behind the curve because Intel is about to release Xeon-D (8 Broadwell cores and integrated I/O in a 45 watt TDP, or lower), which is designed for exactly this type of workload and is going to massively improve performance in the low-power envelope sphere:

http://techreport.com/review/27928/intel-xeon-d-br...
SarahKerrigan - Monday, March 9, 2015 - link
14nm server Atom isn't coming.

http://www.eetimes.com/document.asp?doc_id=1325955

"Atom will become a consumer only SoC."

X-Gene 1, Atom C2000 and Xeon E3: Exploring the Scale-Out Server World

Memory Subsystem: Latency

Post Your Comment

47 Comments

View All Comments

gdansk - Monday, March 9, 2015 - link

Samus - Monday, March 9, 2015 - link

Krysto - Monday, March 9, 2015 - link

Samus - Monday, March 9, 2015 - link

Taneli - Tuesday, March 10, 2015 - link

owan - Tuesday, March 10, 2015 - link

xenol - Wednesday, March 11, 2015 - link

xenol - Wednesday, March 11, 2015 - link

CajunArson - Monday, March 9, 2015 - link

SarahKerrigan - Monday, March 9, 2015 - link

Log in

Don't have an account? Sign up now