The eMAG 8180: AppliedMicro's Legacy Skylark Core

While you’re reading this in 2020, and the eMAG Workstation had been released in 2019 – the CPU powering the system is actually quite ancient, tracing back its roots in the 2017 defunct AppliedMicro. Originally meant to be called the X-Gene3, the chip had originally been planned for the second half of 2017 before the AppliedMicro had went through several changes of ownership before the IP and designs ended up with Ampere Computing.

In that sense, the eMAG 8180 is more of a legacy design and quite distantly related to Ampere’s newer Altra system processors.

The Skylark cores in the eMAG 8180 are a custom core design having the X-Gene processor pedigree. It’s a 4-wide OOO processor that’s relatively narrow by today’s standards, characterised by quite high operating frequencies up to 3-3.3GHz and quite the unusual cache hierarchy, such as two core pairs sharing the same 256KB L2 cache.

On a chip-level, the CPU is characterised by having a large coherent network tying all the CPU modules, the memory controllers, and a big large 32MB L3 cache together.

What’s surprising here is that the core-to-core latency across the whole chip isn’t bad at all, ranging from 68-73ns. While this certainly doesn’t keep up with more recent monolithic designs, this is an Arm v8.0 core lacking CAS atomic operations – so the above figures are done via regular sequential exclusive load / exclusive stores which aren’t as fast. The coherency here going over the 32MB L3 cache certainly helps the system punch above its weight for a design of its time.

The CPU cores have 32KB L1 instruction and data caches – the access latencies here are 5 cycles. The 256KB L2 caches has a 13-cycle access latency, while the 32LB L3 cache has some massive 45ns+ access latencies that are much slower than any other comparable design out there.

We note the core’s L1 TLB ends at 48 pages (192KB) and the L2 TLB at 1024 pages (4MB), after which page-miss access times increasingly result in worse latencies.

In contrast with the quite large cache access latencies, the DRAM access latency isn’t all that bad at around 137ns full random at 128MB depth.

Single-core bandwidth of the Skylark cores isn’t too pretty, load and store bandwidth into the L1 and L2 seem to be limited at 8B/cycle and a combined 16B/cycle for concurrent load & stores. The dip between the L2 and L3 is usually a showcase of a bandwidth bottleneck when evicting/replacing a cacheline, and the load bandwidth at the DRAM level is also quite disappointing.

Overall, the performance here is only half of a more modern Arm core, but again, this is a 2015-2016 core design.

An Arm SBSA System SPEC2017: Weak ST Performance
POST A COMMENT

35 Comments

View All Comments

  • vFunct - Friday, May 22, 2020 - link

    They really need ARM systems that are a little higher than Raspberry-PI but a little lower than x86, perhaps in the $100-$200 price range, for personal network appliances. Reply
  • Death666Angel - Friday, May 22, 2020 - link

    I'd be interested in what you would use that one for? And why exactly those specs? Lower power than x86 at "good enough" performance levels? If that is the base, why not do an undervolted / down clocked x86 build? Ryzen can get to some pretty great voltage/frequency levels. :D Or is it the ATX form factor as well? That one is a bit trickier, either go with a 12/19V native motherboard or get a nice pico PSU with ATX cables and a 12/19V input. :) Unless I'm way off base in my assumptions. :D Reply
  • lmcd - Friday, May 22, 2020 - link

    I haven't used an RPi 4 yet but I'd be willing to bet the 4GB variant would meet vFunct's needs. Reply
  • vFunct - Friday, May 22, 2020 - link

    Network file server with ZFS, Or, a mail server.

    Need storage & memory, but don't need intense CPU
    Reply
  • vFunct - Friday, May 22, 2020 - link

    Network file server with ZFS, Or, a mail server.

    Need storage & memory, but don't need intense CPU.
    Reply
  • Wilco1 - Friday, May 22, 2020 - link

    It's worth pointing out for future reviews that GCC 10 is out and shows a 10.5% performance gain on Neoverse N1: https://community.arm.com/developer/tools-software... Reply
  • SarahKerrigan - Friday, May 22, 2020 - link

    Doesn't mean much for eMag, though. Reply
  • Wilco1 - Saturday, May 23, 2020 - link

    Indeed, eMag is quite old, so it won't benefit nearly as much as the latest microarchitectures. Reply
  • GreenReaper - Sunday, May 24, 2020 - link

    The graph at the end suggests that 10.0 was a significant regression for many tests, though, so that should probably be taken with a pinch of salt. <^_^>

    There are some tests (mostly vectorization-related) where it's really helped, though.
    Reply
  • mrvco - Friday, May 22, 2020 - link

    Out of curiosity, how would the performance of the eMag compare to a typical single-board ARM computer? My reference point would be the RPi3 or 4, but there seem to be a variety of others ranging up to a couple hundred dollars with (allegedly) 'better' performance than the RPi. Reply

Log in

Don't have an account? Sign up now