We only recently reported on the story that Amazon are designing a custom server SoC based on Arm’s Neoverse N1 CPU platforms, only for Amazon to now officially announce the new Graviton2 processor as well as AWS instances based on the new hardware.


AWS Re:Invent Event Twitter

The new Graviton2 SoC is a custom design by Amazon’s own in-house silicon design teams and is a successor to the first-generation Graviton chip. The new chip quadruples the core count from 16 cores to 64 cores and employs Arm’s newest Neoverse N1 cores. Amazon is using the highest performance configuration available, with 1MB L2 caches per core, with all 64 cores connected by a mesh fabric supporting 2TB/s aggregate bandwidth as well as integrating 32MB of L3 cache.  

Amazon claims the new Graviton2 chip is can deliver up to 7x higher performance than the first generation based A1 instances in total across all cores, up to 2x the performance per core, and delivers memory access speed of up to 5x compared to its predecessor. The chip comes in at a massive 30B transistors on a 7nm manufacturing node - if Amazon is using similar high density libraries to mobile chips (they have no reason to use HPC libraries), then I estimate the chip to fall around 300-350mm² if I was forced to put out a figure.

The memory subsystem of the new chip is supported by 8 DDR4-3200 channels with support for hardware AES256 memory encryption. Peripherals of the system are supported by 64 PCIe4 lanes.

Powered by the new generation processor, Amazon also detailed its new 6th generation instances M6g, R6g and C6g, offering various configuration up to the full 64 cores of the chip and up to 512GB of RAM for the memory optimised instance variants. 25Gbps “enhanced networking” connectivity, as well as 18Gbps bandwidth to EBS (Elastic Block Storage).

Amazon is also making some very impressive benchmark comparisons against its fifth-generation instances, supporting Intel Xeon Platinum 8175 processor of up to 2.5GHz:

  • All of these performance enhancements come together to give these new instances a significant performance benefit over the 5th generation (M5, C5, R5) of EC2 instances. Our initial benchmarks show the following per-vCPU performance improvements over the M5 instances:
  • SPECjvm® 2008: +43% (estimated)
  • SPEC CPU® 2017 integer: +44% (estimated)
  • SPEC CPU 2017 floating point: +24% (estimated)
  • HTTPS load balancing with Nginx: +24%
  • Memcached: +43% performance, at lower latency
  • X.264 video encoding: +26%
  • EDA simulation with Cadence Xcellium: +54%

Amazon is making M6g instances with the new Graviton2 processor available for CPU for non-production workloads, with expected wider rollout in 2020.

The announcement is a big win for Amazon and especially for Arm’s endeavours in the server space as they try to surpass the value that the x86 incumbents are able to offer. Amazon describes that the new 6g instances are able to offer 40% higher performance/$ than the existing x86 5th generation platforms, which represents some drastic cost savings for the company and its customers.

Related Reading:

POST A COMMENT

43 Comments

View All Comments

  • Antony Newman - Tuesday, December 3, 2019 - link

    Ultimate multicore performance for single SoC x86 is being limited by dark silicon on Intel 14nm.
    For a 64 Core Intel monster - they need their (Intel) 7nm process - or a multi SoC solution.

    When TSMC’s 5nm ovens are ready is ready - Amazon will be able to ARMs next Cores that will close the per Core performance gap - but allow considerably more cores before bottlenecking occurs,

    A 128 Core Arm Poseidon SoC on TSMC 5nm could very well eclipse a 64 Core Intel CPU bakes on Intel 7nm - but cost Amazon a fraction of the cost.

    AJ
    Reply
  • mdriftmeyer - Wednesday, December 4, 2019 - link

    When TSMC's 5nm is ready AMD's future Zen cores will curb stomp anything ARM can offer, like they already do.

    Language is a funny thing, ``New Generation of ARM-based instances powered by AWS Graviton2 processors offer 40% better price/performance than current x86-based instances.''

    A. That's 40% over previous Graviton processor nodes. BFD.
    B. Our upcoming x86-based instances drastically knee cap our current x86-based instances in price/performance but we won't say that as we're trying to sell our own schtick here.
    Reply
  • Gondalf - Friday, December 6, 2019 - link

    TSMC 5nm do not give such area advantages over 7nm to allow Poseidon. 5nm is more like an half node. Reply
  • mode_13h - Tuesday, December 3, 2019 - link

    Just to nit pick, Purley is Intel's LGA 3647-based platform spec - not the core uArch or anything like that. Reply
  • techbug - Friday, December 6, 2019 - link

    how per vCPU calculated is totally over my head. Is it total-score on intel processor divided by the number of hardware thread (96, 2 * 48 threads/socket) compared against ARM processor score divided by (128, 2* 64threads/socket) ? Reply
  • name99 - Tuesday, December 3, 2019 - link

    How many people buying AWS services care about latency rather than throughput?
    Sure, you need to hit a minimum per-core performance level, but once that's achieved what matters is the throughput/dollar (including eg rack volume and watts).

    Judging a design like this by metrics appropriate to the desktop is just silly.
    Reply
  • ksec - Wednesday, December 4, 2019 - link

    It doesn't matter, you get 1 thread on Intel vCPU, you get 1 Core per ARM vCPU . The unit are the same. Not to mention a lot of Clients and Workload likes to have HT disabled.

    As long as the 1 ARM vCPU is cheaper, ( which it is ), and provides comparable performance ( which it does, according to AWS it is 30% faster than a Single Thread 3.1Ghz Skylake ) then it is all that matters.
    Reply
  • Sychonut - Tuesday, December 3, 2019 - link

    Now imagine this, but on 14++++++. Reply
  • name99 - Tuesday, December 3, 2019 - link

    The numbers seem a bit strange, Andrei. I assume we all agree that, while this is a nice step forward in the ARM server space, the individual cores are no Lightning's.
    So let's look at area; TSMC 7nm so basically like with like:

    IF one chip has 32 cores (per yesterday's article) then one core (+support ie L3 etc) is ~10mm^2.
    Meanwhile Apple is about 16mm^2 (eyeballing it as about 1/6th of the die for 2 large+small cores,+ L2s + large system cache).
    So Apple seems to be getting a LOT more out of their die... Even put aside the small cores and their per big core (+LOTS of cache) is ~8mm^2.

    Of course DRAM PHYs take some space, but mainly around the edges.
    So possibilities?
    - 64 cores on the die, not 32? AND/OR
    - LOTS of IO? A few ethernet phy's, some flash controllers, some USB and PCIe?
    - lost of the die devoted to GPU/NPU?

    The only way I can square it is likely all three are true. Half the die is IO+GPU/NPU (which gets us to 5mm^2/core) AND there are actually 64 cores? WikiChip says an N1+L2 is supposed to be around 1.4mm^2 on 7nm, so throw in L3 and the numbers kinda work out.
    Reply
  • ksec - Wednesday, December 4, 2019 - link

    They are 32 Core, not 64

    I/O takes up more space and does not scale well with node changes. Yes. There are lot of I/O needs for Server, especially PCI-E lanes.
    Reply

Log in

Don't have an account? Sign up now