SPEC - MT Performance (16xlarge 64vCPU)

While the core scaling figures are interesting from an academical standpoint, what’s even more interesting is seeing the absolute throughput numbers compared to the competition. We’re starting off with SPECrate results with 64-rate runs, fully utilising the vCPUs of the EC2 16xlarge instances.

Again, there’s the conundrum of the apples-and-oranges comparison between the Graviton2’s 64 physical cores versus the 32 cores plus SMT setups of the AMD and Intel platforms, but again, that’s how Amazon is positioning these systems in terms of throughput capacity and instance pricing. You could argue that if you can parallelise your workload above a certain amount of threads, it doesn’t matter on whether you can achieve the higher throughput through more cores or through mechanisms such as SMT. Remember, when talking about silicon die area, you could at minimum probably fit 2 N1 cores in the same area than an AMD Zen core or an Intel core (probably an even higher number in the latter comparison).

SPECint2006 Rate Estimated Scores (64 vCPU)

The Graviton2’s performance is absolutely impressive across the board, beating the Intel Cascade Lake system by quite larger margins in a lot of the workloads. AMD’s Epyc system here doesn’t fare well at all and is showing its age.

SPECfp2006(C/C++) Rate Estimated Scores (64 vCPU)

It’s particularly in the non-memory bound workloads that the Graviton2 manages to position itself significantly ahead, and here the advantage of having a two-fold physical core lead with essentially double the execution resources shows its benefits.

SPEC2006 Rate-64 Estimated Total (16xlarge)

In the overall SPECrate2006 results, the Graviton2 is shy of Arm’s projection of a 1300 score, but again the Amazon chip does clock in a bit lower and has less cache than what Arm had envisioned in their presentations a year ago.

Nevertheless, the Graviton2 has the performance lead here even against the Intel Cascade Lake based EC2 instances, which is quite surprising given the latter’s cost structure, and indicator of what to come later in the cost analysis.

SPECint2017 Rate Estimated Scores (64 vCPU)

Arm’s physical core count advantage here continues to show in the execution intensive workloads of SPECint2017, showcasing some very large performance leads in many workloads. The performance leap on important workloads such as 502.gcc again isn’t too great over the Intel system for example – Amazon and Arm definitely could do better here if the chip would have had more cache available.

SPECfp2017 Rate Estimated Scores (64 vCPU) (copy)

In SPECfp2017, there’s more workloads in which the Xeon system’s 2-socket setup with a 50% memory channel advantage does show up, able to result in more available bandwidth and thus give the more memory intensive workloads in this suite a good performance advantage over the Graviton2 system. Still, the Arm chip fares very competitively and does put the older AMD EPYC processor in its place, and yes again, we have to remind ourselves that things would be quite different here if we’d be able to include Rome in our charts.

SPEC2017 Rate-64 Estimated Total (16xlarge)

Overall, the Graviton2 system has an undisputed lead in the SPECint2017 suite, whilst just edging out on average the Xeon system in the FP suite, only losing out in situations where the Xeon’s higher memory bandwidth comes at play.

SPEC - Multi-Core Performance Scaling SPEC - MT Performance (4xlarge 16 vCPU)
Comments Locked

96 Comments

View All Comments

  • notladca - Tuesday, March 10, 2020 - link

    I would love to know if the product line has split within Annapurna. In other words whether Graviton2 has, like previous Annapurna SoCs, some interesting support around storage and networking for use in future Nitro. It's possible Amazon has some behind the scenes work going on with CCIX for future machines. For example integrating their Inferentia chip more closely with the SoC.

    Given the core count, it'd also be interesting to compare ML inference acceleration via fp16 and int8 dot product instructions per core vs use of GPU or Inferentia.
  • coder543 - Tuesday, March 10, 2020 - link

    One small bit of feedback: with that CPU topology chart, the coloration seems a little off. A difference of +/- 1 yields very different shades of red and orange, but the same difference on the green side of the spectrum yields no discernible difference in color? Personally, I think all of the 200 +/- 5 values in the first topology chart should be an almost uniform sea of orange/red. The important thing is the 150 difference in latency, not the +/- 1 latency, and the noise in the colors distracts the reader from the primary distinction. A lower signal to noise ratio.

    Also: what is the unit? nanoseconds? microseconds? milliseconds? I can’t figure it out, and it’s not labeled as far as I can tell.
  • Andrei Frumusanu - Tuesday, March 10, 2020 - link

    Nanoseconds, I'll add a remark.
  • sing_electric - Tuesday, March 10, 2020 - link

    My tin hat is telling me to be suspicious of Amazon's pricing here. When shopping for cloud computing, perf/$ becomes VERY alluring, but I have to wonder if Amazon is willing to let its Gravitron servers be a "loss leader," artificially lowering prices to get market share until Arm on server is well-established - before then raising prices to something closer to a economically sustainable number.
  • Andrei Frumusanu - Tuesday, March 10, 2020 - link

    Vertical integration is powerful. Amazon can share profits and margins division wide, not having to pay overhead to AMD/Intel.
  • sing_electric - Tuesday, March 10, 2020 - link

    True, but then Amazon has to pay for the ARM license and 100% of the development/production costs. I would be very surprised if they managed to *make money* on the 1st couple Graviton generations (especially if you factor in having to buy Annapurna), since you'd need to say "of the $X generated by Graviton metal, $Y would have been spent on EC2 anyways, meaning $Z is our actual gain," and that's... probably too much to ask at this stage.
  • rahvin - Tuesday, March 10, 2020 - link

    The costs you mention are nothing compared to what they pay right now with Intel or AMD with they 50% margins on top of the actual cost. IMO this initiative was born out of Intel's price increases from 2010 to now. By vertically integrating they have full control over the price structure and they have very good data on what kind of workloads are running so they can tailor the design.

    IMO it was just a question of time until Amazon tried to vertically integrate this like they've done with shipping and lots of other stuff. Bezos is following the Robber Barron growth model.
  • dotjaz - Wednesday, March 11, 2020 - link

    Huh? AMD has a gross margin of 40%, true. But keep in mind AWS has a operating margin of 30%, that mean AWS has a even higher gross margin than AMD, comparable to AMD's server department.
    Do you know what that means? For $1 of expenditure in to chip manufacturing, AWS expects to earn as much as AMD does. And since AWS don't have the volume as far as chip goes, their gross margin for chip investment will be lower, therefore not worth the investment if the decision is purely financial.

    But yes, the other point stands, AWS have better control of costing (with more leverage as well) and performance.
  • Wilco1 - Wednesday, March 11, 2020 - link

    For every $1 worth of silicon you could pay AMD $1.50, pay Intel $2 or pay TSMC $1 plus $0.20 internal development costs. Which works out best you think?
  • extide - Friday, March 13, 2020 - link

    It's not that simple. AMD and Intel can spread those development costs over vastly more processors. I mean we'll never know how it truly breaks down -- but I'd imagine Amazon has figure this all out and this will be pretty profitable for them.

Log in

Don't have an account? Sign up now