First Impressions

Due to bad luck and timing issues we have not been able to test the latest Intel and AMD servers CPU in our most demanding workloads. However, the metrics we were able to perform shows that AMD is offering a product that pushes out Intel for performance and steals the show for performance-per-dollar.

For those with little time: at the high end with socketed x86 CPUs, AMD offers you up to 50 to 100% higher performance while offering a 40% lower price. Unless you go for the low end server CPUs, there is no contest: AMD offers much better performance for a much lower price than Intel, with more memory channels and over 2x the number of PCIe lanes. These are also PCIe 4.0 lanes. What if you want more than 2 TB of RAM in your dual socket server? The discount in favor of AMD just became 50%. 

We can only applaud this with enthusiasm as it empowers all the professionals who do not enjoy the same negotiating power as the Amazons, Azure and other large scale players of this world. Spend about $4k and you get 64 second generation EPYC cores. The 1P offerings offer even better deals to those with a tight budget.

So has AMD done the unthinkable? Beaten Intel by such a large margin that there is no contest? For now, based on our preliminary testing, that is the case. The launch of AMD's second generation EPYC processors is nothing short of historic, beating the competition by a large margin in almost every metric: performance, performance per watt and performance per dollar.  

Analysts in the industry have stated that AMD expects to double their share in the server market by Q2 2020, and there is every reason to believe that AMD will succeed. The AMD EPYC is an extremely attractive server platform with an unbeatable performance per dollar ratio. 

Intel's most likely immediate defense will be lowering their prices for a select number of important customers, which won't be made public. The company is also likely to showcase its 56-core Xeon Platinum 9200 series processors, which aren't socketed and only available from a limited number of vendors, and are listed without pricing so there's no firm determination on the value of those processors. Ultimately, if Intel wanted a core-for-core comparison here, we would have expected them to reach out and offer a Xeon 9200 system to test. That didn't happen. But keep an eye out on Intel's messaging in the next few months.

As you know, Ice lake is Intel's most promising response, and that chip will be available somewhere in the mid of 2020. Ice lake promises 18% higher IPC, eight instead of six memory channels and should be able to offer 56 or more cores in reasonable power envelope as it will use Intel's most advanced 10 nm process. The big question will be around the implementation of the design, if it uses chiplets, how the memory works, and the frequencies they can reach.

Overall, AMD has done a stellar job. The city may be built on seven hills, but Rome's 8x8-core chiplet design is a truly cultural phenomenon of the semiconductor industry.

We'll be revisiting more big data benchmarks through August and September, and hopefully have individual chip benchmark reviews coming soon. Stay tuned for those as and when we're able to acquire the other hardware.

Can't wait? Then read our interview with AMD's SVP and GM of the Datacenter and Embedded Solutions Group, Forrest Norrod, where we talk about Napes, Rome, Milan, and Genoa. It's all coming up EPYC.

An Interview with AMD’s Forrest Norrod: Naples, Rome, Milan, & Genoa

HPC: NAMD
POST A COMMENT

184 Comments

View All Comments

  • MarcusTaz - Wednesday, August 7, 2019 - link

    Another site's article that starts with an F stated that Rome runs hot and uses 1.4 volts, above TMSC recommended 1.3 volt. Did you need to run 1.4 volts for these tests? Reply
  • evernessince - Wednesday, August 7, 2019 - link

    Well 1st, that 1.3v figure is from TSMC's mobile focused 7nm LPP node. Zen 2 is made on the high performance 7nm node, not the mobile focused LPP. Whatever publication you read didn't do their homework. TSMC has not published information on their high performance node and I think it rather arrogant to give AMD an F based on an assumption. As if AMD engineers are stupid enough to put dangerous voltages through their CPUs that would result in a company sinking lawsuit. It makes zero sense.

    FYI all AMD 3000 series processors go up to 1.4v stock. Given that these are server processors, they will run hot. After all, more cores = more heat. It's the exact same situation for Intel server processors. The only difference here is that AMD is providing 50 - 100% more performance in the same or less power consumption at 40% less cost.
    Reply
  • DigitalFreak - Thursday, August 8, 2019 - link

    You reading Fudzilla? Reply
  • Kevin G - Wednesday, August 7, 2019 - link

    AMD is back. They have the performance crown again and have decided to lap the competition with what can be described as an embarrassing price/performance comparison to Intel. The only thing they need to do is be able to meet demand.

    One thing I wish they would have done is added quad socket support. Due to the topology necessary, intersocket bandwidth would be a concern at higher core counts but if you just need lots of memory, those low end 8 core chips would have been fine (think memcache or bulk NVMe storage).

    With the topology improvements, I also would have liked AMD to try something creative: a quad chip + low clocked/low voltage Vega 20 in the same package all linked together via Infinity Fabric. That would be something stunning for HPC compute. I do see AMD releasing some GPU in a server socket at some point for this market as things have been aligning in this direction for sometime.

    Supporting something like CCIX or OpenCAPI also would have been nice. A nod toward my previous point, even enabling Infinity Fabric to Vega 20 compute cards instead of PCIe 4.0 would have been yet another big step for AMD as that'd permit full coherency between the two chips without additional overhead.

    I think it would be foolish to ignore AVX-512 for Zen 3, even if the hardware they run it one continues to use 256 bit wide SIMD units. ISA parity is important even if they don't inherently show much of a performance gain (though considering the clock speed drops seen in Sky Lake-SP, if AMD could support AVX-512 at the clocks they're able to sustain at AVX2 on Zen 2, they might pull off an overall throughput win).

    With regards to Intel, they have Cooper Lake due later this year. If Intel was wise, they'd use that as a means to realign their pricing structure and ditch the memory capacity premium. Everything else Intel can do in the short term is flex their strong packaging techniques and push integrated accelerators: on package fabric, FPGA, Optane DIMMs etc. Intel can occupy several lucrative niches in important, growing fields with that they have in-house right now but they need to get them to market and at competitive prices. Otherwise it is AMD's game for the next 12 to 15 months until Ice Lake-SP arrives to bring back the competitive landscape. It isn't even certain that Intel can score a clean win either as Zen 3 based chips may start to arrive in the same time frame.
    Reply
  • bobdvb - Thursday, August 8, 2019 - link

    I think a four compute node, 2U, dual processor Epyc Rome combined with Mellanox ConnextX-6 VPI, should be quite frisky for HPC. Reply
  • JohanAnandtech - Sunday, August 11, 2019 - link

    "One thing I wish they would have done is added quad socket support. "
    Really? That is extremely small niche market with very demanding customers. Why would you expect AMD to put so much effort in an essentially dead end market?
    Reply
  • KingE - Wednesday, August 7, 2019 - link

    > While standalone compression and decompression are not real world benchmarks (at least as far as servers go), servers have to perform these tasks as part of a larger role (e.g. database compression, website optimization).

    Containerized apps are usually delivered via large, compressed filesystem layers. For latency sensitive-applications, e.g. scale-from-zero serverless, single- and lightly-threaded decompression performance is a larger-than-expected consideration.
    Reply
  • RSAUser - Thursday, August 8, 2019 - link

    Usually the decompression overhead is minimal there. Reply
  • KingE - Thursday, August 8, 2019 - link

    Sure, if you can amortize it over the life of a container, or can benefit from cached pulls. Otherwise, as is fairly common in an event-based 'serverless' architecture, it's a significant contributor to long-tail latency. Reply
  • Thud2 - Wednesday, August 7, 2019 - link

    Will socket-to-socket IF link bandwidth management allow for better dual GPU performance? Reply

Log in

Don't have an account? Sign up now