Analyzing Intel's Cascade Lake in the New Era of AI

Wrapping things up, let’s take stock of the second-generation Xeon Scalable’s performance, and what it brings to the table in terms of features. With Cascade Lake Intel has improved performance by 3 to 6%, improved security, fixed some incredibly important bugs/exploits, added some SIMD instructions, and improved the overall server platform. This is nothing earth shattering, but you get more for the same price and power envelope, so what’s not to like? 

That would be fine 5 years ago, when AMD did not have anything like the Zen(2) architecture, ARM vendors were still struggling with cores that offered painfully slow single-threaded performance, and deep learning was in the early stages. But this is not 2014, when Intel outperformed the nearest competition by a factor 3! Ultimately Cascade Lake delivers in areas where CPUs – and only CPUs – do well. But even with Intel’s DL Boost efforts, it’s not enough if the new chips have to go head-to-head with a GPU in a task the latter doesn’t completely suck at.

The reality is that Intel's datacenter group is under tremendous pressure from all sides, and the numbers are showing it. For the first time in years, the datacenter experienced a revenue drop, despite the fact that the overall server market is growing. 

It is been going on for a while, but as we’ve experienced firsthand, machine learning-based AI applications are being rolled out successfully, and they are a game changer for both software and hardware. As a result, future server CPU reviews will never quite be the same: it is not Intel versus AMD or even ARM anymore, but NVIDIA too. NVIDIA is extremely successful in the deep learning market, and they are confident enough to take on Intel in areas where Intel dominated for years: HPC, machine learning, and even data processing. NVIDIA is ready to accelerate a much larger part of the data pipeline and a wider range of AI applications.

Features found in Intel Cascade Lake like DL Boost (VNNI) are the first attempts by Intel to push back – to cut away at the massive advantage that NVIDIA has in inference performance. Meanwhile, the next Xeon – Cooper Lake – will try to get closer to NVIDIA in training performance. 

Moving on, when we saw this slide, we were gasping for air. 

This slide boasting "leadership performance" also conveniently describes the markets where Intel is a very vulnerable position, despite Intel's current dominant position in the datacenter. Although the slide focuses on the Intel Xeon 9200, this could easily be a slide for the high-end Platinum 8200 Xeons too. 

Intel points towards HPC, AI, and high-density infrastructure to sell their massively expensive Xeons. But as the market shifts towards less traditional business intelligence and more machine learning, and more GPU-accelerated HPC, the market for high end Xeons is shrinking. Intel has a very broad AI portfolio from Movidius (edge inference) to Nervana NNP (ASIC for DL training), and they’re going to need it to replace the Xeon in those segments where it falls out.

A midrange Xeon combined with a Nervana NNP coprocessor might work out well – and it would definitely be a better solution for most AI applications than a Xeon 9200. And the same is true for HPC: we are willing to bet that you are much better off with midrange Xeons and a fast NVIDIA GPU. And depending on where AMD's EPYC 2 pricing goes, even that might end up being debatable...

Exploring Parallel HPC
Comments Locked

56 Comments

View All Comments

  • ballsystemlord - Saturday, August 3, 2019 - link

    Spelling and grammar errors:

    "But it will have a impact on total energy consumption, which we will discuss."
    "An" not "a":
    "But it will have an impact on total energy consumption, which we will discuss."

    "We our newest servers into virtual clusters to make better use of all those core."
    Missing "s" and missing word. I guessed "combine".
    "We combine our newest servers into virtual clusters to make better use of all those cores."

    "For reasons unknown to us, we could get our 2.7 GHz 8280 to perform much better than the 2.1 GHz Xeon 8176."
    The 8280 is only slightly faster in the table than the 8176. It is the 8180 that is missing from the table.

    "However, since my group is mostly using TensorFlow as a deep learning framework, we tend to with stick with it."
    Excess "with":
    "However, since my group is mostly using TensorFlow as a deep learning framework, we tend to stick with it."

    "It has been observed that using a larger batch can causes significant degradation in the quality of the model,..."
    Remove plural form:
    "It has been observed that using a larger batch can cause significant degradation in the quality of the model,..."

    "...but in many applications a loss of even a few percent is a significant."
    Excess "a":
    "...but in many applications a loss of even a few percent is significant."

    "LSTM however come with the disadvantage that they are a lot more bandwidth intensive."
    Add an "s":
    "LSTMs however come with the disadvantage that they are a lot more bandwidth intensive."

    "LSTMs exhibit quite inefficient memory access pattern when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth."
    "pattern" should be plural because "LSTMs" is plural, I choose an "s":
    "LSTMs exhibit quite inefficient memory access patterns when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth."

    "Of course, you have the make the most of the available AVX/AVX2/AVX512 SIMD power."
    "to" not "the":
    "Of course, you have to make the most of the available AVX/AVX2/AVX512 SIMD power."

    "Also, this another data point that proves that CNNs might be one of the best use cases for GPUs."
    Missing "is":
    "Also, this is another data point that proves that CNNs might be one of the best use cases for GPUs."

    "From a high-level workflow perfspective,..."
    A joke, or a misspelling?

    "... it's not enough if the new chips have to go head-to-head with a GPU in a task the latter doesn't completely suck at."
    Traditionally, AT has had no language.
    "... it's not enough if the new chips have to go head-to-head with a GPU in a task the latter is good at."

    "It is been going on for a while,..."
    "has" not "is":
    "It has been going on for a while,..."
  • ballsystemlord - Saturday, August 3, 2019 - link

    Thanks for the cool article!
  • tmnvnbl - Tuesday, August 6, 2019 - link

    Great read, especially liked the background and perspective next to the benchmark details
  • dusk007 - Tuesday, August 6, 2019 - link

    Great Article.
    I wouldn't call Apache Arrow a database though. It is a data format more akin to a file format like csv or parquet. It is not something that stores data for you and gives it to you. It is the how to store data in memory. Like CSV or Parquet are a "how to" store data in Files. More efficient less redundancy less overhead when access from different runtimes (Tensorflow, Spark, Pandas,..).

    Love the article, I hope we get more of those. Also that huge performance optimizations are possible in this field just in software. Often renting compute in the cloud is cheaper than the man hours required to optimize though.
  • Emrickjack - Thursday, August 8, 2019 - link

    Johan's new piece in 14 months! Looking forward to your Rome review
  • Emrickjack - Thursday, August 8, 2019 - link

    It More Information http://americanexpressconfirmcard.club/

Log in

Don't have an account? Sign up now