Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Name: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Item: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Author: Johan De Gelas

by Johan De Gelas on July 29, 2019 8:30 AM EST

56 Comments | Add A Comment

56 Comments

Testing Notes

As the market stands, it is clear that alongside AMD and ARM, NVIDIA's professional offerings are a real threat to Intel's dominance in the datacenter and beyond. So for our testing today, we're going to focus on machine learning, and see just how Intel's new DL Boosted wares fare against the competition in the ML space.

On the Intel side of matters, of course, we're looking at the company's new Cascade Lake Xeon Scalable CPUs. The company provided two of their 28 core models, with the 165 Watt Xeon Platinum 8176, as well as the even faster 205 Watt Xeon Platinum 8280.

As for Cascade Lake's GPU competition, we've tapped NVIDIA's latest "Turing" Titan RTX card. While these aren't truly datacenter cards, the fact that they're based Turing means that they offer NVIDIA's very latest features. At the university that I work for, our deep learning researchers use these GPUs for training AI models as the Titan cards are affordable and have a lot of GPU memory available.

As an added bonus, Titan RTX cards can be used for both training (Hybrid FP32/16) as inference (FP16 and INT8). The current Tesla is still based on NVIDIA's Volta architecture, which does not have INT8 available for inference.

Finally, not to be excluded, we've also included AMD's first-generation EPYC platform in all of our testing. AMD doesn't have a hardware strategy quite like Intel – or specific instructions like VNNI – but as of late the company has offered all sorts of surprises.

Benchmark Configuration and Methodology

All of our testing was conducted on Ubuntu Server 18.04 LTS. You will notice that the DRAM capacity varies among our server configurations. This is of course a result of the fact that Xeons have access to six memory channels while EPYC CPUs have eight channels. As far as we know, all of our tests fit in 128 GB, so DRAM capacity should not have much influence on performance. But it will have a impact on total energy consumption, which we will discuss.

Last but not least, we want to note how the performance graphs have been color-coded. Orange is AMD's EPYC, dark blue is Intel's best (Cascade Lake/Skylake-SP), and light blue is the previous generation Xeons (Xeon E5-v4) . Gray has been used for the soon-to-be-replaced Xeon v1.

Intel's Xeon "Purley" Server – S2P2SY3Q (2U Chassis)

CPU	Two Intel Xeon Platinum 8280 (2.7 GHz, 28c, 38.5MB L3, 205W) Two Intel Xeon Platinum 8176 (2.1 GHz, 28c, 38.5MB L3, 165W)
RAM	384 GB (12x32 GB) Hynix DDR4-2666
Internal Disks	SAMSUNG MZ7LM240 (bootdisk) Intel SSD3710 800 GB (data)
Motherboard	Intel S2600WF (Wolf Pass baseboard)
Chipset	Intel Wellsburg B0
PSU	1100W PSU (80+ Platinum)

We enabled hyper-threading and Intel virtualization acceleration.

Xeon - NVIDIA Titan RTX Workstation

With some diplomacy, our AI researcher Pieter Bovijn at MCT was so kind to test his deep learning workstation. Below you can find the specs.

CPU	Intel Xeon Gold 6152 (2.1 GHz, 22c, 30.25MB L3, 140W)
RAM	192 GB (6x32 GB) Samsung DDR4-2666
Internal Disks	SAMSUNG MZ7LM240 (bootdisk) Intel SSD3710 800 GB (data)
Motherboard	Supermicro SYS-7049A-T (Intel C621 chipset)
GPU	PNY TITAN RTX 24 GB GDDR6
PSU	PWS-865-PQ

This is the only server in the test with a discrete GPU.

AMD EPYC 7601 – (2U Chassis)

CPU	Two EPYC 7601 (2.2 GHz, 32c, 8x8MB L3, 180W)
RAM	512 GB (16x32 GB) Samsung DDR4-2666 @2400
Internal Disks	SAMSUNG MZ7LM240 (bootdisk) Intel SSD3710 800 GB (data)
Motherboard	AMD Speedway
PSU	1100W PSU (80+ Platinum)

Other Notes

Both servers are fed by a standard European 230V (16 Amps max.) power line. The room temperature is monitored and kept at 23°C by our Airwell CRACs.

Who Will Win the Next Enterprise Market? CPU Performance: Intel's Own Claims

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

56 Comments

View All Comments

Gondalf - Tuesday, July 30, 2019 - link
Kudos to the article from a technical point of view :), a little less for the weak analysis of the server market. Johan say that Intel is slowing down in server but the server market is growing fast.
Unfortunately it is not: Q1 this year was the worst quarter of server market in 8 quarters with a grow of only 1%. Q2 will be likely on a negative trend, moreover there is a general consensus that 2019 will be a negative year with a drop in global revenue.
So there recent Intel drop is consistent with a drop of the demand in China in Q2.

To be underlined that a GPU has to be piloted and every GPU like Tesla is up, there is a one or two Xeons on the motherboard.
GPU is only an accelerator, but without a cpu is useless. Intel slides about upcoming threat from competitors are related to the existence of AMD in HPC , IBM and some sparse ARM based SKUs for custom applications.
A GPU is welcomed, it helps to sell more Xeons.
eastcoast_pete - Tuesday, July 30, 2019 - link
More a question than anything else: What is the state of AI-related computing on AMD (graphics) hardware? I know NVIDIA is very dominant, but is it mainly due to an existing software ecosystem?
BenSkywalker - Wednesday, July 31, 2019 - link
AMD has two major hurdles to overcome when specifically looking at AI/ML on GPUs, essentially non existent software support and essentially non existent hardware support. AMD has chosen the route of focusing on general purpose cores that can perform solidly on a variety of traditional tasks both in hardware and software. AI/ML benefit enormously from specialized hardware that in turn takes specialized software to utilize.

This entire article is stacking up $40k worth of Intel CPUs against a consumer nVidia part and Intel gets crushed whenever nVidia can use it's specialized hardware. Throw a few Tesla V100s in to give us something resembling price parity and Intel would be eviscerated.

AMD needs tensor cores, a decade worth of tools development, and a decade worth of pipeline development(university training, integration into new systems and build out on to those systems, not hardware pipeline) in order to get where nVidia is now if they were standing still.

The software ecosystem is the biggest problem long term, everyone working in the field uses CUDA whenever they can, even if AMD mopped the floor with nVidia on the hardware side, for their GPUs to get traction they would need all the development tools nVidia has spent a decade building, but right now their GPUs are throttled by nVidia because of specialized hardware.
abufrejoval - Tuesday, July 30, 2019 - link
Some telepathy must be involved: Just a day or two before this appeared online, I was looking for Johan de Gelas' last appearance on AT in 2018 and thinking that it was high time for one of my favorite authors to publish something. Ever so glad you came out with the typical depth, quality and relevance!

While GAFA and BATX seem to lead AI and the frameworks, their problems and solutions mostly fit their needs and as it turns out the vastest number of use cases cannot afford the depth and quality they require, nor do they benefit from it, either: If the responsibility of your AI is to monitor for broken drill bits from vibration, sound, normal and thermal visuals, the ability to identify cats in every shape and color has no benefit.

The big guys typically need to solve a sharply defined problem in a signle domain at a very high quality: They don't combine visual with audio and the inherent context in time-series video is actually ignored, as their AIs stare at each frame independently, hunting for known faces or things to tag and correlate social graphs and products.

Iterating over ML approaches, NN designs and adequate hyperparameters for training requires months even with clusters of DGX workstations and highly experience ML experts. What makes all that effort worthwhile is that the inference part can then run at relatively low power on your mobile phone inside WeChat, Facebook, Instagram, Google keyboard/translate (or some other "innocent" background app) at billions of instances: Trial and train until you have trained the single sufficiently good network design in days, weeks or even months and then you can deploy inference to billions of devices on battery power.

Few of us smaller IT companies can replicate that, but again, few of us need to, because we have a vastly higher number of small problems to solve and with a few orders of magnitude less of a difference in training:inference efforts: 1Watt of difference makes or brakes the usability of inference model on mobile target devices, 100 Watts of difference in a couple of servers running a dozen instances of a less optimized and well trained model won't justify an ML-expert team working through another five pizzas.

As the complexity of your approach (e.g. XGBoost or RF) is perhaps much smaller or your network are much simpler than those of GAFA/BATX you actually worry about how to scale-in not out and batch dozens of training for model iteration and mix that with some QA or even production inference streams on GPUs which Linux understands or treats little better than a printer with DMA.

Intel quite simply understands that while you get famous with the results you get from training AIs e.g. on GPUs, the money is made from inference at the lowest power and lowest operational overhead: Linux (or Unix for that matter), knows how to manage virtual memory (preferably uniform) and CPUs (preferably few); a memory hierarchy deeper than the manual for your VCR and more types and numbers of cores than Unics first hard disk had in blocks, confuse it.

But I'd dare say that AMD understood it much longer and much better. When they came up with the HSA on their first APUs, this GPGPU blend, which allowed switching the compute model with a function call makes CUDA look very brutish indeed.

Writing code able to take full advantage of these GPGPU capabilites is still a nightmare, because high-level languages have abstraction levels far too low for what these APUs or VNNI CPUs can execute in a single clock cycle, but from the way I read it, the Infinity Fabric is about making those barriers as low as they can possibly be in terms of hardware and memory space.

And RISC-V goes beyond what all x86 advocates still suffer from: An instruction set that's not designed for modular expandability.
FunBunny2 - Wednesday, July 31, 2019 - link
"Trial and train until you have trained the single sufficiently good network design in days, weeks or even months and then you can deploy inference to billions of devices on battery power."

when and if this capability is used for something useful, e.g. cure for cancer, rather than yet another scheme to extract moolah from rubes. then I'll be interested.
keg504 - Tuesday, July 30, 2019 - link
Why do you say on the testing page that AMD is colour coded in orange, and then put them in grey?
808Hilo - Wednesday, July 31, 2019 - link
Client/server renamed again...
There is no AI. That stuff is very very dumb. look at the diagramm above. Nothing new. Data, script does something, parsing and readout of vastly unimportant info. I have not seen a single meaningful AI app. Its now year 25 of the Internet and I am terribly bored. Next please.
J7SC_Orion - Wednesday, July 31, 2019 - link
This explains very nicely why Intel has been raiding GPU staff and pouring resources into Xe Discrete Graphics...if you can't beat them, join them ?
tibamusic.com - Saturday, August 3, 2019 - link
Thank you very much.
Threska - Saturday, August 3, 2019 - link
What a coincidence. The latest humble bundle is "Data Analysis & Machine Learning by O'Reilly"

https://www.humblebundle.com/books/data-analysis-m...

Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Testing Notes

Benchmark Configuration and Methodology

Intel's Xeon "Purley" Server – S2P2SY3Q (2U Chassis)

Xeon - NVIDIA Titan RTX Workstation

AMD EPYC 7601 – (2U Chassis)

Other Notes

Post Your Comment

56 Comments

View All Comments

Gondalf - Tuesday, July 30, 2019 - link

eastcoast_pete - Tuesday, July 30, 2019 - link

BenSkywalker - Wednesday, July 31, 2019 - link

abufrejoval - Tuesday, July 30, 2019 - link

FunBunny2 - Wednesday, July 31, 2019 - link

keg504 - Tuesday, July 30, 2019 - link

808Hilo - Wednesday, July 31, 2019 - link

J7SC_Orion - Wednesday, July 31, 2019 - link

tibamusic.com - Saturday, August 3, 2019 - link

Threska - Saturday, August 3, 2019 - link

Log in

Don't have an account? Sign up now