Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Name: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Item: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Author: Johan De Gelas

by Johan De Gelas on July 29, 2019 8:30 AM EST

56 Comments | Add A Comment

56 Comments

Testing Notes

As the market stands, it is clear that alongside AMD and ARM, NVIDIA's professional offerings are a real threat to Intel's dominance in the datacenter and beyond. So for our testing today, we're going to focus on machine learning, and see just how Intel's new DL Boosted wares fare against the competition in the ML space.

On the Intel side of matters, of course, we're looking at the company's new Cascade Lake Xeon Scalable CPUs. The company provided two of their 28 core models, with the 165 Watt Xeon Platinum 8176, as well as the even faster 205 Watt Xeon Platinum 8280.

As for Cascade Lake's GPU competition, we've tapped NVIDIA's latest "Turing" Titan RTX card. While these aren't truly datacenter cards, the fact that they're based Turing means that they offer NVIDIA's very latest features. At the university that I work for, our deep learning researchers use these GPUs for training AI models as the Titan cards are affordable and have a lot of GPU memory available.

As an added bonus, Titan RTX cards can be used for both training (Hybrid FP32/16) as inference (FP16 and INT8). The current Tesla is still based on NVIDIA's Volta architecture, which does not have INT8 available for inference.

Finally, not to be excluded, we've also included AMD's first-generation EPYC platform in all of our testing. AMD doesn't have a hardware strategy quite like Intel – or specific instructions like VNNI – but as of late the company has offered all sorts of surprises.

Benchmark Configuration and Methodology

All of our testing was conducted on Ubuntu Server 18.04 LTS. You will notice that the DRAM capacity varies among our server configurations. This is of course a result of the fact that Xeons have access to six memory channels while EPYC CPUs have eight channels. As far as we know, all of our tests fit in 128 GB, so DRAM capacity should not have much influence on performance. But it will have a impact on total energy consumption, which we will discuss.

Last but not least, we want to note how the performance graphs have been color-coded. Orange is AMD's EPYC, dark blue is Intel's best (Cascade Lake/Skylake-SP), and light blue is the previous generation Xeons (Xeon E5-v4) . Gray has been used for the soon-to-be-replaced Xeon v1.

Intel's Xeon "Purley" Server – S2P2SY3Q (2U Chassis)

CPU	Two Intel Xeon Platinum 8280 (2.7 GHz, 28c, 38.5MB L3, 205W) Two Intel Xeon Platinum 8176 (2.1 GHz, 28c, 38.5MB L3, 165W)
RAM	384 GB (12x32 GB) Hynix DDR4-2666
Internal Disks	SAMSUNG MZ7LM240 (bootdisk) Intel SSD3710 800 GB (data)
Motherboard	Intel S2600WF (Wolf Pass baseboard)
Chipset	Intel Wellsburg B0
PSU	1100W PSU (80+ Platinum)

We enabled hyper-threading and Intel virtualization acceleration.

Xeon - NVIDIA Titan RTX Workstation

With some diplomacy, our AI researcher Pieter Bovijn at MCT was so kind to test his deep learning workstation. Below you can find the specs.

CPU	Intel Xeon Gold 6152 (2.1 GHz, 22c, 30.25MB L3, 140W)
RAM	192 GB (6x32 GB) Samsung DDR4-2666
Internal Disks	SAMSUNG MZ7LM240 (bootdisk) Intel SSD3710 800 GB (data)
Motherboard	Supermicro SYS-7049A-T (Intel C621 chipset)
GPU	PNY TITAN RTX 24 GB GDDR6
PSU	PWS-865-PQ

This is the only server in the test with a discrete GPU.

AMD EPYC 7601 – (2U Chassis)

CPU	Two EPYC 7601 (2.2 GHz, 32c, 8x8MB L3, 180W)
RAM	512 GB (16x32 GB) Samsung DDR4-2666 @2400
Internal Disks	SAMSUNG MZ7LM240 (bootdisk) Intel SSD3710 800 GB (data)
Motherboard	AMD Speedway
PSU	1100W PSU (80+ Platinum)

Other Notes

Both servers are fed by a standard European 230V (16 Amps max.) power line. The room temperature is monitored and kept at 23°C by our Airwell CRACs.

Who Will Win the Next Enterprise Market? CPU Performance: Intel's Own Claims

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

56 Comments

View All Comments

ballsystemlord - Saturday, August 3, 2019 - link
Spelling and grammar errors:

"But it will have a impact on total energy consumption, which we will discuss."
"An" not "a":
"But it will have an impact on total energy consumption, which we will discuss."

"We our newest servers into virtual clusters to make better use of all those core."
Missing "s" and missing word. I guessed "combine".
"We combine our newest servers into virtual clusters to make better use of all those cores."

"For reasons unknown to us, we could get our 2.7 GHz 8280 to perform much better than the 2.1 GHz Xeon 8176."
The 8280 is only slightly faster in the table than the 8176. It is the 8180 that is missing from the table.

"However, since my group is mostly using TensorFlow as a deep learning framework, we tend to with stick with it."
Excess "with":
"However, since my group is mostly using TensorFlow as a deep learning framework, we tend to stick with it."

"It has been observed that using a larger batch can causes significant degradation in the quality of the model,..."
Remove plural form:
"It has been observed that using a larger batch can cause significant degradation in the quality of the model,..."

"...but in many applications a loss of even a few percent is a significant."
Excess "a":
"...but in many applications a loss of even a few percent is significant."

"LSTM however come with the disadvantage that they are a lot more bandwidth intensive."
Add an "s":
"LSTMs however come with the disadvantage that they are a lot more bandwidth intensive."

"LSTMs exhibit quite inefficient memory access pattern when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth."
"pattern" should be plural because "LSTMs" is plural, I choose an "s":
"LSTMs exhibit quite inefficient memory access patterns when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth."

"Of course, you have the make the most of the available AVX/AVX2/AVX512 SIMD power."
"to" not "the":
"Of course, you have to make the most of the available AVX/AVX2/AVX512 SIMD power."

"Also, this another data point that proves that CNNs might be one of the best use cases for GPUs."
Missing "is":
"Also, this is another data point that proves that CNNs might be one of the best use cases for GPUs."

"From a high-level workflow perfspective,..."
A joke, or a misspelling?

"... it's not enough if the new chips have to go head-to-head with a GPU in a task the latter doesn't completely suck at."
Traditionally, AT has had no language.
"... it's not enough if the new chips have to go head-to-head with a GPU in a task the latter is good at."

"It is been going on for a while,..."
"has" not "is":
"It has been going on for a while,..."
ballsystemlord - Saturday, August 3, 2019 - link
Thanks for the cool article!
tmnvnbl - Tuesday, August 6, 2019 - link
Great read, especially liked the background and perspective next to the benchmark details
dusk007 - Tuesday, August 6, 2019 - link
Great Article.
I wouldn't call Apache Arrow a database though. It is a data format more akin to a file format like csv or parquet. It is not something that stores data for you and gives it to you. It is the how to store data in memory. Like CSV or Parquet are a "how to" store data in Files. More efficient less redundancy less overhead when access from different runtimes (Tensorflow, Spark, Pandas,..).

Love the article, I hope we get more of those. Also that huge performance optimizations are possible in this field just in software. Often renting compute in the cloud is cheaper than the man hours required to optimize though.
Emrickjack - Thursday, August 8, 2019 - link
Johan's new piece in 14 months! Looking forward to your Rome review
Emrickjack - Thursday, August 8, 2019 - link
It More Information http://americanexpressconfirmcard.club/

Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Testing Notes

Benchmark Configuration and Methodology

Intel's Xeon "Purley" Server – S2P2SY3Q (2U Chassis)

Xeon - NVIDIA Titan RTX Workstation

AMD EPYC 7601 – (2U Chassis)

Other Notes

Post Your Comment

56 Comments

View All Comments

ballsystemlord - Saturday, August 3, 2019 - link

ballsystemlord - Saturday, August 3, 2019 - link

tmnvnbl - Tuesday, August 6, 2019 - link

dusk007 - Tuesday, August 6, 2019 - link

Emrickjack - Thursday, August 8, 2019 - link

Emrickjack - Thursday, August 8, 2019 - link

Log in

Don't have an account? Sign up now