Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Name: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Item: Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI
Author: Johan De Gelas

by Johan De Gelas on July 29, 2019 8:30 AM EST

56 Comments | Add A Comment

56 Comments

Exploring Parallel HPC

HPC benchmarking, just like server software benchmarking, requires a lot of research. We are definitely not HPC experts, so we will limit ourselves to one HPC benchmark.

Developed by the Theoretical and Computational Biophysics Group at the University of Illinois Urbana-Champaign, NAMD is a set of parallel molecular dynamics codes for extreme parallelization on thousands of cores. NAMD is also part of SPEC CPU2006 FP.

To be fair, NAMD is mostly single precision. And, as you probably know, the Titan RTX was designed to excel at single precision workloads; so the NAMD benchmark is a good match for the Titan RTX. Especially now that the NAMD authors reveal that:

Performance is markedly improved when running on Pascal (P100) or newer CUDA-capable GPUs.

Still, it is an interesting benchmark as the NAMD binary is compiled with Intel ICC and optimized for AVX. For our testing, we used the "NAMD_2.13_Linux-x86_64-multicore" binary. This binary supports AVX instructions, but only the "special” AVX-512 instructions for the Intel Xeon Phi. Therefore, we also compiled an AVX-512 ICC optimized binary. This way we can really measure how well the AVX-512 crunching power of the Xeon compares to NVIDIA’s GPU acceleration.

We used the most popular benchmark load, apoa1 (Apolipoprotein A1). The results are expressed in simulated nanoseconds per wall-clock day. We measure at 500 steps.

NAMD Molecular Dynamics 2.13

Using AVX-512 boosts performance in this benchmark by 46%. But again, this software runs so much faster on a GPU, which is of course understandable. At best, the Xeon has 28 cores running at 2.3 GHz. Each cycle 32 single precision floating operations can be done. All in all, the Xeon can do 2 TFLOPs (2.3 G*28*32). So a dual Xeon setup can do 4 TFLOPs at the most. The Titan RTX, on the other hand, can do 16 TFLOP s, or 4 times as much. The end result is that NAMD runs 3 times faster on the Titan than on the dual Intel Xeon.

Inference: ResNet-50 Analyzing Intel's Cascade Lake in the New Era of AI

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

56 Comments

View All Comments

ballsystemlord - Saturday, August 3, 2019 - link
Spelling and grammar errors:

"But it will have a impact on total energy consumption, which we will discuss."
"An" not "a":
"But it will have an impact on total energy consumption, which we will discuss."

"We our newest servers into virtual clusters to make better use of all those core."
Missing "s" and missing word. I guessed "combine".
"We combine our newest servers into virtual clusters to make better use of all those cores."

"For reasons unknown to us, we could get our 2.7 GHz 8280 to perform much better than the 2.1 GHz Xeon 8176."
The 8280 is only slightly faster in the table than the 8176. It is the 8180 that is missing from the table.

"However, since my group is mostly using TensorFlow as a deep learning framework, we tend to with stick with it."
Excess "with":
"However, since my group is mostly using TensorFlow as a deep learning framework, we tend to stick with it."

"It has been observed that using a larger batch can causes significant degradation in the quality of the model,..."
Remove plural form:
"It has been observed that using a larger batch can cause significant degradation in the quality of the model,..."

"...but in many applications a loss of even a few percent is a significant."
Excess "a":
"...but in many applications a loss of even a few percent is significant."

"LSTM however come with the disadvantage that they are a lot more bandwidth intensive."
Add an "s":
"LSTMs however come with the disadvantage that they are a lot more bandwidth intensive."

"LSTMs exhibit quite inefficient memory access pattern when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth."
"pattern" should be plural because "LSTMs" is plural, I choose an "s":
"LSTMs exhibit quite inefficient memory access patterns when executed on mobile GPUs due to the redundant data movements and limited off-chip bandwidth."

"Of course, you have the make the most of the available AVX/AVX2/AVX512 SIMD power."
"to" not "the":
"Of course, you have to make the most of the available AVX/AVX2/AVX512 SIMD power."

"Also, this another data point that proves that CNNs might be one of the best use cases for GPUs."
Missing "is":
"Also, this is another data point that proves that CNNs might be one of the best use cases for GPUs."

"From a high-level workflow perfspective,..."
A joke, or a misspelling?

"... it's not enough if the new chips have to go head-to-head with a GPU in a task the latter doesn't completely suck at."
Traditionally, AT has had no language.
"... it's not enough if the new chips have to go head-to-head with a GPU in a task the latter is good at."

"It is been going on for a while,..."
"has" not "is":
"It has been going on for a while,..."
ballsystemlord - Saturday, August 3, 2019 - link
Thanks for the cool article!
tmnvnbl - Tuesday, August 6, 2019 - link
Great read, especially liked the background and perspective next to the benchmark details
dusk007 - Tuesday, August 6, 2019 - link
Great Article.
I wouldn't call Apache Arrow a database though. It is a data format more akin to a file format like csv or parquet. It is not something that stores data for you and gives it to you. It is the how to store data in memory. Like CSV or Parquet are a "how to" store data in Files. More efficient less redundancy less overhead when access from different runtimes (Tensorflow, Spark, Pandas,..).

Love the article, I hope we get more of those. Also that huge performance optimizations are possible in this field just in software. Often renting compute in the cloud is cheaper than the man hours required to optimize though.
Emrickjack - Thursday, August 8, 2019 - link
Johan's new piece in 14 months! Looking forward to your Rome review
Emrickjack - Thursday, August 8, 2019 - link
It More Information http://americanexpressconfirmcard.club/

Intel's Xeon Cascade Lake vs. NVIDIA Turing: An Analysis in AI

Exploring Parallel HPC

Post Your Comment

56 Comments

View All Comments

ballsystemlord - Saturday, August 3, 2019 - link

ballsystemlord - Saturday, August 3, 2019 - link

tmnvnbl - Tuesday, August 6, 2019 - link

dusk007 - Tuesday, August 6, 2019 - link

Emrickjack - Thursday, August 8, 2019 - link

Emrickjack - Thursday, August 8, 2019 - link

Log in

Don't have an account? Sign up now