CPU Performance: Intel's Own Claims

Before we get into the new AI benchmarks, let’s take a quick look at the usual CPU benchmarks and performance claims made available by Intel. 

For this comparison we’ll focus on the second row – the first row is comparing the insanely priced 400W dual-die Intel Platinum 9282 to a much more reasonable and available to everyone Intel Platinum 8180. The second row tells it all: a few MHz and slightly higher RAM speeds result in a 3% (Integer) to 5% (FP) performance increase compared to the first-generation Xeon Scalable parts. The higher boost in floating point performance is probably the result of the fact that Intel's second generation parts can use faster DDR4-2933 DIMMs and hence offer more bandwidth to the cores. 

The midrange SKUs get a bigger boost as some of x2xx Xeon Scalable parts get more cores and more L3 cache than the previous x1xx parts. For example, the 6252 has 24 cores and 35.75 MB L3, while the 6152 had 22 cores and 30.25 MB L3. 

 

The comparison with AMD's EPYC 7601 however deserves our attention, as there’s some interesting data here. Again, the comparison of a 400W, $50k chiplet CPU with a 180W $4k one does not make any sense whatsoever, so we ignore the first line. 

The Linpack numbers are not surprising: the more expensive Skylake SKUs add a 512-bit FMAC to the already existing dual 256-bit FMACs, offering up to 4 times more AVX throughput than AMD's EPYC. AMD's next generation will be a lot more competitive in this area as the each FP unit is now capable of doing 256-bit AVX instead of 128-bit. 

The image classification results clearly show that Intel is trying to convince people that some AI applications should simply run on a CPU, no GPU needed. Well, at least for now… 

The fact that Intel claims that database performance is a lot better than on the EPYC is quite interesting, as we’ve previously pointed out that AMD's four NUMA dies on a chip does have drawbacks. Quoting our Xeon Skylake vs EPYC review

Out of the box, the EPYC CPU is a rather mediocre transactional database CPU ... transactional databases will remain Intel territory for now.

In databases, cache (coherency) latency plays an important role. It will be interesting to see how well AMD has addressed this weakness in the second generation EPYC server chips. 

Testing Notes & Benchmark Configuration SAP S&D 2-tier
Comments Locked

56 Comments

View All Comments

  • Bp_968 - Tuesday, July 30, 2019 - link

    Oh no, not 8 million, 8 *billion* (for the 8180 xeon), and 19.2 *billion* for the last gen AMD 32 core epyc! I don't think they have released much info on the new epyc yet buy its safe to assume its going to be 36-40 billion! (I dont know how many transistors are used in the I/O controller).

    And like you said, the connections are crazy! The xeon has a 5903 BGA connection so it doesn't even socket, its soldered to the board.
  • ozzuneoj86 - Sunday, August 4, 2019 - link

    Doh! Thanks for correcting the typo!

    Yes, 8 BILLION... it's incredible! It's even more difficult to fathom that these things, with billions of "things" in such a small area are nowhere near as complex or versatile as a similarly sized living organism.
  • s.yu - Sunday, August 4, 2019 - link

    Well the current magnetic storage is far from the storage density of DNA, in this sense.
  • FunBunny2 - Monday, July 29, 2019 - link

    "As a single SQL query is nowhere near as parallel as Neural Networks – in many cases they are 100% sequential "

    hogwash. SQL, or rather the RM which it purports to implement, is embarrassingly parallel; these are set operations which care not a fig for order. the folks who write SQL engines, OTOH, are still stuck in C land. with SSD seq processing so much faster than HDD, app developers are reverting to 60s tape processing methods. good for them.
  • bobhumplick - Tuesday, July 30, 2019 - link

    so cpus will become more gpu like and gpus will become more cpu like. you got your avx in my cuda core. no, you got your cuda core in my avx......mmmmmm
  • bobhumplick - Tuesday, July 30, 2019 - link

    intel need to get those gpus out quick
  • Amiba Gelos - Tuesday, July 30, 2019 - link

    LSTM in 2019?
    At least try GRU or transformer instead.
    LSTM is notorious for its non-parallelizablity, skewing the result toward cpu.
  • Rudde - Tuesday, July 30, 2019 - link

    I believe that's why they benchmarked LSTM. They benchmarked gpu stronghold CNNs to show great gpu performance and benchmarked LSTM to show great cpu performance.
  • Amiba Gelos - Tuesday, July 30, 2019 - link

    Recommendation pipeline already demonstrates the necessity of good cpus for ML.
    Imho benching LSTM to showcase cpu perf is misleading. It is slow, performing equally or worse than alts, and got replaced by transformer and cnn in NMT and NLP.
    Heck why not wavenet? That's real world app.
    I bet cpu would perform even "better" lol.
  • facetimeforpcappp - Tuesday, July 30, 2019 - link

    A welcome will show up on their screen which they have to acknowledge to make a call.
    So there you go; Mac to PC, PC to iPhone, iPad to PC or PC to iPod, the alternatives are various, you need to pick one that suits your needs. Facetime has magnificent video calling quality than other best video calling applications.
    https://facetimeforpcapp.com/

Log in

Don't have an account? Sign up now