Intel’s View on AI: Do What NV Doesn't

On the whole, Intel has a good point that there is "a wide range of AI applications", e.g. there is AI life beyond CNNs. In many real-life scenarios, traditional machine learning techniques outperform CNNs, and not all deep learning is done with the ultra-scalable CNNs. And in other real-world cases, having massive amounts of RAM is another big performance advantage, both while training the model and using it to infer new data. 

So despite NVIDIA’s massive advantage in running CNNs, high end Xeons can offer a credible alternative in the data analytics market. To be sure, nobody expects the new Cascade Lake Xeons to outperform NVIDIA GPUs in CNN training, but there are lots of cases where Intel might be able to convince customers to invest in a more potent Xeon instead of an expensive Tesla accelerator:

  • Inference of AI models that require a lot of memory
  • "Light" AI models that do not require long training times.
  • Data architectures where the batch or stream processing time is more important than the model training time.
  • AI models that depend on traditional “non-neural network” statistical models

As result, there might be an opportunity for Intel to keep NVIDIA at bay until they have a reasonable alternative for NVIDIA’s GPUs in CNN workloads. Intel has been feverishly adding features to the Xeons Scalable family and optimizing its software stack to combat NVIDIA AI hegemony. Optimized AI software like Intel’s own distribution for Python, the Intel Math Kernel Library for Deep Learning, and even the Intel Data Analytics Acceleration Library – mostly for traditional machine learning... 

All told then, for the second generation of Intel’s Xeon Scalable processors, the company has added new AI hardware features under the Deep Learning (DL) Boost name. This primarily includes the Vector Neural Network Instruction (VNNI) set, which can do in one instruction what would have previously taken three. However even farther down the line, Cooper Lake, the third-generation Xeon Scalable processor, will add support for bfloat16, further improving training performance.

In summary, Intel trying to recapture the market for “lighter AI workloads” while making a firm stand in the rest of data analytics market, all the while adding very specialized hardware (FPGA, ASICs) to their portfolio. This is of critical importance to Intel's competitiveness in the IT market. Intel has repeatedly said that the data center group (DCG) or “enterprise part” is expected to be the company's main growth engine in the years ahead.

Convolutional, Recurrent, & Scalability NVIDIA’s Answer: Bringing GPUs to More Than CNNs
Comments Locked

56 Comments

View All Comments

  • Bp_968 - Tuesday, July 30, 2019 - link

    Oh no, not 8 million, 8 *billion* (for the 8180 xeon), and 19.2 *billion* for the last gen AMD 32 core epyc! I don't think they have released much info on the new epyc yet buy its safe to assume its going to be 36-40 billion! (I dont know how many transistors are used in the I/O controller).

    And like you said, the connections are crazy! The xeon has a 5903 BGA connection so it doesn't even socket, its soldered to the board.
  • ozzuneoj86 - Sunday, August 4, 2019 - link

    Doh! Thanks for correcting the typo!

    Yes, 8 BILLION... it's incredible! It's even more difficult to fathom that these things, with billions of "things" in such a small area are nowhere near as complex or versatile as a similarly sized living organism.
  • s.yu - Sunday, August 4, 2019 - link

    Well the current magnetic storage is far from the storage density of DNA, in this sense.
  • FunBunny2 - Monday, July 29, 2019 - link

    "As a single SQL query is nowhere near as parallel as Neural Networks – in many cases they are 100% sequential "

    hogwash. SQL, or rather the RM which it purports to implement, is embarrassingly parallel; these are set operations which care not a fig for order. the folks who write SQL engines, OTOH, are still stuck in C land. with SSD seq processing so much faster than HDD, app developers are reverting to 60s tape processing methods. good for them.
  • bobhumplick - Tuesday, July 30, 2019 - link

    so cpus will become more gpu like and gpus will become more cpu like. you got your avx in my cuda core. no, you got your cuda core in my avx......mmmmmm
  • bobhumplick - Tuesday, July 30, 2019 - link

    intel need to get those gpus out quick
  • Amiba Gelos - Tuesday, July 30, 2019 - link

    LSTM in 2019?
    At least try GRU or transformer instead.
    LSTM is notorious for its non-parallelizablity, skewing the result toward cpu.
  • Rudde - Tuesday, July 30, 2019 - link

    I believe that's why they benchmarked LSTM. They benchmarked gpu stronghold CNNs to show great gpu performance and benchmarked LSTM to show great cpu performance.
  • Amiba Gelos - Tuesday, July 30, 2019 - link

    Recommendation pipeline already demonstrates the necessity of good cpus for ML.
    Imho benching LSTM to showcase cpu perf is misleading. It is slow, performing equally or worse than alts, and got replaced by transformer and cnn in NMT and NLP.
    Heck why not wavenet? That's real world app.
    I bet cpu would perform even "better" lol.
  • facetimeforpcappp - Tuesday, July 30, 2019 - link

    A welcome will show up on their screen which they have to acknowledge to make a call.
    So there you go; Mac to PC, PC to iPhone, iPad to PC or PC to iPod, the alternatives are various, you need to pick one that suits your needs. Facetime has magnificent video calling quality than other best video calling applications.
    https://facetimeforpcapp.com/

Log in

Don't have an account? Sign up now