NVIDIA’s Answer: RAPIDS Bring GPUs to More Than CNNs

NVIDIA’s has proven more than once that it can outmaneuver the competition with excellent vision and strategy. NVIDIA understands that getting all neural networks to scale as CNNs is not going to be easy, and that there are a lot of applications out there that are either running on other methods than neural networks, or which are memory intensive rather than compute intensive.  

At GTC Europe, NVIDIA launched a new data science platform for enterprise use, built on NVIDIA’s new “RAPIDS” framework. The basic idea is that the GPU acceleration of the data pipeline should not be limited to deep learning.

CuDF, for example, allows data scientists to load data into GPU memory and batch process it, similar to Pandas (the python library for manipulating data). cuML is a currently limited collection of GPU-accelerated machine learning libraries. Eventually most (all?) machine learning algorithms available in Scikit-Learn toolkit should be GPU accelerated and available in cuML.

NVIDIA also added Apache Arrow, a columnar in-memory database. This is because GPUs operate on vectors, and as a result favor a columnar layout in memory.

By leveraging Apache arrow as a “central database”, NVIDIA avoids a lot of overhead.

Making sure that there are GPU accelerated versions of the typical Python libraries such as Sci-Kit and Pandas is one step in right direction. But Pandas is only suited for the lighter “data science exploration” tasks. By working with Databricks to make sure that RAPIDS is also used in the heavy duty, distributed “data processing” framework Spark, NVIDIA is taking the next step, breaking out of the "Deep learning mostly" role and towards "NVIDIA in the rest of the data pipeline".

However, the devil is in the details. Adding GPUs to a framework that has been optimized for years to make optimal use of CPU cores and the massive amounts of RAM available in servers is not easy. Spark is built to run on a few tens of powerful server cores, not thousands of wimpy GPU cores. Spark has been optimized to run on clusters of server nodes, making it seem like one big lump of RAM memory and cores. Mixing two kinds of memory – RAM and GPU VRAM – and keeping the distributed compute nature of Spark intact will not be easy.

Secondly, cherry picking the most GPU-friendly machine learning algorithms is one thing, but making sure most of them run fine in GPU-based machine is another thing. Lastly, GPUs will still have less memory than CPUs for the foreseeable future; and even coherent platforms won’t solve the problem that system RAM is a fraction of the speed of local VRAM

Intel’s View on AI: Do What NV Doesn't Who Will Win the Next Enterprise Market?
Comments Locked

56 Comments

View All Comments

  • tipoo - Monday, July 29, 2019 - link

    Fyi, when on page 2 and clicking "convolutional, etc" for page 3, it brings me back to the homepage
  • Ryan Smith - Monday, July 29, 2019 - link

    Fixed. Sorry about that.
  • Eris_Floralia - Monday, July 29, 2019 - link

    Johan's new piece in 14 months! Looking forward to your Rome review :)
  • JohanAnandtech - Monday, July 29, 2019 - link

    Just when you think nobody noticed you were gone. Great to come home again. :-)
  • Eris_Floralia - Tuesday, July 30, 2019 - link

    Your coverage on server processors are great!
    Can still well remember Nehalem, Barcelona, and especially Bulldozer aftermath articles
  • djayjp - Monday, July 29, 2019 - link

    Not having a Tesla for such an article seems like a glaring omission.
  • warreo - Monday, July 29, 2019 - link

    Doubt Nvidia is sourcing AT these cards, so it's likely an issue of cost and availability. Titan is much cheaper than a Tesla, and I'm not even sure you can get V100's unless you're an enterprise customer ordering some (presumably large) minimum quantity.
  • olafgarten - Monday, July 29, 2019 - link

    It is available https://www.scan.co.uk/products/32gb-pny-nvidia-te...
  • abufrejoval - Tuesday, July 30, 2019 - link

    Those bottlenecks are over now and P100, V100 can be bought pretty freely, as well as RTX6000/8000 (Turings). Actually the "T100" is still missing and the closest siblings (RTX 6000/8000) might never get certified for rackmount servers, because they have active fans while the P100/V100 are designed to be cooled by server fans. I operate a handful of each and getting budget is typically the bigger hurdle than purchasing.
  • SSNSeawolf - Monday, July 29, 2019 - link

    I've been trying to find more information on Cascade Lake's AI/VNNI performance, but came up dry. Thanks, Johan. Eagerly putting this aside for my lunch reading today.

Log in

Don't have an account? Sign up now