RAPIDS 0.7: Well We’re Moving On Up…

This blog will show you how easy it is.

Google Colab is a great place to try out the complete RAPIDS suite and experiment with single-GPU jobs.

Finally, another Google milestone: last week Google Dataproc announced RAPIDS integration as a new Initialization during their Cloud OnAir: New open-source tools in Cloud Dataproc webinar.

Currently in early beta, Dataproc users can easily and quickly configure an NVIDIA GPU cluster in the Google environment to try RAPIDS at scale.

With Dataproc, you can leverage RAPIDS with dask-XGboost to use multiple GPUs and clusters of GPU nodes to train larger problems.

Stay tuned for a detailed blog explaining how to get started.

In summary, RAPIDS is in the clouds — now on Azure, Databricks, and throughout Google platforms including: GCP DL VM, Colab, Dataproc, and KubeFlow.

In addition, for AWS, and other clouds which are NGC-Ready, you can use the NGC Container to quickly try RAPIDS.

So many places!New Features and Improvements (… in usability and feature completeness!)RAPIDS cuDF library 0.

7 makes a bunch of things easier.

There are now cumulative sum, product, min, and max functions for series.

In fact, we did a complete overhaul of reduction operations on series on the C++ side.

This fixed several bugs, added null support, improved datatype flexibility, and increased aggregation coverage for libcudf.

We also added DataFrame.

pop(), a great way to get a label column and a data matrix in one line.

For reshaping your data, we’ve added multi-index support, including join and groupby functionality (including on strings, which is truly amazing), and the DataFrame.

melt() method.

See our cheatsheet for more info on melt.

Min and max methods now support datetime columns.

One additional improvement to mention, more cuDF functions support null / NA data.

cuDF gets better every release, and we’re looking forward to release 0.

8.

A couple of things to get excited about are rolling window functionality and a GPU-accelerated to_csv() function.

In the RAPIDS cuML library, we’ve added two new methods on the Python side.

One is brand new: a coordinate descent solver to fit lasso and elastic net regressions.

The other is a big improvement: a completely rewritten single-GPU version of k-means built entirely on our machine learning primitives.

Under the hood, we’ve done a lot to improve the code and have added C++ methods, like Quasi-Newtonian solvers and Random Forests, that will be exposed in Python in a later release.

cuGraph continues to improve and refine its code base with an eye towards matching the NetworkX API.

Read the latest blog from the cuGraph team here.

New analytics for version 0.

7 are (1) enhancements to Jaccard Similarity to allow comparison between any vertex pairs; (2) the additional of the Overlap Coefficient as an alternative to Jaccard; (3) Triangle Counting; (4) Subgraph Extraction; (5) and Renumbering.

Finally, you asked for it, we did it: better error handling!.This required rewriting Python bindings to the underlying libcudf library to use Cython, so now low-level errors are cleanly passed through to the end user for better diagnostics.

Version 0.

7 took the first steps toward fully rewriting bindings, and we’ll smooth out any resulting issues in subsequent releasesThe Rule of Two (again)In my last blog, I talked about the “rule of two” when it came to CUDA version support in RAPIDS.

Beginning with version 0.

7, we are instituting a new rule of two.

We will only be supporting two installation formats for the foreseeable future: conda and source installation.

After much thought and deliberation, we will not be supporting PIP.

For more details on why we made this decision, please see this blog.

Getting Started Has Never Been Easier (… Have Your Piece of the Pie)With our 0.

7 release and in support of our growing community, we’d like to share our new Notebooks Extended repo on GitHub.

Read this blog to learn more.

You can think of this as the RAPIDS Community’s notebooks to provide data practitioners a place to grow their skills and teach others what they’ve learned.

We expect that Notebooks Extended will be the go-to place for the latest tips and tricks and where budding RAPIDS practitioners grow and master RAPIDS.

If you’re interested in cybersecurity use cases, you owe it to yourself to read the recent blog post.

And if you’re new to cuDF and Dask-cuDF, the blog walks you through what those packages do at the highest level to building an optimized ETL pipeline for large data sets.

RAPIDS in the Financial News!We were excited to see that RAPIDS was a key part of an NVIDIA effort to blow away the previous best score on the STAC A3 benchmark, a measure of backtesting performance, which is critical in the financial services sector.

To learn more about how RAPIDS is accelerating Python in banking, join us for a webinar on 6/13 with RAPIDS Senior Data Scientist @realpaulmahler.

Deep Learning with Tabular Data — RAPIDS with PyTorchRAPIDS is making it possible to work with tabular data in deep learning.

In a recent post, we explore how traditional machine learning approaches like XGboost compare in performance against deep learning DNNs.

This first foray into deep learning for RAPIDS is a significant step that demonstrates we can achieve similar performance to XGboost with DNNs with a reasonably simple model.

Looking to 0.

8 and BeyondIn 0.

8, we’re working hard to release a single-GPU implementation of random forests, and we’re laying the groundwork for multi-node, multi-GPU k-means and random forests in 0.

9.

We’ve also been working with the OpenUCX community to integrate UCX into Dask.

This is coming along very well, and we should have our first version of this out in 0.

8 with more optimizations and support in 0.

9 and 0.

10.

As long as We Live, It’s You and Me BabyIf you’ve been thinking about trying out RAPIDS, you can get started on Google Colab in seconds.

For returning users, there are so many ways to try the latest release, docs are improved, and there are numerous getting started notebooks to showcase the many new features in RAPIDS.

We’re excited for you to join the community.

If you like RAPIDS, please give it a star on GitHub, and file GitHub issues for problems or feature requests to make it even better.

See y’all in 6 weeks!.. More details

Leave a Reply