Democratizing AI: How to Gain Actionable Insights through Open Source

In this special guest feature, Ion Stoica, Co-founder of Anyscale, details the state of machine learning application creation in the enterprise today, the case for democratizing AI, and what this means for the future of work.

Ion is an original founder of Databricks and creator of the Apache Spark data processing framework.

He is a Professor in the EECS Department at University of California at Berkeley and leads the RISE Lab.

He is an ACM Fellow and has received numerous awards, including the SIGOPS Hall of Fame Award (2015), the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001).

Whether you like it or not, artificial intelligence (AI) is taking over the (business) world.

In fact, a study from McKinsey notes that 30 percent of organizations are conducting AI pilots, while nearly half have embedded at least one AI capability into their traditional business processes.

Adoption numbers are only going to increase with time.

The reason for this is pretty simple.

Over the past several years, organizations have amassed copious amounts of data.

Machines can sift through this data to drive optimal outcomes more quickly than any human could ever hope to, so businesses let machines do what they’re best at, while leaving high-level tasks for humans.

googletag.

cmd.

push(function() { googletag.

display(div-gpt-ad-1439400881943-0); }); Unfortunately, for the majority of organizations, deploying AI for practical use cases is a daunting task.

On one hand, they have to be able to hire and retain incredible engineering talent, and on the other, they need to allocate the time and resources necessary to scale these new applications.

The Current Challenges In order to take advantage of AI, the Googles of the world employ distributed computing experts and infrastructure teams to ensure that they have the computing power needed to scale their applications.

However, organizations with finite resources simply can’t afford to bring on the expertise and build the teams to help with their distributed computing needs.

So why are these AI applications so hard to scale? In a nutshell, the demands of the applications are growing faster than ever, much faster than the capabilities of individual computers.

This leaves no choice but to distribute these applications.

This means handling failures, implementing sophisticated resource allocation and scheduling algorithms to efficiently use available resources, and supporting an increasing variety of hardware accelerators (e.

g.

, GPUs, TPUs).

Handling each of these problems in isolation is very difficult.

When put together they present a daunting challenge.

And as application demands continue their explosive growth, this challenge will only get exacerbated.

Scaling Applications with Open Source Technologies The use of open source is prevalent among machine learning researchers and engineers.

Indeed, machine learning libraries, such as scikit-learn, Tensorflow, and Pytorch are some of the most popular open source projects.

Furthermore, Python has emerged as the standard language for developing AI applications.

While these open source libraries significantly simplify the development of AI applications on a single machine, the challenge of scaling these applications remains.

To cope with this challenge, organizations are increasingly turning to Ray, an open source framework for building distributed applications that provides a simple yet flexible Python API.

With distributed computing becoming the norm, Ray allows developers to easily scale applications from a laptop to a cluster, eliminating the need for in-house distributed computing expertise and resources.

This effectively removes the primary barrier to entry for building scalable, distributed applications that has held organizations back from exploring and ultimately deploying AI and ML for enterprise use.

Ray does this by employing sophisticated techniques to seamlessly scale to large clusters and provide fault tolerance.

Furthermore, Ray can be deployed both in the public cloud or in on-premise clusters with Kubernetes — providing developers with the scale and performance requirements needed to drive true business value.

The Path Forward While most businesses lack the engineering resources of the Googles of the world, this is no longer a barrier that has to hold them back from scaling and deploying large-scale AI applications.

By tapping into the power of the open-source community in an intelligent way, enterprises of all sizes can begin to turn their data into actionable insights that lead to added value and quick decision making.

These are crucial capabilities in today’s competitive industries and ones that are likely to turn up-and-coming companies into tomorrow’s household names.

Sign up for the free insideBIGDATA newsletter.

.

Leave a Reply