The Most Useful Machine Learning Tools of 2020

By Ian Xiao, Engagement Lead at Dessa TD; DR — Building good Machine Learning applications is like making Michelin-style dishes.

Having a well organized and managed kitchen is critical, but there are too many options to choose from.

In this article, I will highlight the tools I found useful in delivering professional projects, share a few thoughts and alternatives, and do a quick real-time survey (you can see what the community thinks after you participate).

Like any tooling discussion, the list is not exhaustive; but I try to focus on the most useful and simplest tools.

Welcome any feedback in the comment section or let me know if there are better alternatives I should mention.

Disclaimer: This post is not endorsed or sponsored.

I use the term Data Science and ML interchangeably.

Like What You Read? Follow me on Medium, LinkedIn, or Twitter.

Do you want to learn how to communicate and become an influential Data Scientist? Check out my “Influence with Machine Learning” guide.

   “How do I build good Machine Learning applications?”This question came up many times and in various forms during chats with aspiring data scientists in schools, professionals who are looking to switch, and team managers.

There are many aspects of delivering a professional data science project.

Like many others, I like to use the analogy of cooking in a kitchen: there is the ingredient (data), the recipe (design), the process of cooking (well, your unique approach), and finally, the actual kitchen (tools).

So, this article walks through my kitchen.

It highlights the most useful tools to design, develop, and deploy full-stack Machine Learning applications — solutions that integrate with systems or serve human users in Production environments.

If you want to know more about other aspects of delivering ML, check out my articles here.

   We live in a golden age.

If you search “ML tools” in Google or ask a consultant, you are likely to get something like this:  There are (too) many tools out there; the possible combination is infinite.

It’ can be confusing and overwhelming.

So, let me help you to shrink it down.

That said, there is no perfect setup.

It all depends on your needs and constraints.

So pick and choose accordingly.

My list prioritizes the following (not in order):Caveat: I use Python ???? 99% of the time.

So the tools work well with or are built with native Python.

I haven’t tested them with other programming languages, such as R or Java.

    A free and open-source relational database management system (RDBMS) emphasizing extensibility and technical standards compliance.

It is designed to handle a range of workloads, from single machines to data warehouses or Web services with many concurrent users.

 Alternatives: MySQL, SAS, IBM DB2, Oracle, MongoDB, Cloudera, GCP, AWS, Azure   Pipeline tools are critical to the speed and quality of development.

We should be able to iterate fast with minimum manual processing.

Here is a setup that works well, see my 12-Hour ML Challenge article for more details.

Every lazy data scientist should try this up early on in the project.

    It offers the distributed version control and source code management (SCM) functionality of Git, plus its own features.

It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project.

Alternative: DVC, BitBucket, GitLab   An integrated development environment (IDE) used in computer programming, specifically for the Python language.

It is developed by the Czech company JetBrains.

It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems (VCSes), and supports web development with Django as well as Data Science with Anaconda.

Alternatives: Atom, Sublime   A framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries.

It saves lots of time from manual testing.

If you need to test something every time you make changes to the code, automate it with Pytest.

Alternative: Unittest   CircleCI is a continuous integration and deployment tool.

It creates an automated testing workflow using remote dockers when you commit to Github.

Circle CI rejects any commit that does not pass the test cases set by PyTest.

This ensures code quality, especially when you work with a larger team.

Alternative: Jenkins, Travis CI, Github Action   A platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.

You can integrate with CircleCI and Github to enable automatic deployment.

Alternative: Google App Engine, AWS Elastic Compute Cloud, others   Streamlit is an open-source app framework for Machine Learning and Data Science teams.

It’s become one of my favourite tools in recent years.

Check out how I used it and the other tools in this section to create a movie and simulation app.

Alternative: Flask, Django, Tableau    Forget about Jupyter Notebook.

Yes, that’s right.

Jupyter was my go-to tool for exploring data, doing analysis, and experimenting with different data and modelling processes.

But I can’t remember how many times when:It’s frustrating ????????.

So, I use Streamlit to do early exploration and serve the final front-end — killing two birds with one stone.

The following is my typical screen setup.

PyCharm IDE on the left and result visualization on the right.

Give it a try.

 Alternative: Jupyter Notebook, Spyder from Anaconda, Microsoft Excel (seriously)   Like using actual knives, you should pick the right ones depending on the food and how you want to cut it.

There are general-purpose and specialty knives.

Be cautious.

 Using a specialty knife for sushi to cut bones will take a long time, although the sushi knife is shinier.

Pick the right tool to get the job done.

   The go-to framework for doing general Machine Learning in Python.

Enough said.

 Alternatives: none, period.

   An open-source machine learning library based on the Torch library.

Given the Deep Learning focus, it’s mostly used for applications such as computer vision and natural language processing.

It is primarily developed by Facebook’s AI Research lab (FAIR).

Recently, many well-known AI research institutes, such as Open AI, are using PyTorch as their standard tool.

Alternatives: Tensorflow, Keras, Fast.

ai   A toolkit for developing and comparing reinforcement learning algorithms.

It offers API and visual environments.

This is an active area the communities are building tools for.

Not many well-packaged tools are available yet.

Alternatives: many small projects, but not many are as well maintained as the Gym.

    A free tool that allows data scientists to set up experiments with a few snippets and surface the results to a web-based dashboard.

 Disclaimer: I worked at Dessa, the company that creates Altas.

Alternatives: ML Flow, SageMaker, Comet, Weights & Biases, Data Robot, Domino   Out of curiosity, what troubles you the most when finding the right tools? I’d love to hear your thoughts below.

It’s a live survey, so you see what the community thinks after you participate.

   As I mentioned, there is no perfect setup.

It all depends on your needs and constraints.

Here is another view of what tools are available and how they can work together.

    If you want to learn more about how to use these tools, the best way is to find a project to work on.

You can incorporate the tools in a current project or do a 12-hour ML challenge.

Not sure how? Check out how I created a user-empowered recommendation app with tools and processes discussed.

I look forward to seeing what you can create.

Please share it with the community and tag me on Twitter ????.

Like What You Read? Follow me on Medium, LinkedIn, or Twitter.

Also, do you want to learn business thinking and communication skills as a Data Scientist? Check out my “Influence with Machine Learning” guide.

   The Forgotten Algorithm Exploring Monte Carlo Simulation with StreamlitData Science is Boring How I cope with the boring days of deploying Machine Learning12-Hour ML Challenge How to build & deploy an ML app with Streamlit and DevOps toolsA Doomed Marriage of ML and Agile How not to apply Agile to an ML projectThe Last-Mile Problems of AI How to foster trust between humans and AIAnother AI Winter? How to deploy more ML solutions — five tactics  Bio: Ian Xiao is Engagement Lead at Dessa, deploying machine learning at enterprises.

He leads business and technical teams to deploy Machine Learning solutions and improve Marketing & Sales for the F100 enterprises.

Original.

Reposted with permission.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

Leave a Reply