Guest Blog: Using Databricks, MLflow, and Amazon SageMaker at Brandless to Bring Recommendation Systems to Production

This is a guest blog from Adam Barnhard, Head of Data at Brandless, Inc.

, and Bing  Liang, Data Scientist at Brandless, Inc.

Launched in July 2017, Brandless makes hundreds of high-quality items, curated for every member of your family and room of your home, and all sold at more accessible price points than similar products on the market.

We sell exclusively through our website, ship directly to our customers, and collaborate directly with our partners to manufacture our assortment.

These direct relationships provide a unique opportunity to capture and use data to better serve our customers and share their feedback with our partners.

The data team at Brandless is a small team of fewer than ten people (in a total company of ~115) covering centralized analytics, algorithm development, and data engineering.

Among the many responsibilities we have, we create systems and processes that allow us to utilize our large datasets to ensure each customer gets a personalized and optimized experience.

We are also responsible for making sure our business leaders are equipped with the proper data and analysis to make decisions.

Building Out the Analytics Stack We use a variety of third-party, open source, and in-house built tools for our data stack.

We made these decisions based on the capabilities of what is available on the market and whether or not an in-house tool would serve as a competitive advantage for Brandless.

To implement our core stack, we utilized Amazon Redshift to house our data, Airflow to manage our Extract, Transform, and Load (ETL) jobs, and a variety of BI tools to present our metrics to stakeholders.

As the need for data and an optimized site experience grew, we set our sights on a production machine learning model: a product recommendation system that would serve up relevant products to customers as they visited our site.

As we started building the recommendation system in a local Python script, we quickly realized that the processing logic was more complex than what an out-of-the-box model provided.

We needed to understand the difference between new and old products, as well as develop a complex set of post-processing logic.

Here is a diagram that outlines the model and processing flow: End-to-end machine learning workflow at Brandless.

We pull a variety of data to train two separate models.

One collaborative filter utilizing user-product purchase data (ALS on Apache Spark) and one content-based system using product metadata.

Depending on the age of the product, we route between the two different models.

Next, we calculate cosine similarities between the input and the available recommendations.

Finally, we rank products based on similarity and other factors such as cross-category exposure.

We dreamed of using a system like Uber’s Michelangelo or Airbnb’s Bighead, but we weren’t able to find a set of tools that would meet our requirements.

Brandless has a relatively small (and highly utilized) engineering team, so we needed a system that could be built and maintained by the data team.

We had a fairly straightforward list of requirements: Easy sandbox to test a variety of models and complex pre- and post-processing.

Model-management system to keep track of different model versions and iterations.

Easy, engineering-minimal deployment capabilities.

Simple A/B testing frameworks.

Open-source software that we could contribute back to in the future (if possible).

How it Works After exploring a variety of different systems and tools, we landed on the following toolkit: Databricks Unified Analytics Platform: To develop, iterate, and test custom-built models.

MLflow: To log models and metadata, compare performance, and deploy to production.

Amazon Sagemaker: To host production models and run A/B tests on different models.

The solution feeds raw data from Amazon Redshift to Databricks Unified Analytics Platform, which trains recommendation system models and develop custom pre and post-processing logic.

We use the Databricks notebook functionality to collaborate in real time on model development and logic.

We also performs a bit of offline testing within the Databricks platform.

Next we push the data to our MLflow tracking server, which acts as a source of truth for our models.

MLflow will store the model hyperparameters, metadata as well as actual model artifacts.

Once we have two different models sitting on the tracking server, we use the MLflow deploy commands to push these models into Amazon Sagemaker.

Our environment is stored in a Docker container, and the custom-packaged model pipeline is sent into Amazon Sagemaker’s inference platform.

This final step also enables us to run multiple models in parallel, as shown below.

Production machine learning workflow.

Once we have pushed a model to production with Amazon SageMaker, we can use the A/B testing functionality to iterate through different models and understand which version performs best.

We will push two different model variants to a single endpoint and use the UpdateEndpointWeightsAndCapacities functionality to set specific weights to each model variant.

Users on Brandless.

com will now be assigned model variants in a random occurrence.

We record which variant each user receives as well as the actions taken after the recommendation system displays for the customer.

Finally, we calculate model performance for each variant.

Results Now that Brandless has been using the Databricks-MLflow-Amazon SageMaker combination, the deployment process has evolved and become more efficient over time.

It originated as a process that required manual checks as we trained the model, pushed to MLflow, and deployed to Amazon SageMaker.

We now have a one-click process that automatically checks for errors at each step.

We run this process roughly once a week.

Since our initial deployment, we have tested and iterated through approximately 10 different models in production.

These versions have different model hyperparameters, utilize increasingly complex post-processing, or combine multiple models in a hierarchy.

We measure online performance by calculating the percentage of Brandless.

com visitors that interact with our recommended product carousels.

We have seen performance increases on all but one of these model versions, with an estimated 15% improvement overall in comparison to our original model.

The team has also used the Databricks-MLflow-Amazon SageMaker combination to move faster with development for other ML models.

These use cases range from customer service improvements to logistics optimization, and all of the models follow the same process.

Challenges and Learnings We ran into a couple of challenges along the way!.Below, we outline a few of these and what we learned: Leave time for DevOps and bug fixes: Whether we were setting up proper AWS permissions or debugging an issue while alpha testing MLflow, we needed to leave more time for general debugging the first time the system was set up.

Each of the fixes and changes that we made apply to the system as a whole, meaning new models generally take less time from start to finish.

Optimize for latency: When we first deployed our model, we saw latency above 500ms due to complex Spark operations.

This was too slow for our use cases on Brandless.

com.

To handle this, we built a system that pre-computed model outputs, ultimately bringing latency down below 100ms.

We now consider and plan for latency at the beginning of model development.

Dependency management: Due to the variety of environments that our process uses (multiple systems executing code, different Spark clusters, etc.

) we occasionally run into problems managing our library and dataset dependencies.

To solve this problem, we wrap all of our custom code into a Python egg package and upload it into all systems.

This creates an extra step and can cause confusion across egg versions, so we hope to implement a fully-integrated container system in the future.

Summary The Databricks-MLflow-Sagemaker combination has allowed the Brandless data team to move much faster with development for other machine learning models.

We have used the system to build, schedule and serve a customer service optimization model, and we are currently working on a dynamic shipping fee model.

Looking forward, we plan to further build out the online testing system to provide better support for more advanced frameworks such as a multi-armed bandit.

We are continuing to optimize our deployment systems to add automation, alerting and monitoring.

We plan to contribute these upgrades to the MLflow open source project after we build and test internally.

Thanks for reading!.Watch our story here, and please feel free to reach out to Adam Barnhard (adam@brandless.

com) if you have any questions or would like to discuss further.

If you’re an existing Databricks user you can start using Managed MLflow right now.

Visit the Databricks Managed MLflow guide and the Quick Start notebook to get started.

 If you’re not yet a Databricks user, visit databricks.

com/mlflow to learn more and start a free trial of Databricks and Managed MLflow.

To learn more about open source MLflow, visit www.

mlflow.

org and join the community!.Try Databricks for free.

Get started today.. More details

Leave a Reply