Building Sales Prediction Web Application using Machine Learning Dataset

IntroductionThere are a lot of resources on the internet about finding insights and training models on machine learning datasets however very few articles on how to use these models for building actual applications.

So today we are going to learn this process by first training a video game sales prediction model using a dataset from a hackathon and then use the trained model for creating a basic app that gives us sales prediction based on user inputs.

This article is divided into sections that you can pick up one by one instead of trying to finish it one go.

It took me a full week to finish the app from the point when I first picked up the dataset.

Therefore, take your own time and focus on learning various aspects of building the app rather than the final product.

If you are ready then start of your favorite music playlist in the background and let’s begin…                            Part 1: Generating the modelWe are going to use a dataset from the Video Games sales prediction hackathon which ran on the Machine Hack website.

First, create an account on MachineHack and register for the hackathon on this link.

Once registered go to the Data tab and download the zip file which will have three files viz Train, Test, and Sample Submission.

Next steps will be covered in Google Colab notebook which you can open and clone from this link: Google Colab Notebook or if you would like to download and run the notebook locally or on other platforms then download the notebook from this GitHub link: Jupyter Notebook LinkThe first section in the notebook gives a brief overview of the problem statement.

Upload the files we receive by running the next code cell shown below-from google.

colab import files uploaded = files.

upload() for fn in uploaded.

keys(): print(User uploaded file “{name}” with length {length} bytes.

format( name=fn, length=len(uploaded[fn])))In the next code cell, we import the required python packages.

Most of them are pre-installed in Google Colab so there is no need to install any of them.

Since we cannot submit test data for evaluation after the hackathon ended, I am going to use data only for Train.

csv for the rest of this article.

Keep in mind that Train.

csv has a lesser number of rows than what we would typically use for training a model properly.

However, for learning purposes, we can use a dataset with a lesser number of rows.

Let’s now deep dive into solving this Machine Learning Problem… Step 1: Identifying target and independent featuresFirst, let’s import Train.

csv into a pandas dataframe and run df.

head() to see the columns in the dataset.

Column valuesFrom the dataframe, we can see that the target column is SalesInMillions and rest of the columns are independent featuresStep 2: Cleaning the data setFirst, we check for null values by running input.


sum() command.



sum() #Output: ID 0 CONSOLE 0 YEAR 0 CATEGORY 0 PUBLISHER 0 RATING 0 CRITICS_POINTS 0 USER_POINTS 0 SalesInMillions 0 dtype: int64We can see that there are no null values in the dataset.

Next, we can drop unnecessary ID column since it does not play a role in target sales by running below command:- input = input.

drop(columns=[ID])Next, we can split the dataframe into training and test dataset using train_test_split command:-train, test = train_test_split(input, test_size=0.

2, random_state=42, shuffle=True)Step 3: Exploratory Data AnalysisDescriptive Statistics Using df.

shape command we can find a count of total rows in the dataset and df.

nunique() command can be used to find unique values in each of the columns.

CONSOLE 17 YEAR 23 CATEGORY 12 PUBLISHER 184 RATING 6 CRITICS_POINTS 1499 USER_POINTS 1877 SalesInMillions 2804In the EDA section, we make use of pandas profiling and matplotlib packages to generate graphs of various columns and observe their relationships with the target column.

A few insight gained from EDA are:-Sales were highest for the PS3 platform.

It was followed by Xbox360: Sales were highest for the action category and lowest for puzzles And sales were highest in the year in the period from 2007 to 2011: Usually, we go for feature engineering or feature selection steps after EDA.

But we have fewer features and emphasis on actually using the model.

So we are moving forward towards the next steps.

However, keep in mind that USER_POINTS and CRITICS_POINTS columns can be used to derive extra features.

Step 4: Building a modelWe are going to use catboost regression model for our dataset since we have a lot of categorical features.

This skips the step of label encoding categorical features since catboost can work on categorical features directly.

First, we install catboost package using pip install command.

Then we create a list of categorical features, pass it over to the model and then fit the model on train dataset:import catboost as cat cat_feat = [CONSOLE,CATEGORY, PUBLISHER, RATING] features = list(set(train.

columns)-set([SalesInMillions])) target = SalesInMillions model = cat.

CatBoostRegressor(random_state=100,cat_features=cat_feat,verbose=0) model.

fit(train[features],train[target])Step 5: Check model accuracyFirst, we create true predictions from test dataset:y_true= pd.

DataFrame(data=test[target], columns=[SalesInMillions]) test_temp = test.

drop(columns=[target])Next, we run our trained model on test dataset to get model predictions and check model accuracyy_pred = model.

predict(test_temp[features]) from sklearn.

metrics import mean_squared_error from math import sqrt rmse = sqrt(mean_squared_error(y_true, y_pred)) print(rmse) #Output: 1.

5555409360901584We have an RMSE value of 1.

5 which is pretty decent.

For more information about accuracy metrics in case of regression problems, you can refer to this article.

If you would like to improve the model further or try to combine various models you can refer to the approaches of the winners of this hackathon in this article: Analytics Vidya Step 6: Save the model into a pickle fileWe can now save our model into a pickle file and then save it locally:import pickle filename = finalized_model.

sav pickle.

dump(model, open(filename, wb))Once you save the pickle file, you can download and save it locally from the left sidebar in the files section of Google colab notebook:Bonus TipsAdd more data We can improve the model predictions by adding more data to the model.

A few related datasets which we can use are on Kaggle.

Improve model efficiency We can use a stack of combined models to improve model efficiency a bit further.

If you have reached this step then give yourself a pat on the back because we just finished the first major section of our project.

Take a break for a while, do stretches, change the song to your playlist, and then start off into the next section of this article.

Part 2: Creating backend API from modelWe are going to use Python Flask to create our backend APIs.

So first create a folder named server locally.

Also, install Python and pip package manager on your machine if you have not yet.

Next, we need to create a virtual environment within our folder.

You can find more details here: Python venv.

I am using python3 locally on Linux so the command for creating a virtual environment for me would be: python3 -m venv server You can check out the relevant command for your OS and Python version in this article: Python venvNext, we will activate our virtual environment by running the command: source server/bin/activateOnce done we need to install Flask pip package: pip install -U FlaskNext, create a file with name `app.

py` within the server folder with your preferred text editor and add the below code to create a basic API:from flask import Flask, jsonify, make_response, request, abort app = Flask(__name__) @app.

route(“/”) def hello(): return “Hello World!” if __name__ == “__main__”: app.

run()Now open a terminal and run python3 app.

py to start the server.

This will mostly start the server on 5000 port.

In order to test the API, open this link in browser: http://localhost:5000/.

You should get Hello World printed in the browser.

If not then check if the API is running on a different port or if there is an error printed in the terminal while starting the API.

We are going to use a POST API call, so it would be better to install Postman tool before proceeding further.

Use this tool will be to send a POST request to the server.

Next, we need to install catboost, pandas and Flask-Cors pip packages by using this command:- pip install catboost pandas Flask-CorsNext copy the trained model’s pickle file(finalized_model.

sav), which we downloaded at the end of Part 1, into the server folder.

Now update the app.

py with the following code:-from flask import Flask, jsonify, make_response, request, abort import pandas as pd import catboost import pickle from flask_cors import CORS,cross_origin model = pickle.

load(open( “finalized_model.

sav”, “rb”)) app = Flask(__name__) app.

config[CORS_HEADERS] = Content-Type cors = CORS(app) @app.

errorhandler(404) def not_found(error): return make_response(jsonify({error: Not found}), 404) @app.

route(“/”) def hello(): return “Hello World!” @app.

route(“/get_prediction”, methods=[POST,OPTIONS]) @cross_origin() def get_prediction(): if not request.

json: abort(400) df = pd.


json, index=[0]) cols=[“CONSOLE”,”RATING”,”CRITICS_POINTS”,”CATEGORY”,”YEAR”,”PUBLISHER”,”USER_POINTS”] df = df[cols] return jsonify({result: model.

predict(df)[0]}), 201 if __name__ == “__main__”: app.

run()In line 6 we import the trained model into our python file.

On line 10 we initialize the CORS module to allow requests from client-side API calls.

On line 11 we have defined an error handler which will send error response if any unhandled exception or undefined path is accessed from the server.

The main point of interest for us is the get_prediction POST API defined from line 19.

get_prediction method is where we get the data from the client-side and provide sales prediction in response.

On line 24 we convert the data from API request into a pandas dataframe.

Now our model expects columns to be in a particular order to provide a proper response.

Therefore in line 25, we specify the column order.

In the next steps rearrange the columns in the required order.

On line 27 model.

predict is used to get a prediction from the model and it is passed as a response to the client.

At this step, we are ready with the API locally.

We can test the API in Postman client by sending POST API call as shown in the screenshot:A sample JSON which you can append in the body section of the above request can be found in code gist here: Github Gist.

Make sure to select raw option in body and body type as JSON as shown in the screenshot along with request type as POST.

If everything works fine till this step, then congrats, you now have a backend API which gives a prediction from a trained model based on input parameters.

 Bonus TipsModularise code Writing APIs in a single file is not a recommended approach for backend design and we can separate the routes and model imports into different folders to make code a bit more modular.

This will also allow us to scale the code in manageable fashion if we introduce additional API routes in the future.

At this point, we can again take a break, have a snack break, and make sure to bookmark this article so that it’s easy to come back to it for starting the next section of the project.

Part 3: Deploying backend API to HerokuTill now our API worked locally but we need to deploy it on a remote server to be used from elsewhere.

For that, we are going to use Heroku as our API hosting platform.

I mainly referred to the article from stackabuse to deploy the app to Heroku.

We will be covering the steps in brief but if you get stuck in any of these steps then refer the original article here: Stackabuse ArticleFirst, we install gunicorn using the terminal command: pip install gunicornNext, run the below command to store all the installed pip packages until now to requirement.

txt file: pip freeze > requirements.

txtYou can refer to requirements.

txt file uploaded here for reference: Github link.

Next, create a file with the name Procfile in server folder with below code: web: gunicorn app:app Now sign up on Heroku website, create an app on the site and install Heroku CLI as described in the original article.

Next, login into Heroku from terminal locally by running below command: heroku login -iAdd your Heroku app git reference using below command:- heroku git:remote -a {your-project-name}Now push the code to Heroku using the below command:- git push heroku masterAt the end of the running the above command, you will get API URL in the terminal output, using which now we can now make calls from the client-side.

We can also send an API request from the PostMan app at this point to see if we receive the response properly similar to the way described at the end of Step 2.

The codebase until this point can be found in Github repo here: Videogame Sales BackendWe now have a proper working API hosted on a server.

If everything worked well then we can move ahead with the development of the client-side app.

If you encountered any problems, then mention your problems in the comments section.

Or reach out to me at the social handles mentioned at the end of the article.

Part 4: Creating a client-side app using react and bootstrapWe will need Node.

js installed and set-up properly on our machine.

So download and install Node.

js for your relevant OS and system before proceeding further.

Also, installation of the yarn package manager is recommended: Yarn InstallationNow create a new folder called frontend outside the server folder created in the previous step and go inside the frontend folder from the terminal.

Next, we will create a new react application and start it by running below commands in the terminal:npx create-react-app sales-prediction-app cd sales-prediction-app npm startYou should see the browser tab open up and react.

js default template app visible in the browser.

Now we need to open this project in our favorite editor(I am using VSCode) and start making changes to build our frontend application.

First, we need to import relevant bootstrap files in the index.

html file found in the public folder within our app.

We need to add files as per instructions provided by bootstrap docs in the index.

html file as shown below:-<head> .

<link rel=”apple-touch-icon” href=”%PUBLIC_URL%/logo192.

png” /> <link rel=”stylesheet” href=”https://stackpath.






css” integrity=”sha384-9aIt2nRpC12Uk9gS9baDl411NQApFmC26EwAOH8WgZl5MYYxFfc+NcPb1dKGj7Sk” crossorigin=”anonymous”> .

</head> <body> .

<div id=”root”></div> .

<script src=”https://code.







js” integrity=”sha384-DfXdz2htPH0lsSSs5nCTpuj/zy4C+OGpamoFVy38MVBnE+IbbVYUew+OrCXaRkfj” crossorigin=”anonymous”></script> <script src=”https://cdn.


net/npm/[email protected]/dist/umd/popper.


js” integrity=”sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo” crossorigin=”anonymous”></script> <script src=”https://stackpath.






js” integrity=”sha384-OgVRvuATP1z7JjHLkuOU7Xw704+h835Lr+6QL9UvYjZE3Ipu6Tp75j7Bh/kR0JKI” crossorigin=”anonymous”></script> .

</body>Our final UI is a collection dropdown items of which a single item is as shown below:                                        We will create a JSON file with the name optionsSources.

json within the src folder.

Each entry in the JSON file contains the below object:{ “CONSOLE”: { “options”: [ “ps2″,”x360″,”ps3″,”pc” ], “icon”: “????️”, “dropDownPlaceholder”: “Select Console” } }The options which are shown in the dropdown are in options array, the icon and the label which is shown to the left of dropdown selection are in icon and dropDownPlaceholder key item.

We need to create multiple dropdowns like this, so the complete JSON file to be added is as shown in the file here: GithubNext, we need to implement the dropdown component within our app.

Create a folder called components within the src folder and create a file named OptionSelection.

js within the components folderWe are going to write a functional component which returns a dropdown item as shown below:import React,{ useState } from react; import optionSources from .


json; function OptionSelection({itemKey, setOptionInObject}) { const title = optionSources[itemKey].

dropDownPlaceholder; const icon = optionSources[itemKey].

icon; return( <div className=”d-flex justify-content-start align-items-center mt-2 selection-item”> <div className=”option-label”> <b><span role=”img” aria-label=”label-icon”>{icon}</span>{` ${title}`}</b> </div> <div className=”dropdown ml-4″> <button className=”btn btn-primary dropdown-toggle” type=”button” id=”dropdownMenuButton” data-toggle=”dropdown” aria-haspopup=”true” aria-expanded=”false”> {title} </button> <div className=”dropdown-menu” aria-labelledby=”dropdownMenuButton”> {renderOptionsDropdown()} </div> </div> </div> ) } export default OptionSelection;In the above component, we get itemKey as prop(param) value from the parent component on line 3.

Let’s assume that itemKey received from the parent component is CONSOLE.

On lines 4 and 5 we first extract the title and icon that we display at the left side of the dropdown.

Then we have used HTML tags in return function from line 6 as per Boostrap docs on creating dropdown.

Next, we need to implement renderOptionsDrop function which is defined within return function as shown below:import optionSources from .


json; function OptionSelection({itemKey, setOptionInObject}) { .

const renderOptionsDropdown = () => { const selectionOptions = optionSources[itemKey].

options; return selectionOptions.

map((selectionOption, index)=>{ return ( <div className=”dropdown-item pointer” key={`${index}${selectionOption}`} onClick={() => handleDropDownSelection(selectionOption)} > {selectionOption} </div> ); }) } .

}On line 5, we get the options array for a particular item from the optionSources JSON object and store it in selectionOptions variable.

Then on line 6 we iterate over the array using map function and show dropdown selection items.

We have to update the selected value for a dropdown item using onClick function on line 10.

Function in onClick handler viz handleDropDownSelection is then implemented as shown below:import React,{ useState } from react; .

function OptionSelection({itemKey, setOptionInObject}) { const [currentSelectedOption, setSelectedOption] = useState(null); const handleDropDownSelection = (consoleOption) => { setSelectedOption(consoleOption) setOptionInObject(itemKey, consoleOption) } .

}We are have imported useState hook on line 1.

It is an internal function available in react.

js library which allows us to dynamically update values using the concept of state variables.

More info on the function can be found here: React useState docs.

On line 7 we update the selected option for the dropdown.

And In line 8 we pass the selected value back to parent function for further processing.

The complete code for this component can be found here: GithubThen we make changes in the App.

js file within the src folder to import this option selection component above and make an API call to the server.

The complete code can be be found here: GithubThen we make an API call to the backend in the handleInputSubmission function as shown below:import React, {useState} from react; import axios from axios; function App() { .

const handleInputSubmission = () => { if(selectedObject && Object.


length === 7) { .





then(function (response) { setPredictionLoading(false) setModelPrediction(response.


result) }) .

catch(function (error) { setPredictionLoading(false) setRequestFailed(“Some error ocurred while fetching prediction”) }); } else { setRequestFailed(“Please select all fields before submitting request”) } } }We are making use of Axios npm module to make a POST API call to backend Heroku server.

Make sure that you add your own Heroku server URL on line 8 for the process.


REACT_APP_HEROKU_SERVER_URL placeholder to receive the response from your server API.

It’s a good practice to keep the API URL variables in .

env file and set them on the deployment environment later.

More details can be found here: Env variables.

Find the complete source for the frontend app on Github here: Video game sales prediction frontendThis leaves us with the final step of the deployment of our web app online.

So hold your patience and persistence a bit longer and let’s start off with the last step of our project.

 Part 5: Deploying the client-side app to NetlifyNetlify is a platform to deploy static websites online with ease.

It has a quite easy process when deploying applications made using create-react-app module.

We are going to make use of this service to host our web app online.

First, we need to create an account on Github.

Then we need to upload the frontend folder to Github repo.

We can follow the steps shown in official docs for deploying the project to Github: Official docsOnce the project is live on GitHub deployment to netlify is straight forward and easy process by following their official docs here: Netlify DeployIf you have used environment variable for server URL in the previous step then make sure to add them in netlify dashboard as shown in this doc.

Finally, we will have a web app live which looks like below:Bonus TipsImprove UI and color scheme To be honest the UI above is pretty basic.

It does not have a great color scheme(mainly because developers like me are not good designers).

You can improve the design and tweak the CSS for a better overall look at the web page.

 End NotesThis finishes the process of creating a sale prediction web application from a machine learning hackathon dataset.

If you finished the project without any hiccups on the path, then kudos to your analytical and coding skills.

But if at any point in time you do get stuck then Google and StackOverflow are our best friends as usual.

If you still can’t figure out the solution to your problems then feel free to mention them in comments or reach out to me on LinkedIn or Twitter.

About the AuthorSaurabh MhatreI have worked as a frontend developer in the industry for the last 5 years.

I have a part-time master’s degree in data science from GreyAtom and have been learning data science concepts for the last 2 years.

In my spare time, I usually create side-projects, teach college students, and write technical articles.

Twitter: https://twitter.

com/saurabhnative LinkedIn: https://www.


com/in/saurabh-mhatre You can also read this article on our Mobile APP Related Articles (adsbygoogle = window.

adsbygoogle || []).


Leave a Reply