Predicting the ‘Future’ with Facebook’s Prophet

Predicting the ‘Future’ with Facebook’s ProphetForecasting Medium’s statistics using Facebook’s Prophet LibraryParul PandeyBlockedUnblockFollowFollowingMar 22Photo by Justin Clark on UnsplashWhen hundreds or even thousands of forecasts are made, it becomes important to let machines do the hard work of model evaluation and comparison while efficiently using human feedback to fix performance problems :Creators of ProphetForecasting is a technique that uses historical data as inputs to make informed estimates that are predictive in determining the direction of future trends.

It is an important and common data science task in organisations today.

Having prior knowledge of any event can help a company tremendously in the formulation of its goals, policies and planning.

However, producing high-quality and reliable forecasts comes with challenges of its own.

Forecasting is a complex phenomenon both for humans and for machines.

It also requires very experienced time series analysts which as a matter of fact are quite rare.

Prophet is a tool that has been built to address these issues and provides a practical approach to forecasting “at scale”.

It intends to automate the common features of business time series by providing simple and tunable methods.

Prophet enables the analysts with a variety of backgrounds to make more forecasts than they can do manually.

ObjectiveJust like Charity should begin at home, Data Science should also follow suit.

Today, we are surrounded by so much data right from our smartphones to fitness bands to smart TVs to our activity on the web.

We can easily analyse this data to know ourselves better.

This would also be better than repeatedly working with the age-old Titanic or Iris database.

I have been writing on Medium for quite some time now and I thought why not make use of this data.

Since Prophet appears so promising, I thought of trying it out to predict the number of views that my articles will get on Medium in the next few days.

Not only will this be interesting, but I will also get an idea about the popularity index of my articles in advance.

My Medium Stats for the last 30 DaysI will update this article at the end of April by posting in the actual views that I get compared to the predictions, to check the Prophet’s accuracy.

The code for this article can be accessed from the associated Github Repository or you can view it on my binder by clicking the image below.

The data has been taken from the Stats page of my Medium account.

I have the data from 2nd July 2018 till 21st March 2019, both dates inclusive.

An Overview of ProphetProphet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.

It works best with time series that have strong seasonal effects and several seasons of historical data.

Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

There is a white paper published by the creators of Prophet library.

The paper highlights two main scenarios which were the motivation for the research behind Prophet.

First, completely automatic forecasting techniques can be hard to tune and are often too inflexible to incorporate useful assumptions or heuristics.

Second, the analysts responsible for data science tasks throughout an organization typically have deep domain expertise about the specific products or services that they support, but often do not have training in time series forecasting.

Analysts who can produce high quality forecasts are thus quite rare because forecasting is a specialized skill requiring substantial experienceApproachProphet follows an analyst-in-the-loop approach to business forecasting at scale.

Schematic view of the analyst-in-the-loop approach to forecasting at scaleThis approach begins by modelling a time series using the parameters specified by analysts, producing forecasts and then evaluating them.

Whenever a performance issue or a need for human intervention crops up, these issues are flagged to human analysts so that they can then inspect the forecast and potentially adjust the model based on this feedback.

Advantages:Open SourceAccurate and fastAllows for a large number of people to make forecasts, possibly without training in time series methods;Tunable parametersAvailable for both Python and RInstallationProphet is an open source software released by Facebook’s Core Data Science team.

It is available for download on CRAN and PyPI.

The complex statistical modelling is handled by the Stan library and is a prerequisite for Prophet.

PyStan has its own installation instructions.

Install pystan with pip before using pip to install fbprophet.

On Windows, PyStan requires a compiler so you’ll need to follow the instructions.

The key step is installing a recent C++ compiler.

Go through the installation instructions below to get everything in place properly.

InstallationProphet has two implementations: R and Python.

Note the slight name difference for the Python package.

facebook.

github.

ioIn this article, I will use the Python API of Prophet.

Importing the datasetThe input to Prophet is always a dataframe with two columns: ds and y.

The ds(datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM: SS for a timestamp.

The y column must be numeric and represents the measurement we wish to forecast.

For this analysis, I shall be using an excel file that contains the total ‘Views’ that my articles have garnered to date.

I roughly started writing on Medium in July 2018 and have published about 53 blogs in this time period.

import pandas as pdimport numpy as npfrom fbprophet import Prophetimport matplotlib.

pyplot as pltplt.

style.

use("fivethirtyeight")# for pretty graphsdf = pd.

read_excel('medium_stats.

xlsx')df.

head()DatasetAnalysing the datatypesdf.

dtypesDate datetime64[ns]Views int64dtype: objectPlotting to get insightsdf.

set_index('Date').

plot();A plot of ‘View’ with Time.

The plot shows a more or less upward trend but we will have to do further analysis to arrive at a concrete conclusion.

Also, there appears to be a possible outlier around March’2019.

On inspecting the data, I found that this appears around 8th March which was when I published the article: From ‘R vs Python’ to ‘R and Python’.

There was a sudden spike on that day with maximum views reaching up to 17,688.

Views on 8th March 2019Since this was just a single day phenomenon, it can be considered as an outlier.

The best way to handle outliers is to remove them — Prophet has no problem with missing data.

I will set these ‘Views’ > 10,000 to nan so that they have no bearing on the result.

df.

loc[(df['Views'] > 10000), 'Views'] = np.

nandf.

set_index('Date').

plot();Plot without outliersThis makes much more sense now as the trend doesn’t show abrupt spikes.

Making the PredictionsMaking the dataset ‘Prophet’ compliant.

Let’s convert the data in the format desired by Prophet.

We shall rename ‘Date’: ‘ds’ and ‘Views’: ‘y’df.

columns = ['ds','y']df.

head()Prophet follows the sklearn model API wherein an instance of the Prophet class is created and then the fit and predict methods are called.

The model is instantiated by a new Prophet object and followed by calling its fit method and passing in the historical dataframe.

SeasonalitiesProphet will by default fit weekly and yearly seasonalities if the time series is more than two cycles long.

It will also fit daily seasonality for a sub-daily time series.

You can add other seasonalities (monthly, quarterly, hourly)if required.

m1 = Prophet(daily_seasonality=True)m1.

fit(df)ForecastingBy default, Prophet uses a linear model for its forecast but a logistic model can also be used by passing it as an attribute.

#m = Prophet(growth='logistic')#m.

fit(df)For forecasting, we need to tell Prophet how far to predict in future.

For this, we need to make a dataframe for future predictions using make_future_dataframe .

I would only like to predict the views for the next 60 days.

future1 = m1.

make_future_dataframe(periods=60)forecast1 = m1.

predict(future1)forecast1.

tail().

Tforecast is essentially a pandas dataframe and consists of a lot of fields.

The predict method will assign each row in future a predicted value which it names yhat and the range is defined by yhat_lower andyhat_upper .

These ranges can be considered as uncertainty levels.

To see the last 5 predicted values, we need to use the tail function.

I will use the Transpse function to transpose the dataframe so that all the columns are visible.

The screenshot below shows the predicted values from 16th May’2019 to 20th May’2019.

Lat 5 predicted valuesHowever, we are only interested in the yhat, yhat_lower and y_hat_upper values but I have shown these values above to highlight Prophet’s ability to infer various values from the given data.

Essentially, our desired dataframe will be as follows:forecast1[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].

tail()Plotting the ForecastVisualisation, especially in Time Series data, makes more sense than mere numbers.

Let’s plot the forecast by calling the Prophet.

plot method and passing in the forecast dataframe.

m1.

plot(forecast1);Plotting forecast for the next 60 daysWe can also see the forecast components, by using the Prophet.

plot_components method.

This will show us the daily and weekly trends which makes the picture more clear.

If you have data pertaining to different years, you can enable the yearly_seasonality to observe the yearly trends also.

m1.

plot_components(forecast1);weekly and daily trends for the forecastThe trend shows an increase in the views with time which is good.

Also, more views are garnered from Monday to Friday than at the weekends.

The daily trend is more or less constant.

Holiday EffectHolidays and events provide large, somewhat predictable shocks to many business time series and often do not follow a periodic pattern, so their effects are not well modelled by a smooth cycle.

Prophet gives the ability to include the fluctuations caused by these holidays or special events in the forecast model.

I have observed that there is a sudden spike in the views the day I publish my article.

This phenomenon lasts for about 4 to 5 days and then there is a decline.

So, I can consider the day of publishing my article as a holiday.

Also, I will define an upper_window consisting of 5 days for which there is an increased viewership.

“The Columns lower_window and upper_window extend the holiday out to [lower_window, upper_window] days around the date.

For instance, if you wanted to include Christmas Eve in addition to Christmas you’d include lower_window=-1,upper_window=0.

If you wanted to use Black Friday in addition to Thanksgiving, you’d include lower_window=0,upper_window=1".

Read more about holiday effect here.

I will define a holiday dataframe that contains the dates and the holiday description.

We can also pass future dates in the dataframe.

Since I will publish this article on 23rd March 2019, I have also included it in the database.

Once the dataframe has been created, holiday effects can be included in the forecast by passing them in with the holidaysargument and the rest of the forecasting process remain the same as above:m2 = Prophet(holidays=articles,daily_seasonality=True).

fit(df)future2 = m2.

make_future_dataframe(periods=90)forecast2 = m2.

predict(future2)m2.

plot(forecast2);Let’s plot the components again.

This time we will also see the trend for the holidays.

m2.

plot_components(forecast2);Predicting Views for the next 15 daysNow that we have an idea about Prophet and its functionalities, let’s predict the future views that my Medium Blog will get based on the historical data.

Let’s predict the views from 23rd’March,2018 to 6th’April, 2019.

The process remains the same as above except that we shall introduce a parameter called mcmc_samples.

By default, Prophet will only return uncertainty in the trend and observation noise.

To get uncertainty in seasonality, you must do full Bayesian sampling.

This is done using the parameter mcmc.

samples (which defaults to 0).

There are many other ways to tweak the model and I would encourage you to go through the documentation in detail.

m3 = Prophet(holidays=articles, mcmc_samples=300).

fit(df)future3 = m3.

make_future_dataframe(periods=60)forecast3 = m3.

predict(future3)forecast3["Views"] = (forecast3.

yhat).

round()forecast3["Views_lower"] = (forecast3.

yhat_lower).

round()forecast3["Views_upper"] = (forecast3.

yhat_upper).

round()forecast3[(forecast3.

ds > "3-22-2019") & (forecast3.

ds < "4-07-2019")][["ds","Views_lower", "Views", "Views_upper"]]Projected Views for the next 15 DaysThe model predicts around 5k views on the day of the publishing of this article.

However, the actual spike comes on Monday.

This is in conformity with our model which predicted that more views are observed on weekdays than on weekends.

ConclusionThis was an interesting study and it’ll be great to analyse the results when I get the actual views.

Having said that, Prophet does make the entire forecasting process easy and intuitive and also gives a lot of options.

The actual advantage of this model can only be assessed on large datasets but Prophet does enable forecasting a large number and a variety of time series problems — which is truly forecasting at scale.

.

. More details

Leave a Reply