An End-to-End Project on Time Series Analysis and Forecasting with Python

Let’s get started!The DataWe are using Superstore sales data that can be downloaded from here.import warningsimport itertoolsimport numpy as npimport matplotlib.pyplot as pltwarnings.filterwarnings("ignore")'fivethirtyeight')import pandas as pdimport statsmodels.api as smimport matplotlibmatplotlib.rcParams['axes.labelsize'] = 14matplotlib.rcParams['xtick.labelsize'] = 12matplotlib.rcParams['ytick.labelsize'] = 12matplotlib.rcParams['text.color'] = 'k'There are several categories in the Superstore sales data, we start from time series analysis and forecasting for furniture sales.df = pd.read_excel("Superstore.xls")furniture = df.loc[df['Category'] == 'Furniture']We have a good 4-year furniture sales['Order Date'].min(), furniture['Order Date'].max()Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)Data PreprocessingThis step includes removing columns we do not need, check missing values, aggregate sales by date and so on.cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']furniture.drop(cols, axis=1, inplace=True)furniture = furniture.sort_values('Order Date')furniture.isnull().sum()Figure 1furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()Indexing with Time Series Datafurniture = furniture.set_index('Order Date')furniture.indexFigure 2Our current datetime data can be tricky to work with, therefore, we will use the averages daily sales value for that month instead, and we are using the start of each month as the timestamp.y = furniture['Sales'].resample('MS').mean()Have a quick peek 2017 furniture sales data.y['2017':]Figure 3Visualizing Furniture Sales Time Series Datay.plot(figsize=(15, 6)) 4Some distinguishable patterns appear when we plot the data..Office SuppliesAccording to our data, there were way more number of sales from Office Supplies than from Furniture over the = df.loc[df['Category'] == 'Furniture']office = df.loc[df['Category'] == 'Office Supplies']furniture.shape, office.shape((2121, 21), (6026, 21))Data ExplorationWe are going to compare two categories’ sales in the same time period..This means combine two data frames into one and plot these two categories’ time series into one plot.cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']furniture.drop(cols, axis=1, inplace=True)office.drop(cols, axis=1, inplace=True)furniture = furniture.sort_values('Order Date')office = office.sort_values('Order Date')furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()office = office.groupby('Order Date')['Sales'].sum().reset_index()furniture = furniture.set_index('Order Date')office = office.set_index('Order Date')y_furniture = furniture['Sales'].resample('MS').mean()y_office = office['Sales'].resample('MS').mean()furniture = pd.DataFrame({'Order Date':y_furniture.index, 'Sales':y_furniture.values})office = pd.DataFrame({'Order Date': y_office.index, 'Sales': y_office.values})store = furniture.merge(office, how='inner', on='Order Date')store.rename(columns={'Sales_x': 'furniture_sales', 'Sales_y': 'office_sales'}, inplace=True)store.head()Figure 12plt.figure(figsize=(20, 8))plt.plot(store['Order Date'], store['furniture_sales'], 'b-', label = 'furniture')plt.plot(store['Order Date'], store['office_sales'], 'r-', label = 'office supplies')plt.xlabel('Date'); plt.ylabel('Sales'); plt.title('Sales of Furniture and Office Supplies')plt.legend();Figure 13We observe that sales of furniture and office supplies shared a similar seasonal addition, average daily sales for furniture are higher than those of office supplies in most of the months..Let’s find out when was the first time office supplies’ sales surpassed those of furniture’s.first_date = store.ix[np.min(list(np.where(store['office_sales'] > store['furniture_sales'])[0])), 'Order Date']print("Office supplies first time produced higher sales than furniture is {}.".format( supplies first time produced higher sales than furniture is 2014–07–01.It was July 2014!Time Series Modeling with ProphetReleased by Facebook in 2017, forecasting tool Prophet is designed for analyzing time-series that display patterns on different time scales such as yearly, weekly and daily..We will now join them together to compare their future forecasts.furniture_names = ['furniture_%s' % column for column in furniture_forecast.columns]office_names = ['office_%s' % column for column in office_forecast.columns]merge_furniture_forecast = furniture_forecast.copy()merge_office_forecast = office_forecast.copy()merge_furniture_forecast.columns = furniture_namesmerge_office_forecast.columns = office_namesforecast = pd.merge(merge_furniture_forecast, merge_office_forecast, how = 'inner', left_on = 'furniture_ds', right_on = 'office_ds')forecast = forecast.rename(columns={'furniture_ds': 'Date'}).drop('office_ds', axis=1)forecast.head()Figure 16Trend and Forecast Visualizationplt.figure(figsize=(10, 7))plt.plot(forecast['Date'], forecast['furniture_trend'], 'b-')plt.plot(forecast['Date'], forecast['office_trend'], 'r-')plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')plt.title('Furniture vs. Office Supplies Sales Trend');Figure 17plt.figure(figsize=(10, 7))plt.plot(forecast['Date'], forecast['furniture_yhat'], 'b-')plt.plot(forecast['Date'], forecast['office_yhat'], 'r-')plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')plt.title('Furniture vs. Office Supplies Estimate');Figure 18Trends and PatternsNow, we can use the Prophet Models to inspect different trends of these two categories in the data.furniture_model.plot_components(furniture_forecast);Figure 19office_model.plot_components(office_forecast);Figure 20Good to see that the sales for both furniture and office supplies have been linearly increasing over time and will be keep growing, although office supplies’ growth seems slightly stronger.The worst month for furniture is April, the worst month for office supplies is February.. More details

Leave a Reply