Time Series Analysis, Visualization & Forecasting with LSTM

Photo credit: UnsplashTime Series Analysis, Visualization & Forecasting with LSTMStatistics normality test, Dickey–Fuller test for stationarity, Long short-term memorySusan LiBlockedUnblockFollowFollowingMay 16The title says it all.

Without further ado, let’s roll!The DataThe data is the measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years that can be downloaded from here.

Different electrical quantities and some sub-metering values are available.

However, we are only interested in Global_active_power variable.

import numpy as npimport matplotlib.

pyplot as pltimport pandas as pdpd.


float_format', lambda x: '%.

4f' % x)import seaborn as snssns.

set_context("paper", font_scale=1.


set_style('white')import warningswarnings.

filterwarnings('ignore')from time import timeimport matplotlib.

ticker as tkrfrom scipy import statsfrom statsmodels.


stattools import adfullerfrom sklearn import preprocessingfrom statsmodels.


stattools import pacf%matplotlib inlineimport mathimport kerasfrom keras.

models import Sequentialfrom keras.

layers import Densefrom keras.

layers import LSTMfrom keras.

layers import Dropoutfrom keras.

layers import *from sklearn.

preprocessing import MinMaxScalerfrom sklearn.

metrics import mean_squared_errorfrom sklearn.

metrics import mean_absolute_errorfrom keras.

callbacks import EarlyStoppingdf=pd.


txt', delimiter=';')print('Number of rows and columns:', df.


head(5)Table 1The following data pre-processing and feature engineering steps need to be done:Merge Date & Time into one column and change to datetime type.

Convert Global_active_power to numeric and remove missing values (1.


Create year, quarter, month and day features.

Create weekday feature, “0” is weekend and “1” is weekday.

df['date_time'] = pd.

to_datetime(df['Date'] + ' ' + df['Time'])df['Global_active_power'] = pd.

to_numeric(df['Global_active_power'], errors='coerce')df = df.


to_datetime(df['date_time']) df['year'] = df['date_time'].

apply(lambda x: x.

year)df['quarter'] = df['date_time'].

apply(lambda x: x.

quarter)df['month'] = df['date_time'].

apply(lambda x: x.

month)df['day'] = df['date_time'].

apply(lambda x: x.


loc[:,['date_time','Global_active_power', 'year','quarter','month','day']]df.

sort_values('date_time', inplace=True, ascending=True)df = df.


apply(lambda row: row["date_time"].

weekday(),axis=1)df["weekday"] = (df["weekday"] < 5).

astype(int)print('Number of rows and columns after removing missing values:', df.

shape)print('The time series starts from: ', df.


min())print('The time series ends on: ', df.


max())After removing the missing values, the data contains 2,049,280 measurements gathered between December 2006 and November 2010 (47 months).

The initial data contains several variables.

We will here focus on a single value : a house’s Global_active_power history, that is, household global minute-averaged active power in kilowatt.

Statistical Normality TestThere are several statistical tests that we can use to quantify whether our data looks as though it was drawn from a Gaussian distribution.

And we will use D’Agostino’s K² Test.

In the SciPy implementation of the test, we will interpret the p value as follows.

p <= alpha: reject H0, not normal.

p > alpha: fail to reject H0, normal.

stat, p = stats.



3f, p=%.

3f' % (stat, p))alpha = 0.

05if p > alpha: print('Data looks Gaussian (fail to reject H0)')else: print('Data does not look Gaussian (reject H0)')We can also calculate Kurtosis and Skewness, to determine if the data distribution departs from the normal distribution.



Global_active_power);print( 'Kurtosis of normal distribution: {}'.



Global_active_power)))print( 'Skewness of normal distribution: {}'.



Global_active_power)))Figure 1Kurtosis: describes heaviness of the tails of a distributionNormal Distribution has a kurtosis of close to 0.

If the kurtosis is greater than zero, then distribution has heavier tails.

If the kurtosis is less than zero, then the distribution is light tails.

And our Kurtosis is greater than zero.

Skewness: measures asymmetry of the distributionIf the skewness is between -0.

5 and 0.

5, the data are fairly symmetrical.

If the skewness is between -1 and — 0.

5 or between 0.

5 and 1, the data are moderately skewed.

If the skewness is less than -1 or greater than 1, the data are highly skewed.

And our skewness is greater than 1.

First Time Series Plotdf1=df.




ylabel('Global active power')plt.




title('Global Active Power Time Series')sns.


show();Figure 2Apparently, this plot is not a good idea.

Don’t do this.

Box Plot of Yearly vs.

Quarterly Global Active Powerplt.





boxplot(x="year", y="Global_active_power", data=df)plt.


title('Box plot of Yearly Global Active Power')sns.




boxplot(x="quarter", y="Global_active_power", data=df)plt.


title('Box plot of Quarterly Global Active Power')sns.


tight_layout();Figure 3When we compare box plot side by side for each year, we notice that the median global active power in 2006 is much higher than the other years’.

This is a little bit misleading.

If you remember, we only have December data for 2006.

While apparently December is the peak month for household electric power consumption.

This is consistent with the quarterly median global active power, it is higher in the 1st and 4th quarters (winter), and it is the lowest in the 3rd quarter (summer).

Global Active Power Distributionplt.




title('Global Active Power Distribution')plt.


probplot(df['Global_active_power'], plot=plt);df1.


TFigure 4Normal probability plot also shows the data is far from normally distributed.

Average Global Active Power Resampled Over Day, Week, Month, Quarter and Yeartimeseries_plot.

pyFigure 5In general, our time series does not have a upward or downward trend.

The highest average power consumption seems to be prior to 2007, actually it was because we had only December data in 2007 and that month was a high consumption month.

In another word, if we compare year by year, it has been steady.

Plot Mean Global Active Power Grouped by Year, Quarter, Month and Daygrouped_plot.

pyFigure 6The above plots confirmed our previous discoveries.

By year, it was steady.

By quarter, the lowest average power consumption was in the 3rd quarter.

By month, the lowest average power consumption was in July and August.

By day, the lowest average power consumption was around 8th of the month (don’t know why).

Global Active Power by YearsThis time, we remove 2006.



loc[df['year'] != 2006], values = "Global_active_power", columns = "year", index = "month").

plot(subplots = True, figsize=(12, 12), layout=(3, 5), sharey=True);Figure 7The pattern is similar every year from 2007 to 2010.

Global Active Power Consumption in Weekdays vs.

 Weekendsdic={0:'Weekend',1:'Weekday'}df['Day'] = df.



figure(figsize=(9,4)) plt1=sns.


6,fliersize=3, data=df) a.

legend(loc='upper center', bbox_to_anchor=(0.

5, 1.

00), shadow=True, ncol=2)sns.

despine(left=True, bottom=True) plt.


tight_layout() plt.


set_visible(False);Figure 8The median global active power in weekdays seems to be lower than the weekends prior to 2010.

In 2010, they were identical.

Factor Plot of Global Active Power by Weekday vs.


factorplot('year','Global_active_power',hue='Day', data=df, size=4, aspect=1.

5, legend=False) plt.

title('Factor Plot of Global active power by Weekend/Weekday') plt.

tight_layout() sns.

despine(left=True, bottom=True) plt.

legend(loc='upper right');Figure 9Both weekdays and weekends follow the similar pattern over year.

In principle we do not need to check for stationarity nor correct for it when we are using an LSTM.

However, if the data is stationary, it will help with better performance and make it easier for the neural network to learn.

StationarityIn statistics, the Dickey–Fuller test tests the null hypothesis that a unit root is present in an autoregressive model.

The alternative hypothesis is different depending on which version of the test is used, but is usually stationarity or trend-stationarity.

Stationary series has constant mean and variance over time.

Rolling average and the rolling standard deviation of time series do not change over time.

Dickey-Fuller testNull Hypothesis (H0): It suggests the time series has a unit root, meaning it is non-stationary.

It has some time dependent structure.

Alternate Hypothesis (H1): It suggests the time series does not have a unit root, meaning it is stationary.

It does not have time-dependent structure.

p-value > 0.

05: Accept the null hypothesis (H0), the data has a unit root and is non-stationary.

p-value <= 0.

05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.


pyFigure 10From the above results, we will reject the null hypothesis H0, the data does not have a unit root and is stationary.

LSTMThe task here will be to predict values for a time series given the history of 2 million minutes of a household’s power consumption.

We are going to use a multi-layered LSTM recurrent neural network to predict the last value of a sequence of values.

If you want to reduce the computation time, and also get a quick result to test the model, you may want to resample the data over hour.

I will keep it is in minute.

The following data pre-processing and feature engineering need to be done before construct the LSTM model.

Create the dataset, ensure all data is float.

Normalize the features.

Split into training and test sets.

Convert an array of values into a dataset matrix.

Reshape into X=t and Y=t+1.

Reshape input to be 3D (num_samples, num_timesteps, num_features).


pyModel ArchitectureDefine the LSTM with 100 neurons in the first hidden layer and 1 neuron in the output layer for predicting Global_active_power.

The input shape will be 1 time step with 30 features.

Dropout 20%.

Use the MSE loss function and the efficient Adam version of stochastic gradient descent.

The model will be fit for 20 training epochs with a batch size of 70.


pyMake Predictionstrain_predict = model.

predict(X_train)test_predict = model.

predict(X_test)# invert predictionstrain_predict = scaler.

inverse_transform(train_predict)Y_train = scaler.

inverse_transform([Y_train])test_predict = scaler.

inverse_transform(test_predict)Y_test = scaler.

inverse_transform([Y_test])print('Train Mean Absolute Error:', mean_absolute_error(Y_train[0], train_predict[:,0]))print('Train Root Mean Squared Error:',np.

sqrt(mean_squared_error(Y_train[0], train_predict[:,0])))print('Test Mean Absolute Error:', mean_absolute_error(Y_test[0], test_predict[:,0]))print('Test Root Mean Squared Error:',np.

sqrt(mean_squared_error(Y_test[0], test_predict[:,0])))Plot Model Lossplt.



history['loss'], label='Train Loss')plt.


history['val_loss'], label='Test Loss')plt.

title('model loss')plt.



legend(loc='upper right')plt.

show();Figure 11Compare Actual vs.

PredictionFor me, every time step is one minute.

If you resampled the data over hour earlier, then every time step is one hour for you.

I will compare the actual and predictions for the last 200 minutes.

aa=[x for x in range(200)]plt.


plot(aa, Y_test[0][:200], marker='.

', label="actual")plt.

plot(aa, test_predict[:,0][:200], 'r', label="prediction")# plt.

tick_params(left=False, labelleft=True) #remove ticksplt.





ylabel('Global_active_power', size=15)plt.

xlabel('Time step', size=15)plt.


show();Figure 12LSTMs are amazing!Jupyter notebook can be found on Github.

Enjoy the rest of the week!Reference:Multivariate Time Series Forecasting with LSTMs in KerasNeural networks like Long Short-Term Memory (LSTM) recurrent neural networks are able to almost seamlessly model…machinelearningmastery.


. More details

Leave a Reply