Time Series in Python — Exponential Smoothing and ARIMA processes

Time Series in Python — Exponential Smoothing and ARIMA processesBenjamin EtienneBlockedUnblockFollowFollowingFeb 9TL;DR: In this article you’ll learn the basics steps to performing time-series analysis and concepts like trend, stationarity, moving averages, etc.

You’ll also explore exponential smoothing methods, and learn how to fit an ARIMA model on non-stationary data.

DefinitionA time series is a data sequence ordered (or indexed) by time.

It is discrete, and the the interval between each point is constant.

Properties and types of seriesTrend : A long-term increase or decrease in the data.

This can be seen as a slope (is doesn’t have to be linear) roughly going through the data.

Seasonality : A time series is said to be seasonal when it is affected by seasonal factors (hour of day, week, month, year, etc.


Seasonality can be observed with nice cyclical patterns of fixed frequency.

Cyclicity : A cycle occurs when the data exhibits rises and falls that are not of a fixed frequency.

These fluctuations are usually due to economic conditions, and are often related to the “business cycle”.

The duration of these fluctuations is usually at least 2 years.

Left: Series with a trend.

Right: Series exhibiting seasonal behaviorLeft: Cyclical series.

Right: Random walkThe golden rule: StationarityBefore going any further into our analysis, our series has to be made stationary.

Stationarity is the property of exhibiting constant statistical properties (mean, variance, autocorrelation, etc.


If the mean of a time-series increases over time, then it’s not stationary.

Transforms used for stationarizing data:De-trending : We remove the underlying trend in the series.

This can be done in several ways, depending on the nature of data :- Indexed data: data measured in currencies are linked to a price index or related to inflation.

Dividing the series by this index (ie deflating) element-wise is therefore the solution to de-trend the data.

– Non-indexed data: is it necessary to estimate if the trend is constant, linear or exponential.

The first two cases are easy, for the last one it is necessary to estimate a growth rate (inflation or deflation) and apply the same method as for indexed data.

Differencing : Seasonal or cyclical patterns can be removed by substracting periodical values.

If the data is 12-month seasonal, substracting the series with a 12-lag difference series will give a “flatter” seriesLogging : in the case where the compound rate in the trend is not due to a price index (ie the series is not measured in a currency), logging can help linearize a series with an exponential trend (recall that log(exp(x)) = x).

It does not remove an eventual trend whatsoever, unlike deflation.

Checking StationarityPlotting rolling statisticsPlotting rolling means and variances is a first good way to visually inspect our series.

If the rolling statistics exhibit a clear trend (upwards or downwards) and show varying variance (increasing or decreasing amplitude), then you might conclude that the series is very likely not to be stationary.

Augmented Dickey-Fuller TestThis test is used to assess whether or not a time-series is stationary.

Without getting into too much details about hypothesis testing, you should know that this test will give a result called a “test-statistic”, based on which you can say, with different levels (or percentage) of confidence, if the time-series is stationary or not.

KPSSThe KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test tests for the null hypothesis that the series is trend stationary.

In other words, if the test statistic is above the X% confidence threshold, this means we reject this hypothesis and that the series is not trend-stationary with X% confidence.

A test statistic lower than the threshold will lead us to accept this hypothesis and conclude that the series is trend-stationary.

Autocorrelation plots (ACF & PACF)An autocorrelation (ACF) plot represents the autocorrelation of the series with lags of itself.

 A partial autocorrelation (PACF) plot represents the amount of correlation between a series and a lag of itself that is not explained by correlations at all lower-order lags.

Ideally, we want no correlation between the series and lags of itself.

Graphically speaking, we would like all the spikes to fall in the blue region.

As we can see, there are several spikes above the blue region, meaning there are correlations at lags 1, 2, 3 and 4.

Choosing a modelExponential smoothings methods are appropriate for non-stationary data (ie data with a trend and seasonal data).

ARIMA models should be used on stationary data only.

One should therefore remove the trend of the data (via deflating or logging), and then look at the differenced series.

Non-Seasonal smoothing or trend-fittingSimple Exponential Smoothing & Linear Exponential SmoothingWhen to use?> Few data points> Irregular dataWinter’s Seasonal SmoothingThe problem with Holt’s Linear trend method is that the trend is constant in the future, increasing or decreasing indefinitely.

For long forecast horizons, this can be problematic.

The damped trend method is therefore a method which add a dampening parameter so that the trend converges to a constant value in the future (it flattens the trend).

The parameter ????.is replaced by ????????When to use?>Data has a trend and is seasonal>Use the multiplicative version, unless the data has been logged before.

In this case, use the additive versionARIMAARIMA models (which include ARMA, AR and MA models) are a general class of models to forecast stationary time series.

ARIMA models are made of three parts:A weighted sum of lagged values of the series (Auto-regressive (AR) part)A weighted sum of lagged forecasted errors of the series (Moving-average (MA) part)A difference of the time series (Integrated (I) part)An ARIMA model is often noted ARIMA(p, d, q) where p represents the order of the AR part, d the order of differencing (“I” part), and q the order of the MA term.

1) Choosing the differencing orderThe first step of fitting an ARIMA model is to determine the differencing order to stationarize the series.

To do that, we look at the ACF and PACF plots, and keep in mind these two rules:“ — Rule 1 : If the series has positive autocorrelations out to a high number of lags, then it probably needs a higher order of differencing.

  — Rule 2 : If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all small and patternless, then the series does not need a higher order of differencing.

If the lag-1 autocorrelation is -0.

5 or more negative, the series may be overdifferenced.

” (Robert Nau, Statistical Forecasting)We start by logging the data as the raw data exhibits an exponential trend:The logged series seems to be more flat, but is it stationary ?.Let’s compute a KPSS test to check this :The test statistic is above the critical values, we reject the null hypothesis, our series is not trend stationary.

However, in order to keep this article short, we will continue as if it were.

(— To have a trend-stationary series, we could for instance think of linearly regressing our logged series, and divide our series by the coefficient of the regression.

— )Let’s now look at the ACF plots for the logged-first-difference data:The “ACF — Logged data” chart shows non-stationary data, characterized by the slow linear decay in the spikes (cf rule 1 above).

Adding a 1st order difference gives a single negative spike at lag value 1.

According to rule#2, we don’t need to differentiate the series any further.

Let’s check our results by comparing a (0, 0, 0) and a (0, 1, 0) ARIMA model:The Akaike Information Criterion (AIC) is lower with the ARIMA(0,1,0), meaning this model is performing better than the ARIMA(0,0,0).

Let’s have a look at the residuals and check their variance:Which residuals do you think look better ?2) Choosing the MA orderNow we know we have to include a 1st order difference in our model, we need to choose the Moving-Average order.

This is done by looking at the differenced series (because we just saw that the first-order difference series was stationary).

Again, we look at our ACF and PACF plots, with this rule in mind:“ If the lag-1 autocorrelation of the differenced series ACF is negative, and/or there is a sharp cutoff, then choose a MA order of 1”.

Question : why choosing an MA order of 1, and not higher ?.Well it’s because we will proceed step by step.

If we observe autocorrelations at higher lags, and by looking at the autocorrelation residuals of our (1,1,0) model, we still observe these spikes, we can increase our MA order, though is usually not recommended to go beyond 2!Notice how the AIC has dropped again, and how the residuals variance decreased.

That’s a sign our (1,1,0) ARIMA is performing better than the (0,1,0) model !3) Choosing the AR orderNow you might wonder: Do we have to add an AR term at all ?.The answer is no.

In fact, you should add an AR term in the case where :“ If the lag-1 autocorrelation of the differenced series PACF is negative, and/or there is a sharp cutoff, then choose a AR order of 1”.

In our case, we observe negative lag-1 autocorrelations in both the ACF and the PACF.

Notice that the PACF of the differenced series shows two negative spikes at lags 1 and 2, meaning we could in theory push our AR order up to 2.

Here is what we get by fitting a (1,1,1) ARIMA:Adding an AR term has lowered the AIC and the variance went from 0.

155 to 0.

145, we’re doing good !.Should we add another AR term and pick a (2,1,1) ARIMA?.Let’s have a look at the ACF & PACF of the ARIMA(1,1,1) residuals:There are no significant autocorrelation lag values in the plots, our model doesn’t seem to need any additional AR terms.

You might point out the spike at lag 12: this could be a sign of seasonality.

We will look at it in the next part with SARIMA and SARIMAX.

Plotting predictionsLuckily, statsmodels has a nice API which allows us to plot predictions from ARIMA processes.

I highly recommend you put your data in a DataFrame with a DateTimeIndex, because the plot_predict() method really likes dates…In this case, we will choose 12 forecasting steps, and will set the dynamic keyword to True: this will force the algorithm to use its own forecasted values as the lagged values for future forecasts:Let’s have a look on a larger prediction window (34 forecasting steps):That’s all folks ! In the next article, I will discuss Regression models.

Hope you enjoyed it :)Additional lectures:I highly recommended the two below if you are interested in time-series forecasting and analysis:Forecasting, Principles and Practice (Hyndman & Athanasopoulos)Statistical Forecasting (Robert Nau)You can also check this article on Neural Turing Machines and this article on Attention-based Neural Networks !.. More details

Leave a Reply