Photo by James Padolsey on UnsplashMathematical Intuitions for Stochastic ProcessesBridge to Time Series and ForecastingJagerynn Ting VeranoBlockedUnblockFollowFollowingJun 22Overviewthe relationship between a stochastic process and time series datawhat is stationary data and why it is importanthow to check for non-stationarityhow to convert a non-stationary process to stationaryTime Series and Stochastic ProcessesPhoto by Aron Visuals on UnsplashEvery time series is a stochastic process.

A stochastic process is a series of random variables, and if viewed that way, it can be inferred that each random variable in the sequence has its own mean and variance.

White noise, for example, is simply a sequence of i.

i.

d random variables— you could imagine that they have constant means (expectations) and variances.

Mathematically speaking, a stochastic process is specified by the joint distribution of the full set of random variables.

The What and Why of Stationary DataPhoto by Evan Dennis on UnsplashA process is stationary when the joint distribution of a set of random variables, e.

g.

the first to the third, is the same as the joint distribution from another set in the same process, e.

g.

the fourth to the sixth.

This applies for every set size, and of any order, therefore every r.

v.

is also identically distributed.

It follows that the expectations and variances of these r.

v.

s are constant, and the auto-covariance function is not time-dependent, but instead is solely based on lag spacing.

We aim for “strictly stationary” to extract properties from these random variables.

A strictly stationary process is useful in detecting patterns and forecasting into the future.

However, for a process to be strictly stationary is rare and to make it such is very restrictive.

A workaround for this is a weakly stationary process, which removes the rigidity of strictly stationary processes.

A process is considered weakly stationary if its expectation is the same throughout and its auto-covariance function depends only on lag spacing.

In other words, the requirements of what stationary means are relaxed in the weak stationary definition.

Is my data non-stationary?Note: In the real world, there is no cookie-cutter approach.

There are always alternative steps, more fitting approaches and a thousand critical details one could miss.

If anything, the instructional tone hereafter should be taken with a pinch of salt.

There are times when non-stationarity is not obvious.

Plotting rolling means and standard deviations over your original data can really help.

After visualizing the time series, statistical tests like the Dickey-Fuller test can be conducted to “confirm” your results.

The Solution: Seasonal DecompositionSeasonal Decomposition on Anomaly.

ioA time series model can be written as a sum or product of several components.

Statsmodels describes additive and multiplicative models as:y(t) = Trend + Seasonality + Noise [additive]y(t) = Trend * Seasonality * Noise [multiplicative]According to Minitab:Choose the multiplicative model when… the magnitude of the seasonal pattern increases as the data values increase…Choose the additive model when … the magnitude of the seasonal pattern does not change as the series goes up or down.

A naive approach to remove trend and seasonality in Python is through statsmodels’ statsmodels.

tsa.

seasonal.

seasonal_decompose() function.

You can pass an argument to specify the model type.

The function plots the seasonality, trend and noise of the data.

It extracts the seasonal component by applying a convolution filter and getting the average of the series.

The nice part is that the function allows you to visualize the trend, seasonal and residual component.

The not-so-nice part is that the function may well overfit the data and leave you with little to no residual.

When possible, more robust/sophisticated methods like STL (“Seasonal and Trend decomposition using Loess”) and X11 decomposition should be used.

Packages for these functions are available in R.

References[1] Additive models and multiplicative models, Minitab 18 Support[2] R.

J.

Hyndman & G.

Athanasopoulos, Forecasting: Principles and Practice — Chapter 6: Time Series Decomposition, O Texts.