Unfortunately, in the real world, this is not quite that simple.

Unlike a sinewave, a stock market time series is not any sort of specific static function which can be mapped.

The best property to describe the motion of a stock market time series would be a random walk.

As a stochastic process, a true random walk has no predictable patterns and so attempting to model it would be pointless.

Fortunately, there are on-going arguments by many sides to say that a stock market isn’t a purely stochastic process, which allows us to theorize that the time series may well have some kind of hidden pattern.

And it is these hidden patterns that LSTM deep networks are prime candidates to predict.

The data this example will be using is the sp500.

csv file in the data folder.

This file contains the Open, High, Low, Close prices as well as the daily Volume of the S&P 500 Equity Index from January 2000 to September 2018.

In the first instance, we will only create a single dimensional model using the Close price only.

Adapting the config.

json file to reflect the new data we will keep most of the parameters the same.

One change which is needed however is that, unlike the sinewave which only had numerical ranges between -1 to +1 the close price is a constantly moving absolute price of the stock market.

This means that if we tried to train the model on this without normalizing it, it would never converge.

To combat this we will take each n-sized window of training/testing data and normalize each one to reflect percentage changes from the start of that window (so the data at point i=0 will always be 0).

We’ll use the following equations to normalize and subsequently de-normalize at the end of the prediction process to get a real-world number out of the prediction:n = normalized list [window] of price changesp = raw list [window] of adjusted daily return pricesNormalization:De-Normalization:We have added the normalise_windows() function to our DataLoader class to do this transformation, and a Boolean normalize flag is contained in the config file which denotes the normalization of these windows.

def normalise_windows(self, window_data, single_window=False): '''Normalise window with a base value of zero''' normalised_data = [] window_data = [window_data] if single_window else window_data for window in window_data: normalised_window = [] for col_i in range(window.

shape[1]): normalised_col = [((float(p) / float(window[0, col_i])) – 1) for p in window[:, col_i]] normalised_window.

append(normalised_col) # reshape and transpose array back into original multidimensional format normalised_window = np.

array(normalised_window).

T normalised_data.

append(normalised_window) return np.

array(normalised_data)With the windows normalized, we can now run the model in the same way that we ran it against out sinewave data.

We have however made an important change when running this data; instead of using our framework’s model.

train() method, we are instead using the model.

train_generator() a method which we have created.

We are doing this because we have found that it is easy to run out of memory when trying to train large datasets, as the model.

train() function loads the full dataset into memory, then applies the normalizations to each window in-memory, easily causing a memory overflow.

So instead we utilized the fit_generator() function from Keras to allow for dynamic training of the dataset using a python generator to draw the data, which means memory utilization will be minimized dramatically.

The code below details the new run thread for running three types of predictions (point-by-point, full sequence, and multiple sequences).

configs = json.

load(open('config.

json', 'r'))data = DataLoader( os.

path.

join('data', configs['data']['filename']), configs['data']['train_test_split'], configs['data']['columns'])model = Model()model.

build_model(configs)x, y = data.

get_train_data( seq_len = configs['data']['sequence_length'], normalise = configs['data']['normalise'])# out-of memory generative trainingsteps_per_epoch = math.

ceil((data.

len_train – configs['data']['sequence_length']) / configs['training']['batch_size'])model.

train_generator( data_gen = data.

generate_train_batch( seq_len = configs['data']['sequence_length'], batch_size = configs['training']['batch_size'], normalise = configs['data']['normalise'] ), epochs = configs['training']['epochs'], batch_size = configs['training']['batch_size'], steps_per_epoch = steps_per_epoch)x_test, y_test = data.

get_test_data( seq_len = configs['data']['sequence_length'], normalise = configs['data']['normalise'])predictions_multiseq = model.

predict_sequences_multiple(x_test, configs['data']['sequence_length'], configs['data']['sequence_length'])predictions_fullseq = model.

predict_sequence_full(x_test, configs['data']['sequence_length'])predictions_pointbypoint = model.

predict_point_by_point(x_test) plot_results_multiple(predictions_multiseq, y_test, configs['data']['sequence_length'])plot_results(predictions_fullseq, y_test)plot_results(predictions_pointbypoint, y_test){ "data": { "filename": "sp500.

csv", "columns": [ "Close" ], "sequence_length": 50, "train_test_split": 0.

85, "normalise": true }, "training": { "epochs": 1, "batch_size": 32 }, "model": { "loss": "mse", "optimizer": "adam", "layers": [ { "type": "lstm", "neurons": 100, "input_timesteps": 49, "input_dim": 1, "return_seq": true }, { "type": "dropout", "rate": 0.

2 }, { "type": "lstm", "neurons": 100, "return_seq": true }, { "type": "lstm", "neurons": 100, "return_seq": false }, { "type": "dropout", "rate": 0.

2 }, { "type": "dense", "neurons": 1, "activation": "linear" } ] }}Running the data on a single point-by-point prediction as mentioned above gives something that matches the returns pretty closely.

But this is slightly deceptive.

Upon a closer examination, the prediction line is made up of singular prediction points that have had the whole prior true history window behind them.

Because of that, the network doesn’t need to know much about the time series itself other than that each next point most likely won’t be too far from the last point.

So even if it gets the prediction for the point wrong, the next prediction will then factor in the true history and disregard the incorrect prediction, yet again allowing for an error to be made.

Whilst this might not initially sound promising for exact forecasts of the next price point, it does have some important uses.

Whilst it doesn’t know what the exact next price will be, it does give a very accurate representation of the range that the next price should be in.

This information can be used in applications like volatility forecasting (being able to predict a period of high or low volatility in the market can be extremely advantageous for a particular trading strategy), or moving away from trading this could also be used as a good indicator for anomaly detection.

Anomaly detection could be achieved by predicting the next point, then comparing it to the true data when it comes in, and if the true data value is significantly different to the predicted point an anomaly flag could be raised for that data point.

S&P500 point-by-point predictionMoving on to the full sequence prediction it seems like this proves to be the least useful prediction for this type of time series (at least trained on this model with these hyperparameters).

We can see a slight bump on the start of the prediction where the model followed a momentum of some sorts, however very quickly we can see the model decided that the most optimal pattern was to converge onto some equilibrium of the time series.

At this stage, this might seem like it doesn’t offer much value, however, mean reversion traders might step in there to proclaim that the model is simply finding the mean that the price series will revert to when volatility is removed.

S&P500 full sequence predictionLastly, we have made a third type of prediction for this model, something I call a multi-sequence prediction.

This is a blend of the full sequence prediction in the sense that it still initializes the testing window with test data, predicts the next point over that and makes a new window with the next point.

However, once it reaches a point where the input window is made up fully of past predictions it stops, shifts forward one full window length, resets the window with the true test data, and starts the process again.

In essence, this gives multiple trend-line like predictions over the test data to be able to analyze how well the model can pick up future momentum trends.

S&P500 multi-sequence predictionWe can see from the multi-sequence predictions that the network does appear to be correctly predicting the trends (and amplitude of trends) for a good majority of the time series.

Whilst not perfect, it does give an indication of the usefulness of LSTM deep neural networks in sequential and time series problems.

Greater accuracy could most certainly be achieved with careful hyperparameter tuning.

MULTIDIMENSIONAL LSTM PREDICTIONSo far our model has only taken in single dimensional inputs (the “Close” price in the case of our S&P500 dataset).

But with more complex datasets there naturally exists many different dimensions for sequences which can be used to enhance the dataset and hence enhance the accuracy of our model.

In the case of our S&P500 dataset, we can see we have Open, High, Low, Close and Volume that makes up five possible dimensions.

The framework we have developed allows for multi-dimensional input datasets to be used, so all we need to do to utilize this is to edit the columns and lstm first layer input_dim values appropriately to run our model.

In this case, I will run the model using two dimensions; “Close” and “Volume”.

{ "data": { "filename": "sp500.

csv", "columns": [ "Close", "Volume" ], "sequence_length": 50, "train_test_split": 0.

85, "normalise": true }, "training": { "epochs": 1, "batch_size": 32 }, "model": { "loss": "mse", "optimizer": "adam", "layers": [ { "type": "lstm", "neurons": 100, "input_timesteps": 49, "input_dim": 2, "return_seq": true }, { "type": "dropout", "rate": 0.

2 }, { "type": "lstm", "neurons": 100, "return_seq": true }, { "type": "lstm", "neurons": 100, "return_seq": false }, { "type": "dropout", "rate": 0.

2 }, { "type": "dense", "neurons": 1, "activation": "linear" } ] }}S&P500 multi-dimensional multi-sequence prediction using “Close” & “Volume”We can see with the second “Volume” dimension added alongside the “Close” that the output prediction gets more granular.

The predictor trend lines seem to have more accuracy in them to predict slight upcoming dips, not only the prevailing trend from the start and the accuracy of the trend lines also seems to improve in this particular case.

CONCLUSIONWhilst this article aims to give a working example of LSTM deep neural networks in practice, it has only scratched the surface of their potential and application in sequential and temporal problems.

As of writing, LSTMs have been successfully used in a multitude of real-world problems from classical time series issues as described here, to text auto-correct, anomaly detection and fraud detection, to having a core in self-driving car technologies being developed.

There are currently some limitations with using the vanilla LSTMs described above, specifically in the use of a financial time series, the series itself has non-stationary properties which is very hard to model (although advancements have been made in using Bayesian Deep Neural Network methods for tackling non-stationarity of time series).

Also for some applications, it has also been found that newer advancements in attention-based mechanisms for neural networks have out-performed LSTMs (and LSTMs coupled with these attention based mechanisms have outperformed either on their own).

As of now, however, LSTMs provide significant advancements on more classical statistical time series approaches in being able to model the relationships non-linearly and being able to process data with multiple dimensions in a non-linear fashion.

The code for this framework can be found in the following GitHub repo (it assumes python version 3.

5.

x and the required versions in the requirements.

txt file.

Deviating from these versions might cause errors):DiogoRibeiro7/Medium-BlogSome Jupyter Notebooks that were published in my Medium Blog – DiogoRibeiro7/Medium-Bloggithub.

com.