Forecasting variable Travel expenses with 95% accuracy

The goal will be to use the data up to 2017 to predict 2018’s total spend on expenses.Feature EngineeringThe dataset contains the start and end dates of each trip so we going to ask AuDaS to calculate the duration as this will probably have a strong impact on the total spend.We are then going to remove the $ signs from the data for formatting purposes using a RegEx transform.We are then going to drop the columns including variable expenses to avoid data leakage as they are already included in the total spend which we wish to predict..After doing this, AuDaS has detected that there are some missing values and is suggesting advice to the user on how to correct it.Throughout this whole process, AuDaS has added steps to the workflow which is kept as an audit trail..You can go back to previous versions of the data set or you can export the workflow..In our case we are going to export this workflow to our test set containing the pension fund expenses of 2018..This automatically reproduces the data preparation steps and will allow us to easily deploy our model on it once we have trained it.Data ExplorationNow that we have cleaned the data we can access the histogram view to extract initial insights..We can also change the scale to see the values which are sparsely distributed.Our immediate takeaways are that the most common destination is Toronto and the Board members travel the most..There doesn’t seem to be a key pattern which is why we are going to use Machine Learning to uncover more intricate relationships.Automated ModellingWe are going to ask AuDaS to build a regression model to predict the total spend.AuDaS automatically withholds a balanced 10% hold out of the training set for final validation purposes..It also trains the model using 10-fold Cross Validation to avoid overfitting..This guarantees that models trained by AuDaS perform well in production..Once we are happy we can now launch the training with the Start button.The training is achieved using Mind Foundry’s proprietary Bayesian Optimiser, OPTaaS, which allows AuDaS to efficiently navigate the large search space of possible Regression pipelines.AuDaS provides full transparency of the chosen pipeline, model and parameter values as well as performance statistics..AuDaS also provides feature relevance for the best found model.In this case the Accommodation and air fare spend as well as the purpose and the destination city being London are the strongest predictors of variable expenses..The CIO title also seems to be a good indicator of total spend.. More details

Leave a Reply