# Car De-registrations in Singapore: Can they be Predicted?

The data is in monthly series, with the first time point being Feb 1990..For illustration, the training and test data split is shown as the points on the left and right of the purple dashed line in the time series above.Part 2: Fitting an ARIMA ModelThe first model I would like to try is an ARIMA model..For more technical explanations of ARIMA, I found the first link on a Google search of “What is ARIMA” to be most informative!My task here is to specify what exactly these three parameters (p,d,q) are, and this can be done manually — which is what I will do next, and automatically on R using the auto.arima function in the forecast package.For me to be able to fit an ARIMA model, the data must first be stationary..Just to confirm this, I plotted the ACF and PACF plots for the time series, as well as an Augmented Dickey Fuller (ADF) test of stationarity to justify my claim:Augmented Dickey-Fuller TestDickey-Fuller = -1.268, Lag order = 6, p-value = 0.8854alternative hypothesis: stationaryFigure 2: ACF and PACF plots of the Cat B vehicle de-registrationThe high p-value of the ADF tells us that we do not reject the null, showing that the series is indeed non stationary..From the ACF/PACF plots, we can infer two things:In the ACF plot, the steadily decreasing pattern indicates correlation between the Cat B vehicle de-registrations and its lags, essentially meaning that the series is not stationaryA steadily decreasing pattern in the ACF and a sharp decrease in PACF (after lag 3) indicates an MA pattern of order 3 (q = 3)In order to transform my non-stationary series into a stationary series, I will be differencing the series once, and running the same procedure as above to test if the differenced series is now stationary.Augmented Dickey-Fuller TestDickey-Fuller = -6.3372, Lag order = 6, p-value = 0.01alternative hypothesis: stationaryFigure 3: ACF and PACF plots of the Cat B vehicle de-registration after 1st differencingNot bad, after differencing the series once, the ADF test now shows that the series is stationary, but ACF plot still show some signs of correlation with lags with a recurring pattern every 2 lags..Perhaps I could try setting p or q as 12 and see how that goes, or I could also have took a natural log of the series before proceeding with fitting an ARIMA model on it (on hindsight I should have done this earlier)..Nonetheless, allow me to just use this current model for the forecasting of the next four months for illustration sake – noting that this model probably isn’t the most correctly specified one.Figure 5: Actual values in black, fitted ARIMA values in red, new forecasted values in blueWell, the in-sample fit seems quite promising..What about the out-sample performance?Zooming in to the top right part of the graph (in blue) yields:Figure 6: Forecasted values for (April : July 2018) using ARIMA model in blue, actual figures in blackForecasted values for April to July 2018 seem a little bit off, but hey, I guess that is expected..For the Cat B vehicle de-registration data, I included up to 12 lags, and for the rest I included up to 5 lags..Let us have a look at the in-sample performance of the Random Forest Model:Figure 7: Actual values of Cat B vehicle de-registration in black, fitted RF model values in red, forecasted values in blueNot bad!.What about the out-sample performance?Figure 8: Actual values in black, forecasted RF values in redThe RF model seems to be doing a lot better in terms of forecasting the April to July 2018 values..As you can see, at least for April and May the forecasted values seem to be very close to the actual values.. More details