Forecast combination can improve forecasts – but where are the software implementations?
It isn’t a well-kept secret anymore that forecast combination often improves forecast accuracy. It has been over four decades since Bates and Granger published their famous paper and a large number of different approaches have been suggested in the forecasting literature.
Despite the topic’s popularity in research and practice, there is a severe lack of implementation of forecast combination methods in R (and I assume also other statistical software). In a recent blog post, Rob Hyndman addresses this topic and introduces 2 packages (opera and forecastHybrid) that cover forecast combination to some extent, however in a specialized manner without giving the user too much flexibility about the method that is applied. In addition to that, regression-based methods (OLS, CLS, and LAD) are included in the ForecastCombinations package. But that’s about it.
Currently available packages are a good start – but neither extensive, nor user-friendly.
I first came across the forecast combination topic when I worked on my article on disaggregated inflation forecasting with Paul Kattuman. We had produced forecasts with several methods: Naive, ARIMA, ETS, Damped Trend, and the recently proposed Dynamic Optimised Theta Model. We found that there is no single best model — ARIMA models did best in periods of high volatility, but the exponential smoothing-type models did better in periods of low volatility. One cloudy afternoon (well, it’s always cloudy in Cambridge) we found ourselves thinking: Why don’t we try to combine forecasts?
That is how I came across the ForecastCombinations package and was excited to test it. So I put my inflation forecasts in a data frame, called the function, and voilà…got an error message. After checking the input data it became clear very fast that these specific forecasts were highly collinear, which is not that surprising given the fact that the majority of the forecasts were produced with exponential smoothing-type methods. What was a little surprising though: A package that so heavily relies on linear regressions does not deal with collinearity issues prior to estimation of the combined forecasts? Why? Don’t get me wrong, I like the package, it is useful and to my knowledge was the first implementation of forecast combinations in R — still, it could do a much better job at catching and preventing errors: While an error like this is easily resolved for us as researchers, these methods are also interesting for practitioners and should be coded in a user-friendly way, meaning they should at the same time proactively avoid errors and leave as much flexibility to the user as possible.
The GeomComb R package: A flexible and user-friendly application
Over the past few months, Gernot Roetzer (from Trinity College Dublin) and I have worked on the GeomComb package that aims to achieve exactly these things: Flexibility & User-Friendliness. First, we included pre-cleaning of the forecast data in the package itself. Often forecast combination relies on survey forecasts such as the ECB’s Survey of Professional Forecasters, which often include missing values and more often than not (if the number of input forecasts is high) have collinearity issues. Our methods check the input data for both of these potential problems: In case of missing values, users have the choice between either removing these specific forecast models, or using smooth spline imputation. Perfect collinearity is avoided by checking the rank of the matrix of predictors and if it is not full rank, the algorithm removes the model that is among those that are responsible for perfect collinearity and has the worst individual fit within this group.
Second, we leave the choice of method flexible to the user, by providing several simple combination methods (simple average, median, trimmed mean, winsorized mean), regression-based combination methods (OLS, CLS, LAD), as well as geometric combination methods that are based directly on the mean square prediction error matrix of the forecasts or on its eigendecomposition (Bates/Granger, Granger/Newbold, Inverse Rank, Standard Eigenvector, Bias-Corrected Eigenvector, Trimmed Eigenvector, Trimmed Bias-Corrected Eigenvector). For inexperienced users, who do not want flexibility but instead are mostly interested in getting a “one-size-fits-all” forecast, we also provide an automated combination function that selects the best out of these 14 forecast combination methods based on in-sample fit.
A fully functional development version of the package is available on Github. We plan to work on sparse estimation and rolling-window approaches for the included methods for a future version. Please leave a comment if you have questions about the package or suggestions about features that we should include in future updates!