Overview

My research focus is applied forecasting, in particular the following topics:

  • Hierarchical Time Series Forecasting (Disaggregated Modelling of Financial/Economic Aggregates)
  • Ensemble Forecasting Methods
  • Time-Varying Correlation/Asset Co-Movement (e.g. sparse network analysis – LASSO/Elastic Nets)
  • Time Series Clustering
  • Innovation Diffusion Modelling
  • Development of Statistical Software Implementation (R and Python)

Below a selection of research publications/projects. You can contact me about potential research collaboration here.

Publications

  • Forecast Combinations in R Using the ForecastComb Package

With Eran Raviv & Gernot R. Roetzer
Published in: The R Journal (2018) 10:2, pages 262-281

More Information...

Summary:  This paper introduces the R package ForecastComb. The aim is to provide researchers and practitioners with a comprehensive implementation of the most common ways in which forecasts can be combined. The package in its current version covers 15 popular estimation methods for creating a combined forecasts – including simple methods, regression-based methods, and eigenvector-based methods. It also includes useful tools to deal with common challenges of forecast combination (e.g., missing values in component forecasts, or multicollinearity), and to rationalize and visualize the combination results.

Published Version:  Link
R Package on CRAN:  Link

Working Papers & Ongoing Research

  • Hierarchical Modelling and Forecasting System for Inflation Rate and Volatility

With Paul Kattuman
(Currently preparing resubmission for International Journal of Forecasting)
More Information...

Abstract:  Using monthly data that underlies the Retail Prices Index for the UK, we analyse the dynamics of inflation rate and its volatility. We examine patterns in the time-varying covariation among product-level inflation rates that aggregate up to industry-level inflation rates that in turn aggregate up to the overall inflation rate. The aggregate inflation volatility closely tracks the time path of this covariation, which is seen to be driven primarily by the variances of common shocks shared by all products, and by the covariances between idiosyncratic product-level shocks. We formulate a forecasting system that comprises of models for mean inflation rate and its variance, and exploit the index structure of the aggregate inflation rate using the hierarchical time series framework. Using a dynamic model selection approach to forecasting we obtain forecasts that are between 9 and 155 % more accurate than a SARIMA-GARCH(1,1) for the aggregate inflation volatility.

Author Manuscript:  Link
Presented at:  36th International Symposium on Forecasting, RSS International Conference 2016, Vienna Congress on Mathematical Finance

  • Hierarchical Healthcare Demand Forecasting & Forecast Combination

With Paul Kattuman & Stefan Scholtes
More Information...

Outline:  Accurate forecasting of healthcare demand is essential for efficient staffing of temporary workers in hospitals. Extant healthcare forecasting relies heavily on aggregate forecasting. Using a large disaggregated dataset supplied by a hospital, we show that exploiting the patterns (trend, seasonality, comovement with explanatory variables) at disaggregated levels, e.g., divisions or primary specialties, can significantly improve forecasting accuracy, if aggregated optimally through hierarchical time series (HTS) forecasting. Building on the forecasts obtained from the HTS modelling, we explore the value of simple, geometric, as well as regression-based forecast combination techniques.

Methods:  HTS, Forecast Combination, Distributed Lag Models
(Preliminary results available on request)

  • The Effect of Open-Access Publishing on Research Impact: Time Series Feature Extraction

With Rupert Gatti, Paul Kattuman & Cameron Neylon
More Information...

Outline:  Innovation diffusion trajectories are traditionally modelled using a differential equations-based approach that fits the long-term behavior of a diffusion curve reasonably well, but is not suitable for short-term forecasting or for modelling diffusion features of interest (saddles, takeoff). While the popular Bass model is well suited for describing the long-term trajectory, it is not capable of extracting interesting short-term features. Using a sample of 722 time series that describe the daily evolution of article views for all articles that were published in ‘Nature Communications’ in the first half of 2013, we analyze the effect of open-access publishing on research impact. Using state-space modelling and Dynamic Time Warping – a time series clustering technique – we identify distinct views trajectories. In separate analyses of the ‘Open Access’ the ‘Subscription’ subsamples, we find supportive evidence that open-access articles receive a larger number of total views by far, and even a change to open access years after publication has a significant permanent positive effect on article views. We further explore the directional causality between article views and related social media activity, controlling for a wide range of author- and paper-specific explanatory variables.

Methods:  Unobserved Components Models, Time Series Clustering, Bayesian Poisson VAR, Multinomial Logit
(Preliminary results available on request)

  • Time-Varying Correlation in Product-Level Inflation: A Network Analysis

With Paul Kattuman
More Information...

Outline:  Motivated by literature on divergent trends in micro and macro volatility (e.g., Comin and Mulani, 2006 – “Divergent Trends in Aggregate and Firm Volatility”) that documents covariances as main drivers of aggregated volatility, we aim to explore the main linkages across the 85 products that form the UK Retail Prices Index. Using adaptive elastic nets (a sparse estimation method that mitigates the documented weaknesses of LASSO estimation for correlation networks), we compute the contemporaneous partial correlation network, as well as the Granger correlation network (that can show lead/lag relationships), and combine the two into a long-run partial correlation measure (concept based on Barigozzi and Brownlees, 2014 – “Network Estimation for Time Series”). This information allows us to recombine the bottom-level components of the RPI into groups that reflect the input-output structure of the economy and to evaluate its usefulness for hierarchical inflation forecasting.

Methods:  Long-Run Partial Correlation, Granger Causality, (Adaptive) Elastic Net, Eigenvector Centrality
(Preliminary results available on request)

Graded Degree Papers (Theses)

  • Essays In Hierarchical Time series forecasting and forecast combination

More Information...

(submitted as PhD Dissertation to University of Cambridge)

Grade:  Awarded Doctor of Philosophy — graded by Prof. Stefan Scholtes and Prof. Rustam Ibragimov

Abstract:  This PhD dissertation comprises of three original contributions to empirical forecasting research.

The first essay contributes to the literature on hierarchical time series (HTS) modelling by proposing a disaggregated forecasting system for both inflation rate and its volatility. Using monthly data that underlies the Retail Prices Index for the UK, we analyse the dynamics of the inflation process. We examine patterns in the time-varying covariation among product-level inflation rates that aggregate up to industry-level inflation rates that in turn aggregate up to the overall inflation rate. The aggregate inflation volatility closely tracks the time path of this covariation, which is seen to be driven primarily by the variances of common shocks shared by all products, and by the covariances between idiosyncratic product-level shocks. We formulate a forecasting system that comprises of models for mean inflation rate and its variance, and exploit the index structure of the aggregate inflation rate using the HTS framework. Using a dynamic model selection approach to forecasting, we obtain forecasts that are between 9 and 155 % more accurate than a SARIMA-GARCH(1,1) for the aggregate inflation volatility.

The second essay is on improving forecasts using forecast combinations. The paper documents the software implementation of the open source R package for forecast combination that we coded and published on the official R package depository, CRAN. The GeomComb package is the only R package that covers a wide range of different popular forecast combination methods. We implement techniques from 3 broad categories: (a) simple non-parametric methods, (b) regression-based methods, and (c) geometric (eigenvector) methods, allowing for static or dynamic estimation of each approach. Using S3 classes/methods in R, the package provides a user-friendly environment for applied forecasting, implementing solutions for typical issues related to forecast combination (multicollinearity, missing values, etc.), criterion-based optimisation for several parametric methods, and post-fit functions to rationalise and visualise estimation results. The package has been listed in the official R Task Views for Time Series Analysis and for Official Statistics. The brief empirical application in the paper illustrates the package’s functionality by estimating forecast combination techniques for monthly UK electricity supply.

The third essay introduces HTS forecasting and forecast combination to a healthcare staffing context. A slowdown of healthcare budget growth in the UK that does not keep pace with growth of demand for hospital services made efficient cost planning increasingly crucial for hospitals, in particular for staff which accounts for more than half of hospitals’ expenses. This is facilitated by accurate forecasts of patient census and churn. Using a dataset of more than 3 million observations from a large UK hospital, we show how HTS forecasting can improve forecast accuracy by using information at different levels of the hospital hierarchy (aggregate, emergency/electives, divisions, specialties), compared to the naïve benchmark: the seasonal random walk model applied to the aggregate. We show that forecast combination can improve accuracy even more in some cases, and leads to lower forecast error variance (decreasing forecasting risk). We propose a comprehensive parametric approach to use forecasts in a nurse staffing model that has the aim of minimising cost while satisfying that the care requirements (e.g. nurse hours per patient day thresholds) are met.

This research was supported through grants by ESRC, Cambridge Trusts, the Qualcomm Trust, and St. John’s College.

Methods:  Hierarchical Time Series Forecasting, Forecast Combination, Dynamic Model Selection
Author Manuscript:  Link

  • The Use of Time-Series Methods for Diffusion Modelling: An Evaluation

More Information...

(submitted as First-Year PhD Progress Report, graded on a pass/fail basis)

Grade:  Pass (without required corrections) — graded by Prof. Andrew Harvey and Dr. Vincent Mak

Abstract:  The ability to describe, explain, and predict the diffusion of innovations in a social system is crucial – understanding the dynamic drivers of the diffusion process is a necessity for successful innovation management. This paper sets out to evaluate the extant modelling techniques in the field and introduces state-space modelling as a powerful holistic approach to diffusion modelling. A formal theoretical framework for state-space modelling in a diffusion context is provided. The empirical part of the study suggests superiority of a state-space approach as regards description and forecasting of diffusion processes (when compared to the popular Bass and Logistic growth models, as well as ARIMA models) and can also be used to explain such processes well by accommodating regressors and intervention variables in the model framework. Furthermore, we introduce a formal systematic test (within the state-space framework) for the saddle effect that is a feature of many diffusion processes.

Methods:  Unobserved Component Models, Bass Model, Logistic Growth, Gompertz Growth, ARIMA
Author Manuscript:  Link

  • Disaggregating Stock Index Return Volatility: A Variance Decomposition Study

More Information...

(submitted as MSc Dissertation to Department of Statistics, University of Oxford)

Grade:  Distinction

Abstract:  Reflecting the growing importance of volatility in economics and finance, there is a large empirical literature in the field that is devoted to estimating and forecasting conditional volatility. The dominant empirical approach hinges on the ARCH/GARCH family of volatility models, numerous extensions of which have been applied to time series data arising in a wide variety of contexts. The success of these models has led to their being applied without discrimination to series such as returns to individual stocks, as well as to returns from stock indices. More generally, the current practice in volatility modelling does not differentiate between models that apply to contemporaneous aggregates of sets of disaggregate variables (such as stock indices, inflation rates, national growth rates) and models that apply to their components (returns to individual stocks, prices of individual goods, growth rates of individual firms).

There is obvious scope for improvement in models for aggregate variables, that take note of component volatilities, component weights in the aggregate, and the covariation of the components,
all of which are observed. In this dissertation we take note of the generating process for volatility behavior of aggregates in the hope that it will lead to better explanations of stylized facts about volatility and more accurate forecasts. To illustrate the value of this approach we study the volatility behavior of the major German stock index, DAX, for the period between September 2012 and July 2014. We find that covariance between the returns of its constituent stocks is the dominant driver of index volatility and specify a causal model. In addition, taking note of the fact that the covariation between stock returns is often driven by unobserved bubbles, we estimate unobserved component models and illustrate the value of this approach.

Methods:  Variance Decomposition, Unobserved Component Models, GARCH Models, Distributed Lag Models
Author Manuscript:  Link