Forecasting

Vegan haruspicy

June 16, 2015 — October 8, 2022

model selection

regression

signal processing

statistics

stochastic processes

time series

Time series prediction niceties, where what needs to be predicted is the future. Filed under forecasting because in machine learning terminology, prediction is a general term that does not imply extrapolation into the future necessarily.

🏗 handball to Rob Hyndman.

1 Recursive estimation

See recursive identification for generic theory of learning under the distribution shift induced by a moving parameter vector.

2 Model selection

Rob Hyndman explains how to cross-validate time series models that use only the lagged observations. Cosma Shalizi mentions the sample splitting problem for time series for post-selection inference and has supervised students to do some work with it, notably (Lunde 2019).

For a different emphasis upon the same problem, consider statistical learning theory, or model ensembles in times series.

Or: would we like to try model mixing in this setting? See ensemble methods.

3 Calibration of probabilistic forecasts

See calibration.

4 Training data intake

There is a common pattern with training time series models that they each predict the next observation from the previous observations, which is not how a classic data loader works in machine learning. The time at which the future observations are evaluated is the horizon and the ones used to make that prediction are the history. For patterns to handle this in neural networks in particular, see Recurrent neural networks.

5 Software

Not comprehensive, just noting some useful time series forecasting models/packages as I encounter them. Peter Cotton attempts to collate Popular Python Time Series Packages.

5.1 Tidyverse time series analysis and forecasting packages

A good first stop.

You can find a presentation on these tools by Rob Hyndman.

tsibble: Tidy Temporal Data Frames and Tools [CRAN]
tsibbledata: Example datasets for tsibble [CRAN]
feasts: Feature Extraction And Statistics for Time Series [CRAN]
fable: Forecasting Models for Tidy Time Series [CRAN]
sugrrants: Supporting Graphs for Analysing Time Series. Tools for plotting temporal data using the tidyverse and grammar of graphics framework. [CRAN]
gravitas: Explore Probability Distributions for Bivariate Temporal Granularities. [CRAN]

5.2 scikit-learn style: Nixtla, sktime, darts

Rob J Hyndman, in Python implementations of time series forecasting and anomaly detection recommends Nixtla and sktime as good implementations. Nixtla is big! Rob recommends starting from

statsforecast: Automatic ARIMA and ETS forecasting
hierarchicalforecast: Hierarchical forecasting
tsfeatures: Time series features

Darts (source) (Herzen et al. 2022). aims to be a scikit-learn for time series forecasting. Includes many other algorithms such as prophet, Challu et al. (2022), Oreshkin et al. (2020a) and Salinas, Flunkert, and Gasthaus (2019).

5.3 prophet

prophet (R/Python/Stan):

is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.

Commentary:

Is Facebook’s “Prophet” the Time-Series Messiah, or Just a Very Naughty Boy? via Sean J. Taylor on Twitter

This post rips Prophet (a forecasting package I helped create) to shreds and I agree with most of it🥲. I always suspected the positive feedback was mostly from folks who’d had good results—conveniently the author has condensed many bad ones into one place.

5.4 Silverkite

Hosseini et al. (2021)

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

Silverkite algorithm works well on most time series, and is especially adept for those with changepoints in trend or seasonality, event/holiday effects, and temporal dependencies. Its forecasts are interpretable and therefore useful for trusted decision-making and insights.

The Greykite library provides a framework that makes it easy to develop a good forecast model, with exploratory data analysis, outlier/anomaly preprocessing, feature extraction and engineering, grid search, evaluation, benchmarking, and plotting. Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework, as listed below.

5.5 Causal impact

🏗 find out how Causal impact works. (Based on Brodersen et al. (2015).)

5.6 asap

asap:

Automatic Smoothing for Attention Prioritization in Time Series

ASAP automatically smooths time series plots to remove short-term noise while retaining large-scale deviations.

6 Makridakis competitions

The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.

Now we are up to M5 and M6 is cooking.

Mcompetitions (M Forecasting Competitions)

7 Micropredictions.org

micropredictions is a quixotic project my colleagues have forwarded to me. Included here as a spur. The micropredictions FAQ says:

What’s microprediction you say?

The act of making thousands of predictions of the same type over and over again. Microprediction can

Clean and enrich live data

Alert you to outliers and anomalies

Provide you short term forecasts

Identify patterns in model residuals

Moreover it can be combined with patterns from Control Theory and Reinforcement Learning to

Engineer low cost but tailored intelligent applications

Often enough AI is microprediction, albeit bundled with other mathematical or application logic.

You publish a live data value.

The sequence of these values gets predicted by a swarm of algorithms.

Anyone can write a crawler that tries to predict many different streams.

Microprediction APIs make it easy to:

Separate the act of microprediction from other application logic.

Invite contribution from other people and machines

Benefit from other data you may never have considered.

… Let’s say your store is predicting sales and I’m optimising an HVAC system across the street. Your feature space and mine probably have a lot in common.

I am unclear how the datastreams as set up incorporates domain knowledge and private side information, which seems the hallmark of natural intelligence and, e.g. science. Perhaps they feel domain knowledge is a bug standing in the way of truly general artificial intelligence? If I had free time I might try to get a better grip on what they are doing, whoever they are.

Alternatively, they are coming at this from a chartist quant perspective and data are best considered as sort-of-anonymous streams of numbers, the better to attract disinterested competition.

8 Incoming

Darts:
- Transfer Learning for Time Series Forecasting with Darts
- Time Series Made Easy in Python
ARiMA is not Sufficient, an interview on S. Wang, Li, and Lim (2021)

9 References

Agarwal, Amjad, Shah, et al. 2018. “Time Series Analysis via Matrix Estimation.” arXiv:1802.09064 [Cs, Stat].

Alquier, Li, and Wintenberger. 2013. “Prediction of Time Series by Statistical Learning: General Losses and Fast Rates.” Dependence Modeling.

Alquier, and Wintenberger. 2012. “Model Selection for Weakly Dependent Time Series Forecasting.” Bernoulli.

Ben Taieb, and Atiya. 2016. “A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.” IEEE transactions on neural networks and learning systems.

Bergmeir, Hyndman, and Koo. 2018. “A Note on the Validity of Cross-Validation for Evaluating Autoregressive Time Series Prediction.” Computational Statistics & Data Analysis.

Bousquet, and Warmuth. 2001. “Tracking a Small Set of Experts by Mixing Past Posteriors.” In Computational Learning Theory. Lecture Notes in Computer Science.

Box, Jenkins, Reinsel, et al. 2016. Time Series Analysis: Forecasting and Control. Wiley Series in Probability and Statistics.

Brodersen, Gallusser, Koehler, et al. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics.

Broersen. 2006. Automatic Autocorrelation and Spectral Analysis.

Carvalho, and Tanner. 2006. “Modeling Nonlinearities with Mixtures-of-Experts of Time Series Models.” International Journal of Mathematics and Mathematical Sciences.

Challu, Olivares, Oreshkin, et al. 2022. “N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting.” arXiv:2201.12886 [Cs].

Chen, and Liu. 2000. “Mixture Kalman Filters.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Chevillon. 2007. “Direct Multi-Step Estimation and Forecasting.” Journal of Economic Surveys.

Commandeur, and Koopman. 2007. An Introduction to State Space Time Series Analysis.

Commandeur, Koopman, and Ooms. 2011. “Statistical Software for State Space Methods.” Journal of Statistical Software.

Cox, Gudmundsson, Lindgren, et al. 1981. “Statistical Analysis of Time Series: Some Recent Developments [with Discussion and Reply].” Scandinavian Journal of Statistics.

Dahl, and Bonilla. 2019. “Sparse Grouped Gaussian Processes for Solar Power Forecasting.” arXiv:1903.03986 [Cs, Stat].

Ding, Tarokh, and Yang. 2018. “Model Selection Techniques: An Overview.” IEEE Signal Processing Magazine.

Frühwirth-Schnatter. 2006. Finite Mixture and Markov Switching Models. Springer Series in Statistics.

Frühwirth-Schnatter, and Pamminger. 2010. “Model-Based Clustering of Categorical Time Series.” Bayesian Analysis.

———. n.d. “Bayesian Clustering of Categorical Time Series Using Finite Mixtures of Markov Chain Models.”

Gerstenberger, Wiemer, Jones, et al. 2005. “Real-Time Forecasts of Tomorrow’s Earthquakes in California.” Nature.

Granger, and Joyeux. 1980. “An Introduction to Long-Memory Time Series Models and Fractional Differencing.” Journal of Time Series Analysis.

Herzen, Lässig, Piazzetta, et al. 2022. “Darts: User-Friendly Modern Machine Learning for Time Series.” Journal of Machine Learning Research.

Hosseini, Yang, Chen, et al. 2021. “A Flexible Forecasting Model for Production Systems.” arXiv:2105.01098 [Stat].

Huerta, Jiang, and Tanner. 2001. “Discussion: Mixtures of Time Series Models.” Journal of Computational and Graphical Statistics.

———. 2003. “Time Series Modeling Via Hierarchical Mixtures.” Statistica Sinica.

Hurvich. 2002. “Multistep Forecasting of Long Memory Series Using Fractional Exponential Models.” International Journal of Forecasting, Forecasting Long Memory Processes,.

Hyndman. 2020. “A Brief History of Forecasting Competitions.” International Journal of Forecasting, M4 Competition,.

Jacobs. 2011. “Adapting to Non-Stationarity with Growing Predictor Ensembles.”

Kurniasih. n.d. “Knowledge Management of Agricultural Prophecy in the Manuscript of Sundanese Society in Tasikmalaya District of West Java Indonesia.”

Kuznetsov, and Mohri. 2014. “Forecasting Non-Stationary Time Series: From Theory to Algorithms.”

———. 2015. “Learning Theory and Algorithms for Forecasting Non-Stationary Time Series.” In Advances in Neural Information Processing Systems.

Lunde. 2019. “Sample Splitting and Weak Assumption Inference For Time Series.” arXiv:1902.07425 [Math, Stat].

Lunde, and Shalizi. 2017. “Bootstrapping Generalization Error Bounds for Time Series.” arXiv:1711.02834 [Math, Stat].

Makridakis, Spiliotis, and Assimakopoulos. 2020. “The M4 Competition: 100,000 Time Series and 61 Forecasting Methods.” International Journal of Forecasting, M4 Competition,.

Moradkhani, Sorooshian, Gupta, et al. 2005. “Dual State–Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter.” Advances in Water Resources.

Morvai, Yakowitz, and Györfi. 1996. “Nonparametric Inference for Ergodic, Stationary Time Series.” The Annals of Statistics.

Nicholson, Wilms, Bien, et al. 2020. “High Dimensional Forecasting via Interpretable Vector Autoregression.” Journal of Machine Learning Research.

Oreshkin, Carpov, Chapados, et al. 2020a. “N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting.” arXiv:1905.10437 [Cs, Stat].

———, et al. 2020b. “Meta-Learning Framework with Applications to Zero-Shot Time-Series Forecasting.”

Phillips. 1987. “Composite Forecasting: An Integrated Approach and Optimality Reconsidered.” Journal of Business & Economic Statistics.

Prado, Ferreira, and West. 2021. Time series: modeling, computation, and inference. Texts in statistical science.

Runge, Donner, and Kurths. 2015. “Optimal Model-Free Prediction from Multivariate Time Series.” Physical Review E.

Ryabko. 2009. “On Finding Predictors for Arbitrary Families of Processes.” arXiv:0912.4883 [Cs, Math, Stat].

Salinas, Flunkert, and Gasthaus. 2019. “DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks.” arXiv:1704.04110 [Cs, Stat].

Smith. 2000. “Disentangling Uncertainty and Error: On the Predictability of Nonlinear Systems.” In Nonlinear Dynamics and Statistics.

Sornette. 2009. “Dragon-Kings, Black Swans and the Prediction of Crises.” arXiv:0907.4290 [Physics].

Sugihara. 1994. “Nonlinear Forecasting for the Classification of Natural Time Series.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

Taieb, Taylor, and Hyndman. 2017. “Coherent Probabilistic Forecasts for Hierarchical Time Series.” In PMLR.

Taleb. 2018. “Election Predictions as Martingales: An Arbitrage Approach.” Quantitative Finance.

Taylor, James W. 2008. “Using Exponentially Weighted Quantile Regression to Estimate Value at Risk and Expected Shortfall.” Journal of Financial Econometrics.

Taylor, Sean J., and Letham. 2017. “Forecasting at Scale.” e3190v2.

Uematsu. 2015. “Penalized Likelihood Estimation in High-Dimensional Time Series Models and Its Application.” arXiv:1504.06706 [Math, Stat].

Wang, Shixiong, Li, and Lim. 2021. “Why Are the ARIMA and SARIMA Not Sufficient.”

Wang, Wei, Rothschild, Goel, et al. 2015. “Forecasting Elections with Non-Representative Polls.” International Journal of Forecasting.

Wen, Torkkola, and Narayanaswamy. 2017. “A Multi-Horizon Quantile Recurrent Forecaster.” arXiv:1711.11053 [Stat].

Werbos. 1988. “Generalization of Backpropagation with Application to a Recurrent Gas Market Model.” Neural Networks.

Werner, Helmstetter, Jackson, et al. 2010. “Adaptively Smoothed Seismicity Earthquake Forecasts for Italy.” Annals of Geophysics.

West. 1993. “Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models.”

Wood, Rosen, and Kohn. 2011. “Bayesian Mixtures of Autoregressive Models.” Journal of Computational and Graphical Statistics.

Zeevi, Assaf J., Meir, and Adler. 1996. “Time Series Prediction Using Mixtures of Experts.” In Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS’96.

Zeevi, Assaf, Meir, and Adler. 1999. “Non-Linear Models for Time Series Using Mixtures of Autoregressive Models.”