A Bayesian Model for Football Scores using the Bookmakers Odds

Leonardo EgidiFrancesco PauliNicola Torelli

The plots provided in the links below are derived from a draft of paper in which a hierarchical Bayesian model is proposed for the home and the away goals, respectively modeled through two conditionally independent Poisson distributions. The Poisson rates are convex combinations of two parameters weighted with a certain probability, and they refer to two separate sources of information: the historical data and the bookmakers’ betting odds. Preliminary simulations and predictions show a good predictive accuracy on hold-out data and good efficiency in terms of betting strategies.

See the graphs


Andreas Groll, Christophe Ley, Gunther Schauberger, Hans Van Eetvelde

In this work, we compare three different modeling approaches for the scores of soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 – 2014: Poisson regression models, random forests and ranking methods. While the former two are based on the teams’ covariate information, the latter method estimates adequate ability parameters that reflect the current strength of the teams best. Within this comparison the best-performing prediction methods on the training data turn out to be the ranking methods and the random forests. However, we show that by combining the random forest with the team ability parameters from the ranking methods as an additional covariate we can improve the predictive power substantially. Finally, this combination of methods is chosen as the final model and based on its estimates, the FIFA World Cup 2018 is simulated repeatedly and winning probabilities are obtained for all teams. The model slightly favors Spain before the defending champion Germany. Additionally, we provide survival probabilities for all teams and at all tournament stages as well as the most probable tournament outcome.

Scientific paper

RUSSIA WORLD CUP 2018: a multinomial logistic model: Who wins the world cup?

Leonardo Egidi, Nicola Torelli

In this document a Bayesian multinomial logistic model for the soccer Russia FIFA World Cup 2018 is presented. Non-statisticians may skip model characteristics and focus their attention on predictions, whereas statisticians may enjoy the model formulation and the fitting steps outlined in the Model section.

Read the document


Luca MalfattiGianluca Rosso

The aim of this document is to analyze, by means of descriptive statistics, group B of the Piedmontese regional championship 2017/2018 – category “Giovanissimi” FB – where 26 games have been played. The study has been devoleped within the project Quant4Sport.

Read the document