Dixon-Coles Model for Football Predictions

Published on April 11, 2020, 6:29 a.m. - Sports: Football

A betting strategy to predict the outcome of football matches based on the Dixon-Coles model is evaluated on historical data using Python code. The Dixon-Coles model to predict the outcome of football matches goes back to a scientific publication in the year of 1997. Back then Dixon and Coles, the authors of the paper, published a mathematical model that would allow to predict the result of football (soccer) matches.

If you are interested in the original publication - it is available online here.

As a quick summary: The basic idea behind the Dixon-Coles model is that for every team an attack and defense rate is derived from historical data. Similar to a Poisson-like model the probability of any score in a football match can be calculated using the attack and defense strength along with some corrections. Comparing the odds from the Dixon Coles model with the odds offered on betting exchange or bookmaker websites might be a starting point for a value betting strategy.

There are great resources available online which will help you getting started with coding up a Dixon Coles model. opisthokonta is a good starting point using the R programming language whilst dashee87.github.io focuses on a Python implementation.

In this article I will re-use parts of dashee87's implementation and add a backtest at the end of it using Pinnacle odds.

Obtaining parameters for the Dixon Coles model

First, I will load football data in a pandas dataframe, for the backtest I will be using Premier League data for the 2018/19 season:


import pandas as pd
from datetime import datetime

df = pd.read_csv(
    f"http://www.football-data.co.uk/mmz4281/1819/E0.csv",
    parse_dates=["Date"],
    date_parser=lambda x: datetime.strptime(x, "%d/%m/%Y"),
    usecols=["Date", "HomeTeam", "AwayTeam", "FTHG", "FTAG", "PSH", "PSD", "PSA"],
)

I will then re-use dashee87's implementation to calculate the attack and defense strength for each team as well as the rho correction and the home advantage factor. Fitting is done on the first 50 football matches of the season:


import numpy as np


def rho_correction(x, y, lambda_x, mu_y, rho):
    if x==0 and y==0:
        return 1- (lambda_x * mu_y * rho)
    elif x==0 and y==1:
        return 1 + (lambda_x * rho)
    elif x==1 and y==0:
        return 1 + (mu_y * rho)
    elif x==1 and y==1:
        return 1 - rho
    else:
        return 1.0
    
    
def dc_log_like(x, y, alpha_x, beta_x, alpha_y, beta_y, rho, gamma):
    lambda_x, mu_y = np.exp(alpha_x + beta_y + gamma), np.exp(alpha_y + beta_x) 
    return (np.log(rho_correction(x, y, lambda_x, mu_y, rho)) + 
            np.log(poisson.pmf(x, lambda_x)) + np.log(poisson.pmf(y, mu_y)))


from scipy.stats import poisson
from scipy.optimize import minimize

warmup_matches = 50


def solve_parameters(dataset, debug = False, init_vals=None, options={'disp': True, 'maxiter':100},
                     constraints = [{'type':'eq', 'fun': lambda x: sum(x[:20])-20}] , **kwargs):
    teams = np.sort(dataset['HomeTeam'].unique())
    # check for no weirdness in dataset
    away_teams = np.sort(dataset['AwayTeam'].unique())
    if not np.array_equal(teams, away_teams):
        raise ValueError("Something's not right")
    n_teams = len(teams)
    if init_vals is None:
        # random initialisation of model parameters
        init_vals = np.concatenate((np.random.uniform(0,1,(n_teams)), # attack strength
                                      np.random.uniform(0,-1,(n_teams)), # defence strength
                                      np.array([0, 1.0]) # rho (score correction), gamma (home advantage)
                                     ))

    def estimate_paramters(params):
        score_coefs = dict(zip(teams, params[:n_teams]))
        defend_coefs = dict(zip(teams, params[n_teams:(2*n_teams)]))
        rho, gamma = params[-2:]
        log_like = [dc_log_like(row.FTHG, row.FTAG, score_coefs[row.HomeTeam], defend_coefs[row.HomeTeam],
                     score_coefs[row.AwayTeam], defend_coefs[row.AwayTeam], rho, gamma) for row in dataset.itertuples()]
        return -sum(log_like)
    opt_output = minimize(estimate_paramters, init_vals, options=options, constraints = constraints, **kwargs)
    if debug:
        # sort of hacky way to investigate the output of the optimisation process
        return opt_output
    else:
        return dict(zip(["attack_"+team for team in teams] + 
                        ["defence_"+team for team in teams] +
                        ['rho', 'home_adv'],
                        opt_output.x))
    
params = solve_parameters(df[:warmup_matches])

Deriving a betting strategy based on the Dixon Coles model

With the Dixon-Coles model the probability of each score line in a football match can be calculated. If you would like to calculate the probability that one team wins, you would simply need to sum up all the probabilities of scores where the team scores more goals than the opponent, e.g. 1:0, 2:0, 2:1, 3:0, 3:1,... The probability of a draw is the sum of probabilities for 0:0, 1:1, 2:2, ..., simply the diagonal of the goals matrix.

The parameters of the Dixon-Coles model are used to calculate the probability for home win, draw and away win for the remaining matches of the 2018/19 premier league season:


def calc_means(param_dict, homeTeam, awayTeam):
    return [np.exp(param_dict['attack_'+homeTeam] + param_dict['defence_'+awayTeam] + param_dict['home_adv']),
            np.exp(param_dict['defence_'+homeTeam] + param_dict['attack_'+awayTeam])]


def dixon_coles_simulate_match(params_dict, homeTeam, awayTeam, max_goals=10):
    team_avgs = calc_means(params_dict, homeTeam, awayTeam)
    team_pred = [[poisson.pmf(i, team_avg) for i in range(0, max_goals+1)] for team_avg in team_avgs]
    output_matrix = np.outer(np.array(team_pred[0]), np.array(team_pred[1]))
    correction_matrix = np.array([[rho_correction(home_goals, away_goals, team_avgs[0],
                                                   team_avgs[1], params['rho']) for away_goals in range(2)]
                                   for home_goals in range(2)])
    output_matrix[:2,:2] = output_matrix[:2,:2] * correction_matrix
    return np.sum(np.tril(output_matrix, -1)), np.sum(np.diag(output_matrix)), np.sum(np.triu(output_matrix, 1))

df_after_warmup = df[warmup_matches:]
predictions = pd.DataFrame(
    [dixon_coles_simulate_match(params, row["HomeTeam"], row["AwayTeam"], max_goals=10) for index, row in df_after_warmup.iterrows()],
    columns=["HomeProba", "DrawProba", "AwayProba"]
)
predictions.set_index(df_after_warmup.index, inplace=True)
df_with_predictions = pd.concat([df_after_warmup, predictions], axis=1)
df_with_predictions


A simple value betting strategy is applied against the odds of the bookmaker Pinnacle. The odds for 1x2 are in the PSH, PSD and PSA columns. A 1 point back bet is placed on any outcome if the probability of the Dixon Coles model is larger than the probability implied by the odds:


def bet(x):
    if x["HomeProba"] > 1/x["PSA"]:
        if x["FTHG"] > x["FTAG"]:
            return x["PSH"] - 1
        else:
            return -1
    elif x["DrawProba"] > 1/x["PSD"]:
        if x["FTHG"] == x["FTAG"]:
            return x["PSD"] - 1
        else:
            return -1
    elif x["AwayProba"] > 1/x["PSA"]:
        if x["FTHG"] < x["FTAG"]:
            return x["PSA"] - 1
        else:
            return -1
    else:
        return 0
    
df_with_predictions["profit"] = df_with_predictions.apply(lambda x: bet(x), axis=1)
df_with_predictions["profit_cumsum"] = df_with_predictions["profit"].cumsum()
df_with_predictions

In the end the strategy would result in a loss of -26.93 points for the 2018/2019 season. However, multiple improvements can be made to the model and the betting strategy: It would be possible to re-evaluate the parameters of the Dixon Coles model during the season, taking into account more data. Another approach is to put more weight on recent matches when fitting the parameters (using time-decay). The parameters of the Dixon-Coles model might also serve as input to other models, like machine learning models.

Do you like our content? Please share with your friends!

Share on Facebook Share on Twitter

Comments

No comments published yet.

Please log in to leave a comment.

Similar Strategies
See all Strategies!
Any Questions or Suggestions?

If you would like to learn more about this strategy, please do not hesitate to contact us.

Contact Us!