Use a Rating Model to Predict Tennis Matches

Published on Aug. 17, 2020, 6:17 a.m. - Sports: Tennis

Rating systems are very popular to rank players and teams especially in sports such as tennis, chess or go. In this article Python is used to build a rating system for tennis betting which is evaluated on historical data.

What is a rating system in sports?

In sports rating systems are used to rank individual players or teams by their skills or strength. Rating systems are typically based on past results which are used to derive a rating factor for each player or team. Prior to the first match of a player or team, the rating is initialised with a certain value. Ratings are then continuously updated to reflect recent matches. At any point in time a numerical value -the so called rating- can be calculated for the player or team. Ratings are typically presented and published in tables but in this article I will also use charts to show how the ranking evolves over time.

One example of such rating is the Universal Tennis Rating (UTR), a popular tennis rating. You can navigate on their site and find a table with the ratings of tennis players. At the time of writing (17/08/20) Novak Djokovic and Bianca Andreescu are rated as the best tennis player.

In this article Python is used to develop rating models. We start with a naive rating system and also look at well established rating models such as Elo and Glicko. tennis-data.co.uk is used as data source to develop and test the models. A simple betting strategy based on the rating models is developed in Python and tested on historical data.

Build your own tennis rating model with Python

Before I start with the development of rating models I will just import some Python packages and load the data from tennis-data.co.uk into a single pandas DataFrame:

import io
import pandas as pd
import requests
from typing import Tuple
import zipfile
%matplotlib inline

def get_df(
    years: Tuple[int] = (2018, 2019, 2020)
) -> pd.DataFrame:
    df = pd.DataFrame()
    for year in years:
        url = f"http://www.tennis-data.co.uk/{year}/{year}.zip"
        r = requests.get(url, stream=True)
        assert r.ok
        z = zipfile.ZipFile(io.BytesIO(r.content))
        xlfile = z.open(f"{year}.xlsx")
        ydf = pd.read_excel(
            xlfile,
            usecols=["Date", "Winner", "Loser", "B365W", "B365L"],
            parse_dates=["Date"]
        )
        df = pd.concat([df, ydf], ignore_index=True)
    return df

df = get_df()
df.head()

The dataframe should then have the following structure and includes the date of the tennis match, the players and the odds:

  Date Winner Loser B365W B365L
0 2017-12-31 Dolgopolov O. Schwartzman D. 2.20 1.61
1 2017-12-31 De Minaur A. Johnson S. 2.75 1.40
2 2018-01-01 Harrison R. Mayer L. 1.61 2.20
3 2018-01-01 Ebden M. Tiafoe F. 2.50 1.50
4 2018-01-01 Zverev M. Smith J.P. 1.40 2.75

Next, I would like to start off with a very simple rating model: If a tennis player wins a match, he gets one point added to his rating. If he loses the match, his rating is reduced by one point. The rating of a player at any time can be simply calculated by number of wins minus number of losses.

# naive approach
point_assignment = {
    "Winner": 1,
    "Loser": -1
}
rating = pd.DataFrame(
    columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
    index=df["Date"].unique(),
    data=0,
)

for index, row in df.iterrows():
    for r, point in point_assignment.items():
        rating.loc[row["Date"], row[r]] += point
        
rating = rating.cumsum()
rating.sort_values(by=rating.index[-1], axis=1, ascending=False, inplace=True)
rating.plot(figsize=(15, 8))

Naive Tennis Rating System

Here we can see one issue with such rating model: rating inflation. Over time the difference between maximum rating and minimum rating is increasing which makes it difficult to convert ratings or rating differences into odds. Again, a naive approach is to normalise the ratings every day to a range between 0 and 1:

Naive Tennis Rating System Normalised

rating = rating.subtract(rating.min(axis=1), axis=0).div(rating.max(axis=1)-rating.min(axis=1), axis=0)
rating.sort_values(by=rating.index[-1], axis=1, ascending=False, inplace=True)
rating.plot(figsize=(15, 8))

Converting ratings into betting odds

Next, I would like to use the rating model as part of a tennis betting strategy. For this I need to convert the ratings of the model into winning probabilities. In my naive model the strongest player has a maximum rating of 1, weakest player minimum of 0. To obtain the winning probability I simply divide the players rating by the sum of the ratings of both players. The odds are then just the inverse of the winning probabilities:

def get_rating(date, selection):
    try:
        return rating.loc[date - pd.Timedelta(days=1), selection]
    except KeyError:
        return 0.5

live = df[df["Date"] >= "2019-01-01"]
for selection in ("Winner", "Loser"):
    live[f"Rating{selection}"] = live.apply(lambda x: get_rating(x["Date"], x[selection]), axis=1)
live[f"CalculatedOddsWinner"] = 1 / (live["RatingWinner"] / (live["RatingWinner"] + live["RatingLoser"]))
live[f"CalculatedOddsLoser"] = 1 / (live["RatingLoser"] / (live["RatingWinner"] + live["RatingLoser"]))
live.head()
  Date Winner Loser B365W B365L RatingWinner RatingLoser CalculatedOddsWinner CalculatedOddsLoser
2642 2019-01-01 Kudla D. Fritz T. 2.62 1.44 0.228070 0.350877 2.538462 1.650000
2643 2019-01-01 Chardy J. Struff J.L. 2.10 1.66 0.333333 0.263158 1.789474 2.266667
2644 2019-01-01 Murray A. Duckworth J. 1.28 3.50 0.298246 0.210526 1.705882 2.416667
2645 2019-01-01 Kyrgios N. Harrison R. 1.40 2.75 0.473684 0.280702 1.592593 2.687500
2646 2019-01-01 Tsonga J.W. Kokkinakis T. 2.25 1.57 0.263158 0.228070 1.866667 2.153846

For the backtest on historical tennis data I only place a bet, if the available odds of the bookmaker are larger than the odds derived from my rating model. Flat staking is used with 1 point per selection. For the evaluation of the model matches after the 1st January 2019 are considered. With the matches prior to this date we give the algorithm some time to initialise and calculate more appropriate ratings. It is very important to make sure that you do not introduce a look-ahead bias in your backtest. You should only use the ratings that are available prior to the match and not the ones that are updated with the match itself.

def bet(x):
    if x["B365W"] > x["CalculatedOddsWinner"]:
        return x["B365W"] - 1
    if x["B365L"] > x["CalculatedOddsLoser"]:
        return -1
    return 0
    
live["Profit"] = live.apply(lambda x: bet(x), axis=1)
live["Profit"].cumsum().plot()

The outcome of the backtest is disappointing but not a real surprise: With such simple model you are not able to beat the bookie and would have accumulated a loss of -268.6 points. The Pearson correlation between the odds of the model and the odds of the bookie are 0.45 for both, the winner and the loser.

Building a Elo rating system for tennis with Python

After the simple rating model I will now focus on a more established rating system with the hope to get a better result: The Elo rating system which was developed by the physics professor Arpad Elo.

There are Python packages available that contain an implementation of the Elo rating model, such as elo or EloPy. However, the model itself doesn't seem to be too complex so I decided to quickly implement it by myself. I feel that I get a much better understanding of the model when writing your own code instead of just using a readily available API.

Two functions are defined for the Elo rating: The first one calculates the winning probability given the ratings of the two opponents. The second one is the formula to update the ranking. I iterate over all the matches and after each match, the Elo rating for the two players is updated. The ratings are initialised with a value of 1500. Panda's forwardfill (ffill) is used to complete the missing ratings.

# Elo rating

def winning_prob(r_player, r_opponent):
    return 10**(r_player/400) / ( 10**(r_player/400) + 10**(r_opponent/400) )


def update_rating(rating, score, expected_score, K=16):
    return rating + K * (score - expected_score)


rating = pd.DataFrame(
    columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
    index=df["Date"].unique(),
)

current_rating = {}
for index, row in df.iterrows():
    for selection in ("Winner", "Loser"):
        try:
            current_rating[selection] = rating[row[selection]].dropna()[-1]
        except:
            current_rating[selection] = 1500
    winner_prob = winning_prob(current_rating["Winner"], current_rating["Loser"])
    rating.loc[row["Date"], row["Winner"]] = update_rating(current_rating["Winner"], 1, winner_prob)
    rating.loc[row["Date"], row["Loser"]] = update_rating(current_rating["Loser"], 0, 1-winner_prob)

rating.ffill(inplace=True)
rating.plot(figsize=(15, 8))

Elo Rating System for Tennis

The chart with the rating over time looks a bit similar to the one of the previous rating model. Again, rating inflation is an issue here but there are also other challenges: In this example I simply select a K value of 16. The K value determines by how much the rating is adjusted at every update. Selecting the K value is a trade off between putting more weight on recent matches and not being too sensitive to last matches. Another issue here is the player inactivity: The rating stays constant if a player does not participate in any match.

We will now use the Elo ratings to derive odds for the winner of the tennis match:

# calculate odds for Elo
live = df[df["Date"] >= "2019-01-01"]
for selection in ("Winner", "Loser"):
    live[f"Rating{selection}"] = live.apply(lambda x: get_rating(x["Date"], x[selection]), axis=1)
live[f"CalculatedOddWinner"] = 1 / winning_prob(live["RatingWinner"], live["RatingLoser"])
live[f"CalculatedOddLoser"] = 1 / winning_prob(live["RatingLoser"], live["RatingWinner"])
live.head()
  Date Winner Loser B365W B365L RatingWinner RatingLoser CalculatedOddsWinner CalculatedOddsLoser
2642 2019-01-01 Kudla D. Fritz T. 2.62 1.44 1486.354191 1524.375587 2.244668 1.803427
2643 2019-01-01 Chardy J. Struff J.L. 2.10 1.66 1517.900391 1524.942959 2.041373 1.960270
2644 2019-01-01 Murray A. Duckworth J. 1.28 3.50 1512.969943 1474.196447 1.799956 2.250068
2645 2019-01-01 Kyrgios N. Harrison R. 1.40 2.75 1559.722152 1501.656042 1.715871 2.396900
2646 2019-01-01 Tsonga J.W. Kokkinakis T. 2.25 1.57 1494.368434 1480.952349 1.925678 2.080290

Again we use the same tennis betting strategy as before: A back bet of stake 1 point is used per selection if the odds of the bookie are larger than the odds of our model:

live["Profit"] = live.apply(lambda x: bet(x), axis=1)
live["Profit"].cumsum().plot()

The backtest for the Elo rating system has a slightly better outcome compared to the naive rating model but still far away from generating any profit: The result is a loss of -265.6 points. The correlation between odds of the bookie and the model are 0.59 for the winner and 0.69 for the loser which means that the Elo model better reflects the odds of the bookie than our previously developed naive rating model.

Glicko rating system for tennis implemented in Python

Looking at the drawbacks of the Elo rating model I though it would make sense to try out a more sophisticated rating system and I ended up looking at the Glicko rating system. In addition to the ratings it introduces a rating deviation which reflects how certain the model is about a player's rating.

I was following the wiki article on the Glicko rating model and implemented it in Python. As a rating period I choose one months which means that the player's ratings get updated every month.

# Glicko https://en.wikipedia.org/wiki/Glicko_rating_system
import math

c = ((350**2 - 50**2)/100)**0.5
q = math.log(10)/400


def g(rdi):
    return 1 / (1 + 3 * q**2 * rdi**2 / math.pi**2)**0.5


def E(rdi, r0, ri):
    return 1 / ( 1 + 10**(g(rdi)*(r0-ri)/-400))


def d2(rdi, r0, ri):
    s = sum( [g(rdi[i])**2 * E(rdi[i], r0, ri[i]) * (1 - E(rdi[i], r0, ri[i])) for i in range(len(rdi))] )
    return 1 / (q**2 * s)


def determine_rating_deviation(rd0, rating_period):
    return min((rd0**2 + c**2*rating_period)**0.5, 350)


def determine_new_rating(r0, rd, ri, rdi, si):
    s = sum([(g(rdi[i])*(si[i] - E(rdi[i], r0, ri[i]))) for i in range(len(rdi))])
    if s == 0:
        return r0
    return r0 + q / (1/rd**2 + 1/d2(rdi, r0, ri)) * s


def determine_new_rating_deviation(rd, d2):
    return ((1/rd**2 + 1/d2)**(-1))**0.5
    
    
rating = pd.DataFrame(
    columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
    index=pd.date_range(start=df["Date"].min(), end=df["Date"].max(), freq='MS'),
)
rating.iloc[0] = 1500

rating_deviation = pd.DataFrame(
    columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
    index=pd.date_range(start=df["Date"].min(), end=df["Date"].max(), freq='MS'),
)
rating_deviation.iloc[0] = 350

d2_by_player = {}

for index, row in rating[1:].iterrows():
    print(index)
    previous_index = rating.index[rating.index.get_loc(index) - 1]
    for player in rating.columns:
        player_mask = ((df["Winner"] == player) | (df["Loser"] == player))
        last_competition = df.loc[(df["Date"] < previous_index) & player_mask]
        r0 = rating.ix[previous_index, player]
        rd0 = rating_deviation.ix[previous_index, player]
        time_since_last_competition = (index - last_competition["Date"].tolist()[-1]).days if len(last_competition) else 999
        rd = determine_rating_deviation(rd0, time_since_last_competition/30) 
        rating_deviation.loc[index, player] = rd
    for player in rating.columns:
        player_mask = ((df["Winner"] == player) | (df["Loser"] == player))
        r0 = rating.loc[previous_index, player]
        rd = rating_deviation.loc[index, player]
        mask = (df['Date'] > previous_index) & (df['Date'] < index) & player_mask
        matches_in_period = df.loc[mask]
        if len(matches_in_period):
            matches_in_period["Opponent"] = matches_in_period.apply(lambda x: x["Winner"] if x["Loser"] == player else x["Loser"], axis=1)
            matches_in_period["ri"] = matches_in_period.apply(lambda x: rating.loc[previous_index, x["Opponent"]] , axis=1)
            matches_in_period["rdi"] = matches_in_period.apply(lambda x: rating_deviation.loc[previous_index, x["Opponent"]] , axis=1)
            matches_in_period["si"] = matches_in_period.apply(lambda x: 1 if x["Winner"] == player else 0, axis=1)
            new_rating = determine_new_rating(r0, rd, matches_in_period["ri"].tolist(), matches_in_period["rdi"].tolist(), matches_in_period["si"].tolist())
            d2_by_player[player] = d2(matches_in_period["rdi"].tolist(), r0, matches_in_period["ri"].tolist())
        else:
            new_rating = determine_new_rating(r0, rd, [], [], [])
            d2_by_player[player] = -1
        rating.loc[index, player] = new_rating
    for player in rating.columns:
        rd = rating_deviation.loc[index, player]
        if d2_by_player[player] > 0:
            new_rd = determine_new_rating_deviation(rd, d2_by_player[player])
        else:
            new_rd = rd
        rating_deviation.loc[index, player] = new_rd

rating.plot(figsize=(15, 8))
rating_deviation.plot(figsize=(15, 8))

With this code I obtain the following charts for the ratings and rating deviations:

Glicko Rating System for Tennis

Glicko Rating Deviation for Tennis

Rating inflation seems to be less of a problem here. After a short warm-up period the range of the ratings seems pretty stable. The rating deviations are pinned to values lower than 350 which is used to initialise the rating deviations.

# calculate odds Glicko
def get_rating(date, selection):
    return rating.loc[(rating.index <= date)].iloc[-1][selection]


live = df[df["Date"] >= "2019-01-01"]
for selection in ("Winner", "Loser"):
    live[f"Rating{selection}"] = live.apply(lambda x: get_rating(x["Date"], x[selection]), axis=1)
live[f"CalculatedOddsWinner"] = 1 / winning_prob(live["RatingWinner"], live["RatingLoser"])
live[f"CalculatedOddsLoser"] = 1 / winning_prob(live["RatingLoser"], live["RatingWinner"])
live.head()
  Date Winner Loser B365W B365L RatingWinner RatingLoser CalculatedOddsWinner CalculatedOddsLoser
2642 2019-01-01 Kudla D. Fritz T. 2.62 1.44 1412.131914 1528.456157 2.953487 1.511905
2643 2019-01-01 Chardy J. Struff J.L. 2.10 1.66 1522.339777 1583.856534 2.424925 1.701792
2644 2019-01-01 Murray A. Duckworth J. 1.28 3.50 1501.907316 1183.392904 1.159850 7.255846
2645 2019-01-01 Kyrgios N. Harrison R. 1.40 2.75 1598.869593 1468.640148 1.472527 3.116282
2646 2019-01-01 Tsonga J.W. Kokkinakis T. 2.25 1.57 1397.705261 1386.527033 1.937680 2.066462

Same procedure as before is used to backtest a betting strategy based on the Glicko model:

# backtest Glicko
live["Profit"] = live.apply(lambda x: bet(x), axis=1)
live["Profit"].cumsum().plot()

The result I obtained was a loss of -316 points which is worse than previous backtests. The correlation between odds of the model and bookie odds are 0.5 for the winner and 0.6 for the loser. To be fair, after reading about different rating models I had the expectation that the Glicko model is yielding a superior result compared to Elo or my naive rating model and hence was a bit surprised. It might well be that parameters are not optimised, I just picked 1 months as rating period and set c to the value of the wiki article. Could make sense to further investigate the parameter setting or maybe also try another time frame.

Summary on rating models for tennis betting

From the experiments here I can conclude that a simple rating model alone does not make a successful betting strategy. However, there are multiple possibilities to further improve the performance of the rating models such as parameter optimization for instance. It might also be possible to enrich the ratings with other betting data and maybe use it is an input for a machine learning model that is capable of predicting the outcome of a tennis match. If you have any experience with rating models for tennis please let me know in the comments.

Do you like our content? Please share with your friends!

Share on Facebook Share on Twitter

Comments

No comments published yet.

Please log in to leave a comment.

Similar Strategies
See all Strategies!
Any Questions or Suggestions?

If you would like to learn more about this strategy, please do not hesitate to contact us.

Contact Us!