Published on Aug. 17, 2020, 6:17 a.m. - Sports: Tennis

Rating systems are very popular to rank players and teams especially in sports such as tennis, chess or go. In this article Python is used to build a rating system for tennis betting which is evaluated on historical data.

In sports rating systems are used **to rank individual players or teams by their skills or strength**. Rating systems are typically based on past results which are used to derive a rating factor for each player or team. Prior to the first match of a player or team, the rating is initialised with a certain value. Ratings are then continuously updated to reflect recent matches. At any point in time a numerical value -the so called rating- can be calculated for the player or team. Ratings are typically presented and published in tables but in this article I will also use charts to show how the ranking evolves over time.

One example of such rating is the Universal Tennis Rating (UTR), a popular tennis rating. You can navigate on their site and find a table with the ratings of tennis players. At the time of writing (17/08/20) Novak Djokovic and Bianca Andreescu are rated as the best tennis player.

In this article **Python** is used to develop rating models. We start with a naive rating system and also look at well established rating models such as **Elo** and **Glicko**. tennis-data.co.uk is used as data source to develop and test the models. A simple betting strategy based on the rating models is developed in Python and tested on historical data.

Before I start with the development of rating models I will just import some Python packages and load the data from tennis-data.co.uk into a single pandas DataFrame:

```
import io
import pandas as pd
import requests
from typing import Tuple
import zipfile
%matplotlib inline
def get_df(
years: Tuple[int] = (2018, 2019, 2020)
) -> pd.DataFrame:
df = pd.DataFrame()
for year in years:
url = f"http://www.tennis-data.co.uk/{year}/{year}.zip"
r = requests.get(url, stream=True)
assert r.ok
z = zipfile.ZipFile(io.BytesIO(r.content))
xlfile = z.open(f"{year}.xlsx")
ydf = pd.read_excel(
xlfile,
usecols=["Date", "Winner", "Loser", "B365W", "B365L"],
parse_dates=["Date"]
)
df = pd.concat([df, ydf], ignore_index=True)
return df
df = get_df()
df.head()
```

The dataframe should then have the following structure and includes the date of the tennis match, the players and the odds:

Date | Winner | Loser | B365W | B365L | |
---|---|---|---|---|---|

0 | 2017-12-31 | Dolgopolov O. | Schwartzman D. | 2.20 | 1.61 |

1 | 2017-12-31 | De Minaur A. | Johnson S. | 2.75 | 1.40 |

2 | 2018-01-01 | Harrison R. | Mayer L. | 1.61 | 2.20 |

3 | 2018-01-01 | Ebden M. | Tiafoe F. | 2.50 | 1.50 |

4 | 2018-01-01 | Zverev M. | Smith J.P. | 1.40 | 2.75 |

Next, I would like to start off with a very simple rating model: If a tennis player wins a match, he gets one point added to his rating. If he loses the match, his rating is reduced by one point. The rating of a player at any time can be simply calculated by number of wins minus number of losses.

```
# naive approach
point_assignment = {
"Winner": 1,
"Loser": -1
}
rating = pd.DataFrame(
columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
index=df["Date"].unique(),
data=0,
)
for index, row in df.iterrows():
for r, point in point_assignment.items():
rating.loc[row["Date"], row[r]] += point
rating = rating.cumsum()
rating.sort_values(by=rating.index[-1], axis=1, ascending=False, inplace=True)
rating.plot(figsize=(15, 8))
```

Here we can see one issue with such rating model: **rating inflation**. Over time the difference between maximum rating and minimum rating is increasing which makes it difficult to convert ratings or rating differences into odds. Again, a naive approach is to normalise the ratings every day to a range between 0 and 1:

```
rating = rating.subtract(rating.min(axis=1), axis=0).div(rating.max(axis=1)-rating.min(axis=1), axis=0)
rating.sort_values(by=rating.index[-1], axis=1, ascending=False, inplace=True)
rating.plot(figsize=(15, 8))
```

Next, I would like to use the rating model as part of a tennis betting strategy. For this I need to convert the ratings of the model into winning probabilities. In my naive model the strongest player has a maximum rating of 1, weakest player minimum of 0. To obtain the winning probability I simply divide the players rating by the sum of the ratings of both players. The odds are then just the inverse of the winning probabilities:

```
def get_rating(date, selection):
try:
return rating.loc[date - pd.Timedelta(days=1), selection]
except KeyError:
return 0.5
live = df[df["Date"] >= "2019-01-01"]
for selection in ("Winner", "Loser"):
live[f"Rating{selection}"] = live.apply(lambda x: get_rating(x["Date"], x[selection]), axis=1)
live[f"CalculatedOddsWinner"] = 1 / (live["RatingWinner"] / (live["RatingWinner"] + live["RatingLoser"]))
live[f"CalculatedOddsLoser"] = 1 / (live["RatingLoser"] / (live["RatingWinner"] + live["RatingLoser"]))
live.head()
```

Date | Winner | Loser | B365W | B365L | RatingWinner | RatingLoser | CalculatedOddsWinner | CalculatedOddsLoser | |
---|---|---|---|---|---|---|---|---|---|

2642 | 2019-01-01 | Kudla D. | Fritz T. | 2.62 | 1.44 | 0.228070 | 0.350877 | 2.538462 | 1.650000 |

2643 | 2019-01-01 | Chardy J. | Struff J.L. | 2.10 | 1.66 | 0.333333 | 0.263158 | 1.789474 | 2.266667 |

2644 | 2019-01-01 | Murray A. | Duckworth J. | 1.28 | 3.50 | 0.298246 | 0.210526 | 1.705882 | 2.416667 |

2645 | 2019-01-01 | Kyrgios N. | Harrison R. | 1.40 | 2.75 | 0.473684 | 0.280702 | 1.592593 | 2.687500 |

2646 | 2019-01-01 | Tsonga J.W. | Kokkinakis T. | 2.25 | 1.57 | 0.263158 | 0.228070 | 1.866667 | 2.153846 |

For the backtest on historical tennis data I only place a bet, if the available odds of the bookmaker are larger than the odds derived from my rating model. Flat staking is used with 1 point per selection. For the evaluation of the model matches after the 1st January 2019 are considered. With the matches prior to this date we give the algorithm some time to initialise and calculate more appropriate ratings. It is very important to make sure that you do not introduce a **look-ahead bias** in your backtest. You should only use the ratings that are available prior to the match and not the ones that are updated with the match itself.

```
def bet(x):
if x["B365W"] > x["CalculatedOddsWinner"]:
return x["B365W"] - 1
if x["B365L"] > x["CalculatedOddsLoser"]:
return -1
return 0
live["Profit"] = live.apply(lambda x: bet(x), axis=1)
live["Profit"].cumsum().plot()
```

The outcome of the backtest is disappointing but not a real surprise: With such simple model you are not able to beat the bookie and would have accumulated a loss of -268.6 points. The Pearson correlation between the odds of the model and the odds of the bookie are 0.45 for both, the winner and the loser.

After the simple rating model I will now focus on a more established rating system with the hope to get a better result: The Elo rating system which was developed by the physics professor Arpad Elo.

There are Python packages available that contain an implementation of the Elo rating model, such as elo or EloPy. However, the model itself doesn't seem to be too complex so I decided to quickly implement it by myself. I feel that I get a much better understanding of the model when writing your own code instead of just using a readily available API.

Two functions are defined for the Elo rating: The first one calculates the winning probability given the ratings of the two opponents. The second one is the formula to update the ranking. I iterate over all the matches and after each match, the Elo rating for the two players is updated. The ratings are initialised with a value of 1500. Panda's forwardfill (ffill) is used to complete the missing ratings.

```
# Elo rating
def winning_prob(r_player, r_opponent):
return 10**(r_player/400) / ( 10**(r_player/400) + 10**(r_opponent/400) )
def update_rating(rating, score, expected_score, K=16):
return rating + K * (score - expected_score)
rating = pd.DataFrame(
columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
index=df["Date"].unique(),
)
current_rating = {}
for index, row in df.iterrows():
for selection in ("Winner", "Loser"):
try:
current_rating[selection] = rating[row[selection]].dropna()[-1]
except:
current_rating[selection] = 1500
winner_prob = winning_prob(current_rating["Winner"], current_rating["Loser"])
rating.loc[row["Date"], row["Winner"]] = update_rating(current_rating["Winner"], 1, winner_prob)
rating.loc[row["Date"], row["Loser"]] = update_rating(current_rating["Loser"], 0, 1-winner_prob)
rating.ffill(inplace=True)
rating.plot(figsize=(15, 8))
```

The chart with the rating over time looks a bit similar to the one of the previous rating model. Again, rating inflation is an issue here but there are also other challenges: In this example I simply select a **K value** of 16. The K value determines by how much the rating is adjusted at every update. Selecting the K value is a trade off between putting more weight on recent matches and not being too sensitive to last matches. Another issue here is the player inactivity: The rating stays constant if a player does not participate in any match.

We will now use the Elo ratings to derive odds for the winner of the tennis match:

```
# calculate odds for Elo
live = df[df["Date"] >= "2019-01-01"]
for selection in ("Winner", "Loser"):
live[f"Rating{selection}"] = live.apply(lambda x: get_rating(x["Date"], x[selection]), axis=1)
live[f"CalculatedOddWinner"] = 1 / winning_prob(live["RatingWinner"], live["RatingLoser"])
live[f"CalculatedOddLoser"] = 1 / winning_prob(live["RatingLoser"], live["RatingWinner"])
live.head()
```

Date | Winner | Loser | B365W | B365L | RatingWinner | RatingLoser | CalculatedOddsWinner | CalculatedOddsLoser | |
---|---|---|---|---|---|---|---|---|---|

2642 | 2019-01-01 | Kudla D. | Fritz T. | 2.62 | 1.44 | 1486.354191 | 1524.375587 | 2.244668 | 1.803427 |

2643 | 2019-01-01 | Chardy J. | Struff J.L. | 2.10 | 1.66 | 1517.900391 | 1524.942959 | 2.041373 | 1.960270 |

2644 | 2019-01-01 | Murray A. | Duckworth J. | 1.28 | 3.50 | 1512.969943 | 1474.196447 | 1.799956 | 2.250068 |

2645 | 2019-01-01 | Kyrgios N. | Harrison R. | 1.40 | 2.75 | 1559.722152 | 1501.656042 | 1.715871 | 2.396900 |

2646 | 2019-01-01 | Tsonga J.W. | Kokkinakis T. | 2.25 | 1.57 | 1494.368434 | 1480.952349 | 1.925678 | 2.080290 |

Again we use the same tennis betting strategy as before: A back bet of stake 1 point is used per selection if the odds of the bookie are larger than the odds of our model:

```
live["Profit"] = live.apply(lambda x: bet(x), axis=1)
live["Profit"].cumsum().plot()
```

The backtest for the Elo rating system has a slightly better outcome compared to the naive rating model but still far away from generating any profit: The result is a loss of -265.6 points. The correlation between odds of the bookie and the model are 0.59 for the winner and 0.69 for the loser which means that the Elo model better reflects the odds of the bookie than our previously developed naive rating model.

Looking at the drawbacks of the Elo rating model I though it would make sense to try out a more sophisticated rating system and I ended up looking at the Glicko rating system. In addition to the ratings it introduces a rating deviation which reflects how certain the model is about a player's rating.

I was following the wiki article on the Glicko rating model and implemented it in Python. As a rating period I choose one months which means that the player's ratings get updated every month.

```
# Glicko https://en.wikipedia.org/wiki/Glicko_rating_system
import math
c = ((350**2 - 50**2)/100)**0.5
q = math.log(10)/400
def g(rdi):
return 1 / (1 + 3 * q**2 * rdi**2 / math.pi**2)**0.5
def E(rdi, r0, ri):
return 1 / ( 1 + 10**(g(rdi)*(r0-ri)/-400))
def d2(rdi, r0, ri):
s = sum( [g(rdi[i])**2 * E(rdi[i], r0, ri[i]) * (1 - E(rdi[i], r0, ri[i])) for i in range(len(rdi))] )
return 1 / (q**2 * s)
def determine_rating_deviation(rd0, rating_period):
return min((rd0**2 + c**2*rating_period)**0.5, 350)
def determine_new_rating(r0, rd, ri, rdi, si):
s = sum([(g(rdi[i])*(si[i] - E(rdi[i], r0, ri[i]))) for i in range(len(rdi))])
if s == 0:
return r0
return r0 + q / (1/rd**2 + 1/d2(rdi, r0, ri)) * s
def determine_new_rating_deviation(rd, d2):
return ((1/rd**2 + 1/d2)**(-1))**0.5
rating = pd.DataFrame(
columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
index=pd.date_range(start=df["Date"].min(), end=df["Date"].max(), freq='MS'),
)
rating.iloc[0] = 1500
rating_deviation = pd.DataFrame(
columns = pd.concat([df["Winner"], df["Loser"]]).unique(),
index=pd.date_range(start=df["Date"].min(), end=df["Date"].max(), freq='MS'),
)
rating_deviation.iloc[0] = 350
d2_by_player = {}
for index, row in rating[1:].iterrows():
print(index)
previous_index = rating.index[rating.index.get_loc(index) - 1]
for player in rating.columns:
player_mask = ((df["Winner"] == player) | (df["Loser"] == player))
last_competition = df.loc[(df["Date"] < previous_index) & player_mask]
r0 = rating.ix[previous_index, player]
rd0 = rating_deviation.ix[previous_index, player]
time_since_last_competition = (index - last_competition["Date"].tolist()[-1]).days if len(last_competition) else 999
rd = determine_rating_deviation(rd0, time_since_last_competition/30)
rating_deviation.loc[index, player] = rd
for player in rating.columns:
player_mask = ((df["Winner"] == player) | (df["Loser"] == player))
r0 = rating.loc[previous_index, player]
rd = rating_deviation.loc[index, player]
mask = (df['Date'] > previous_index) & (df['Date'] < index) & player_mask
matches_in_period = df.loc[mask]
if len(matches_in_period):
matches_in_period["Opponent"] = matches_in_period.apply(lambda x: x["Winner"] if x["Loser"] == player else x["Loser"], axis=1)
matches_in_period["ri"] = matches_in_period.apply(lambda x: rating.loc[previous_index, x["Opponent"]] , axis=1)
matches_in_period["rdi"] = matches_in_period.apply(lambda x: rating_deviation.loc[previous_index, x["Opponent"]] , axis=1)
matches_in_period["si"] = matches_in_period.apply(lambda x: 1 if x["Winner"] == player else 0, axis=1)
new_rating = determine_new_rating(r0, rd, matches_in_period["ri"].tolist(), matches_in_period["rdi"].tolist(), matches_in_period["si"].tolist())
d2_by_player[player] = d2(matches_in_period["rdi"].tolist(), r0, matches_in_period["ri"].tolist())
else:
new_rating = determine_new_rating(r0, rd, [], [], [])
d2_by_player[player] = -1
rating.loc[index, player] = new_rating
for player in rating.columns:
rd = rating_deviation.loc[index, player]
if d2_by_player[player] > 0:
new_rd = determine_new_rating_deviation(rd, d2_by_player[player])
else:
new_rd = rd
rating_deviation.loc[index, player] = new_rd
rating.plot(figsize=(15, 8))
rating_deviation.plot(figsize=(15, 8))
```

With this code I obtain the following charts for the ratings and rating deviations:

Rating inflation seems to be less of a problem here. After a short warm-up period the range of the ratings seems pretty stable. The rating deviations are pinned to values lower than 350 which is used to initialise the rating deviations.

```
# calculate odds Glicko
def get_rating(date, selection):
return rating.loc[(rating.index <= date)].iloc[-1][selection]
live = df[df["Date"] >= "2019-01-01"]
for selection in ("Winner", "Loser"):
live[f"Rating{selection}"] = live.apply(lambda x: get_rating(x["Date"], x[selection]), axis=1)
live[f"CalculatedOddsWinner"] = 1 / winning_prob(live["RatingWinner"], live["RatingLoser"])
live[f"CalculatedOddsLoser"] = 1 / winning_prob(live["RatingLoser"], live["RatingWinner"])
live.head()
```

Date | Winner | Loser | B365W | B365L | RatingWinner | RatingLoser | CalculatedOddsWinner | CalculatedOddsLoser | |
---|---|---|---|---|---|---|---|---|---|

2642 | 2019-01-01 | Kudla D. | Fritz T. | 2.62 | 1.44 | 1412.131914 | 1528.456157 | 2.953487 | 1.511905 |

2643 | 2019-01-01 | Chardy J. | Struff J.L. | 2.10 | 1.66 | 1522.339777 | 1583.856534 | 2.424925 | 1.701792 |

2644 | 2019-01-01 | Murray A. | Duckworth J. | 1.28 | 3.50 | 1501.907316 | 1183.392904 | 1.159850 | 7.255846 |

2645 | 2019-01-01 | Kyrgios N. | Harrison R. | 1.40 | 2.75 | 1598.869593 | 1468.640148 | 1.472527 | 3.116282 |

2646 | 2019-01-01 | Tsonga J.W. | Kokkinakis T. | 2.25 | 1.57 | 1397.705261 | 1386.527033 | 1.937680 | 2.066462 |

Same procedure as before is used to backtest a betting strategy based on the Glicko model:

```
# backtest Glicko
live["Profit"] = live.apply(lambda x: bet(x), axis=1)
live["Profit"].cumsum().plot()
```

The result I obtained was a loss of -316 points which is worse than previous backtests. The correlation between odds of the model and bookie odds are 0.5 for the winner and 0.6 for the loser. To be fair, after reading about different rating models I had the expectation that the Glicko model is yielding a superior result compared to Elo or my naive rating model and hence was a bit surprised. It might well be that parameters are not optimised, I just picked 1 months as rating period and set c to the value of the wiki article. Could make sense to further investigate the parameter setting or maybe also try another time frame.

From the experiments here I can conclude that a simple rating model alone does not make a successful betting strategy. However, there are multiple possibilities to further improve the performance of the rating models such as parameter optimization for instance. It might also be possible to enrich the ratings with other betting data and maybe use it is an input for a machine learning model that is capable of predicting the outcome of a tennis match. If you have any experience with rating models for tennis please let me know in the comments.

No comments published yet.

Please log in to leave a comment.

- Genetic Programming to Create a Betting Strategy (40.26 Points Profit per Month)
- Back the Newcomer in Horse Racing (36.25 Points Profit per Month)
- Lay the Draw in Football (12.24 Points Profit per Month)

If you would like to learn more about this strategy, please do not hesitate to contact us.

Contact Us!
Do you like our content?Please share with your friends!