Reverse Engineering Betting Strategies by Analysing Starting Price Bets

This post explains how a simple analysis of bets that are placed using starting price reveals a successful betting strategy - which really surprised us!

Some betting exchange markets, such as betfair for instance, offer the possibility to place bets at a starting price. In an earlier blog post we already warned about using starting price bets. One of the reasons is, that it might reveal a successful betting strategy. This is exactly what will be shown in this article.

Reverse Engineering Betting Strategies

The idea behind this blog post is that a bot is placing bets at a betting exchange at characteristic odd limits, at typical times (e.g. x minutes before the off) or using a characteristic staking plan. I will then deploy a simple script that collects this data from the exchange and saves it in a database for analysis. Different analysis techniques can then be used to identify patterns in the data. In this example I will focus on principal component analysis (PCA) to visualize and explore the data. Typical patterns should then lead to betting strategies that are applied in the market. It might also be possible to train a machine learning model on top of the data - but that's maybe something for another blog post in the future.

The following picture illustrates the approach:

Bets at Betfair Starting Price

For UK horse races betfair allows to place bets at starting price which means that bets are matched at the beginning of the race. For both, back and lay bets, a limit can be set and users can view the amounts and limits on the ladder interface as shown in the following picture.

The problem is that this data is visible to the public and also accessible via the betfair API.

Scraping the Data from Betfair

With a simple Python script it is possible to connect to the Betfair API and request the price data for starting price bets. A cron is scheduled every minute to poll the data from the API for races with a starting time within the next 2 hours (sampling rate is once a minute). The raw data is saved to a postgres database table with the following structure:

id timestamp marketId marketStartTime eventName selectionId runnerName side price size
1 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 26781748 Sporting John back 1.01 309.95
2 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 26781748 Sporting John back 1.80 18.07
3 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 26781748 Sporting John back 2.00 12.05
4 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 26781748 Sporting John back 2.66 7.16
5 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 26781748 Sporting John lay 1000.00 114.97
6 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 18286520 Master Debonair back 1.01 1117.47
7 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 18286520 Master Debonair back 1.73 12.05
8 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 18286520 Master Debonair back 1.80 15.66
9 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 18286520 Master Debonair back 2.66 15.54
10 2020-02-15 12:15:35 1.168759002 2020-02-15 13:15:00 Ascot 15th Feb 18286520 Master Debonair lay 1000.00 12.05

Analyzing the Betting Data

Before the actual analysis of the data another transformation is used to convert the raw data into a format that is more suitable for analysis. The script polls the accumulated amount placed at odds limits. For the analysis I am more interested in the difference of the series, hence I convert the raw data into the following structure with a Python script:

id market event_name event_dt selection_id selection_name time_offset_seconds side price size
269547 1.168759002 Ascot 15th Feb 2020-02-15 13:15:00 26781748 Sporting John 7107 0 1.01 24.09
269548 1.168759002 Ascot 15th Feb 2020-02-15 13:15:00 26781748 Sporting John 6686 0 1.01 2.89
269549 1.168759002 Ascot 15th Feb 2020-02-15 13:15:00 26781748 Sporting John 6627 0 1.01 2.41
269550 1.168759002 Ascot 15th Feb 2020-02-15 13:15:00 26781748 Sporting John 5006 0 1.01 23.50
269551 1.168759002 Ascot 15th Feb 2020-02-15 13:15:00 26781748 Sporting John 4765 0 1.01 3.62

Identifying Clusters with Principal Component Analysis

The assumption is that betting bots place bets at characteristic times before the off, use characteristic stakes and odds limits. The data is high dimensional, for a range of 120 minutes before the off I have the amount placed for every price increment for both, back and lay side. My goal is to plot the data in two dimensions with a scatter plot where one point illustrates a selection (horse). There is a technique to reduce high dimensional data which is called Principal component analysis (PCA). With the scikit-learn implementation of the PCA I get the following chart:

The next step was to look at the clusters that formed in a bit more detail. One cluster had positive return hence I looked through the selections in that cluster, trying to figure out what the selections had in common and why they showed similar behaviour in the way BSP bets were placed. Quickly it become obvious that horses giving their debut were forming the cluster. A separate backtest of such a simple strategy could reveal impressive returns over the past couple of years. Please have a look at the "Back the Newcomer in Horse Racing" betting strategy for more details.

Conclusions

Try to avoid deploying betting bots using the starting price of a betting exchange in a deterministic manner. Some obfuscation such as placing bets at random times, split in random stakes etc. might help with this. Or simply drop feed your bets into the market prior to the off not using starting price bets at all.

Scraping data from betting exchange markets and analysing the data can be a viable approach towards the development of a betting strategy through reverse engineering. With the example above a successful strategy over the past couple of years could be revealed. A simple copy and paste of the strategy would surely reduce the edge and profits will diminish over time. However, the approach can lead to new ideas, refinements and new developments. One example for further development is the training of a machine learning model on top of the BSP data, that we continue to collect which is something for another blog post.

Do you like our content? Please share with your friends!

Share on Facebook Share on Twitter

Comments

No comments published yet.

Please log in to leave a comment.