How to Use Betfair's Historical Data

As a Betfair customer you can download Betfair's historical exchange data. However, processing the data for analysis might be a bit of a challenge which will be discussed in this article.

The betting exchange Betfair offers historical data for download including time-stamped price data for various sports. Customers can select and download the data at https://historicdata.betfair.com either in the basic, advanced or pro category. Whilst basic data is currently for free, it only covers 1 timestamp every minute. For advanced or pro you would need to pay some money to get higher resolution data (every second for advanced or 50ms frequency for pro).

Extracting the tar Archive

Once you selected the data of interest you can download a tar archive and save it as data.tar on your system. In this post I will use the soccer data for June 2020 which has around 184MB and includes around 78.000 markets - according to Betfair's site. After downloading the data was actually larger than indicated on their website.

You might wonder how to navigate the folder structure to actually find the data. If you are familiar with Betfair's data model you might identify that the folder represent the year, month, date, event id and market id. For every market there is a archive (bz2) file, that actually contains the market data.

Historical Betfair Exchange Data Folder Structure

Looking at the Market Data Files (bz2 Files)

Next we will address the question how we can actually view the data. After extracting a bz2 file and opening it with the editor you will see something like the following:

{"op":"mcm","clk":"1234","pt":1234,"mc":[{"id":"1.1234","marketDefinition":{"bspMarket":false,"turnInPlayEnabled":true,"persistenceEnabled":true,"marketBaseRate":1.2,"eventId":"1234","eventTypeId":"1","numberOfWinners":1}}]}
{"op":"mcm","clk":"12345","pt":12345,"mc":[{"id":"1.1234","rc":[{"ltp":1.9,"id":1}]}]}
{"op":"mcm","clk":"12345","pt":12345,"mc":[{"id":"1.1234","rc":[{"ltp":1.6,"id":2}]}]}
{"op":"mcm","clk":"12345","pt":12345,"mc":[{"id":"1.1234","rc":[{"ltp":4.3,"id":1}]}]}
{"op":"mcm","clk":"12345","pt":12345,"mc":[{"id":"1.1234","rc":[{"ltp":6.9,"id":1}]}]}
{"op":"mcm","clk":"12345","pt":12345,"mc":[{"id":"1.1234","rc":[{"ltp":4.1,"id":1}]}]}

You might be surprised by the data format and processing the data for some useful analysis might impose a challenge. Betfair provides a pdf with an explanation of the data format.

Processing Betfair's Historical Exchange Data with Python

Since the raw data might be very handy for analysis I would like to develop a Python script that can extract the last price traded for selections in a betting market.

import pandas as pd
import json
from bz2 import BZ2File

file_name = "1.170604916.bz2"

with BZ2File(file_name, "r") as f:
    lines = f.readlines()

price_changes = []

for line in lines:
    content = json.loads(line)
    market_changes = content.get("mc", [])
    for market_change in market_changes:
        market_id = market_change.get("id")
        market_definition = market_change.get("marketDefinition")

        runner_changes = market_change.get("rc")
        if runner_changes:
            for runner_change in runner_changes:
                print(runner_change)
                price_changes.append([content.get("pt"), runner_change.get("id"), runner_change.get("ltp")])

df = pd.DataFrame(price_changes, columns=["timestamp", "selection_id", "price"])
df.to_csv("prices.csv", index=False)

With the Python script I basically extract a bz2 file into memory and iterate over all the lines. For every line the JSON is parsed and then I check for market changes and filter further for runner changes. Every runner change is added to a list and then a Pandas dataframe object is created from that list. In the end a csv file is created containing the timestamp, the selection id and the last price traded. The csv file looks like the following extract and is in a format that can be easily worked with in Excel for instance:

  timestamp selection_id price
0 123456789 1 3.75
1 123456789 2 3.65
2 123456789 3 1.94

Saving Betfair's Historical Data for Analysis

If you would like to save the raw data or some processed data that is still close to the raw data and if you would like to keep the JSON format then a NoSQL JSON storage such as MongoDB might be a good option. PostgreSQL now also has some JSON support and might be a viable option to store the data. Again, it depends a lot on how the extracted data looks like and what kind of analysis you'd like to do.

If your data processing pipeline generates some relational data then again PostgreSQL might be a solid option to save the processed data. If you prefer to analyse and view the data in software like Excel, then saving the data as csv file might be the best option. It seems that Betfair also published a Excel file on their github that includes a VBA macro to process the raw historical data and draws a chart with the last price traded per selection.

Summary

For many customers Betfair's Historical Exchange data might not be very intuitive to use due to the data format and the complexity of the data structure. However, the great detail of the data allows all kind of backtests and with some code the data can be transformed into simple csv-like extracts that can be easily analysed in Excel or similar software to successfully test betting systems on historical data.

Do you like our content? Please share with your friends!

Share on Facebook Share on Twitter

Comments

No comments published yet.

Please log in to leave a comment.