Genetic Programming to Create a Betting Strategy

Published on Jan. 21, 2021, 4:12 p.m. - Sports: Horse Racing

Genetic Programming belongs to the field of artificial intelligence and can be used in sports betting to automatically develop a betting strategy: You start with a set of random betting rules and evolve them through selection, mutation and cross-over to find a strategy that you can use to make some profit.

With the advance of computational power new algorithms were developed that belong to the field of artificial intelligence (AI). One example of such AI algorithm is genetic programming. The idea behind genetic programming is that you have a computer program that evolves over time in a similar manner as evolution in biology: You have mutation of computer programs, cross-over and a selection process that would only allow the most promising approaches to survive.

Genetic Programming can be used along with historical betting data to automatically create betting strategies. There are easy to use frameworks that allow quick protyping in the field of evolutionary algorithms. Both, genetic programming and genetic algorithms can also be used in the context of sports betting trading.

Definition of Genetic Programming (GP)

In genetic programming you start with a population of individuals. In our example every individual is basically a betting rule. Initially, those individuals are randomly generated which means that we have a set of random betting rules.

The individuals are then evaluated according to some scoring. We use historical data to evaluate the betting rules and check how much profit we could realise in the past.

Depending on the score we make modifications to the population:

  • Selection
    The process of selection makes sure that only a sub set of the population is taken over to the next generation. Certain individuals of the population that do not score well are removed and do not survive. In the context of a betting strategy this means that only those individual betting rules survive and are taken over into the new generation if they yield a good result.
  • Mutation
    Mutation means that an individual is modified in some parts and added to the new generation. In the betting context this might mean that a single filter is removed or added.
  • Cross-Over
    Crossover means that individual are combined to generate a new individual. In the betting context this means that two betting rules are somehow combined to create a new betting rule.

All the steps mentioned above are an iterative process and we are creating multiple generations of individuals that are scored and modified over time. At the end we should have found a individual which is a great solution to the defined problem. In our case we start off with random betting rules and hopefully end up with a successful betting rule or strategy.

Genetic Programming vs Genetic Algorithms

In the area of evolutionary algorithms -which is a field of artificial intelligence- we can differentiate between several types of algorithms. In general their approach is very similar: Individuals are created as a possible solution to a problem, their fitness is evaluated using a scoring function and they evolve further through mutation, selection and cross-over to find a better solution to the problem.

Genetic Algorithms (GA) is the most popular approach in the field of evolutionary algorithms. Genetic algorithms use the same principle of evolution (selection, mutation and cross-over) to search the solution space for a given problem.

For Genetic Programming (GP) we have arules] problem statement and the solution is a computer program. Using genetic programming we evolve computer programs (functions) to find one that is the best fit for the problem. In our context the programs are functions that indicate whether we should place a bet on a certain selection or not.

Frameworks for Genetic Programming

There are plenty of evolutionary algorithm frameworks that make it easy to work with genetic programming. You don't need to specify everything from scratch and can rely on heavily used and well tested code to quickly prototype.

In the Python ecosystem there is a framework called DEAP (Distributed Evolutionary Algorithms in Python) that allows you to solve problems with genetic algorithms and genetic programming.

Example: Create a Betting Strategy with Genetic Programming

I would now like to show an example how genetic programming can be used to develop a sports betting strategy. I will use genetic programming in the context of horse racing but it could equally be used for other sports such football, tennis, etc.

The data that will be used for the development of the genetic programs is horse racing data that I obtained by extracting it from historical starting price data that is published by Betfair. The data is in a csv file and has the following structure:

market event_id event_name event_dt selection_id selection_name win_lose bsp delay previous_win prior
pricesirewin 153079544 2m Mdn Hrd 2019-01-01 12:00:00 30567 Be My Dream False 137.106990 4.0 False 15
pricesirewin 153079544 2m Mdn Hrd 2019-01-01 12:00:00 8145916 Notebook True 3.424157 46.0 False 15
pricesirewin 153079544 2m Mdn Hrd 2019-01-01 12:00:00 11930841 Claregate Street False 187.381682 66.0 False 15
pricesirewin 153079544 2m Mdn Hrd 2019-01-01 12:00:00 12236144 Willwams False 1000.000000 205.0 False 15
pricesirewin 153079544 2m Mdn Hrd 2019-01-01 12:00:00 12956422 Aunty Audrey False 177.643615 31.0 False 15

First we important required packages. We use the Python package DEAP as a framework to derive betting rules following a genetic programming approach:

import operator
import math
import random
import itertools
import pandas as pd
import numpy
from deap import algorithms
from deap import base
from deap import creator
from deap import tools
from deap import gp

Next, the data is loaded into a Pandas data frame. I separate the data into a frame with 2019 and 2020 data. The data from 2019 is used to derive the betting strategy. Later, the strategy is also evaluated on the 2020 data.

df = pd.read_csv("20200120_horses_extract.csv", parse_dates=["event_dt"])
df_2019 = df[df.event_dt.dt.year == 2019]
df_2020 = df[df.event_dt.dt.year == 2020]

For the genetic programming approach we need to define which operations can be used by the betting rules. In this example I only focus on back bets using Betfair starting prices. The profit is BSP - 1 in case the selections win. A 2% commission on the exchange is taken into account.

pset = gp.PrimitiveSetTyped("MAIN", (float, float, int), bool, "IN")

def protectedDiv(left, right):
        return left / right
    except ZeroDivisionError:
        return 1

pset.addPrimitive(operator.and_, [bool, bool], bool)
pset.addPrimitive(operator.or_, [bool, bool], bool)
pset.addPrimitive(operator.not_, [bool], bool)
pset.addPrimitive(operator.add, [float,float], float)
pset.addPrimitive(operator.sub, [float,float], float)
pset.addPrimitive(operator.mul, [float,float], float)
pset.addPrimitive(protectedDiv, [float,float], float)

def if_then_else(input, output1, output2):
    if input: return output1
    else: return output2

pset.addPrimitive(, [float, float], bool)
pset.addPrimitive(operator.eq, [float, float], bool)
pset.addPrimitive(if_then_else, [bool, float, float], float)

pset.addEphemeralConstant(f"rand{random.random()}", lambda: random.random() * 100, float)
pset.addTerminal(False, bool)

creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMin)

To evaluate the programs that are generated by the algorithm I define a function that calculates the profit. Actually it is the negative profit as the program tries to minimize the evaluation function:

toolbox = base.Toolbox()
toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)

def back_profit(bsp, win_lose):
    return 0.98*(bsp-1) if win_lose else -1

def neg_profit(individual):
    # Transform the tree expression in a callable function
    f = toolbox.compile(expr=individual)
    # Calculate the profit
    profit = sum([back_profit(row["bsp"], row["win_lose"]) for index, row in df_2019.iterrows() if f(row["bsp"], row["delay"], row["prior"] )])
    return -profit,

toolbox.register("evaluate", neg_profit)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)

As a last step I fix the random set to have reproducibility and just run the genetic programming algorithm. I use a small population with only 10 samples.

pop = toolbox.population(n=10)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind:
stats.register("avg", numpy.mean)
stats.register("std", numpy.std)
stats.register("min", numpy.min)
stats.register("max", numpy.max)

algorithms.eaSimple(pop, toolbox, 0.5, 0.2, 10, stats, halloffame=hof, verbose=True)

On my Laptop it took a couple of minutes to run the genetic programming code with an initial population of 10 individuals and evolving it for 10 generations. During processing I received the following logs. The first column is number of generations, nevals is the number of evaluations for the generation and then it also shows some statistics (avg, std, min and max) for the evaluation function.

gen nevals avg std min max
0 10 2970.25 2511.2 -253.268 5576.59
1 7 1032.23 2196.5 -253.268 5576.59
2 6 381.618 1217.83 -197.905 4030.88
3 6 504.593 1586.43 -197.905 5260.64
4 5 -20.6305 59.1444 -197.905 0
5 6 1039.51 2182.3 -197.905 5538.71
6 8 1520.45 2418.34 -197.905 5260.64
7 5 543.07 1578.47 -197.905 5260.64
8 7 -7.24258 77.755 -197.905 146.239
9 9 1055.86 2167.3 -197.905 5514.19
10 5 -21.8665 59.0049 -197.905 0

Remember that our evaluation function is the negative profit for back betting with stake 1. Looking at the column with the minimum value the smallest number is around -253 which means that the best individual would have achieved a profit of around 253 points on the training data. When inspecting the most promising betting rule I get the following output:

'lt(IN0, IN1)'

When looking at how the input that is passed to the function this means the following: Back all selections in UK / IRE horse racing where the starting price (BSP) is lower than the number of days since the last race for the selection. To be fair this seems pretty random and maybe doesn't make a lot of sense on a first glance. Anyway, my goal is just to illustrate how genetic programming can be used to derive betting rules rather than deriving a great strategy. So I don't want to spend too much time on anlysing and interpreting this specific rule.

There are also plenty of options to vary this approach. I was only evolving the betting rules for 10 evolutions and using 10 individuals as a start. By increasing these values you generate additional betting rules that are potentially more profitable. You can also allow additional operations to be considered in the betting rules or restrict them. A major role plays the data that is used to evaluate the betting rules. You could think of adding additional columns with information on the horses.

However, I still want to check the potential of this rule for the data of the year 2020. Remember that the horse racing data from 2019 was used to train the model. In 2019 the profit would have been around 253 units profit. Now lets check for the 2020 data:

df_2020["back_profit"] = df_2020.apply(lambda x: back_profit(x["bsp"], x["win_lose"]), axis=1)
df_2020[df_2020.bsp < df_2020.delay].back_profit.sum()

It is quite impressive that for 2020 the profit is also positive and it came out at around 646 units profit. A full backtest on the 2019/2020 period looks like the following:

What are your thoughts on genetic programming? Are you using any evolutionary algorithms for sports betting? Please let us know in the comments or contact us directly if you have any feedback.

Do you like our content? Please share with your friends!

Share on Facebook Share on Twitter


No comments published yet.

Please log in to leave a comment.

Similar Strategies
See all Strategies!
Any Questions or Suggestions?

If you would like to learn more about this strategy, please do not hesitate to contact us.

Contact Us!