Simple Evolutionary Genetic Optimisation code in Python

Step through of a simple minimilist framework for evolutionary genetic optimisation in Python, with examples.

Full uninterrupted code available in this gist

For an introduction to genetic optimisation see my previous post

Summary

Evolutionary Genetic Optimisation is an algorithm that creates and breeds generations of potential solutions, with the idea that each generation becomes on average better than the last. It borrows ideas from the biological process of evolution, where the more ‘fit’ solutions in a generation are more likely to be selected as parents of a child solution that is a mix of the parents.

Genes and Genomes

First, we need a way of representing solutions and sets of solutions. We’ll borrow some terms from biology:

  • Genome: How solutions are represented. We’ll use a list.
  • Gene: The smallest parts that make up a solution. These are the possible elements that can appear in the list
  • Individual: Used interchangeably with ‘genome’ and ‘solution’
  • Population: Set of individuals. We’ll store these in a list (so it’s a list of lists)
  • Fitness Function: Function taking an individual and returns a numeric value representing how ‘good’ the solution is

Example Optimisation Problem

  • Genome: List of 10 numbers
  • Genes: Digits 0 – 9
  • Fitness Function: Sum numbers in the list

We’ll cover the other parameters and build the functions that will eventually be used by the ‘optimise’ function

genes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
genome_length = 10
fitness_func = lambda x: sum(x)

Initialisation

We create an initial population where genes are randomised and create a function that can sort them based on their fitnesses. Note the initialisation lines of code will be placed into optimise function.

def population_fitnesses(population, fitness_func):
  # Input population list, returns population (sorted by fitness) and fitness lists
  fitnesses = [fitness_func(i) for i in population]
  fitnesses, population = zip(*sorted(zip(fitnesses, population), reverse=True))
  return list(population), list(fitnesses)

# Initialisation and order by fitnesses
population = [random.choices(genes, k=genome_length) for _ in range(population_size)]
population, fitnesses = population_fitnesses(population, fitness_func)

Create Child

We’ll define a function to handle the selection, crossover and mutation processes that go into creating new children for the next generation

def create_child(population, genes, tournament_size, mutation_rate):
  # Breed two parents, returning (possibly mutated) offspring
  parent1 = select(population, tournament_size)
  parent2 = select(population, tournament_size)
  child = crossover(parent1, parent2)
  child = mutate(child, mutation_rate, genes)
  return child

Select

We’ll use tournament selection to pick which genomes in a population should be parents. Tournament selection randomly chooses a subset (size tournament_size) of the population then picks the most fit individual to return

def select(population, tournament_size):
  # Choose an individual in population through winner of a tournament
  assert tournament_size <= len(population)
  tournament_winner_index = min(random.sample(range(len(population)), tournament_size))
  parent = population[tournament_winner_index]
  return parent.copy()

Crossover

We’ll use single point crossover to pick an index in the genome to slice the second parent onto the first.

def crossover(parent1,parent2):
  # Returns results of single point crossover of the parents
  crossover_point = random.choice(range(len(parent1)))
  child = parent1.copy()
  child[crossover_point:] = parent2[crossover_point:]
  return child

Mutate

Mutation randomly changes genes in a child’s genome to another possibility. The idea is to introduce novel variations into the population.

def mutate(child, mutation_rate, genes):
  # Choose to replace a gene with a random different one
  for gene in range(len(child)):
    if random.random() < mutation_rate:
      child[gene] = random.choice(genes)
  return child

Optimise

Now we can combine our functions into an iterative process to create and evolve generations of solutions, until we hit an iteration or max fitness limit. Note how we don’t replace the best individuals in the population (elitism) and we allow optional stopping if a max fitness has been reached.

def optimise(iterations, genes, genome_length, fitness_func
             , population_size, elites_size, tournament_size, mutation_rate, max_fitness=None):
  # Initialisation and order by fitnesses
  population = [random.choices(genes, k=genome_length) for _ in range(population_size)]
  population, fitnesses = population_fitnesses(population, fitness_func)

  # Iterate through generations of populations
  for generation in range(iterations):
    new_population = [p.copy() for p in population]

    # Keep the best elites, replace rest of population with new children
    for p in range(elites_size, population_size):
      new_population[p] = create_child(population, genes, tournament_size, mutation_rate)

    # Compute fitnesses and sort population by them
    population, fitnesses = population_fitnesses(new_population, fitness_func)
    print(f"Gen: {generation+1}, Best: {population[0]} with Fitness: {fitnesses[0]}")

    # Check if we've passed the max fitness stopping condition
    if max_fitness is not None and fitnesses[0] >= max_fitness:
      break

  return population[0], fitnesses[0]

Example: List of numbers

We now just need to change the genes, genome_length and fitness_func to use the framework for different problems:

if __name__ == "__main__":
  iterations = 8
  population_size = 200
  elites_size = 5
  tournament_size = 10
  mutation_rate = 0.1

  genes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
  genome_length = 10
  fitness_func = lambda x: sum(x)

  best, best_fitness = optimise(iterations, genes, genome_length, fitness_func
                                , population_size, elites_size, tournament_size, mutation_rate)
  print(f"Best: {best} with Fitness: {best_fitness}")
Gen: 1, Best: [7, 5, 7, 9, 9, 9, 7, 8, 8, 8] with Fitness: 77
Gen: 2, Best: [9, 9, 9, 6, 9, 6, 9, 7, 8, 7] with Fitness: 79
Gen: 3, Best: [9, 9, 9, 9, 9, 9, 7, 8, 8, 8] with Fitness: 85
Gen: 4, Best: [9, 9, 9, 9, 9, 9, 9, 9, 9, 7] with Fitness: 88
Gen: 5, Best: [9, 9, 9, 9, 9, 9, 9, 9, 9, 8] with Fitness: 89
Gen: 6, Best: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9] with Fitness: 90
Gen: 7, Best: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9] with Fitness: 90
Gen: 8, Best: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9] with Fitness: 90
Best: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9] with Fitness: 90

After 6 generations the solution converges to 90. If we knew the best solution (or best we’d be satisfied with) was 90 in advance (but not know what the solution looked like) we could send in 90 as the extra optimise parameter.

Example: Short Password Cracker

Let’s try a more interesting example. We’ll pretend we’re cracking a password and use unicode distance between the fixed password string and our attempt. As we want the distance to be 0, we’ll define the fitness_func to be the negative distance so that we’re still optimising towards the maximum.

genes = "abcdefghijklmnopqrstuvwxyz123456789"
  genome_length = 7
  fitness_func = lambda x: sum([-abs(ord("pass123"[i]) - ord(''.join(x)[i])) for i in range(len(x))])
Gen: 1, Best: ['p', 'g', 'r', 'y', '8', '2', '1'] with Fitness: -22
Gen: 2, Best: ['p', 'a', 'o', 'v', '6', '1', '5'] with Fitness: -15
Gen: 3, Best: ['p', 'a', 'o', 's', '1', '8', '3'] with Fitness: -10
Gen: 4, Best: ['p', 'a', 'o', 's', '1', '2', '1'] with Fitness: -6
Gen: 5, Best: ['o', 'a', 's', 't', '2', '1', '3'] with Fitness: -4
Gen: 6, Best: ['p', 'a', 'r', 's', '1', '1', '3'] with Fitness: -2
Gen: 7, Best: ['p', 'a', 's', 's', '1', '2', '1'] with Fitness: -2
Gen: 8, Best: ['p', 'a', 's', 's', '1', '2', '3'] with Fitness: 0
Best: ['p', 'a', 's', 's', '1', '2', '3'] with Fitness: 0

We luckily get there at the 8th iteration.

Example: Long Password Cracker

In this example we know what the best fitness should be (0) so we can set a high iterations number (like 1000) and send in max_fitness (0) to the optimise function so it stops when it finds it.

if __name__ == "__main__":
  iterations = 1000
  population_size = 200
  elites_size = 5
  tournament_size = 10
  mutation_rate = 0.1

  genes = "abcdefghijklmnopqrstuvwxyz123456789"
  genome_length = 7
  fitness_func = lambda x: sum([-abs(ord("pleasedontcrackmepass123"[i]) - ord(''.join(x)[i])) for i in range(len(x))])

  best, best_fitness = optimise(iterations, genes, genome_length, fitness_func
                                , population_size, elites_size, tournament_size, mutation_rate, 0)
  print(f"Best: {best} with Fitness: {best_fitness}")

The algorithm finds it hard to begin with…

Gen: 1, Best: ['x', 'g', 'g', '1', 'k', 'b', '5', 'm', 's', 'o', 'c', 'g', 'b', 'n', 'p', 'j', 'f', 'q', 'e', 'y', 'o', '2', '1', 'l'] with Fitness: -239
Gen: 2, Best: ['g', 'k', 'b', 'p', 'q', 'a', 'c', 'm', 'i', 'p', 'c', 'g', 'b', 'n', 'p', 'j', 'f', 'q', 'e', 'l', 'n', '9', 'g', '8'] with Fitness: -161
Gen: 3, Best: ['n', 'o', 'g', 'e', 'z', 'x', 'g', 'w', 'r', 'j', 'f', 'u', 'b', 'n', 'p', 'j', 'f', 'w', 'j', 'q', 'u', '4', '7', '2'] with Fitness: -118

… but gets there on the 108th generation and stops there.

Gen: 107, Best: ['p', 'l', 'e', 'a', 's', 'e', 'd', 'o', 'n', 't', 'c', 'r', 'a', 'c', 'k', 'm', 'e', 'p', 'a', 's', 's', '1', '3', '3'] with Fitness: -1
Gen: 108, Best: ['p', 'l', 'e', 'a', 's', 'e', 'd', 'o', 'n', 't', 'c', 'r', 'a', 'c', 'k', 'm', 'e', 'p', 'a', 's', 's', '1', '2', '3'] with Fitness: 0
Best: ['p', 'l', 'e', 'a', 's', 'e', 'd', 'o', 'n', 't', 'c', 'r', 'a', 'c', 'k', 'm', 'e', 'p', 'a', 's', 's', '1', '2', '3'] with Fitness: 0

Conclusion

Evolutionary genetic optimisation offers a nice approach to converge at a solution through survival of the fittest. I hope this code helps to understand and demonstrate how this can be done in a minimalist and clean framework.

Appendix: Code in Full

Full uninterrupted code available in this gist or below:

import random


def select(population, tournament_size):
  # Choose an individual in population through winner of a tournament
  assert tournament_size <= len(population)
  tournament_winner_index = min(random.sample(range(len(population)), tournament_size))
  parent = population[tournament_winner_index]
  return parent.copy()


def create_child(population, genes, tournament_size, mutation_rate):
  # Breed two parents, returning (possibly mutated) offspring
  parent1 = select(population, tournament_size)
  parent2 = select(population, tournament_size)
  child = crossover(parent1, parent2)
  child = mutate(child, mutation_rate, genes)
  return child


def crossover(parent1,parent2):
  # Returns results of single point crossover of the parents
  crossover_point = random.choice(range(len(parent1)))
  child = parent1.copy()
  child[crossover_point:] = parent2[crossover_point:]
  return child


def mutate(child, mutation_rate, genes):
  # Choose to replace a gene with a random different one
  for gene in range(len(child)):
    if random.random() < mutation_rate:
      child[gene] = random.choice(genes)
  return child


def population_fitnesses(population, fitness_func):
  # Takes list of individuals and returns population (sorted by fitness) and fitness lists
  fitnesses = [fitness_func(i) for i in population]
  fitnesses, population = zip(*sorted(zip(fitnesses, population), reverse=True))
  return list(population), list(fitnesses)


def optimise(iterations, genes, genome_length, fitness_func
             , population_size, elites_size, tournament_size, mutation_rate, max_fitness=None):
  # Initialisation and order by fitnesses
  population = [random.choices(genes, k=genome_length) for _ in range(population_size)]
  population, fitnesses = population_fitnesses(population, fitness_func)

  # Iterate through generations of populations
  for generation in range(iterations):
    new_population = [p.copy() for p in population]

    # Keep the best elites, replace rest of population with new children
    for p in range(elites_size, population_size):
      new_population[p] = create_child(population, genes, tournament_size, mutation_rate)

    # Compute fitnesses and sort population by them
    population, fitnesses = population_fitnesses(new_population, fitness_func)
    print(f"Gen: {generation+1}, Best: {population[0]} with Fitness: {fitnesses[0]}")

    # Check if we've passed the max fitness stopping condition
    if max_fitness is not None and fitnesses[0] >= max_fitness:
      break

  return population[0], fitnesses[0]


if __name__ == "__main__":
  iterations = 20
  population_size = 200
  elites_size = 5
  tournament_size = 10
  mutation_rate = 0.1

  genes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
  genome_length = 10
  fitness_func = lambda x: sum(x)

  best, best_fitness = optimise(iterations, genes, genome_length, fitness_func
                                , population_size, elites_size, tournament_size, mutation_rate)
  print(f"Best: {best} with Fitness: {best_fitness}")

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s