Differential evolution

The following is a re-upload of a post in my previous blog (which was destroyed by mistake).


Came across an interesting idea during ICAPS 2016. I first learnt about the term "differential evolution" from Q&A sessions of 2~3 talks. Turns out it's a genetic algorithm that works on optimising multi-dimensional real-valued functions that are not readily differentiable1.

To my pleasant surprise, there was actually a talk on it. These guys2 adapted differential evolution to a finitely generated algebraic group setting to optimise a permutation problem (i.e. Find the best permutation which optimises the function $f($permutation$)$)3.

In this post, I will cover the following:

Differential evolution

If you know genetic algorithms, differential evolution is not much different. It has a few main steps (that are amenable to tons of variations). For explanation's sake, let us use the function $f(x,y) = -((x+5)^2 + 3 * \sqrt{|y+2| + 7})$ as an example.

  • Encode parameters to be optimised into an agent/gene.
    E.g. Values of $x$ and $y$. An agent could then be something as simple as a tuple such as [x = 3, y = 2].

  • At all times, maintain a population of $N$ agents.
    E.g. We always maintain 100 possible value assignments to $x$ and $y$.

  • At each iteration/generation, perform some genetic operations to modify the population.
    The goal is to emulation genetic modification and "survival of the fittest" to increase the "overall fitness level" of the population. The hope is that the "best fitness" also increases as a side effect.

  • Terminate the iteration loop when certain terminating condition(s) is/are satisfied. One common terminating condition is to fix a maximum number of iterations, so that the program doesn't run forever.

  • Use the "fittest" agent as the solution to the optimisation problem.

The fitness of an agent is the value of $f($parameters encoded by agent$)$. Depending on whether it is a minimisation/maximisation problem setting, the lower/higher the value, the fitter the agent.

Genetic operations

The 2 common ones are crossover and mutation.

Crossover

Crossover is the act of combining 2 or more agents into a hybrid one. This is usually done by splicing up the older agents and combining different parts.

For example, given [x = 3, y = 2] and [x = 10, y = -5], we can produce 2 possible hybrid agents [x = 3, y = -5] and [x = 10, y = 2]. Then, there are many ways to decide which 2 of these 4 agents joins the next generation of population - one can take the "fittest" between the two, or do some kind of weighted random choice.

Mutation

Mutation is the act of modifying one or more parts of a single agent, while keeping the other parts intact.

For example, given [x = 3, y = 2] , we can produce a mutated agent [x = 5, y = 2] if we choose to modify the $x$ value. In higher dimensional settings, one may choose to modify more than 1 parameter in a single mutation step as well.

Differential evolution (Wikipedia's version)

In differential evolution, for every agent $x$,

  • 3 other agents $a$, $b$, and $c$ are picked
  • A hybrid agent $z$ is then formed via $z = a + F * (b-c)$. $F$ is a parameter.
  • A fixed mutation point $r$ is chosen (i.e. At this point, the new agent will always be mutated)
  • New agent $y$ = A mixture of values from $x$ and $z$ depending on parameter $CR$, and $r$ (from previous step)
  • Put the fitter between $x$ (the original agent) or $y$ (the newly formed agent) into the next generation of population

In this case, crossover and mutation are kind of performed together. Crossover occurs in the formation of $z$ and mutation is forced to occur at position $r$. The result of these operations yield $y$. The selection process is then a greedy "survival of the fitter" between $x$ and $y$.

Genetic algorithm parameters

These parameters are used to tune the genetic algorithm. They greatly differ and are usually based on choice of genetic operators employed. Common examples include:

  • Number of agents maintained in a population
  • Number of maximum iterations
  • Number of slicing points during crossover
  • Probability of performing mutation
  • Number of points of mutation
  • Etc.

Differential evolution (Wikipedia's version)

  • $N$: Size of the population
    In general, the larger the population, the more diverse it turns to be but each iteration will take longer. Due to the genetic operation procedure, we need $N \geq 4$ in order to be able to pick 4 unique agents at each step.

  • $F$: Differential weight
    This parameter controls how the "crossover" step ($z = a + F * (b-c)$) is done. In my code, I named this $DW$ to avoid confusion with the input function $f$.

  • $CR$: Crossover probability
    This parameter controls how many values will be from the newly generated agent $z$. The other values will come from the previous agent $x$.

Adaptation to finitely generated algebraic groups

In their own words:

The main contribution is the algebraic design of the differential mutation operator which allows to extend the "contour matching" property of numerical DE to every combinatorial search space representable by a finitely generated group.

According to them, as far as they are aware, they are the first to do so. The experimental results were quite encouraging as well - it beat several state-of-the-art results on a widely accepted benchmark suite.

They modified differential evolution:

  • Old agent $x$ is labelled as $\pi$.
  • No "pick a random index to force mutation"
  • Adaptive $F$ value, which could change across different iterations. This idea is based on this paper, where $F$ changes based on provided $f_l$, $f_u$ and $\tau_1$ parameter values.
  • Modify $a + F * (b-c)$ to work on finitely generated groups to yield a new agent $v$
  • Two-point splicing crossover between $v$ and $\pi$ to produce 2 agents $v'$ and $v''$, then pick the fitter of the 2.
  • Select between old agent $\pi$ and $v'$/$v''$ to enter the next generation of population

Details are in their pseudocode in Figures 1 and 2, replicated below:

Most of the modifications are pretty straightforward. I think the most interesting part is: Modify $a + F * (b-c)$ to work on finitely generated groups to yield a new agent $v$.

Consider the group of length 4 binary sequences (0000, 0001, 0010, 0011, ... , 1110, 1111) with the operation "bit flip". We can represent this as a Cayley graph as shown:

Now, $a + F * (b-c)$ works as follows:

  • Pick 2 agents $b$ and $c$, say 1100 and 1011
  • Find the shortest sequence of operations (bit flips) that transforms one to the other. This may not be unique.
    In the example below, we chose $z = b - c = \langle f_3, f_1, f_2 \rangle$, of length 3.
  • Suppose $F = \frac{1}{2}$, then $k = \lceil F * |z| \rceil = \lceil \frac{3}{2} \rceil = 2$. So, $F * (b-c) = F * z = \langle f_3, f_1 \rangle$.
    Geometrically, this coincides to the rounded-up truncation of the operation sequence.
  • Pick another agent $a$, say 0000
  • Perform the operation $F*z$ on it to get the new mutated agent.
    In the example, we get: 0101

Going further

I have written a Python3 implementation based on Wikipedia's pseudocode here. Feel free to play around with it with different functions $f$. The agent representation is just a tuple containing the parameter values. You may also wish to implement the finite group variant after reading the paper2.

References and Footnotes

  1. If a function is differentiable, one would employ gradient descent techniques to optimise it.

  2. Paper: Algebraic Differential Evolution Algorithm for the Permutation Flowshop Scheduling Problem with Total Flowtime Criterion

  3. The actual problem was to find the best sequence of jobs to execute, which reduces to "the best permutation of job IDs".