Mathematically, optimisation can be thought of as finding the minimum value of some function, \(f(x)\), subject to some conditions, \(c_{i}(x)\), i.e $$ \text{minimise} \enspace f(x)$$ $$ \text{subject to} \enspace c_{i}(x) < b_{i} $$ If we want to maximise a function we can always minimise it's inverse \(-f(x)\). For example we may want to find the minimum of \(f(x)=(x-1)^2\) given the condition \(c_{1}(x) = x \leq 0\). In this case the minimum of \(f\) is \(0\) at \(x=1\), but since we have the condition that \(x\) must be negative or \(0\) the minimum

A genetic optimisation algorithm operates inspired by the process of evolution, we have a "fitness score" which is the function to be minimised or maximised, and so we can imagine maintaining a "population" of solutions \(\mathcal{X}_{t} = \{x_{1},x_{2},\ldots,x_{N}\}\) and over time we can select, breed, and mutate the solutions in the population. Then, we hope, just like evolution the population will in time converge to only highly scoring solutions.

Each iteration, \(t\), starts with a population of proposal solutions \(\mathcal{X}_{t}=\{x_{1},x_{2},\ldots,x_{N}\}\). When \(t=0\) (the first iteration) then we set the population to be some randomly chosen values, say \(\mathcal{X}_{0}=\{10,-5,2,-56\}\)

Repeating this process we will eventually converge to the highest scoring solution \(0\).

A note: the separation of

A better choice, certainly a more general one, is to imagine a probability \(p(\boldsymbol{x},t)\) of the selection or rejection of \(\boldsymbol{x}\) at time \(t\). If we say \(p(\boldsymbol{x},t)\) is the probability of removing \(\boldsymbol{x}\) from the population, it makes sense to ask that this probability is large when the fittness is small and small when the fitness is high. One popular choice is to use a Boltzmann distribution. The Boltzmann distribution is defined as $$ p(x) = \frac{e^{-\beta x}}{\sum_{x}e^{-\beta x}} $$ where the parameter \(\beta\) is the

For selection we want the probability to be high when the score is high, which we can achieve by using a negative temperature, or by defining a \(\beta_{s}\geq 0\) and dropping the \(-\)'s in the exponential terms like so $$ p_{selection}(\boldsymbol{x},t) = \frac{e^{\beta_{s} f(\boldsymbol{x})}}{\sum_{\boldsymbol{y}\in \mathcal{X}_{t}}e^{\beta_{s} f(\boldsymbol{y})}},\enspace \beta_{s} \geq 0 $$ There is also a \(\mathcal{X}_{t}\) now to represent the fact we are taking the sum over the population at time \(t\). This means we are considering how "fit" a given \(\boldsymbol{x}\) is

Breeding is more interesting because we need a

Note this will allow a member of the population to breed with itself! Indeed the most probable breeding pair is the top scoring member with itself, this can be left as is or countered by setting the probability to zero when \(\boldsymbol{x}=\boldsymbol{y}\), to do this the joint probability can now be expressed using the Dirac delta functional $$ p_{breed}(\boldsymbol{x},\boldsymbol{y},t) = \delta(\boldsymbol{x}-\boldsymbol{y})\frac{e^{\beta_{s} f(\boldsymbol{x})}+e^{\beta_{s} f(\boldsymbol{y})}}{\sum_{\boldsymbol{x'}\in \mathcal{X}_{t}}\sum_{\boldsymbol{y'}\in \mathcal{X}_{t}}\biggl(e^{\beta_{s} f(\boldsymbol{x'})}+e^{\beta_{s} f(\boldsymbol{y'})}\biggr)},\enspace \beta_{s} \geq 0 $$