1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296
/*!
# Genetic Algorithm Scheduler
This is a multi-dimensional genetic algorithm based optimizer written in rust with multi-threading.
## Distinguishing Features
- supports highly dimensional fitness scores with multiple mechanisms to order candidates
- does not require a stable ordering between candidates
- supports both constraint violations and fitness scores
- takes advantage of const generics
This optimizer was designed for a scheduling system, and will work best for problems that have similar characteristics. You don't have to use it for shift scheduling, it should work well for anything that meets the following characteristics:
- there are a large number of fitness functions rather than a single number or even a few numbers to be optimized. In our case, how well the schedule fits each preference for each employee is treated as an fitness function. Given that the employee can specify a suitability for each shift, this means that the number of fitness scores is larger than the length of the chromosone.
- variables are discrete rather than continuous. In our case, there are a small fixed number of employees that can be booked per shift.
- the "chromosone" has a fixed length.
- the position in the chromosone (aka the locus) is significant. (In some other problem domains, the locus is less significant and the ordering of the genes is more significant). This limitation can be overcome by writing significantly different [FitnessFunction]'s.
- fitness functions are relatively cheap to compute. Parallelism is at the population rather than the game level.
Some of these characteristics are more hard coded than others. For example, you can adjust the [FitnessFunction]'s, [Crossover] and [Mutation] functions to make genome subsequences more significant than locus position.
## Terminology, Structure and Algorithm.
Every potential solution is called a [Candidate]. Each [Candidate] has a chromosone which defines the solution. The chromosone is an array of unsigned integers called genes. Each gene occupies a position in the array called the locus. For ease of computation, genes and locii are not enums, but adjacent unsized integers. Mapping these integers to values is outside the scope of this library. See the examples for details.
At the time of creation each [Candidate] evaluates all of its [FitnessFunction]'s to determine the [Candidate]'s suitability. Alongside the [FitnessFunction]'s, [Constraint]'s are also evaluated and number of [Constraint] violations stored.
Because each [Candidate] has many [FitnessFunction] values, [Candidate]'s cannot be trivially and stably ordered. Instead, two candidates are compared in a [Game], and repeated [Game]'s are run across a population in a [Tournament] to order the [Candidate]'s by [FitnessFunction]'s and [Constraint]'s.
All current [Game] implementations order by [Constraint] violations before [FitnessFunction] scores. In other words, only if two candidates have the same number of constraint violations are the fitness scores considered. Most [Game]'s and [Tournament]'s are not stable, and have a strong stochastic component.
A set of [Candidate]'s is called a population.
Each cycle through the algorithm is called a [Gas::generation]. A generation creates a new population from an existing population. Each generation consists of the following steps:
1. Run a [Tournament] to order the candidates.
2. Loop for each new child:
1. Select two parents. Parent selection is biased by [Tournament] score and prefers selecting dissimilar parents.
2. Choose a [Crossover] algorithm to run on the two parents to create a child.
3. Choose a [Mutation] algorithm to run on the child
Typically the GA is tuned so that Null [Crossover] and [Mutation] algorithms are
often chosen. The Null algorithms simply clone a parent rather than performing a
crossover or mutation.
Selection of the [Crossover] and [Mutation] algorithms are not done stochastically. Weights are configured for each algorithm, and the algorithms are cycled respecting the weights. Therefore if the sum of the weights in [Gas::crossovers] have common factors with the sum of the weights in [Gas::mutations], some combinations of crossover and mutation may never be chosen. In some cases you may wish to deliberately trigger this effect, but in most cases you should probably ensure that the both sum of weights do not have a common factor.
Looping the [Gas::generation] until stagnation is reached is called a [Gas::cycle]. There are three stages to each cycle: seeding, running and finalizing. See the docs for [Gas::cycle] for more details
Multiple [Gas::cycle]'s are run in parallel in a [pool]. The pool may be terminated
early by setting the [CycleProgress::sigint] flag. When a [Gas::cycle] stops TBD.
# Setup and Configuration
There is a simple full example in examples/schedule and a minimal configuration in the `#[cfg(test)] Gas::dut()`.
## Example
Let's play Mastermind! We've got 6 colors and 4 positions, so the chromosone type parameters are `<N=4, NSYMS=6>`.
If we were really playing the game, our two fitness functions would be the number of white pegs and number of black pegs. However, we'll use this example to show why it is advantageous to have a lot of fitness functions.
So instead of having the number of black pegs as the fitness function, we'll return 4 separate fitness functions for each possible black peg, and 6 separate fitness functions for each possible white peg.
Here's our fitness function for the black pegs. We'll use the delta between the desired and actual as the score. For delta, 0 is the best score, but GAS optimizes for highest score. So for the score here we return the negated delta -- zero is the highest negative number.
```
#
# use gas::chromosone::Gene;
#
# use gas::fitness::{FitnessFunction, FitnessName};
#
pub struct Black<const N: usize, const NSYMS: usize> {
pub answer: [Gene; N]
}
impl<const N: usize, const NSYMS: usize> FitnessFunction<N, NSYMS> for Black<N, NSYMS> {
fn nscores(&self) -> usize { N }
fn run(&self, chromosone: &[Gene; N]) -> Vec<f64> {
std::iter::zip(chromosone, self.answer).map(|(guess, desired)|
-(*guess as f64 - desired as f64).abs()
).collect()
}
}
assert_eq!(Black::<4,6>{answer:[4,3,2,1]}.run(&[4,0,2,0]), vec![0.0, -3.0, 0.0, -1.0]);
```
For the white scores, we could similarly create a [FitnessFunction] that returns 6 scores, counting each gene and returning how close each count is to the count in the answer. But there is already a FitnessFunction that does this in the library: [fitness::ColorCount].
Once you have the chromosone and the [`FitnessFunction`] defined, you can configure the optimizer by creating a [Gas] object.
```
#
# use gas::chromosone::Gene;
#
# use gas::fitness::{FitnessFunction, FitnessName};
#
# pub struct Black<const N: usize, const NSYMS: usize> {
# pub answer: [Gene; N]
# }
#
# impl<const N: usize, const NSYMS: usize> FitnessFunction<N, NSYMS> for Black<N, NSYMS> {
# fn nscores(&self) -> usize { N }
#
# fn run(&self, chromosone: &[Gene; N]) -> Vec<f64> {
# std::iter::zip(chromosone, self.answer).map(|(guess, desired)|
# -(*guess as f64 - desired as f64).abs()
# ).collect()
# }
# }
#
# use gas::Gas;
# use gas::fitness::{FitnessConfig, self};
# use gas::constraints::{ConstraintConfig, self};
# use gas::game;
# use gas::mutation::{MutationConfig, self};
# use gas::tournaments;
# use gas::crossover::{self, CrossoverConfig};
let gas = Gas {
fitness: FitnessConfig::new(vec![
Box::new(Black::<4, 6>{answer: [4,3,2,1]}),
Box::new(fitness::ColorCount::<4,6>::new(1, vec![0], vec![vec![0],vec![1],vec![1],vec![1],vec![1],vec![0]], &[""], 1.0)),
]),
constraints: ConstraintConfig::new(vec![]),
cycle_tournament: Box::new(tournaments::SingleElimination::new(game::Full::new())),
final_tournament: Box::new(tournaments::FullSeason::new(game::Full::new())),
crossovers: CrossoverConfig::new(vec![(1, Box::new(crossover::Null::new()))]),
mutations: MutationConfig::new(vec![
(1, Box::new(mutation::Null::new())),
(1, Box::new(mutation::Mutate::<4, 6>::new(1))),
(1, Box::new(mutation::Rotate::<4, 6>::new(1))),
]),
taboo_distance: 1,
population_size: 10,
};
```
## The Chromosone
The chromosone is configured through const generics. Virtually every object in the system takes two const generics as template parameters: `N` and `NSYMS`. `N` is the length of the chromosone, and `NSYMS` is the number of genetic symbols there are. The chromosone is defined as `[Gene; N]` where Gene is in the range `0..NSYMS`. In chromosone.rs Gene is defined as a u8 since there are very few genetic algorithm problems where NSYMS is greater than 256. Theoretically the `pub type Gene = u8` could be templated as well, if you need it done, feel free to ask or open a PR.
## The `Gas` object
The algorithm is configured by creating a [Gas] object. This object specifies the fitness functions, constraints, mutation and crossover algorithms, et cetera.
### Fitness Functions
It's quite likely that the available fitness functions will not be suitable for your project. A good set of fitness functions are critical to the success of your optimization. It's easy to spend lots of effort on algorithm tuning, but without a solid base you're unlikely to get good results.
[`FitnessFunction`]'s return a Vec of floats for the scores, taking a chromosone as input.
### Constraints
[`Constraint`]'s are much like fitness functions, except they return vectors of booleans, specifying whether the chromosone is valid or invalid. Any candidates which have more constraint violations than others will lose any competitions. A boolean doesn't provide much guidance to the optimizer, so providing a fitness function that will indicate if a chromosone is close to violating a constraint will be helpful.
### Crossovers and Mutations
The set of [`Crossover`] and [`Mutation`] operators provided in the example are probably a good starting point for your problem. One important consideration is to provide a good number of Null operators for both.
### Games and Tournaments
A [`Game`] replaces the simple fitness score competition in most Genetic Algorithms, so it is an interesting area for experiment. The one used in the example worked best for us. A [`Tournament`] is used to rank candidates.
### Taboo Distance
This ensures genetic diversity
### Stagnation parameters
There are some constants in the [`Gas::cycle`] function that are used to determine when a cycle of the algorithm has stagnated. They perhaps should be parameterized, but for now can
### nthreads and the Pool
Once you have a [`Gas`] configured, you can start it via [`Gas::cycle`], or instead you can set up [`Pool`] to run several populations in parallel. [Gas::cycle] takes a [CycleProgress] as a parameter -- you can monitor the progress of the optimizer by monitoring the [CycleProgress] from a separate thread.
```
#
# use gas::chromosone::Gene;
#
# use gas::fitness::{FitnessFunction, FitnessName};
#
# pub struct Black<const N: usize, const NSYMS: usize> {
# pub answer: [Gene; N]
# }
#
# impl<const N: usize, const NSYMS: usize> FitnessFunction<N, NSYMS> for Black<N, NSYMS> {
# fn nscores(&self) -> usize { N }
#
# fn run(&self, chromosone: &[Gene; N]) -> Vec<f64> {
# std::iter::zip(chromosone, self.answer).map(|(guess, desired)|
# -(*guess as f64 - desired as f64).abs()
# ).collect()
# }
# }
#
# use gas::Gas;
# use gas::fitness::{FitnessConfig, self};
# use gas::constraints::{ConstraintConfig, self};
# use gas::game;
# use gas::mutation::{MutationConfig, self};
# use gas::tournaments;
# use gas::crossover::{self, CrossoverConfig};
#
# let gas = Gas {
# fitness: FitnessConfig::new(vec![
# Box::new(Black::<4, 6>{answer: [4,3,2,1]}),
# Box::new(fitness::ColorCount::<4,6>::new(1, vec![0; 4], vec![vec![0],vec![1],vec![1],vec![1],vec![1],vec![0]], &[""], 1.0)),
# ]),
# constraints: ConstraintConfig::new(vec![]),
# cycle_tournament: Box::new(tournaments::SingleElimination::new(game::Full::new())),
# final_tournament: Box::new(tournaments::FullSeason::new(game::Full::new())),
# crossovers: CrossoverConfig::new(vec![(1, Box::new(crossover::Null::new()))]),
# mutations: MutationConfig::new(vec![
# (1, Box::new(mutation::Null::new())),
# (1, Box::new(mutation::Mutate::<4, 6>::new(1))),
# (1, Box::new(mutation::Rotate::<4, 6>::new(1))),
# ]),
# taboo_distance: 1,
# population_size: 10,
# };
# use std::sync::atomic::AtomicBool;
# use std::sync::Arc;
# use gas::gas::cycle::CycleProgress;
let sigint = Arc::new(AtomicBool::new(false));
let mut progress = CycleProgress::<4, 6>::new(&gas, &sigint);
let solution = gas.cycle(&mut progress);
assert_eq!(solution.chromosone, [4,3,2,1]);
```
# References
<https://www.researchgate.net/profile/Marek-Gutowski/publication/216486175_Biology_Physics_Small_Worlds_and_Genetic_Algorithms/links/09e415003ec1521171000000/Biology-Physics-Small-Worlds-and-Genetic-Algorithms.pdf?origin=publication_detail>
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7259647/>
# Future
## unwrap
This library makes extensive use of unwrap. This can be justified by the fact that the data given to it is programmatic and the configuration set at compile time, so theoretically pretty much any error is programmatic rather than usage. But really it's just because unwrap style programming is easier. I do plan on eventually putting in proper error handling, but there are always higher priorities. I deliberately used unwrap to make searching and replacing them easier. Although technically there are a lot of vector and array accesses using `[]` that would be safer using `.get()`.
## further development
This library was designed to be used in a multi-population network. Some effort was made to implement a primitive network topology between the populations, but this did not improve the results and was abandoned. This should be explored in more depth. See <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7259647/> for a discussion of potential network topologies.
*/
pub mod candidate;
pub mod chromosone;
pub mod constraints;
pub mod crossover;
pub mod fitness;
pub mod game;
pub mod gas;
pub mod helpers;
pub mod mutation;
pub mod pool;
pub mod rando;
pub mod tournaments;
pub use crate::gas::Gas;
#[cfg(doc)]
use crate::gas::cycle::CycleProgress;
#[cfg(doc)]
use candidate::Candidate;
#[cfg(doc)]
use constraints::Constraint;
#[cfg(doc)]
use crossover::Crossover;
#[cfg(doc)]
use fitness::FitnessFunction;
#[cfg(doc)]
use game::Game;
#[cfg(doc)]
use mutation::Mutation;
#[cfg(doc)]
use pool::Pool;
#[cfg(doc)]
use tournaments::Tournament;