30 Jan. 2012
Genetic Search Algorithms are used to solve problems by computing in ways a human wouldn't consider. It purposefully uses a biological genetics approach, creating a survival of the fittest scenario based on a mostly random starting point.
This article provides background information about the algorithm and several variations, based on our own implementation in javascript for our Zjramble5 word search game. We also show code examples from that implementation.
Our implementation of the Genetic Search Algorithm is highly specific. Though the generic, canonical variation can be used for lots of search problems, we found that some variations worked better for our Zjramble5 game than others.
The Question
Our game gives the player a board filled with 16 letters: 4 letters horizontal by 4 letters vertical. If we limit the words to those between 3 and 9 letters in length (inclusive), there's approximately 720,000 unique combinations.
Somehow we must map a random choice of words to a letter board, in such a way that gives the player the opportunity to achieve as high a score as possible within the available time.
The Search Domain
The words we wished to find come from an extensible vocabulary. In our case, the vocabulary is transmitted as JSON. The game should neither offer nor allow other words than those in the vocabulary (or the chosen category there-in).
Our vocabularies don't contain 720,000 words... the largest one contains close to 1800.
Setting up the Genetic Search
We need:
A pool of genes in Javascript is as simple as declaring an array variable.
Done.
To generate random genes, we must first define what constitutes a gene. Since our game sports a letter board that requires 16 letters, we decided to define our gene as a structure that holds 16 letters in a run. Later on, we add some additional information.
Now those who know the canonical version of the genetic search may wonder why we didn't encode the gene run to bits, rather than letters. The reason for that is complexity: it would have required additional encoding and decoding methods to figure out a gene's letters and determining its survival fitness. Methods that take time. Time is limited in our game. Can't make the player wait minutes for a new game board.
So we created a method to generate gene runs. Its outcome should be an array of 16 letters. What should be its source? When we started, we took the English alfabet. Simple enough. Later on, we specialised that and said, no, we can't take every letter without bias. We should first determine some random selection of words from the vocabulary, to serve as a search domain, and reduce that into a weighed selection of at least 16 letters. The outcome there-of, we pass to a run generator:
(Provide your own error checks and handling: above method holds no handling for when Letters.length < 16.)
Next we wish to determine survival fitness per gene. It tells us, how often this gene should spawn children. Fitness in our game is calculated by the amount of words and the word score, that can be created from the gene run, based on the previously selected words. Sounds easy enough.. but to do that, we need a method that will create and test words like a player would... we need a path walker. Each path within the letter grid needs testing to every single word from the selection. With close to 720,000 paths one can imagine we tried several optimizations.
Then we totalled the scores and word amounts and weighed how well each gene faired compared to the total. That would indicate its fitness. Fitness is translated into an amount of times that this gene will be present in the spawn roulette. The greater its fitness, the more copies go into the roulette, the better chance it has to be selected as a parent for procreation, thus the more likely it will pass on its genetic make-up.
After filling the spawn roulette, we must look ahead: the new spawn is going to end up in the same gene pool. What can we do to prevent infinite growth? We must deplete the pool. Just like in biology, specimens die. In programming, we have several options:
Now we have a roulette filled with weighed genes. We perform a stochastic remainder sampling, which means that we randomly remove 2 genes from the roulette and spawn those into 2 children which we add to the gene pool, and repeat that until either the roulette is empty or the pool reached a desired gene count.
Because the spawn roulette is filled with multiple copies of fit genes, based on each gene's fitness, selecting 2 random genes from the roulette will at some point result in receiving 2 copies of the same gene. We chose to allow for that chance.
Spawning child genes from parents takes 3 forms: a random cross-over, a random flip, and a random mutation. The random cross-over is executed by defining a random point between gene run start and end. Into the first part, we copy an equal part from the father. Into the second part, we copy an equal part from the mother. Spawning 2 genes from the same parents, but switched around (like the code example above shows), creates more chances of fitness.
For the random mutation we introduce 1 new letter into the gene run. This letter was not taken from the parents, but from the weighed letters we determined before. In the canonical version of the algorithm, one would generate a random bit. We've tested how many random letters to add, and found the best results with 1 random letter per child.
The random flip is something we chose to omit, for testing showed it had a bad influence on overall fitness. In principle we'd throw dice for every single letter, and if that dice hits 6, we'd flip its source from father to mother or vice versa. We've tested several variations and flip ratios before deciding to omit it completely. For your own search domains and fitness determination, it may work quite well, so do implement and test it.
Obviously this breaks whenever a gene holds less than 16 letters, and break it should, for that scenario should never happen.
(If you wonder why we pass the length of the Letters array next to the actual Letters array: it's a speed optimisation choice. The same goes for prepopulating the child array rather than using child.push().)
And all of the above is tied in together using a generator loop. The real-world problem with such a loop turned out to be the repeat interval: how do we choose an interval length in such a way that each interval gives ample time for one complete pool refill? We've tried several methods, from simple trial and error to CPU profiling.
It showed us, that the devices to where our game gets distributed, use wildly differing CPUs and javascript engines. We had to add checks inside the generator, to skip a repeat if the generator was still running.
In the end we abandoned the interval guestimations, and replaced the loop by a series of repeated time-out calls, meaning the generator would call itself when finished, eliminating the need to check whether or not the generator was running already.
There you have it: a Genetic Search Algorithm implemented in javascript, running a word game that lets players find words from a pre-selected vocabulary, using a 4 by 4 letter grid.
Zjramble5, a word game like Boggle and Scramble.
The Question
Our game gives the player a board filled with 16 letters: 4 letters horizontal by 4 letters vertical. If we limit the words to those between 3 and 9 letters in length (inclusive), there's approximately 720,000 unique combinations.
Somehow we must map a random choice of words to a letter board, in such a way that gives the player the opportunity to achieve as high a score as possible within the available time.
The Search Domain
The words we wished to find come from an extensible vocabulary. In our case, the vocabulary is transmitted as JSON. The game should neither offer nor allow other words than those in the vocabulary (or the chosen category there-in).
Our vocabularies don't contain 720,000 words... the largest one contains close to 1800.
Setting up the Genetic Search
We need:
- 1 pool of genes;
- 1 method to generate random genes;
- 1 method to determine gene score and thereby, survival fitness;
- 1 method to add fit genes to a weighed spawn roulette;
- 1 method to remove unfit genes from the gene pool;
- 1 method to take genes from the roulette, and add their spawn into the pool;
- 1 method to spawn child genes from fit parent genes;
- 1 loop to repeat all of the above until satisfied.
A pool of genes in Javascript is as simple as declaring an array variable.
var pool = [];
Done.
To generate random genes, we must first define what constitutes a gene. Since our game sports a letter board that requires 16 letters, we decided to define our gene as a structure that holds 16 letters in a run. Later on, we add some additional information.
pool[ i ] = ({run: "abcdefghijklmnop", score: 0, amount: 0, fitness: 0});
Now those who know the canonical version of the genetic search may wonder why we didn't encode the gene run to bits, rather than letters. The reason for that is complexity: it would have required additional encoding and decoding methods to figure out a gene's letters and determining its survival fitness. Methods that take time. Time is limited in our game. Can't make the player wait minutes for a new game board.
So we created a method to generate gene runs. Its outcome should be an array of 16 letters. What should be its source? When we started, we took the English alfabet. Simple enough. Later on, we specialised that and said, no, we can't take every letter without bias. We should first determine some random selection of words from the vocabulary, to serve as a search domain, and reduce that into a weighed selection of at least 16 letters. The outcome there-of, we pass to a run generator:
function runFromLetters( Letters ) {
var run = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
lettersCopy = [].concat( Letters ),
r = 0, s = '', i = 0;
while ( lettersCopy.length && i < 16 ) {
r = Math.floor( lettersCopy.length * Math.random() );
s = lettersCopy.splice( r, 1 );
run[ i ] = ( s );
i += 1;
}
return run;
}
(Provide your own error checks and handling: above method holds no handling for when Letters.length < 16.)
Next we wish to determine survival fitness per gene. It tells us, how often this gene should spawn children. Fitness in our game is calculated by the amount of words and the word score, that can be created from the gene run, based on the previously selected words. Sounds easy enough.. but to do that, we need a method that will create and test words like a player would... we need a path walker. Each path within the letter grid needs testing to every single word from the selection. With close to 720,000 paths one can imagine we tried several optimizations.
var Gene = pool[ i ];
var geneScore = calculateScore( Gene.run );
Gene.amount = geneScore.amount;
Gene.score = geneScore.score;
Then we totalled the scores and word amounts and weighed how well each gene faired compared to the total. That would indicate its fitness. Fitness is translated into an amount of times that this gene will be present in the spawn roulette. The greater its fitness, the more copies go into the roulette, the better chance it has to be selected as a parent for procreation, thus the more likely it will pass on its genetic make-up.
while( Gene.fitness ) {
roulette.push( Gene );
Gene.fitness -= 1;
}
After filling the spawn roulette, we must look ahead: the new spawn is going to end up in the same gene pool. What can we do to prevent infinite growth? We must deplete the pool. Just like in biology, specimens die. In programming, we have several options:
- We can have all ancestors die (empty the pool): every generation will start completely fresh. This is the canonical variation. It yielded bad results in our case.
- We can let some genes live; to be specific, those genes that scored above average. This is the hill-climbing variation. This worked well in our case. We tested several lower and upper bounds for determining how much a gene should score above average to survive.
- And then at some point you find that your pool delivers too low a score and you disrupt it and start over. We chose a disruption that consisted of removing the middle 80% of the pool, after 5 generations void of progress.
Now we have a roulette filled with weighed genes. We perform a stochastic remainder sampling, which means that we randomly remove 2 genes from the roulette and spawn those into 2 children which we add to the gene pool, and repeat that until either the roulette is empty or the pool reached a desired gene count.
function refillPool( roulette, wantedPoolSize ) {
var father = [], mother = [], child = [],
r = 0, s = roulette.length, geneScore;
while ( pool.length < wantedPoolSize && s > 0 ) {
r = Math.floor( s * Math.random() );
father = roulette.splice( r, 1 ).run;
s -= 1;
r = Math.floor( s * Math.random() );
mother = roulette.splice( r, 1 ).run;
s -= 1;
child = spawn( father, mother );
geneScore = calculateScore( child );
pool.push( {
gene: child,
score: geneScore.score,
amount: geneScore.amount
} );
child = spawn( mother, father );
geneScore = calculateScore( child );
pool.push( {
gene: child,
score: geneScore.score,
amount: geneScore.amount
} );
}
return true;
}
Because the spawn roulette is filled with multiple copies of fit genes, based on each gene's fitness, selecting 2 random genes from the roulette will at some point result in receiving 2 copies of the same gene. We chose to allow for that chance.
Spawning child genes from parents takes 3 forms: a random cross-over, a random flip, and a random mutation. The random cross-over is executed by defining a random point between gene run start and end. Into the first part, we copy an equal part from the father. Into the second part, we copy an equal part from the mother. Spawning 2 genes from the same parents, but switched around (like the code example above shows), creates more chances of fitness.
For the random mutation we introduce 1 new letter into the gene run. This letter was not taken from the parents, but from the weighed letters we determined before. In the canonical version of the algorithm, one would generate a random bit. We've tested how many random letters to add, and found the best results with 1 random letter per child.
The random flip is something we chose to omit, for testing showed it had a bad influence on overall fitness. In principle we'd throw dice for every single letter, and if that dice hits 6, we'd flip its source from father to mother or vice versa. We've tested several variations and flip ratios before deciding to omit it completely. For your own search domains and fitness determination, it may work quite well, so do implement and test it.
function spawn( father, mother, Letters, lengthOfLetters ) {
var child = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
i = 0, r = 0, q = 16;
r = Math.floor( Math.random( q ) );
for( i = 0; i < q; i += 1 ) {
if ( i < r ) {
child[ i ] = father[ i ];
} else if ( i == r ) {
child[ i ] = Letters[
Math.floor( Math.random( lengthOfLetters ) )
];
} else {
child[ i ] = mother[ i ];
}
}
return child;
}
Obviously this breaks whenever a gene holds less than 16 letters, and break it should, for that scenario should never happen.
(If you wonder why we pass the length of the Letters array next to the actual Letters array: it's a speed optimisation choice. The same goes for prepopulating the child array rather than using child.push().)
And all of the above is tied in together using a generator loop. The real-world problem with such a loop turned out to be the repeat interval: how do we choose an interval length in such a way that each interval gives ample time for one complete pool refill? We've tried several methods, from simple trial and error to CPU profiling.
It showed us, that the devices to where our game gets distributed, use wildly differing CPUs and javascript engines. We had to add checks inside the generator, to skip a repeat if the generator was still running.
In the end we abandoned the interval guestimations, and replaced the loop by a series of repeated time-out calls, meaning the generator would call itself when finished, eliminating the need to check whether or not the generator was running already.
There you have it: a Genetic Search Algorithm implemented in javascript, running a word game that lets players find words from a pre-selected vocabulary, using a 4 by 4 letter grid.
Zjramble5, a word game like Boggle and Scramble.
Need problem solving?
Talk to me. Let's meet for coffee or over lunch. Mail me at “omegajunior at protonmail dot com”.