Geometry of fitness landscapes

July 27, 2011 — June 17, 2015

evolution

geometry

optimization

topology

Suspiciously similar content

What “shape” is the fitness landscape explored by agents in an evolutionary process? In simple optimisation problems without interaction? In multi-agent systems with interactions between agents? (i.e. with niche construction) In actually existing nature, in all its chaotic glory?

Typically, genetic algorithms are implemented with fixed-length genomes floating values in the range \([0,1]\) — the landscape is \([0,1]^n\) for a genome of length \(n\). (In e.g. John-Holland-style classic GAs it’s \(\{0,1\}^n\), and in terrestrial life, something like \(\{G,A,T,C\}^n\).) More generally these can be free monoids, that is, arbitrary-length strings, or possibly defined on an infinite or uncountable field.

In real evolutionary systems, your average slice of nature red in tooth’n’claw, having got these genomes, you have a lot of steps to go. You then express them as a phenotype, which can be itself quite hard to predict from the genotype (protein folding, epigenetics, general environmentally conditioned expression or suppression), and then throw it into an environment wherein its success may be determined by chance, some intrinsic degree of functionality, or some highly specific combination of those and interactions with its peers, with, that is, the entire ecosystem. So the similarity of two genotypes can be wildly complex; and the relationship between that and its actual probability of reproducing can be grossly dependent on the environment and other phenotypes. Typical in-silico systems cut out that messy phenotypic business and focus on genotypic selection, but it can still be pretty wild there.

OK, so with those disclaimers, what kind of regularities can we expect to find?

Consider general evolutionary processes, say, genetic programming — is \(\mathbb{R}^n\) still the most natural space in which to embed our landscape? There is still that trivial mapping from the free monoid depicting the genome to \(\mathbb{R}^n\), but now there are problem-domain- and encoding-specific “folds”, cases where the symbols in the string can be swapped or substituted without changing the functional form of the algorithm, and which can be known a priori given said encoding and search-space. (Anyone who uses genetic programming for symbolic regression is used to, e.g. getting both \(\sin(x)\) and \(-\sin(-x)\) as solutions, or \(x+y\) and \(y+x\).) How do the likelihoods of these degenerate solutions increase with the genome length? Is it still exponential in genome length, as with the volume of space encompassed by a non-hierarchical GA?

Further, consider evolutionary processes that include significant levels of niche construction, where evolution becomes path-dependent. Is there still some notion of fitness landscape that can be made rigorous for these algorithms, or some mapping between phenotypic and genotypic fitnesses that captures the same function as the fitness landscape? I suspect this problem is well-explored, but I’m missing the keywords to find it. A jelly bean for you if you can tell me, so I can tell my old ecology lecture what to show on the slides for the lecture on path-dependence.

I know people must occasionally toy with these areas, but Google Scholar has not helped me thus far. Hints?