Saturday, September 05, 2009

Genomicron -- Simulates mutation and natural selection of genomes

The Silverlight application below simulates mutation and natural selection of genomes. You can start the simulation now by pressing the Evolve Genome button and then read on.

Note that there are two views to this application: the Simulation View and the Mutation View. The Mutation View shows what happens when a coding sequence is mutated. Press the Mutation View tab and go look. The simulation will keep running while you are in the Mutation View.

While in the Mutation View press the Initialize button to get a starting coding sequence. The press the Mutate button repeatedly to see what happens to a coding sequence when random mutations are applied.

Configuration

Note that this application allows the user to configure the type of mutations and length of mutation as well as the percentage of the genome that gets mutated. The length of the mutations is given by length = Max - (Max-Median)*log2(random+1)/31. This formula favors shorter mutations but allows very long ones to happen. Random is a random number between 0 and 4,294,967,295. The value of length return is rounded to the nearest integer except that all length values less 1.5 are rounded to 1. This allows for a median unrounded mutation length that is a positive number less than 1 and will affect the percentage of single point mutations.

About the simulation

10 child genomes are created using random mutation on a parent genome. The child genome with the largest average size coding sequence is selected to be the parent of the next generation. The goal of this simulation is to see if it is possible to create a genome with coding sequences comparible to the average length of human coding sequences by selecting the genome with the largest average length coding sequence. Human genes are known to be about 1000 codons long on the average.

Source Code

Click here to download the source code. The program is a Silverlight application written in C# using Visual Studio 2008.

7 Comments:

At 11:16 PM, September 05, 2009, Anonymous Paul W said...

Looks fairly good so far.

To be more realistic, you need to have more than 10 entities in any generation. With such a small population it will take a long time to evolve anything.

If you have a larger population (say 1,000 entities), and select the top 100 and have them produce 10 offspring each would better reflect the dynamics of a real population.

By only selecting 1 out of a small population, you are not allowing for competition between "families". Thus you are not really simulating how evolution works.

 
At 3:34 PM, September 07, 2009, Blogger Randy Stimpson said...

Paul W,

Let’s suppose I make the changes you suggested, and not much changes. What conclusions would you draw? Did you have a chance to try out the Mutation View?

 
At 8:44 AM, September 10, 2009, Anonymous Bjoern said...

Does your simulation include something like gene (and/or genome) duplications?

 
At 11:15 AM, September 10, 2009, Blogger Randy Stimpson said...

Hi Bjoern,

No the simulation does not include gene duplications. I don't see how it would make a difference in trying to evolve average gene length. The simulation does allow for duplication of large sections of gene.

 
At 10:00 PM, September 27, 2009, Anonymous Paul W said...

"Let’s suppose I make the changes you suggested, and not much changes. What conclusions would you draw? Did you have a chance to try out the Mutation View?"

The problem is one of population. Because of the way (you described it) you are only using a population of 10.

As I said in the previous thread, you need to have a large population.

I tried the mutation view, and this is worse as you only have a population of 1.

You obviously understand a fair bit about statistics, so you would understand the problems that occur with small sample sizes.

Strictly speaking, evolution (as in the algorithm) does not require any randomness (variation can also be non-random), but it needs to be included here as in biological evolution it does occur. But because we are introducing randomness, we have to be more acutely aware of the problems that occur with small sample sizes.

 
At 12:39 PM, September 28, 2009, Blogger Randy Stimpson said...

Paul,

The Mutation View isn’t even a population of one. A population of one genome consists of 10,000 coding sequences. The Mutation View is there to explain what happens when a sequence is randomly mutated rather than serve as a simulation.

I agree that variation can be non-random or at least semi-random. Recombination is an example of semi-random mutation, where mutations are randomly selected from a restricted set of choices. This gives organisms the ability to adapt. I suspect that there are other types of semi-random genetic mutations waiting to be discovered. However, adaptation is not what is being debated. What is being debated is the idea that evolution can result from natural selection of random mutations.

The idea that evolution is the result of non-random mutations is what evolution by intelligent design suggests. That’s also what software developers like you and I do.

Paul, you didn’t answer my question so let me rephrase it. Suppose we work with an exponentially larger population size and the results are basically the same. Would this cast a doubt on your belief in evolution by random mutation and natural selection?

 
At 5:20 PM, October 18, 2009, Blogger Randy Stimpson said...

Paul,

I am a little bit disappointed that you didn’t answer. Anyway, since I can modify one line of code in Genomicron to increase the number of descendents I did. If you have try Genomicron with default you the average coding sequence length peaks at 57. If you increase the number of descendents exponentially to 100 the average coding sequence length peaks at 68. If you increase the number of descendents to 1000 the average coding sequences peaks at 79. Unfortunately, it takes several hours of simulation to peak out with 1000 descendents so it isn’t practical to keep increasing the number of descendents exponentially. I think we can see a trend though, when we increase the number of descendents exponentially it only has a linear effect on where the average coding sequence length will peak. In other words, during each generation we could pick only one out of a trillion and still not peak at an average coding sequence length of 200 codons.

 

Post a Comment

<< Home