Friday, March 13, 2009

The Probability of Information

I recently searched youtube.com using the search words "evolution probability" and found a number of atheist apologists making variations of the same stupid argument that the guy in this video is making. In this blog entry I explain why this guy and blogger Lord Runolfr get it wrong.



Consider the following paragraph:

Mathematics is the language of science. To succeed in science, one must use mathematics. Thus high quality science depends on high quality mathematics.

This paragraph contains 151 characters. It contains information which can be understood by those who can read English. If we consider the space of characters to consist of all upper and lower case characters, space, comma and period there are 55 characters in all. The probability that this paragraph could be generated randomly is 1 in 55 to the 151st power. If I punch that into my calculator I get 6.23E+262 which is a huge number. On the other hand, consider the random sequence of characters below

jqloHdfnecUmfnpwlkiayfnsklufnmB,sloodmsioi dkmeijdp.jdsewydnsj. kYsdflklgagdkjqlknc lgjha lkjdjhbadkghlkjfajdlkjalkgddopauneo aoij Lkdn ienonismdngsdfg

This sequence of characters also has 151 characters and has the same probability of being generated as the well formed paragraph above. So what’s the big deal about the first sequence? Well if you are asking which of the two sequences is more probable you are asking the wrong question. That's where blogger Lord Runolfr and the dude staring in the YouTube video above get it wrong. The first sequence of characters has three properties that the second sequence does not.

1) It consists of properly spelled English words.

2) It is grammatically correct,

3) It makes sense.

So to illustrate the last property consider the grammatically correct sentence: Mathematics is the job of refrigerators. This sentence is grammatically correct but it doesn't make sense. If you have any doubts about the grammar type the sentence into Microsoft Word and see if its grammar checker flags any problems. So the question we should be asking is this: What is the probability that a random character generator world produce a grammatically correct sequence of characters that makes sense? The answer is too hard to calculate but it is still infinitesimal. Even computing the probability of satisfying (1) is difficult and beyond the scope of this blog. However, we can compute the probability of something similar to (1) which can serve as an upper bound.

Let’s approximate the probability that a sequence of about 150 characters will satisfy property (1) above. To make the approximation let’s make the following assumptions:

1) The average word has a length of seven characters
2) There are 100,000 seven character sequences that represent words.

Now there are 26 to the 7th power possible seven character sequences; that is more than 8 billion. If only 100,000 of those sequences are words the odds that a seven character sequence is a word is 1 out of 80,000. Given that the average word length is seven characters, and words are followed by a space, there are an average of 18.75=150/(7+1) words in a 150 character sequence. So you could say that the odds that a 150 character sequence is made up of entirely properly spelled words are less than one in 80,000 to the 18.75 power. Plug this into your calculator and you will find the odds to be less than 1 in 8.5E+90. Remember, we are only trying to compute the probability of a simplified model of property (1). The probability of satisfying properties (1), (2) and (3), would actually be much less.

Someone will point out that I haven’t actually said anything about biology or evolution yet and that I am arguing by analogy. And I get that. What I am talking about here is the nature of information, whether it is written language, a computer program or a genome. Each information type has its own set of properties that make it essentially impossible to be generated randomly.

ADDENDUM

This is funny. Even Kenneth Miller, a biology professor at Brown University gets it wrong.



ADDENDUM 2013

The above video has been withdrawn from YouTube. I wonder why.

21 Comments:

At 11:32 PM, March 13, 2009, Blogger Samuel Skinner said...

You do realize that chemical process, unlike letters, are inherently ordered?

What specifically are you criticizing? Amino acid sequences? Gene formation? Mutations?

 
At 1:41 AM, March 14, 2009, Blogger Randy Stimpson said...

Hi Samuel,

Thank you for visiting my blog. I am asserting that the probability that a random sequence represents a significant amount information is essentially zero. In the case of nucleotide sequences that means they would represent a viable life form or part of a life form. Are you trying to suggest that nucleotide sequences are inherently ordered to represent viable life forms?

 
At 3:41 AM, March 14, 2009, Anonymous Anonymous said...

Give a set of monkeys a typewriter each, and given enough time they will always end up writing a perfect copy of William Shakespeares completed works, and all other books that have ever been written.

Being "Essentially Zero" means you yourself just admitted that the chance is NOT ZERO, otherwise there would be no need to add essentially in front of the Zero.

Even with incredible low odds of something happening, it will happen given a long enough span of time.

And the universe have been around for a long, long, looong time...

 
At 4:44 AM, March 14, 2009, Anonymous Anonymous said...

Are you trying to suggest that nucleotide sequences are inherently ordered to represent viable life forms?

Of course they are, because mutations only operate on the gametes of viable parents. See, that's why it's called "evolution". And if you want to argue that they weren't initially so ordered, then you have retreated to abiogenesis and have abandoned the argument for intelligent design and against evolution.

Also, you know nothing of information, probability, or the meaning of "impossible", and you utterly failed to grasp the point of the videos you posted and utterly failed to refute them.

Do you any reason to think that you are either particularly intelligent or particularly knowledgeable? You are clearly neither -- in fact clearly quite the opposite -- yet you claim to have shown Ken Miller wrong, and you go to pharyngula and who knows where else to argue with people who demonstrably are intelligent and knowledgeable. It's foolish arrogance, a very low moral trait.

 
At 5:01 AM, March 14, 2009, Anonymous Anonymous said...

Even with incredible low odds of something happening, it will happen given a long enough span of time.

But Randy unwittingly (all he's capable of, it seems) argues against his own position by pointing out that all sequences must be valid and "make sense" -- only sequences that yield viable organisms are allowed. This radically reduces the number of allowed sequences, greatly increasing the chances that a random mutation (change) produces another viable sequence. Evolution is nothing like monkeys typing at a typewriter -- there, every document, every sentence, every character is independent, disconnected, produced from scratch. But evolution consists of random changes to viable sequences, and all results that aren't viable are discarded, so viable success builds upon viable success -- we don't call it "evolution" for nothing. And evolution is nothing like trying to produce Shakespeare -- there is no goal. Every resulting organism is born of viable organisms, and the system just hums away, producing viable organisms of various sorts -- and there are many many sorts. It's really rather simple conceptually, and there is nothing about it that is "essentially impossible".

 
At 7:09 AM, March 14, 2009, Blogger Unknown said...

I don't know if this has been said yet but if not it needs reiterating. The main flaw in Randy's argument is even more basic than the fact that the model he is using bares little resemblance to the system he is trying to make an analogy to.
The main flaw is he is using an upper bound to argue that something is unlikely. This might be ok if he gave, or attempted to calculate, a good lower bound or gave us an idea of how far away a good lower bound was. A good lower bound being one that is within an acceptable margin of error of the actual value, say 1% or better yet a way to get within epsilon, of the real value.
Because of the we don't have a good lower bound, the actual value may be 1 out of 1, which is clearly not the case but the point is we have no idea of how close to that the real value is, so we can say nothing about the unlikeliness of a string of 150 characters containing all 7 letter words other than we think its unlikely but we can't prove it.

Now the model in your analogy is poor because it is significantly more complex than the system you are comparing it to. In a sentence you said you need the string of letters to spell things, you need grammar, and it needs to make sense. All of which are true. But for a string of DNA there is only 1 word length 3 letters, there are only 4 letters vs the 26+ in the english alphabet, and most importantly all possible combinations of those 4 letters in sets of 3 is a word. Also there is no 3rd factor in coding from DNA all it needs to do is code a useful protein to have a significant amount of information, so either grammar or making sense needs to go. I'll choose grammar since its my least favorite.
Now the question of if a string of words makes sense is a very hard one, in DNA or english. But it is doubtful that making sense in english or DNA can be put into an analogy. In english words have meaning and some strings of words never make sense, but how can you say that a protein coded by DNA will never be a useful one? Ignoring everything else that part is shaky. But combining it with everything else your analogy is useless without a significant amount of cleanup and argument on your part.

 
At 11:21 AM, March 14, 2009, Anonymous Anonymous said...

You run through some probability thought experiments. What does any of that have to do with the non-randomness of natural selection?

Are you trying to say that DNA is not ordered to make viable life forms? That's silly, DNA is not random.

Shouldn't your programming experience help you to understand how DNA is in fact, not designed by an intelligence?

 
At 6:07 AM, March 15, 2009, Blogger Lord Runolfr said...

And here is my reply.

 
At 3:27 PM, March 16, 2009, Blogger Randy Stimpson said...

This is what happens when you poke a stick into a hive of atheists. LOL. Except for the comments made by blogger Lord Runolfr on his blog, there is practically nothing above worth responding to, but I will to some of it anyway.

Debaser said: "Are you trying to say that DNA is not ordered to make viable life forms? That's silly, DNA is not random."

How does one have a discussion with someone who insists on the most absurd interpretation of what you have to say? Of course the DNA of a viable life form is organized. However, if you were to take the DNA of a viable life form, and cut it to pieces, it isn’t going to reassemble itself into the DNA of a viable life form.

Adam, I could be wrong, but you seem to be unaware that the probability of an event is a real number between 0 and 1 so I can’t even respond to your math comments and expect you to understand what I am saying.

You also said that my "analogy is poor because it is significantly more complex than the system you are comparing it to." You can hardly conclude that DNA is less complex than English because it consists of only four types of nucleotides and because codons are only three nucleotides long. That’s like saying a computer program isn’t complex because it is just 1s and 0s.

I’ll respond to Lord Runolfr on his blog.

Lastly, the only world in which Kenneth Miller’s argument, and others like it, can make mathematical sense is in a world where every possible DNA sequence represents a viable life form.

 
At 9:30 PM, March 16, 2009, Anonymous Anonymous said...

Of course the DNA of a viable life form is organized?

you just asked:
Are you trying to suggest that nucleotide sequences are inherently ordered to represent viable life forms?

and you say I'm taking the most absurd interpretation of what you have to say? It's not my fault most of what you say is absurd.

Absurd would be something like:
However, if you were to take the DNA of a viable life form, and cut it to pieces, it isn’t going to reassemble itself into the DNA of a viable life form.

Sure I guess not, if I cut one of your computer programs into pieces, it isn't going to reassemble into a viable program either. What are you getting at? When did things self re-assembling after being hacked into bits come into play? And would you mind answering my first question?

By the way, thanks for calling the people who post in your blog "barely worth responding to". That's really nice of you, considering atheists coming in and posting is about all the traffic you get.

 
At 12:11 AM, March 17, 2009, Blogger Randy Stimpson said...

Debaser, I'll be dealing with your question regarding natural selection in a followup blog entry.

 
At 1:29 AM, March 19, 2009, Anonymous Anonymous said...

"This is what happens when you poke a stick into a hive of atheists. "

This is what happens when you're a moron who posts a transparently bad argument to where intelligent, informed people hang out.

"there is practically nothing above worth responding to"

Asshole.

 
At 3:41 PM, March 19, 2009, Blogger Randy Stimpson said...

Did you just call me an asshole or were you signing your name? LOL. Some of you Pharyngula atheists like to play rough don't you? No offense taken. But if you want to play rough you should expect to be roughed up a little yourself. Nevertheless, since I would prefer to keep my blog out of the gutter I should have been more polite by saying there is nothing above worth responding to that I don't plan to respond to in my followup blog entry.

 
At 5:07 PM, March 19, 2009, Anonymous Anonymous said...

I think it was pretty clear which he was doing. Signing your posts is something old people do because they think it's a letter or something. You can LOL after the "I know you are but what am I" if you want, but if that's your being "rough" ID, I tell you what: you can go as over the top as you want, I don't think its gonna be a problem.

You posted an off-topic comment in a pharyngula thread, specifically asking people to come over to your blog about this post. We bothered to come and look at, and have commented on. Is anyone posting here now NOT because of that? Show some basic respect or you'll quickly have your Zero comment posts back.

A combination of letters or words making sense in a sentence is far less probable than a combination of genes coding for a protein.

The problem is that sentences in english are complex on a level which requires human brains and language. DNA does not require the assistance of an intelligent mind to put together working, complex systems.

 
At 10:20 AM, March 23, 2009, Blogger Randy Stimpson said...

Debaser,

It is my understanding that a protien is coded by a single gene, not a combination of genes. Perhaps you are confusing genes with nucleotides.

Also a gene consists of a sequence of codons (nucleotide triplets) and in order for a sequence of codons to code for a protien there must at minimum be a start codon and a stop codon. Of the 64 possible codons, only one codes a start codon and three code a stop codon. So not just any sequence of codons codes for a protien. In other words, genetic code has its own "grammar". Also just because we have sequence that codes a protien doesn't mean it codes a useful protien. It's extremely unlikely that a coded protien would be useful.

Also consider that the average gene is 1210bp in length or about 403 codons long. Given that 3 out of 64 codons are stop codons and only one codon is a start codon, coding sequences that extened for 403 codons before encountering a stop codon would be rare. The longest gene is over 10,000 codons long. What do you think the probability is of even one one of those existing in DNA sequence 3 billion nuceotides long? Also consider that having a start codon is just a necessary condition for a beginning a coding sequence. Near by coding sequences and initiation factors are also required to start translation. My point is that DNA has its own "grammar" and scientists so far have only discovered the most obvious parts of that syntax. DNA which represents a viable lifeform is far more complex that English language.

 
At 5:10 PM, March 24, 2009, Anonymous Anonymous said...

The point I (and everyone else I think ) were getting at was not that DNA was more simple than English. The point was that your analogy is a poor one (because the probability of randomly generating a grammatically correct sentence in english is silly orders of magnitude higher than randomly generating a viable protein in DNA -- still high, but ultimately beside the point) , and the conclusion you draw from it is also incorrect. (That DNA has its own set of properties that make it impossible to be generated randomly).

You seem to equate being "generated randomly" with the whole DNA sequence being laid down at once. That is not accurate. Each living organism inherits a high-fidelity copy of DNA from it's parents (or parent if you're goin the asexual way). Most variations are highly selected against.

It isn't that DNA isn't complex, but you can't ignore natural selection's role in this process. NS has built up a staggering level of complexity, but that was over billions of years, with much attrition. Part of the power of NS is its ability to generate a stunning variety of life without a guiding force or pre-determined goal shaping it.

 
At 10:06 AM, March 25, 2009, Blogger Randy Stimpson said...

Debaser,
You don’t know the difference between a gene and a nucleotide; so I doubt you are in a position to argue that it’s more likely for a useful gene to be randomly generated than for a short paragraph to be randomly generated. But that’s not what this blog entry was about. This blog entry is about the probability of information and its purpose is to refute arguments like the one made by Kenneth Miller. Refuting his wrong-headed probability experiment doesn’t end the debate about evolution, nor does it address the issue of natural selection or other supporting arguments or evidence in favor of evolution. However, if you have an honest bone in your body, you won’t ever hear yourself making the same stupid argument that Kenneth Miller made, or one like it, to support your view that evolution is true.

 
At 4:39 PM, March 25, 2009, Anonymous Anonymous said...

ID, you are correct, a gene only codes for a single protein. I was thinking in the more lay sense of the word gene, which is wrong and distracting in this example. My bad. That accidental use of a plural doesn't mean I'm wrong, however.

Everyone knows your post has been about the possibility of information, that is why we have all been talking about your analogy -- it's weak. It doesn't prove your point. It doesn't refute Ken Miller's argument. You just go and do they exact same thing Ken Miller talks about. What you don't do is address the actual nub of his argument. Yes, DNA is complex. The chance of the human genome being generated randomly is fantastically small. So what? That isn't what evolution says happened. Are you aware of that? Of how the inter-relatedness of all life shows how complexity and diversity arose over billions of years?

And you are right, Ken Miller's 47 second explanation of why the post you went and did anyway was wrong doesn't end the debate. The debate has been over for about 150 years. I'm sorry you were born too late to argue this angle of probability.

I didn't actually watch the Ken Miller video on your post until you mentioned him in your last response. I wish I had earlier. You don't refute him. You simply went and did what he said doesn't work. The problem is that simply calculating how improbable is would be for DNA to arise by chance, all at once, is meaningless.
That is true because of what you say this post isn't about, natural selection. NS is the reason finding out the chance of DNA arising at random doesn't matter. NS can slowly generate increasing complexity from simpler forms, ones far too complex too ever hope to arrive by chance alone, and without any intelligent guiding force behind it.
But look at the amazing creatures alive on this planet at the moment. Such amazing illusion of design, but still illusion. Dawkins' the Blind Watchmaker is all about this, and is a great read. I believe Climbing Mount Improbable also talks about this, but I have not read it yet.

 
At 11:35 AM, March 26, 2009, Blogger Randy Stimpson said...

Debaser,

I know I have applied the math correctly. I have a Masters degree in Applied Mathematics from the University of Washington. I can say with authority that Kenneth Miller’s math does not apply to the situation. Let me say this one more time: the only world in which Kenneth Miller’s argument, and others like it, can make mathematical sense is in a world where every possible DNA sequence represents a viable life form.

 
At 5:38 PM, March 26, 2009, Anonymous Anonymous said...

I didn't say you did the math wrong. Your masters degree in math does not mean your argument is correct, or that you get to dismiss my objections with a wave of your diploma. I don't think you are stupid, I think you are wrong, and that your arguments fail.

How does your Masters in applied math gives you (according to you, no less) the authority to say Ken Miller is wrong about evolution? "Wait wait" you might object, "I'm just talking about his probability example". Yeah, but isn't this whole stillborn exercise merely to argue from incredulity about what scientists DO say about evolution (it just couldn't happen)?

The Ken Miller clip talks about how calculating the probability of DNA coming into existence by random chance today does not show that evolution is wrong (and therefore was designed by an intelligence). Its a misleading exercise, well intended or not. No amount of mathematics gets around the fact that biology does not say that DNA arose, as it is today, by random chance in the past.

DNA as it exists today would not form by random chance any more than you could form an airplane by sending a tornado through a junkyard. It doesn't mean that human flight can't ever happen.

The chance of successfully juggling for the first time, using knives, is extremely low. That doesn't mean that juggling can't possibly be done.

You have not refuted Ken Miller's point, only given another example of what he says doesn't work. You just put another fish into his barrel.

Insisting that every sequence of DNA has to make a viable life form for Miller's point to be valid is weird. As nothing sacred put it pretty well a few posts ago:
"evolution consists of random changes to viable sequences, and all results that aren't viable are discarded, so viable success builds upon viable success -- we don't call it "evolution" for nothing."

Its kind of like me asserting, "for your example to make mathematical sense, every viable combination of letters in english has to form a viable paragraph." Say what? You're tying to trump Biological fact with mathmatical probabilities. It doesn't work.

 
At 4:57 AM, September 03, 2009, Blogger Tony Lloyd said...

Let’s go back to Peirce and his symbols. There are:

- Icons
- Indexes
- Symbols

An “icon” is a direct representation of something, (Like the picture of a man on the door of the gents).

An "index" is a causal consequence. Peirce uses (for some weird reason) the example of a bullet hole in an egg. From the shape of the hole we gain information about what happened (some idiot shot an egg!). We gain information about the cause from the effects of it.

A "symbol" is intended by someone. Letter “H” is not an icon, it doesn’t directly represent something. It is an index, it arises causally from someone pressing the “H”. However, primarily, it’s part of a language and an attempt to communicate – it doesn’t just mean something, somebody means something by it.

So we have three types of information, iconical, indexical and symbolic.

The “probability” of this information is concerned indexical information is spectacularly likely. It happens all over the place, all the time.

“Each information type has its own set of properties that make it essentially impossible to be generated randomly.”

It may not be generated “randomly” but indexical information is generated all the time by non-directed processes. The sun shines, as a result we know which side of a flat stone was facing the ground: the cool side. We know that the tree was struck by lighting because of the charred stump that is left. We have information about the ages of rocks because of the ratios of various isotopes. Let’s just look at that last example: indexical information is created more or less solely by the passage of time.

As far as any statement about the “improbability” perceived in the first quote is related to whether we would expect just that string. And why on earth would we expect just that? We would only expect just that string if someone intended it. If no one had intended it we would not expect any particular string, the numerator of the probability fraction increases to equal the denominator and we have certainty, no symbolic information, but certainty. It is the symbolic information that is unlikely.


Now, what sort of information do we need for evolution?

If we assume that indexical information is sufficient then the ID “argument from information” fails.

If we assume that symbolic information is necessary for life then we assume that someone “meant” life, we assume the conclusion and the argument begs the question.

 

Post a Comment

<< Home