Tuesday, June 26, 2007

Junk DNA is a myth

Probably one of the most absurd scientific ideas that I have ever read about is the idea that approximately 97% of human DNA is junk. Wikipedia (June 26, 2007) says that:

"About 97% of the human genome has been designated as "junk", including most sequences within introns and most intergenic DNA. While much of this sequence may be an evolutionary artifact that serves no present-day purpose, some is believed to function in ways that are not currently understood. Moreover, the conservation of some junk DNA over many millions of years of evolution may imply an essential function."

It's hard for me to believe that any thoughtful person could believe such an absurd theory. On what authority do I make such a claim? Well, not much. I am not a geneticist or a molecular biologist. In fact, I only read a couple of books on genetics. However, as a software developer I have a vague idea of how many bytes of code are needed to make complex software programs and I am amazed that something as complicated as a human being is encoded in as little as 3.2 billion base pairs of DNA.

To be more specific, since the DNA alphabet consists of 4 nucleobases, we can represent a nucleobase with 2 bits data. This means that 4 base pairs can be represented by a byte of data and approximately 4 million base pairs can be represented by a megabyte of data. This means that the entire human genome can be represented by only 800MB of code. From my 25+ years of experience as a software developer, this would have to be highly efficient code. To suggest that 97% of DNA is junk implies the implausible -- that only 24MB of DNA is not junk. By comparison, the Microsoft Word executable has a size of 12MB which is half as much information (and that doesn't even count all the DLLs and operating system code that it relies on).

I think it's more probable that we haven't yet discovered all of the biological information required to produce humans and other forms of life.

What about repetitive DNA sequences?

31 Comments:

At 6:54 AM, June 28, 2007, Blogger Lord Runolfr said...

Perhaps you can figure out the answer to this question.

If the non-coding DNA in a species is all actually necessary, why does an onion need about five times more non-coding DNA than a human?

 
At 10:11 PM, June 30, 2007, Blogger Randy Stimpson said...

Hello Lord Runolfr. Thank you for commenting on my blog.

Actually onions have about twice as much DNA as humans. And if onions have junk DNA it doesn't imply that humans do too.

From a software engineering perspective. I could believe that onions have junk DNA, but not humans.

 
At 1:19 PM, July 13, 2007, Blogger Randy Stimpson said...

On second thought, maybe I better make sure to include onions in my diet.

How Can I Eat to Optimize My Genetic Potential for Good Health

 
At 2:45 PM, July 16, 2007, Blogger Lord Runolfr said...

You appear to be setting a double standard. If you can accept the existence of non-coding DNA in onions, why can't you accept it in humans?

 
At 7:20 PM, July 16, 2007, Blogger Randy Stimpson said...

I haven't accepted the existence of junk DNA in onions, nor am I denying it. However, based on my experience dealing with complexity as a software engineer, it's not plausible that humans have a significant amount of junk DNA and I have already explain this in my blog.

I could speculate that perhaps the DNA of onions also serves a purpose for the animals that eat it -- after all, it is a food source. See How Healthy Food Builds Health and Onions.

Your line of reasoning seems to go like this:

Onions have more DNA than humans, therefore, onions must have junk DNA. Since onions have junk DNA humans must have junk DNA.

Since there is no logic to this line of reasoning I suspect you have other reasons for believing that humans have junk DNA. What are they?

 
At 12:35 PM, July 19, 2007, Blogger Lord Runolfr said...

Since there is no logic to this line of reasoning I suspect you have other reasons for believing that humans have junk DNA. What are they?

My reason for believing that "junk DNA" exists is the consensus of trained biologists that much of the DNA in any species does not code for any protein or serve any other discernible function.

Anyone claiming that all that non-coding DNA serves a purpose bears the burden of showing just what purpose it serves. Do you right thousands of lines of code in a program because "it might come in handy some day", even though it serves no purpose in the current application?

 
At 4:14 PM, July 20, 2007, Blogger Randy Stimpson said...

Richard Dawkins once said:

Science doesn't work by vote and it doesn't work by authority.

And it's often the case that the opinions of many are based on the conclusions drawn by a respected few -- who may be wrong.

The conclusion that most DNA is junk is an assumption. According to Wikipedia: It was generally assumed that the sequence in any given intron is junk DNA with no function. Just because we don't understand it doesn't mean it has no purpose. OMECIHUATLACEUCUCYOTICIHUATI

This will be my last comment on this post, however, I intend to follow up with another post entitled How We know What We Know About DNA that will address the burden of proof you mentioned. I hope you will continue to add thoughtful comments on that post.

As a side note, I don't write thousands of lines of code just because it might come in handy one day although I have worked with some programmers lacking business sense who do. I do write a massive amount of error handling code which only gets executed in rare circumstances, and that will have some relevance in my followup post.

 
At 1:00 PM, December 01, 2007, Blogger Doppelganger said...

"From a software engineering perspective. I could believe that onions have junk DNA, but not humans."

Why?

And what special insights into genomes and genetics does being a software engineer provide? And how? How much experinece/education do software engineers generally get in the course fo their educations/careers?

 
At 5:55 AM, December 12, 2007, Blogger Marion Delgado said...

in general you are pursuing the sort of looney-tunes argument that could disprove the existence/possibility of an Apple II.

moreover, you really badly do not get evolution. or even that your argument from incredulity about growing people strictly from DNA (which is, in fact, not quite what happens) is a sideline not relevant to the discussion.

This is very sad.

 
At 4:54 PM, December 13, 2007, Blogger Randy Stimpson said...

Hi Marion,

I don't understand your line of reasoning. How can my argument against Junk DNA be extended to disprove the existence of an Apple II? Maybe you can elaborate on this a little more.

 
At 6:39 PM, December 13, 2007, Blogger Randy Stimpson said...

Doppelganger said: what special insights into genomes and genetics does being a software engineer provide?

I have spent a significant portion of my career working on embedded systems such as on board flight computers, data acquisition and control instruments and bar code readers. To design these things software, mechanical and electrical engineers were required to pool their knowledge to accomplish the task. No one person or group of persons belonging to only one engineering discipline would be able to it.

In like manner, to untangle the mysteries the human genome, which has a much grander design than any of the things I have worked on, it will require the combined knowledge of people with specialized knowledge from several disciplines. A computer scientist with a grasp of information theory and design patterns could certainly bring something to the table that a Professor of Biology like you could not.

I would wager that it would be easier for a group of biology professors to design a kidney dialysis machine or a prosthetic hand than to decode the human genome.

Doppelganger said: How much experinece[sic]/education do software engineers generally get in the course fo[sic] their educations/careers?

So what’s your point?

 
At 7:01 PM, December 13, 2007, Blogger Randy Stimpson said...

Darn ... I messed up that prosthetic hand link.

 
At 2:25 AM, December 26, 2007, Blogger Randy Stimpson said...

On his blog Professor Scott Page [Doppelganger above] criticizes this blog entry on Junk DNA. His argument includes ad holmium attacks, psychoanalysis, insults, misrepresentations, poor logic and some information. Others chime in with condescending remarks. Even Professor Larry Morgan chimes in from his blog to add insult. I try to be nice and stick to the topic of the debate but can’t resist the occasional sic.

 
At 9:12 AM, February 24, 2008, Blogger Rob said...

All eucaryotic organisms are proposed to contain so-called junk DNA.

Humans are eucaryotic organisms.

Therefore, humans contain junk DNA.

 
At 9:42 AM, February 24, 2008, Blogger Smokey said...

"At 6:39 PM, December 13, 2007, Intelligent Designer said...
"I have spent a significant portion of my career working on embedded systems such as on board flight computers, data acquisition and control instruments and bar code readers."

That's nice, but what does it have to do with DNA?

"To design these things software, mechanical and electrical engineers were required to pool their knowledge to accomplish the task."

To do molecular genetics, scientists and software engineers were required to pool their knowledge. None of the software engineers who actually did this work agree with you. How do you explain that?

"No one person or group of persons belonging to only one engineering discipline would be able to it."

But again, NONE the engineers who have ACTUALLY WORKED on the genome project agree with you. How do you explain that?

"In like manner, to untangle the mysteries the human genome, which has a much grander design than any of the things I have worked on,..."

Your argument is circular. Your hypothesis is one of design, so assuming it at the outset is intellectually dishonest.

"... it will require the combined knowledge of people with specialized knowledge from several disciplines."

Will? It HAS. None of the people from the disciplines related to yours e who do the actual work agree with you. How do you discount their experience?

"A computer scientist with a grasp of information theory and design patterns could certainly bring something to the table that a Professor of Biology like you could not."

Many, many computer scientists with a grasp of information theory and design patterns have already worked in the field of genomics. NONE OF THEM AGREE WITH YOUR CONCLUSIONS.

"I would wager that it would be easier for a group of biology professors to design a kidney dialysis machine or a prosthetic hand than to decode the human genome."

"Decode the human genome" is creationist gobbledygook. You're left with the reality that plenty of software engineers have contributed to this field, and no one who agrees with you has done anything but produce empty rhetoric.

 
At 11:22 AM, February 24, 2008, Blogger Jim said...

"And to think that something as complicated as a human being is encoded in only 3 billion base pairs of DNA is astounding."

Are you familiar with the concept of the cell? Since you're a software engineer, you might want to try to think of a developing human as a recursive algorithm. Back when I was a computer engineer, I used that principle to write a maze-traversal program in about 60 instructions of assembly code. Is that a miracle? For that matter, do you look at the Mandelbrot set and declare that no relatively simple equation could have produced it?

 
At 4:29 PM, February 24, 2008, Blogger Free Lunch said...

First, the professor's name is Moran, not Morgan. He is an excellent critic of anyone who gets things wrong. He does not limit his critiques to creationists.

Second, you seem to miss that development of cells in a multi-cellular organism is controlled in part by the environment in which the development is taking place. You are mistaken to assume that everything that is necessary to develop a complex organism can be found directly in the genetic code.

Third, if you want to claim that non-coding DNA has a purpose or use, you need to show us what it is. You appear to have jumped to the conclusion that scientists haven't considered any of your comments.

 
At 10:48 PM, February 24, 2008, Anonymous Anonymous said...

The main thing you are missing here is that that DNA is not a program. You could consider the laws of chemistry an executable and the DNA as a data set compressed with a lossy method, but also not particularly efficient, method and you'd be closer.

 
At 8:19 AM, February 25, 2008, Blogger dnaworks said...

One of the biggest problems with your argument is that DNA is not equivalent to machine code or assembly language - it is equivalent to 4th or 5th (or higher) GL languages. If you really are a software developer, you know that even one line of Java/C++/SmallTalk code can produce effects that require orders of magnitude more at the actual physical layer. Talking into account the properties of amino acids and post-processing, the information in DNA is amplified exponentially and as such, your idea that 3% of human DNA is insufficient to explain activity is .... flawed.

 
At 4:18 PM, February 25, 2008, Anonymous Anonymous said...

My two cents: You put too much significance in the term "junk". When it was first discovered that the majority of the genome did not translate to protein, the term was coined because, at first, that's all they thought DNA did: translate to protein. Biology has come a long way since then and we know now that telomeres regulate cell division, certain operons regulate expression of genes, introns are involved in RNAi regulation, etc. You need to read a more current genetics book. "Junk" is not meant as a pejorative term.

 
At 5:49 PM, February 25, 2008, Blogger Randy Stimpson said...

Here is an interesting article on how introns serve as a guide to making nerve cell electrical channels.

 
At 9:00 AM, February 26, 2008, Anonymous Anonymous said...

Your intuitive attempt to relate computer programs to DNA fails because computers are not living things and their internal components (including their software) does not evolve because it doesn't have the characteristics necessary for evolution (such as reproduction).

But, since you are a computer programmer, are you familiar with Genetic Algorithms? If you Google the phrase "Genetic Algorithms" you will find literally thousands of web pages discussing how random inputs, which are reproduced and then fed through a selection process, create complex designs within a relatively short time. These algorithms are an exciting new field of Artificial Intelligence and may, at some future time, be the source of the majority of software designs.

 
At 10:06 AM, February 26, 2008, Anonymous Anonymous said...

I think it's better to stick to biology. But, let's go with the computer analogy.

Most of it could still be junk.

Think about the file format for a word processor. Consider a typical letter: Nearly all the information is in the text, which could be rendered in a tiny amount of 8-bit ASCII. The rest is really junk if all you care about is what the words are. The word processor lets you do other things, sure, and uses other information for that-- but most of that is entirely unrelated to the content of the text.

 
At 10:07 AM, February 26, 2008, Anonymous Anonymous said...

I think it's better to stick to biology. But, let's go with the computer analogy.

Most of it could still be junk.

Think about the file format for a word processor. Consider a typical letter: Nearly all the information is in the text, which could be rendered in a tiny amount of 8-bit ASCII. The rest is really junk if all you care about is what the words are. The word processor lets you do other things, sure, and uses other information for that-- but most of that is entirely unrelated to the content of the text.

 
At 1:48 AM, September 13, 2008, Blogger Yeller said...

The Idea of comparing DNA structure to computer code is an interesting one but it doesn't hold much water as a direct comparison.

I've only taken high school Bio, but it's my understanding that geneticists don't fully understand how DNA works. The absence of a genetic sequence could be as important as the presence. The so called "junk DNA" could be simply place holders to prevent the development of certain features.

Also the cell uses a chemical system to read the "code"-a system with level of complexity much higher than that of a computer. it stands to reason that the cell would need less code than software to perform the same task.

The Idea of comparing software with DNA is interesting and perhaps we can use knowledge of genetics to find methods of improving our current software and computers in general, but to use this as justification seems absurd and borderline irresponsible.

In response to the argument about programming an artificial kidney being easier than decoding the corresponding DNA, I strongly disagree

it may be easier to write a program for the basic function. But for each specific cell you have more subtle information that needs to be passed on. The basic cell structure of cell, the antigens and such that allow the cells to be passed over by the immune system, the repair functions.

while having people with different disciplines might speed things up it sill well with in the realm of a single discipline. just as an engineering team (or science team) could handle understanding the on board flight computers many engineers created. The simple fact is understanding something that already exists is much easier than creating the same thing. Creation requires knowledge of what can and cannot work and why. Understanding only requires knowledge of how and why something works

Knowledge of computer systems and computer algorithms does not mean you have even basic knowledge of how all systems work.

before you make a claim(this goes for EVERYONE) boil everything down to it's simplest form, take note of all assumptions made and justify each and every one of them. Occam knew what he was talking about

 
At 1:01 PM, July 09, 2009, Anonymous EvoDevo said...

Just a quick note on the information content in DNA... it’s a well-established fact that human beings have 3.2 billion base pairs of DNA in the nuclear genome (there's a bit more inside mitochondria). However, if you want to determine the coding potential of our DNA, you need to think about how the code is actually read. First off, DNA is an anti-parallel molecule, which means that 3.2 billion base pairs can be read in either direction, producing completely different sets of proteins. Second of all, in terms of protein code, the information is read three bases at a time, such that each three-letter combination (codon) corresponds to an amino acid. The code turns out to be quite degenerate, meaning that individual amino acids can be specified by multiple three-letter combinations. To really see how "smart" DNA is, consider this question: how many codons are present in a DNA sequence 15 base pairs in length? The correct answer is 26, if you consider all 6 reading frames. Further, this information can be edited in many ways before its used to produce a protein. So perhaps 3-4% of our genomes is a lot more information than most people think!

 
At 12:17 AM, July 14, 2009, Blogger Randy Stimpson said...

EvoDevo, I don't think what you said is accurate. I am under the impression that DNA is only read in one direction during transcription. Also a promoter sequence and a start codon determine where transcription begins so you don't actually have overlapping codon frames.

 
At 2:18 PM, May 29, 2010, Anonymous Anonymous said...

As a programmer you must have commented part of your code many times when those parts of the code weren't needed anymore or you had to change something. So does the DNA. DNA, over time, has suffered cumulative evolution. The code itself evolved. Most of the 'old code' has remained. Parts of the old code get reactivated now and then. Strangely, this can lead to evolution though it is an old code, when certain conditions are met.
There is no junk code. There are only parts of the code beeing read and used and other parts beeing inhibited or not read. Furthermore, code ultimately generates more complex designs (tissue, organs, etc.). If, let's say, certain part of the code generate structures which are 'dominant' (found in more places in the organism) those parts of the code will be used instead of the parts of the code which create very unique features. Nature will favour the majority. Furthermore, geens are selected from those geens which cooperate better with other geens to form the complex projects. Nature will preffer and use the geens which are more 'cooperative' (the other ones are not discarded, they simple are not used).

 
At 11:16 AM, January 12, 2012, Blogger mytechpeople said...

Good thinking here.Thank you,all.

 
At 11:35 AM, July 26, 2015, Anonymous Anonymous said...

The genetic code is a software program just like any other software
program. Only in this case it the assembly language code of a self
building machine. Just like any other computer software program the
Genetic Code too has a program logic which can be deciphered and
modified .
The human body is a fractal object, that is it is an object whose
evolution is guided by one simple mathematical algorithm repeating
itself over and over again. Hence what starts out initally as a simple
object becomes increasingly complex with every successive iteration .

 
At 12:42 AM, March 20, 2018, Blogger Tom said...

The way in which the DNA in a cell and the environment in which it grows interact to create a living organism is very different to the way in which a computer executes a computer program. There are some useful analogies to be drawn, but to believe that "genetic code is a software program just like any other software program" serves only to reveal the writers near total ignorance of biology.

Why do people with superficial knowledge think they have to share their ignorance with the world?

 

Post a Comment

<< Home