Kolmogorov :

Post on 12-Feb-2016

58 views 0 download

Tags:

description

The Human Genome, and Human Complexity Yoni Toker. Kolmogorov : Complexity of an object is the shortest length of a computer program that creates the object. Viewpoint GENE NUMBER: What If There Are Only 30,000 Human Genes? Jean-Michel Claverie *. - PowerPoint PPT Presentation

transcript

Kolmogorov: Complexity of an object is the shortest length of a computer program that creates the object

The Human Genome, and Human ComplexityYoni Toker

ViewpointGENE NUMBER:

What If There Are Only 30,000 Human Genes?Jean-Michel Claverie

Humans: ~ 30,000 genes

Worm (Caenorhabditis elegans) :~20,000 genes

Are we not much more complicated than worms?

Science 16 February 2001:Vol. 291. no. 5507, pp. 1255 - 1257

Mapping of the Human genome

1953Rosalind Franklin, James Watson and Francis Crick discover the double helical structure of DNA.

Mid 1980’sHuman Genome Project Suggested

Objections to the Human Genome Project

•Too hard:Human genome is 3e+9 base pairs long. A lab (in the 1980’s) could sequence 500 base pairs a day.

3e+9/500/365~~16,000Base pairs

BP’s a day

Days a yearyea

rs

Objections to the Human Genome Project

•Too hard:Human genome is 3e+9 base pairs long. A lab (in the 1980’s) could sequence 500 base pairs a day.

•Too expensive!

•Not the way to do biology:

Biology is hypothesis driven experiments, not a fishing expedition

Mapping of the Human genome1953Rosalind Franklin, James Watson and Francis Crick discover the double helical structure of DNA.

Mid 1980’sHuman Genome Project Suggested

1990Human Genome project announced: Goal: sequence the entire human genome in 15 years, with a budget of $3 billion

Comparison:LHC budget ~5 billionAircraft carrier ~10 billion

Mapping of the Human genome1953Rosalind Franklin, James Watson and Francis Crick discover the double helical structure of DNA.

Mid 1980’sHuman Genome Project Suggested

1990Human Genome project announced: Goal: sequence the entire human genome in 15 years, with a budget of $3 billion

1998Only 5% of genome sequenced

I (Celera) will decode the entire human genome in just 3 years with

a budget of only $300 Million Dollars

Sequencing small pieces of DNA

A C G TA

C T

primer A C

G

TAA

C

F. Sanger et al., Nature 265, 687 (1977).

E. C. Strauss, J. A. Kobori, G. Siu, L. E. Hood, Anal.Biochem. 154, 353 (1986).

Sequencing small pieces of DNAA C G TAprimer A C

A C G TAprimer A C

A C G TAprimer A C

A C G TAprimer A C

A C G TAprimer A C

A C G TAprimer A C

T G

T G C

T G C AT T

T G C

T G C T

T

T G

T G C

T G C AT T

T G C

T G C T

T

Sequencing small pieces of DNA

Sequencing Large DNAsThe whole shotgun method

Fierce competition .. Comes to a drawJune 26, 2000 President Clinton, with J. Craig Venter, left, and Francis Collins, announces completion of "the first survey of the entire human genome."

Technology is getting better: Solexa sequencing

Technology is getting better!

1960 1970 1980 1990 200010

0

102

104

106

108

1010

Year of Publication

size

of l

arge

st p

roje

ct (b

p)

SequencingSyntheis

1e+5

A

DMT

A

DMT

C

A

A

C

A

A

G

T

T

G

T

T

Oligonucleotide Synthesis

• 1) De-Blocking dichloroacetic acid (DCA) or trichloroacetic acid in dichloromethane (DCM)

DMT= dimethoxytrity

A A

C

A

A

C

A

A

Oligonucleotide Synthesis

• 1) De-BlockingC

DMT

C

DMT

C

DMT

C

DMT

•2) Base Condensation

A A

C

A

A

C

A

A

Oligonucleotide Synthesis

• 1) De-Blocking

C

DMT

•2) Base Condensation

•3) Capping

•4) Oxidation

A A

C

A

A

C

A

A

Oligonucleotide Synthesis

• 1) De-Blocking

C

DMT

•2) Base Condensation

•3) Capping

•4) Oxidation

A

DMTA

DMT

DNA Synthesis

Genetic Code

4 base pairs 20 amino acids

Example:

CCG Proline

Every 3 base pairs code for an amino acid

From DNA to Proteins

Some of the things we learned

•Human genome contains 3e+9 base pairs

•Less then 2% of the genome is genes

•Gene average length 3,000 base pairs

•Number of genes ~30,000

•98% genes identical between all people:only 1-2% of genes responsible for color of eyes, genetic diseases…

SpeciesSize of genomeNumber of genesHuman2900 e+6 base pairs30,000

Fruit fly (Drosophila melanogaster)120 e+6 base pairs13,601

Baker's yeast (Saccharomyces

cerevisiae)12 e+6 base pairs 6 ,275

Worm (Caenorhabditis elegans)97 e+6 base pairs19,000

E. coli4.1 e+6 base pairs 4,800

Arabidopsis (Arabidopsis thaliana)125 e+6 base pairs25,000

Genome Size

ViewpointGENE NUMBER:

What If There Are Only 30,000 Human Genes?Jean-Michel Claverie

Humans: ~ 30,000 genes

Worm (Caenorhabditis elegans) :~20,000 genes

Are we not much more complicated than worms?

Science 16 February 2001:Vol. 291. no. 5507, pp. 1255 - 1257

ViewpointGENE NUMBER:

What If There Are Only 30,000 Human Genes?Jean-Michel Claverie

•Are we really more complicated then flies and worms?

• 30,000 is much more complicated then 20,000

• Gene number isn’t everything

210,000

30,000 is much more complicated then 20,000

230,000

220,000

103000~~

Gene Number isn’t everything

mRNA

30,000 genes, but more than 85,000 mRNA species

Alternative splicingmRNA editing

Vertebrate Immune SystemGene sites

Anti body

Complexity comes from more sophisticated regulation mechanisims!

More sophisticated methods of gene expression and regulation

mRNA editing Proteins change their function:•Number of sugars attached•Folding/Unfolding•….

mRNA

Genetic NetworksCalverie:Every gene connected on average to 4-5 other genes

We are not much more complicated then an airplane!

But: Genetic networks follow a power law distribution

Genetic Networks

Number of connections

Average is not very meaningful!

Summary

Human Genome Project •Decoding the “part list” of humans

•Extraordinary technological advances

Complexity: Genome is just the beginning

Aim High!Dream On!

Aim High Dream On!•Sequence more and more organisms

•Creation of Synthetic life

•Find the genes for genetic diseases

•Reconstruct the tree of life

•Learn more of nature’s tricks

• DNA nanotechnology

• Producing clean energy, depositing C02…