+ All Categories
Home > Documents > Comparative genomics

Comparative genomics

Date post: 03-Jan-2016
Category:
Upload: jin-ratliff
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Comparative genomics. Haixu Tang School of Informatics. WGS of human genome. 2001 Two assemblies of initial human genome sequences published International Human Genome project Celera Genomics: WGS approach. Model organisms. 1995 Haemophilus influenzae sequenced 1997 E. Coli sequenced - PowerPoint PPT Presentation
Popular Tags:
39
Comparative genomics Haixu Tang School of Informatics
Transcript
Page 1: Comparative genomics

Comparative genomics

Haixu Tang

School of Informatics

Page 2: Comparative genomics

WGS of human genome

• 2001 Two assemblies of initial human genome sequences published– International Human

Genome project

– Celera Genomics: WGS approach

Page 3: Comparative genomics

• 1995 Haemophilus influenzae sequenced

• 1997 E. Coli sequenced

• 1998 Complete sequence of the Caenorhabditis elegans genome

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome

Model organisms

Page 4: Comparative genomics

Why model organisms?

• Testing and improvements of genome sequencing technology and strategy

Page 5: Comparative genomics

• 1993 Whole genome shotgun sequencing proposed (J. C. Venter)

• 1995 Haemophilus influenzae sequenced ~1.5-2 MBps

• 1995 Automated fluorescent sequencing instruments and robotic operations (PerkinsElmer, Inc)

• 1996 Yeast sequenced

• 1996 Double barrelled sequencing

• 1997 E. Coli sequenced ~4 Mbps

• 1998 Complete sequence of the Caenorhabditis elegans genome ~ 100 Mbps

• 1998 Whole genome shotgun sequencing (Weber & Myers)

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome ~ 180 Mbps

Model organisms

Page 6: Comparative genomics

Why model organisms?

• Testing and improvements of genome sequencing technology and strategy

• Model organisms have important biological implications themselves.

Page 7: Comparative genomics

• 1995 Haemophilus influenzae sequenced (infectious disease)

• 1996 Yeast sequenced (industry and biology)

• 1997 E. Coli sequenced (industry and biotechnology)

• 1998 Complete sequence of the Caenorhabditis elegans genome (multi-cellular organism, development)

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (genetics, entomology)

Model organisms

Page 8: Comparative genomics

Why model organisms?

• Testing and improvements of genome sequencing technology and strategy.

• Model organisms have important biological implications themselves.

• Genome sequences provide useful information to study genome function and evolution.

Page 9: Comparative genomics

• 1995 Haemophilus influenzae sequenced (Bacterial)

• 1996 Yeast sequenced (Uni-cellular)

• 1997 E. Coli sequenced (Bacterial)

• 1998 Complete sequence of the Caenorhabditis elegans genome (Multi-cellular organism, nematode)

• 2000 Complete sequence of the euchromatic portion of the Drosophila melanogaster genome (Multi-cellular organism, insect)

Model organisms

Page 10: Comparative genomics

• 2001 Human genome

• 2002 Mouse genome– Initial sequencing and comparative analysis of the

mouse genome

• 2003 Rat genome

• 2004 Chicken genome (first bird)

• 2005 Chimpanzee genome

Model mammalian and vertebrate genomes

Page 11: Comparative genomics

Comparative genomics

• Solving biological problems by comparing genomic sequences– Function of genes and genomes– Evolution of genes and genomes

• Data driven approaches– Computational methods are the core

Page 12: Comparative genomics

Which genomes to sequence?

• Species having important biological applications• For comparative genomics studies

– Functional consideration• Evolutionary divergent genomes conserved elements, e.g.

human vs. mouse (~75% identical)• Evolutionary close genomes divergent elements, e.g.

human vs. chimpanzee (98.4% identical)

– Evolutionary consideration• Specific evolutionary puzzles whole genome duplications

in yeast

Page 13: Comparative genomics

Ongoing eukaryotic genome projects

• http://igweb.integratedgenomics.com/ERGO_supplement/genomes_eukarya.html

• >20 yeast, insects (12 drosophila, 2 mosquitoes, Silkworm), Flea, Sea urchin, frog, fish (Zebrafish, Fugu), Mammals (mouse, rat, dog, cow, pig, monkey, etc.), plants (Arabidopsis, Rice(>2), Maize, etc)

Page 14: Comparative genomics

Comparative genomics: case studies

• Gene function and evolution

• Gene-gene relationship

• Genome evolution

Page 15: Comparative genomics

• Orthologues : any gene pairwise relation where the ancestor node is a speciation event

• Paralogues : any gene pairwise relation where the ancestor node is a duplication event

HomologueHomologue relationships of geneselationships of genes

Page 16: Comparative genomics

Atime

Duplication

M 2’

Speciation

Duplication

M 2

A 1 A 2

M 1 H 1

H 2

Inparalogues

OutparaloguesOrthologues

Inparalogues

Inparalogues

Homologue RelationshipsHomologue Relationships

Page 17: Comparative genomics

Functional implications

• Orthologous genes same function in different species

• Paralogous genes different functions

Page 18: Comparative genomics

Yeast speciescerevisiae

paradoxus

mikatae

bayanus

glabrata

castellii

lactis

gossypii

waltii

hansenii

albicans

lipolytica

crassa

graminearum

grisea

nidulans

pombe

• 5-20 million years

• Sufficient conservation to align

• Sufficient divergence to identify conserved functional elements

~20M

~5M

Page 19: Comparative genomics

Large scale genome evolution

• Most genes have a clear match

• Clear blocks of synteny

Page 20: Comparative genomics
Page 21: Comparative genomics

Human–chimpanzee comparisons

• POSITIVE SELECTION---A sequence change in a species that results in increased fitness is subject to positive selection. As a consequence, the change normally becomes fixed, leading to adaptive evolution of that species.

Page 22: Comparative genomics

Genome vs. Genes

• The whole genome sequence can tell not only what genes exist in a genome, but also what genes do not exist (deleted) in a genome.

Page 23: Comparative genomics

Phylogenetic profile analysis

• A non-homologous approach to gene function prediction

• The phylogenetic profile of a gene is a string encoding the presence or absence of the gene in every sequenced genome

• The phylogenetic profiles of genes involving in the same biological process are often “similar'‘, since they may co-evolve.

Page 24: Comparative genomics

Phylogenetic profile analysis

• Phylogenetic profile (against N genomes)– For each gene X in a target genome (e.g., E coli), build a

phylogenetic profile as follows– If gene X has a homolog in genome #i, the ith bit of X’s

phylogenetic profile is “1” otherwise it is “0”

Page 25: Comparative genomics

Phylogenetic profile analysis

• Example – phylogenetic profiles based on 89 genomes

orf1034:1110110110010111110100010100000000111100011111110110111010101orf1036:1011110001000001010000010010000000010111101110011011010000101orf1037:1101100110000001110010000111111001101111101011101111000010100orf1038:1110100110010010110010011100000101110101101111111111110000101orf1039:1111111111111111111111111111111111111111101111111111111111101orf104: 1000101000000000000000101000000000110000000000000100101000100orf1040:1110111111111101111101111100000111111100111111110110111111101orf1041:1111111111111111110111111111111101111111101111111111111111101orf1042:1110100101010010010110000100001001111110111110101101100010101orf1043:1110100110010000010100111100100001111110101111011101000010101orf1044:1111100111110010010111010111111001111111111111101101100010101orf1045:1111110110110011111111111111111101111111101111111111110010101orf1046:0101100000010001011000000111110000010100000001010010100000000orf1047:0000000000000001000010000001000100000000000000010000000000000orf105: 0110110110100010111101101010111001101100101111100010000010001orf1054:0100100110000001100001000100000000100100100001000100100000000

Genes with similar phylogenetic profiles have related functions or functionally linked – D Eisenberg and colleagues (1999)

Page 26: Comparative genomics

Genome evolution

• Genome rearrangement

• Whole genome duplication

Page 27: Comparative genomics

Turnip vs Cabbage: Look and Taste Different

• Although cabbages and turnips share a recent common ancestor, they look and taste different

Page 28: Comparative genomics

Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information

Page 29: Comparative genomics

Turnip vs Cabbage: Different mtDNA Gene Order

• Gene order comparison:

Before

After

Evolution is manifested as the divergence in gene order

Page 30: Comparative genomics

Comparative Genomic Architecture of Human and Mouse Genomes

To locate where corresponding gene is in humans, the relative architecture of human and mouse genomes were analyzed.

Page 31: Comparative genomics

Types of Rearrangements

Reversal1 2 3 4 5 6 1 2 -5 -4 -3 6

Translocation1 2 3 44 5 6

1 2 6 4 5 3

1 2 3 4 5 6

1 2 3 4 5 6

Fusion

Fission

Page 32: Comparative genomics

Comparative Genomic Architectures: Mouse vs Human Genome

• Humans and mice have similar genomes, but their genes are ordered differently

• ~245 rearrangements– Reversals– Fusions– Fissions– Translocation

Page 33: Comparative genomics

Hypothesis (1997): Whole Genome Duplication

cerevisiae

paradoxus

mikatae

bayanus

glabrata

castellii

lactis

gossypii

waltii

hansenii

albicans

lipolytica

crassa

graminearum

grisea

nidulans

pombe

?

~100M

Page 34: Comparative genomics

Hypothetical resolution of WGD

• A 1:2 mapping where– nearly every region in species Y would correspond to

two sister regions in S. cerevisiae – the two sister regions in S. cerevisiae would contain

ordered interleaving subsequences of the genes in the corresponding region of species Y

– nearly every region of S. cerevisiae would correspond to one region of species Y, and thus be paired to a sister region in S. cerevisiae

Page 35: Comparative genomics
Page 36: Comparative genomics

Hypothesis (1997): Whole Genome Duplication

cerevisiae

paradoxus

mikatae

bayanus

glabrata

castellii

lactis

gossypii

waltii

hansenii

albicans

lipolytica

crassa

graminearum

grisea

nidulans

pombe

?

~100M

Page 37: Comparative genomics

Aligning the S. cerevisiae and K. waltii genomes

• Most regions in K. waltii mapped to two regions in S. cerevisiae with each containing matches to only a subset of the K. waltii genes

Page 38: Comparative genomics

Duplication covers the whole S. cerevisiae genome

Page 39: Comparative genomics

What happens to genes post WGD?

• 12% (457) of paralogous gene pairs were retained

• 76 of the 457 gene pairs (17%) show accelerated protein evolution


Recommended