Post on 05-Feb-2016
description
transcript
Comparative GenomicsComparative Genomics
Bioinformatic Tools for Comparative Genomics of Vectors
OverviewOverview
Comparing Genomes Homologies and Families Sequence Alignments
Bioinformatic Tools for Comparative Genomics of Vectors
Comparative GenomicsComparative Genomics
Allows us to achieve a greater understanding of vertebrate evolution
Tells us what is common and what is unique between different species at the genome level
The function of human genes and other regions may be revealed by studying their counterparts in lower organisms
Helps identify both coding and non-coding genes and regulatory elements
Bioinformatic Tools for Comparative Genomics of Vectors
Sequence Conservation Over Sequence Conservation Over TimeTime
Bioinformatic Tools for Comparative Genomics of Vectors
Large stretches of non-coding regions in vertebrates
Regulatory regions of:Developmental genesTranscription factorsmiRNA
Non Coding RegionsNon Coding Regions
Kikuta et al., Genome Research, May 2007
Bioinformatic Tools for Comparative Genomics of Vectors
Methods of Alignment- EnsemblMethods of Alignment- Ensembl
BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human – mouse
Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human – zebrafish
PECAN global alignment used for multispecies alignments
Bioinformatic Tools for Comparative Genomics of Vectors
We can better understand evolution/ speciation
We can find important, functional regions of the sequence (codons, promoters, regulatory regions)
It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments).
Quality control!
Why Compare Genomes?Why Compare Genomes?
Bioinformatic Tools for Comparative Genomics of Vectors
Evolution at the DNA LevelEvolution at the DNA Level
…ACTGACATGTACCA…
…AC----CATGCACCA…
Mutation
Sequence edits
Rearrangements
Deletion
InversionTranslocationDuplication
Bioinformatic Tools for Comparative Genomics of Vectors
• Mammals have roughly 3 billion base pairs in their genomes
• Over 98% human genes are shared with primates, with more than 95-98% similarity between genes.
• Even the fruit fly shares 60% of its genes with humans! (March 2000)
• Compare human & Mouse
• 40% of human genome align with mouse• 24% of human genome missing in mouse (also mouse-specific sequences)
Comparing GenomesComparing Genomes
Bioinformatic Tools for Comparative Genomics of Vectors
Improving Gene QualityImproving Gene Quality
Comparative genomics predicts one long
transcript.
Bioinformatic Tools for Comparative Genomics of Vectors
Pseudogene recoveryPseudogene recovery
chr 3 chr X
humanmouseratdogcow
We find 67 confident cases where a human protein is closer to the ancestor than any extant species in the alignment
Bioinformatic Tools for Comparative Genomics of Vectors
• Uses all the species
• Prediction pipeline: Begins with BLAST and sequence clustering
• Compares gene relationships to species relationships
How Does Ensembl Predict How Does Ensembl Predict Homology?Homology?
BSR: Blast Score Ratio. When 2 proteins P1 and P2 are compared, BSR=scoreP1P2/max(self-scoreP1 or self-scoreP2). The default threshold used in the initial clustering step is 0.33.
Bioinformatic Tools for Comparative Genomics of Vectors
Orthologue / Paralogue Prediction Orthologue / Paralogue Prediction AlgorithmAlgorithm
(1) Load the longest translation of each gene from all species used in Ensembl.
(2) Run WUBLASTp+SW of every gene against every other (both self and non-self species) in a genome-wide manner.
(3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values.
(4) Extract the connected components (=single linkage clusters), each cluster representing a gene family.
(5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE.
(6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage.
(7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree (TreeBeSt).
(8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.
Bioinformatic Tools for Comparative Genomics of Vectors
Anopheles gambiae
Aedes aegypti
Drosophila melanogaster
Dasypus novemcinctus
Loxodonta africana
Echinops telfairi
Tupaia belangeri
Homo sapiens
Pan troglodytes
Macaca mulatta
Otolemur garnettii
Mus musculus
Rattus norvegicus
Spermophilus tridecemlineatusCavia porcellus
Oryctolagus cuniculus
Erinaceus europaeus
Myotis lucifugus
Canis familiaris
Felis catus
Bos taurusMonodelphis domestica
Ornithorhynchus anatinus
Gallus gallus
Xenopus tropicalis
Gasterosteus aculeatusOryzias latipes
Takifugu rubripes
Tetraodon nigroviridis
Danio rerio
Ciona intestinalis
Ciona savignyi
Caenorhabditis elegans
Saccharomyces cerevisiae
Species TreeSpecies Tree
Bioinformatic Tools for Comparative Genomics of Vectors
Phylogenetic Tree Reconciliation: the Species/Gene Tree ProblemDufayard et al. ERCIM News No. 43 October 2000
Species and Gene TreesSpecies and Gene Trees
Bioinformatic Tools for Comparative Genomics of Vectors
Genes/Species Tree reconciliation: TreeBeSTGenes/Species Tree reconciliation: TreeBeST
ReconciliationReconciliation
M
R
H
M
R
H
species tree
unrooted gene tree
Duplication nodeSpeciation node
M
R
HM
H
R
gene
loss
gene
loss
gene lossR’
H’
M’
Bioinformatic Tools for Comparative Genomics of Vectors
Viewing Trees in EnsemblViewing Trees in Ensembl
GeneView page
GeneTreeView
Bioinformatic Tools for Comparative Genomics of Vectors
Types of HomologuesTypes of Homologues
Orthologs : any gene pairwise relation where the ancestor node is a speciation event
Paralogs : any gene pairwise relation where the ancestor node is a duplication event
Bioinformatic Tools for Comparative Genomics of Vectors
Orthologue and Paralogue TypesOrthologue and Paralogue Types
ortholog_one2one ortholog_one2many ortholog_many2many apparent_ortholog_one2one
within_species_paralog between_species_paralog
Ortholog and Paralog typesOrtholog and Paralog types
Bioinformatic Tools for Comparative Genomics of Vectors
Ortholog and Paralog typesOrtholog and Paralog types
Bioinformatic Tools for Comparative Genomics of Vectors
What is ‘1 to 1’?
What is ‘1 to many’?
Orthologues on GeneViewOrthologues on GeneView
Bioinformatic Tools for Comparative Genomics of Vectors
Protein FamiliesProtein Families
How: Cluster proteins for every isoform (transcript) in every species.
Why: Predict a function for ‘novel’ genes/proteins
Understand gene relationships
Bioinformatic Tools for Comparative Genomics of Vectors
Protein DatasetProtein Dataset
More than 1,800,000 proteins clustered:
All Ensembl protein predictions from all species supported 895,070 protein predictions
All metazoan (animal) proteins in UniProt: 96,030 UniProtKB/Swiss-Prot 892,0208 UniProtKB/TrEMBL
Bioinformatic Tools for Comparative Genomics of Vectors
Clustering StrategyClustering Strategy
BLASTP all-versus-all comparison Markov clustering For each cluster:
Calculation of multiple sequence alignments with ClustalW
Assignment of a consensus description
Bioinformatic Tools for Comparative Genomics of Vectors
Link to FamilyView
Where are Families Where are Families shown? shown? ProtViewProtView
Bioinformatic Tools for Comparative Genomics of Vectors
Ensembl family members
within human
Ensembl family
members in other species
JalView multiple alignments
Where are Families shown? Where are Families shown? FamilyViewFamilyView
Bioinformatic Tools for Comparative Genomics of Vectors
Comparing Genomes Homologies and Families Sequence alignments
Bioinformatic Tools for Comparative Genomics of Vectors
• To identify homologous regions
• To spot trouble gene predictions
• Conserved regions could be functional
• To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)
Aligning Whole Genomes- Aligning Whole Genomes- Why?Why?
Bioinformatic Tools for Comparative Genomics of Vectors
Should find all highly similar regions between two sequences
Should allow for segments without similarity, rearrangements etc.
Issues Heavy process Scalability, as more and more genomes are sequenced
Time constraint
Aligning large genomic sequencesAligning large genomic sequences
Enredo Defines orthology map (co-linear regions) Supports segmental duplications
Pecan Consistency based multiple aligner Optimized to cope with long DNA sequences
Ortheus Ancestral sequences reconstructor Inferring the history of insertion and deletions
Whole Genome Multiple AlignmentsWhole Genome Multiple Alignments
Bioinformatic Tools for Comparative Genomics of Vectors
In ContigView...In ContigView...
Bioinformatic Tools for Comparative Genomics of Vectors
Currently 2 sets: 10 amniota vertebrates:
7 eutherian mammals:
Multiple Alignments using Multiple Alignments using PECANPECAN
To come… the fish!
Bioinformatic Tools for Comparative Genomics of Vectors
Use all coding exons Get sets of best reciprocal hits Create orthology maps
Use all coding exons Get sets of best reciprocal hits Create orthology maps Build multiple global alignments
Alignment StrategyAlignment Strategy
Use all coding exons Use all coding exons Get sets of best reciprocal hits
Bioinformatic Tools for Comparative Genomics of Vectors
View Alignments: ContigViewView Alignments: ContigView
In the Detailed View Panel:
Bioinformatic Tools for Comparative Genomics of Vectors
View Conservation: ContigViewView Conservation: ContigView
Click on a Pink Bar for Click on a Pink Bar for AlignSliceViewAlignSliceView… export alignments… export alignments
Bioinformatic Tools for Comparative Genomics of Vectors
AlignSliceViewAlignSliceView
Bioinformatic Tools for Comparative Genomics of Vectors
GeneSeqalignViewGeneSeqalignView
Bioinformatic Tools for Comparative Genomics of Vectors
GeneSeqalignViewGeneSeqalignView
Bioinformatic Tools for Comparative Genomics of Vectors
MultiContigViewMultiContigView
Comparison of chromosomes in Comparison of chromosomes in multiple species.multiple species.
(Links from (Links from SyntenyViewSyntenyView, , ContigView, CytoViewContigView, CytoView))
Bioinformatic Tools for Comparative Genomics of Vectors
Export Alignments in Export Alignments in BioMartBioMart
Choose ‘Compara pairwise alignments’
Bioinformatic Tools for Comparative Genomics of Vectors
Syntenic RegionsSyntenic Regions
Genome alignments are compiled into larger syntenic regions
Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent
Any clusters less than 100 kb are discarded
Bioinformatic Tools for Comparative Genomics of Vectors
EnredoEnredo
Anchors
500.000 anchorsfor mammals
---more than 1 anchor
per 10Kb
Supports segmentalduplications!!
Covers 90% of the humanprotein coding genes
(Hsap-Mmus-Rnor-Cfam-Btau)
Bioinformatic Tools for Comparative Genomics of Vectors
SyntenyViewSyntenyView Human chromosome
Mouse chromosomes
Mouse chromosomes
Orthologues
Bioinformatic Tools for Comparative Genomics of Vectors
Syntenic blocks
CytoViewCytoView
Bioinformatic Tools for Comparative Genomics of Vectors
SummarySummary
View Homology in pages such as GeneView, ProtView, SyntenyView, GeneTreeView, or BioMart
View Protein Family information in FamilyView
View Alignments in ContigView, GeneSeqAlign View, through BioMart