+ All Categories
Home > Documents > P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD...

P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD...

Date post: 14-Dec-2015
Category:
Upload: griffin-hancock
View: 216 times
Download: 0 times
Share this document with a friend
15
PHYLOPAT: AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics Course (CMPUT 606), Instructed by Prof. Guohui Lin, Computing Science Department, University of Alberta, Winter 2009 Tim Hulsen et.al., Nucleic Acids Research, 2009, Vol. 37, Database issue
Transcript
Page 1: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

PHYLOPAT: AN UPDATED VERSION OF THE PHYLOGENETICPATTERN DATABASE CONTAINS GENE NEIGHBORHOOD

Presenter: Reihaneh RabbanyPresented in Bioinformatics Course (CMPUT 606),

Instructed by Prof. Guohui Lin,

Computing Science Department,

University of Alberta,

Winter 2009

Tim Hulsen et.al., Nucleic Acids Research, 2009, Vol. 37, Database issue

Page 2: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

INTRODUCTION

Phylogenetic patterns Show the presence or absence of certain

genes in a set of whole genome sequencesCan be used to determine sets of genes

that occur only in certain evolutionary branches

More Common as increasing amounts of orthology data have become available

Phylogenetic Patterns Search tools are available for querying proteins, but not for querying genes

2

Page 3: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

PHYLOPAT

PhyloPat is a database which offers the possibility of querying the Ensembl database using any phylogenetic pattern

Functionalities : Gene neighborhood view Anticorrelating patterns Support of Entrez ‘ Gene IDs Direct sequence retrieval of members of a

phylogenetic lineage

3

Page 4: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

ENSEMBL Human genome

3 billion base-pairs 35,000 genes

The genome alone is of little use Locations and relationships of individual genes

Manual annotation Ensembl

Ensembl (freely accessible) Sequence data is fed into a software "pipeline“ Creates a set of predicted gene locations Saves them in a MySQL database

Originally focus on Human Now includes mouse, fruitfly, zebrafish, plants, fungi,

4

Page 5: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

PHYLOPAT - DATABASE CONTENT A set of phylogenetic lineages Complete set of orthologies Collected

All 39 species’ genes in Ensembl 741 species pairs 815 452 genes 19 010 478 orthologous relationships

11 446 546 one-to-one 4 588 300 one-to-many 2 975 632 many-to-many

Ensembl ortholog detection pipeline Similarity values by

Best reciprocal hits and best score ratio (WU BLASTP) Graph of gene relations and Clustering Multiple alignment (MUSCLE ) Phylogenetic tree (TreeBeST ) Orthologous relationships 5

Page 6: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

PHYLOPAT - DATABASE CONSTRUCTION

Generating phylogenetic lineages Determining evolutionary order

Using the NCBI Taxonomy Phylogenetic tree Phylogenetic lineages

For each gene in the first species Look for orthologs in the other species Add all orthologs to the phylogenetic lineage Check for orthologs themselves, until no additional

orthologies were found for any of the genes

Repeat for all genes in all 39 species that were not yet connected to any phylogenetic lineage 6

Page 7: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

WEB APPLICATION

A web interface Query the PhyloPat MySQL database

Phylogenetic lineages Phylogenetic patterns

7

Page 8: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

OMNIPRESENT - OLIGOPRESENT - POLYPRESENT GENES Omnipresent

Genes present in all 39 species phylogenetic pattern

‘11111111111111111111111111111111111111’ (or MySQL regular expression ‘^1+$’)

688 omnipresent genes Which most likely have important functions, since they

are present in all species. Oligopresent

Genes that exist in only one or two species Which species are evolutionary most related

Polypresent Genes that are missing in only one or two

species Measure for evolutionary relatedness

8

Page 9: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

ANTICORRELATING PATTERNS

Patterns that are exactly opposite Phylogenetic lineages with anticorrelating

patterns can be functionally completely different, but could also be highly similar in function ‘00000000000000001011100111100111111001

0’ ‘11111111111111110100011000011000000110

1’These genes can be analogous i.e.

performing a similar function without being evolutionary related. 9

Page 10: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

GENE NEIGHBORHOOD

Inferring ‘true’ orthology Orthologous conservation of gene neighborhood Human gene ENSG00000134398

Has two predicted orthologs in chimpanzee: gene ENSPTRG00000007893 gene ENSPTRG00000009535

Only correspond to the gene neighborhoods of gene ENSPTRG00000007893, for nine of the nearest neighbors

Inferring functional annotation Build hypotheses about the processes or

pathways that genes might be involved in10

Page 11: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

FASTA-FORMAT SEQUENCE FILES

Both the pattern search output and the gene neighborhood view contain links to FASTA files of the peptide sequences

11

Page 12: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

DISCUSSION AND CONCLUSION

PhyloPat is useful in Orthology detection Evolutionary studies Gene annotation

Complex Queries It is possible to determine

A species set that should be included (1), A species set that should be excluded (0) A species set which presence is indifferent (*)

Using of regular expression queries Easy-to-use web interface Relies only on one database (Ensembl)

12

Page 13: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

DISCUSSION AND CONCLUSION (CONT.)

Gene neighborhood view Locating evolutionary-related genomic clusters of

genes Detecting the ‘true orthologs’ within large sets of

predicted orthologs Functional annotating less well known genes

PhyloPat will be updated with each major Ensembl release to ensure up-to-date and reliable phylogenetic lineages (species added)

13

Page 14: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

LINEAGE INFORMATION OF PP000255

14

Page 15: P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.

QUESTIONS

15


Recommended