+ All Categories
Home > Documents > Institute of Bioinformatics, National Yang-Ming University

Institute of Bioinformatics, National Yang-Ming University

Date post: 04-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
34
1 of 33 Genome annotation with Genome annotation with Ensembl Ensembl Institute of Bioinformatics, National Yang-Ming University Xos Xos é é M M ª ª Fern Fern á á ndez ndez European Bioinformatics Institute December 2004
Transcript
Page 1: Institute of Bioinformatics, National Yang-Ming University

1 of 33

Genome annotation with Genome annotation with EnsemblEnsembl

Institute of Bioinformatics,National Yang-Ming University

XosXoséé MMªª FernFernáándezndezEuropean Bioinformatics Institute

December 2004

Page 2: Institute of Bioinformatics, National Yang-Ming University

2 of 33

Outline of talkOutline of talk

• High level overview of Ensembl– Making genomes useful

• Outline workshop– New web code,– DAS, display your own data,– Modify EnsMart,– BLAST/SSAHA,– Comparing genomes– Customising Ensembl.

• Outlook– Manual annotation– Other features

Page 3: Institute of Bioinformatics, National Yang-Ming University

3 of 33

We make genomes usefulWe make genomes useful

Page 4: Institute of Bioinformatics, National Yang-Ming University

4 of 33

Making genomes usefulMaking genomes useful

• Interpretation– Where are the interesting parts of the genome?– What do they do?– How are they related to elements in other

genomes?

• Access– for bench biologists– for non-programming mid-scale groups– for good programming groups

Page 5: Institute of Bioinformatics, National Yang-Ming University

5 of 33

AccessAccess…… bench biologistsbench biologists

• Mainly via the web• Web site designed for non

programming, not that genome aware biologist– Simple things to find are simple to find– Graphically displays and overviews– Consistency of layout, colour and text

Page 6: Institute of Bioinformatics, National Yang-Ming University

6 of 33

Ensembl website: Role

– Visual display of Ensembl data• A graphical, intuitive display for biologists

– “Public face” of Ensembl• Contact point for the project

– Local site installation• Free, open-source, supported

– A framework on which to hang user data• DAS and data upload• Local data integration via data adaptors

– Web-based tools• Display tools, primer selection, Anopheles

gene name and transposon submission, etc

Page 7: Institute of Bioinformatics, National Yang-Ming University

7 of 33

Architecture• Encapsulates

– Input– Output– Ensembl API– Rendering

• Improves– Maintainability– Flexibility– Code re-use

MySQL RDBMS

liteestsnp

core

View script

Client browsers

Data

Output

Renderer

Input

Ensembl APIBioperlA

pach

e / m

od_p

erl–

web

ser

vers

Page 8: Institute of Bioinformatics, National Yang-Ming University

8 of 33

Access… mid scale groups

• Wanting to work with 50 to 1,000 genes, regions, expression data

• Little in house programming– Some web views designed for this

group– EnsMart focused on this group

• Mix and match queries• “Instant” refresh of selected set• Output to Excel, FASTA, HTML table

Page 9: Institute of Bioinformatics, National Yang-Ming University

9 of 33

Mart databaseMart database

• De-normalised• Tables with ‘redundant’ information• Query-optimised• Fast and flexible

• Ideal for data mining

Page 10: Institute of Bioinformatics, National Yang-Ming University

10 of 33

There are other waysThere are other ways……MartShellCommandline interface to Mart written in Java.

It works with a Mart Query Language

Page 11: Institute of Bioinformatics, National Yang-Ming University

11 of 33

MartExplorerMartExplorer

Page 12: Institute of Bioinformatics, National Yang-Ming University

12 of 33

BLAST/SSAHABLAST/SSAHA

Page 13: Institute of Bioinformatics, National Yang-Ming University

13 of 33

BLAST/SSAHABLAST/SSAHA• Different web interfaces exist for sequence

comparison over genome scales• Ensembl’s BlastView is a generic/modular

interface that integrates several databases and methods

• BlastView has been extended to integrate tightly with the Ensembl web site

• Server-side state maintenance mechanisms provide a high-performance/flexible framework for the UI

Page 14: Institute of Bioinformatics, National Yang-Ming University

14 of 33

Access… large scale groups

• Full use of the genome, by experienced bioinformaticians

• Complete openness of the group– Open data– Open software– Open MySQL server on the internet– Expect everything to be portable– Participate in standards and adopt

other standards (DAS, UCSC upload)

Page 15: Institute of Bioinformatics, National Yang-Ming University

15 of 33

Ensembl Ensembl –– Open sourceOpen source

Freely-availableCommunity development.

–51 Ensembl installs worldwide.–Both public and commercial,e.g. Gramene (CSHL)

Fugu-sg (ICMB)Ciona-sg (Temasek)

Page 16: Institute of Bioinformatics, National Yang-Ming University

16 of 33

Uploading data to EnsemblUploading data to Ensembl

Page 17: Institute of Bioinformatics, National Yang-Ming University

17 of 33

Display of uploaded data

Page 18: Institute of Bioinformatics, National Yang-Ming University

18 of 33

Comparing genomes

Page 19: Institute of Bioinformatics, National Yang-Ming University

19 of 33

Many Genomes

VertebrateCompara

Human

Mouse Takifugu

C briggsaeC elegans

InterPro

Drosophila

WormCompara

Diptera Compara

Anopheles

Rat

Zebrafish

Honey bee

TetraodonChimp

Chicken

Page 20: Institute of Bioinformatics, National Yang-Ming University

20 of 33

Many more genomes

• Ciona (C. savigny and C. intestinalis)• Rhesus• Sea Urchin, Platynereis…• Aedes, Ixodes… (vectors)

Page 21: Institute of Bioinformatics, National Yang-Ming University

21 of 33

• High level overview of Ensembl– Making genomes useful

• Outline workshop– New web code,– DAS, display your own data,– Modify EnsMart,– BLAST/SSAHA,– Comparing genomes– Customising Ensembl.

• Outlook– Manual annotation– Other features

Page 22: Institute of Bioinformatics, National Yang-Ming University

22 of 33

Future plans• New data

– More species– Variation data– Comparative data

• More integrated views– GeneSNPView– Comparative ContigView

• More focused tool displays– primer & haplotype selection

• Greater integration of user data– Gene & Protein DAS

Page 23: Institute of Bioinformatics, National Yang-Ming University

23 of 33

Challenges

• What is the right way to calculate evolutionary relationships between these genomes?– How different is the gene build for each

new genome?• Is there novel information to be deduced

from the set of related genomes?• How do we integrate “close” genomes and

genome variation?

Page 24: Institute of Bioinformatics, National Yang-Ming University

24 of 33

Manual Curation

• People are the best at– Resolving conflicting

hetreogeneous information– Recognising “out of the ordinary”

biology• For high investment genomes an

automated pipeline with human intervention is the endgame– Human and Mouse

Page 25: Institute of Bioinformatics, National Yang-Ming University

25 of 33

Vega

• Vega is the collection of manually annotated human and other vertebrate genome data– Reuses Ensembl database and

Website technology– Reuses Ensembl pipelines for

Sanger annotation

Page 26: Institute of Bioinformatics, National Yang-Ming University

26 of 33

Two types of variation dataNatural• Limitless• Dense markers

required• Need for optimal

experimental design (HapMap)

• Human and Anopheles

Managed• Limited strain

number• Light density

adequate for some uses

• (dense for complete dataset)

• Mouse, Rat

Page 27: Institute of Bioinformatics, National Yang-Ming University

27 of 33

Variation data (now)

• dbSNP centric– Key data SNP position and allele– Calculate derived properties

(coding SNP, amino acid change)• Provide views on contigview and

transview• Provide selection via EnsMart

Page 28: Institute of Bioinformatics, National Yang-Ming University

28 of 33

Variation data (expected)

• Recombination variability and population history of a species provides for optimal experimental design– “HapMap”

• Have to add individual, cohort, population and genotype concepts

Page 29: Institute of Bioinformatics, National Yang-Ming University

29 of 33

Variation data (future)

• Allow for inexpensive hyper-dense genotype determination of large cohorts

• Integrate population substructure, close species and individual variation – Understanding positive and

negative selection

Page 30: Institute of Bioinformatics, National Yang-Ming University

30 of 33

Other genomic features

Page 31: Institute of Bioinformatics, National Yang-Ming University

31 of 33

There are more than genes!

• RNA genes– “well known” structural RNA genes– Newer miRNA genes– Pseudogenes/duplications a

massive headache• Cis-regulatory motifs

– Transcriptional motifs– RNA processing motifs

• Yet unknown other stuff

Page 32: Institute of Bioinformatics, National Yang-Ming University

32 of 33

Comparative genomics…

• Action of negative selection should let us see these features– Honest research problem - how

does one expect promoters to evolve?

– Overlapping signals, eg, splicing enhancers in exons

Page 33: Institute of Bioinformatics, National Yang-Ming University

33 of 33

Thanks

Ensembl Team

Page 34: Institute of Bioinformatics, National Yang-Ming University

Database Schema and Core APIArne StabenauYuan ChenIan LongdenCraig MelsoppGlenn ProctorDaniel RíosGuy Slater

Distributed Annotation SystemAndreas Kähäri

Project LeaderEwan Birney (EBI)Tim Hubbard (Sanger)

Ensembl Web TeamJames StalkerFiona CunninghamJames Smith

Vega Web TeamPatrick MeidlSteve Trevianon

Analysis and Annotation PipelineVal CurwenSteve SearleDan AndrewsMario CaccamoLaura ClarkeMartin HammondJan Hinnerck-Vogel Kevin HoweVivek IyerKerstin JekoschFelix KokocinskiSimon White

User SupportXosé Mª FernándezMichael Schuster

Comparative GenomicsAbel Ureta-VidalJavier Herrero SánchezJessica SeverinCara Woodwark

EnsMart & BioMartArek KasprzykDamian KeefeDarin LondonDamian Smedley

Ensembl Team

December 2004


Recommended