+ All Categories
Home > Documents > Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome...

Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome...

Date post: 28-Mar-2015
Category:
Upload: seth-preston
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
38
Ensembl Compara Perl API Stephen Fitzgerald http://www.ebi.ac.uk/~stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara
Transcript
Page 1: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Ensembl Compara Perl API

Stephen Fitzgeraldhttp://www.ebi.ac.uk/~stephenf/edinburgh-workshop/

EBI - Wellcome Trust Genome Campus, UK

compara

Page 2: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

What is Ensembl Compara?

A single database which contains precalculated comparative genomics data

Access via perl API and mysql

A production system for generating that database(not in this presentation)

Page 3: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Compara dataRaw genomic sequenceWhole genome alignments (tBLAT, BlastZ-net, PECAN)

46 species in Ensembl release-52

Syntenic regions (based on BlastZ-net)

Raw Protein Alignments Protein Family clusters

Protein treesGene orthology / paraology predictions

Page 4: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Compara database & the Ensembl core databases

Since there is minimal primary data inside Compara, to gain full access to the data external links with core DBs must be re-established

Example: compara_52 must be linked with theEnsembl core_52 databases

Proper REGISTRY configuration is criticalOr load_registry_from_db is probably the best choice here

Page 5: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Written in Object-Oriented Perl

Used to retrieve data from and store data into ensembl-compara database

Generalized to extend to non-ensembl genomic data (Uniprot)

Follows same ‘Data Object’ & ‘Object Adaptor’ DBAdaptor design as the other Ensembl APIs

The Compara Perl API

Page 6: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Compara object model overview

NCBITaxon

GenomeDB

DnaFrag Member

MethodLinkSpeciesSet

GenomicAlign

GenomicAlignBlock SyntenyRegion

DnaFragRegion

Homology Family

PR

IMA

RY

DA

TA

AN

AL

YS

ISR

ES

UL

TS

Attribute

ProteinTree

AlignedMember

Page 7: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Primary data

GenomeDB: relates to a particular Ensembl core DB name(), assembly(), genebuild(), taxon() fetch_by_name_assembly(), fetch_by_registry_name(),

fetch_by_Slice(), fetch_all()

DnaFrag: represents a “top level” SeqRegion name(), length(), genome_db(), slice(), coord_system_name() fetch_by_Slice(), fetch_by_GenomeDB_and_name()

Member: list all Ensembl genes + SwissProt + SPTrEMBL source_name(), stable_id(), genome_db(), taxon(), sequence(),

get_all_peptide_Members(), get_longest_peptide_Member(), gene_member()

fetch_by_source_stable_id()

Page 8: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Analysis MethodLinkSpeciesSet provides a handle to isolate

specific data from the shared tables (homology, genomic_align_block)

MethodLink: Each individual analysis in compara is tagged with a unique name called a method_link_type

BLASTZ_NET, TRANSLATED_BLAT, PECAN, SYNTENY, FAMILY, ENSEMBL_ORTHOLOGUES, ENSEMBL_PARALOGUES, PROTEIN_TREES

SpeciesSet: the sets of species as (a ref. to) an array of GenomeDBs

fetch_by_method_link_type_GenomeDBs(), fetch_by_method_link_type_registry_aliases()

name(), method_link_type(), species_set(), source()

Page 9: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Exerciseshttp://www.ebi.ac.uk/~stephenf/edinburgh-workshop/ComparaAPI.html

GenomeDB1. Find out the versions of human and mouse genomes in the database2. Print the name of all the GenomeDBs in the database

DnaFrag1. Get the DnaFrag for the chromosome 1 of the macaque genome(using a genome_db object as an argument)2. Get the DnaFrag for the chromosome X of the mouse genome(using a core slice object as an argument)

MethodLinkSpeciesSet1. Find out how many analyses are stored in the database2. Get the name of the MethodLinkSpeciesSet corresponding to the BlastZ-net analysis for human and mouse3. Get the names of the all the species using the mlss corresponding to the Pecan analyses

Page 10: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

GenomeDB example code

use strict;use Bio::EnsEMBL::Registry;my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_db( -host=>"ensembldb.ensembl.org", -user => "anonymous");

my $genome_db_adaptor = $reg->get_adaptor( "Multi", "compara", "GenomeDB");

my $genome_db = $genome_db_adaptor-> fetch_by_registry_name("human");

print “Name :”,$genome_db->name, "\n";print “Assembly :”,$genome_db->assembly, "\n";print “GeneBuild :”,$genome_db->genebuild, "\n";

Page 11: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

GenomeDB example code

$> perl genome_db1.pl

Homo sapiens NCBI36 2006-08-EnsemblMus musculus NCBIM36 2006-04-Ensembl

Page 12: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

DnaFrag example codeuse strict;use Bio::EnsEMBL::Registry;my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_db( -host=>"ensembldb.ensembl.org", -user => "anonymous");

my $genome_db_adaptor = $reg->get_adaptor( "Multi", "compara", "GenomeDB");

my $genome_db = $genome_db_adaptor-> fetch_by_registry_name("human");

my $dnafrag_adaptor = $reg->get_adaptor( "Multi", "compara", "DnaFrag");

my $dnafrag = $dnafrag_adaptor-> fetch_by_GenomeDB_and_name($genome_db, "13");

print "Name :", $dnafrag->name, "\n";print "Length :", $dnafrag->length, "\n";print "CoordSystem :", $dnafrag->coord_system_name, "\n";

Page 13: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

DnaFrag example code

$> perl test1.plName :13Length :114142980CoordSystem :chromosome

Page 14: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

MethodLinkSpeciesSetexample code

use strict;use Bio::EnsEMBL::Registry;my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_db( -host=>"ensembldb.ensembl.org", -user => "anonymous");

my $mlssa = $reg->get_adaptor("Multi", "compara", "MethodLinkSpeciesSet");

my $mlss = $mlssa-> fetch_by_method_link_type_registry_aliases( "BLASTZ_NET", ["human", "mouse"]);

print $mlss->name, "\n";

print "type: ", $mlss->method_link_type, "\n";

my $species_set = $mlss->species_set();

foreach my $this_genome_db (@$species_set) { print $this_genome_db->name(), "\n";}

Page 15: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

MethodLinkSpeciesSetexample code

$ > perl method_link_species_set.pl H.sap-M.mus blastz-net (on H.sap)

Page 16: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Genomic Alignments

BlastZ-Net used to compare closely related pair of species BlastZ-raw -> BlastZ-chain -> BlastZ-net

Translated BLAT used to compare more distant pair of species

Pecan multiple global alignments all vs all coding exons wublastp -> Mercator ->

Pecan on each syntenic block

Page 17: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

GenomicAlignBlock GenomicAlignBlock

represents a genomic alignment contains 1 GenomicAlign per sequence fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,$slice) Methods:

method_link_species_set(), score(), length(), perc_id(), get_all_GenomicAligns(), get_SimpleAlign()

GenomicAlign dnafrag(), genome_db(), get_Slice(), dnafrag_start,

dnafrag_end(), dnafrag_strand(), aligned_sequence()

Page 18: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

GenomicAlignBlock$all_GAlign = $GABlock->get_all_GenomicAligns() $arrayref$Simplealign = $GABlock->get_SimpleAlign() $object

$Simplealign: a bioperl object which contains the whole alignment - can be printed in various format using bioperl modules

$Galign: an object which represents one of the sequences in the alignment only

Hsap.X.1223-1230: ACCTTC-A <- $gaCfam.X.1390-1395: ACC--CGA <- $ga

Page 19: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Synteny Based on BlastZ-net alignments

SyntenyRegionAdaptor fetch_all_by_MethodLinkSpeciesSet_Slice(),

fetch_all_by_MethodLinkSpeciesSet_DnaFrag() Methods:

get_all_DnaFragRegions(), method_link_species_set(),

DnaFragRegion slice(), dnafrag(), dnafrag_start(), dnafrag_end(),

dnafrag_strand()

Page 20: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Exerciseshttp://www.ebi.ac.uk/~stephenf/edinburgh-workshop/ComparaAPI.html

GenomicAlignBlock1. Fetch all the BLASTZ_NET alignments between the first 130K nucleotides of the human chromosome X and the mouse genome.2. Print the exact location of the alignment blocks.3. Compare the original and the aligned sequences.4. Find the BLASTZ_NET alignments between human gene BRCA2 and the mouse genome.5. Print the BLASTZ_NET alignments between the rat gene ECSIT and the mouse genome.6. Print the PECAN multiple alignments between the rat gene ECSIT and 11 other amniote vertebrates.7. Print the constrained-element alignments within the rat ECSIT locus (use the constrained elements generated from the 12-way alignments).

Synteny1. Get the human-mouse syntenic map for human chromosome X.

Page 21: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

GenomicAlignBlock example code[...]my $slice_adaptor = $reg->get_adaptor( "human", "core", "Slice");my $slice = $slice_adaptor-> fetch_by_region("chromosome", "12", 1e4, 2e4);

my $gaba = $reg->get_adaptor("Multi", "compara", "GenomicAlignBlock");

my $genomic_align_blocks = $gaba-> fetch_all_by_MethodLinkSpeciesSet_Slice( $method_link_species_set, $slice);

foreach my $this_gab (@$genomic_align_blocks) {

my $all_gas = $this_gab->get_all_GenomicAligns(); foreach my $this_ga (@$all_gas) { print $this_ga->genome_db->name(), ":", $this_ga->get_Slice()->name(), "\n"; print $this_ga->aligned_sequence(), "\n"; } print "\n";}

Page 22: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

GenomicAlignBlock example code

$>perl gab.plMus musculus:chromosome:NCBIM37:6:121449987:121450302:-1CCTCTTAATAAACATTATTGTCAA[…]Homo sapiens:chromosome:NCBI36:12:19128:19507:1CCTCTTAATAAGCACACATATCCT[..]

Page 23: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Synteny example code[...]my $synteny_region_adaptor = $reg->get_adaptor( "Multi", "compara", "SyntenyRegion");

my $synteny_regions = $synteny_region_adaptor-> fetch_all_by_MethodLinkSpeciesSet_Slice( $human_mouse_synteny_method_link_species_set, $human_slice);

foreach my $this_synteny_region (@$synteny_regions) {

my $these_dnafrag_regions = $this_synteny_region->get_all_DnaFragRegions();

foreach my $this_dnafrag_region (@$these_dnafrag_regions) {

print $this_dnafrag_region->dnafrag-> genome_db->name, ": ", $this_dnafrag_region->slice->name, "\n"; } print "\n";}

Page 24: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Homology

(e! 38): Orthologue predictions based on ‘best reciprocal

blast hits’ Paralogues for a selected set of species No global view of the evolution history of the

gene considered

e! 39+: Orthologues and paralogues are inferred from

protein trees Phylogeny: Orthology/Paralogy in one go

Page 25: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

BSR: Blast Score Ratio. When 2 proteins P1 and P2 are compared, BSR=scoreP1P2/max(self-scoreP1 or self-scoreP2). The default threshold used in the initial clustering step is 0.33.

Page 26: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Homology types

Page 27: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Homology Homology object

contains 1 pair of Member/Attribute per gene/protein

fetch_all_by_Member(), fetch_all_by_MethodLinkSpeciesSet(), fetch_all_by_Member_MethodLinkSpeciesSet()

Methods:

method_link_species_set(), description(), subtype(), perc_id(), get_all_Member_Attribute(), get_SimpleAlign()

Page 28: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Family

Compara compute gene family clusters

Runs on all Ensembl transcripts plus all Uniprot/SWISSPROT and Uniprot/SPTREMBL metazoan proteins

The algorithm is based on :

All vs all blastpMCL clusteringMuscle multiple aligner

Results stored in family, family_member tables

Page 29: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Family Family object

contains 1 pair of Member/Attribute per gene/protein

fetch_all by_Member()

Methods:

method_link_species_set(), description(), description_score(), get_all_Member_Attribute(), get_SimpleAlign()

Page 30: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Exerciseshttp://www.ebi.ac.uk/~stephenf/edinburgh-workshop/ComparaAPI.html

Members1. Find the Member corresponding to SwissProt protein O932792. Find the Member for the human gene BRCA23. Find all the peptide Members corresponding to the human gene CTDP1

Homology1. Get all the predicted homologues for the human gene BRCA22. Get all the mouse orthologues predicted for the human gene CTDP1

Family1. Get family predicted for the human gene BRCA22. Get the alignments corresponding to the family of the human gene HBEGF

Page 31: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Member example codeuse strict;use Bio::EnsEMBL::Registry;my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_db( -host=>"ensembldb.ensembl.org", -user => "anonymous");

my $member_adaptor = $reg->get_adaptor( "Multi", "compara", "Member");

my $member = $member_adaptor-> fetch_by_source_stable_id( "ENSEMBLGENE", "ENSG00000000971");

print "All proteins:\n";my $all_peptide_members = $member-> get_all_peptide_Members();

foreach my $this_peptide (@$all_peptide_members) { print $this_peptide->stable_id(), "\n";}

Page 32: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Member example code

$> perl test2.pl All proteins:ENSP00000356399ENSP00000356398ENSP00000352658

Page 33: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Homology example code[...]my $ma = $reg->get_adaptor( "Multi", "compara", "Member");my $member = $ma->fetch_by_source_stable_id( "ENSEMBLGENE", "ENSG00000000971");

my $homology_adaptor = $reg->get_adaptor( "Multi", "compara", "Homology");

my $homologies = $homology_adaptor-> fetch_all_by_Member($member);

foreach my $this_homology (@$homologies) { print $this_homology->description, "\n"; my $member_attributes = $this_homology-> get_all_Member_Attribute(); foreach my $this_mem_attr (@$member_attributes) { my ($this_member, $this_attribute) = @$this_mem_attr; print $this_member->genome_db->name, " ", $this_member->source_name, " ", $this_member->stable_id, "\n"; } print "\n";}

Page 34: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Family example code[...]my $ma = $reg->get_adaptor( "Multi", "compara", "Member");my $member = $ma->fetch_by_source_stable_id( "ENSEMBLGENE", "ENSG00000000971");

my $family_adaptor = $reg->get_adaptor( "Multi", "compara", "Family");my $families = $family_adaptor-> fetch_all_by_Member($member);

foreach my $this_family (@$families) { print $this_family->description, "\n"; my $member_attributes = $this_family-> get_all_Member_Attribute(); foreach my $this_mem_attr (@$member_attributes) { my ($this_member, $this_attribute) = @$this_mem_attr; print $this_member->taxon->binomial, " ", $this_member->source_name, " ", $this_member->stable_id, "\n"; } print "\n";}

Page 35: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Getting More Information

perldoc – Viewer for inline API documentation. shell> perldoc Bio::EnsEMBL::Compara::GenomeDB shell> perldoc Bio::EnsEMBL::Compara::DBSQL::MemberAdaptor

online at: http://www.ensembl.org/ Tutorial document:

cvs: ensembl-compara/docs/ComparaTutorial.pdf ensembl-dev mailing list:

[email protected] Exercise solutions:

http://www.ebi.ac.uk/~stephenf/edinburgh-workshop/solutions.html

Page 36: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Ensembl-dev mailing list and HelpDesk

ensembl-dev mailing list is great for questions around the API and the DB

HelpDesk is very helpful

Give detailed info on what you are trying to do

Check that you have the modules installed ($PERL5LIB pointing to them)

Page 37: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

Guy Coates, Tim Cutts, Shelley GoddardSystems & Support

Paul Flicek, Yuan Chen, Stefan Gräf, Nathan Johnson, Daniel RiosFunctional Genomics

Ewan Birney (EBI), Tim Hubbard (Sanger Institute)Leaders

Damian Keefe, Guy Slater, Michael Hoffman, Alison Meynert, Benedict Paten, Daniel ZerbinoResearch

Martin Hammond, Dan Lawson, Karyn MegyVectorBase Annotation

Kerstin Jekosch, Mario Caccamo, Ian SealyZebrafish Annotation

Val Curwen, Steve Searle, Browen Aken, Julio Banet, Laura Clarke, Sarah Dyer, Jan-Hinnerck Vogel, Kevin Howe, Felix Kokocinski, Stephen Rice, Simon White

Analysis and Annotation Pipeline

Javier Herrero, Kathryn Beal, Benoît Ballester, Stephen Fitzgerald, Albert Vilella, Leo GordonComparative Genomics

James Smith, Fiona Cunningham, Anne Parker, Steve Trevanion (VEGA)Web Team

Xosé M Fernández, Bert Overduin, Giulietta Spudich, Michael SchusterOutreach

Eugene KuleshaDistributed Annotation System (DAS)

Arek Kasprzyk, Damian Smedley, Richard Holland, Syed HaldarBioMart

Glenn Proctor, Ian Longden, Patrick Meidl, Andreas KähäriDatabase Schema and Core API

Ensembl TeamEnsembl Team

Page 38: Ensembl Compara Perl API Stephen Fitzgerald stephenf/edinburgh-workshop/ EBI - Wellcome Trust Genome Campus, UK compara.

A special case of ortholog


Recommended