Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | griffin-trego |
View: | 215 times |
Download: | 1 times |
Swan River foreshore, Perth, Western Australia
University of Western Australia Biomedical, Biomolecular and Chemical Sciences
Ian Small Murray Badger David Day Harvey Millar
Steve Smith Barry Pogson Jim Whelan
AR
C C
en
tre P
lan
t En
erg
y B
iolo
gy
SUBASUBcellular location database
for Arabidopsis proteins
Sandra Tanz and Ian Castleden4th March 2011
Why protein localisation?
• Contributes towards the understanding of protein function and of biological inter-relationships, i.e. only proteins in the same location can interact.
• Separate subcellular locations often represent distinct cellular environments: proteins share similar attributes and play roles in defining the function of a subcellular compartment.
• To build hypotheses or models: large-scale phenotyping screens, microarray experiments and protein-protein interaction assays rely on protein localisation info.
How to localise proteins?
PredictionIn vitro uptake
(imports)
In vivo (GFP)Enzyme activity measurements
Western blot
Immunogold labeling
Subcellular proteomics (MS)
Protein-protein interaction
Images modified from Millar et al., 2009
SUBA: SUBcellular location database for Arabidopsis proteins
SUBA: SUBcellular location database for Arabidopsis proteins
What does SUBA document?
1193 5456942
MS (6398)GFP (2135)
SUBA II (2007) SUBA III (2011)
Combined sub-location data 250’719 1’022’040
Calls by PPI 0 6673
Calls by experiments (GFP, MS) 8273 19’528
Distinct proteins localised by GFP and/or MS 4531 8533
Bioinformatic predictions by 10 predictors 24 predictors
NEW!
Data mining
• Search of the NCBI PubMed (Medline) and Entrez (GenBank) databases using keywords
• Alert via Email
Data mining
• Search publication to extract localisation information = fully curated data
SUBA III interface http://suba.plantenergy.uwa.edu.au/
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III interface
SUBA III flatfile
Analysis of SUBA III data – on the way…
Do data become more or less consistent over time?
Experimental data (MS vs GFP)• How reliable are experimental localisation data? Has the overlap of
data changed with increasing data sets?
How reliable are GFP localisation data?
Total GFP localisations confirmed by MS
Total GFP localisations disputed by MS
1844 8306710
MS (9016)GFP (2554)
1386 73714458
MS (74172)GFP (1844)
1386 neither confirmed or disputed
Analysis of SUBA III data – on the way…
Do data become more or less consistent over time?
Experimental data (MS vs GFP)• How reliable are experimental localisation data? Has the overlap of
data changed with increasing data sets?• Does evidence for multiple locations mean the protein is dual
targeted/dynamic or is it a false positive?
Prediction vs experimental data• How reliable are predictors today?
PPI data• What do PPI data tell us about sub-cellular location? • Organellar proteome: Can we discover novel organellar proteins?
SUBA under the hoodSUBA under the hood
htt
p:/
/lib
rary
.du
ke.e
du
/dig
italc
olle
ctio
ns/
ge
dn
ey.
KY
01
80
/pg
.1/
• Why a Web interface?• GeneInvestigator, Mapman• AHM chemicals (Apache JPA)• For the foreseeable future databases are going to be
“Web” based (HTTP, Javascript, HTML ,css)• Need to be maintained by a minimum number of
developers (i.e. one!)
http://www.guistuff.com/
SUBA Tables (predictors)
SUBA Tables (“original” sources) http://www.ce4csb.org/amigo/
Suba Tables (publications)http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=18453549db=pubmed&retmode=xml&id=18453549
SUBA Tables (automation)
Julian Tonti-Filippini
Why Bother?
SELECT suba3.suba3.*, suba3.src_ppi_1.* FROM suba3.suba3 LEFT OUTER JOIN suba3.src_ppi AS src_ppi_1 ON suba3.suba3.locus = src_ppi_1.`locusA` WHERE EXISTS (SELECT 1 FROM suba3.src_ppi WHERE suba3.suba3.locus = suba3.src_ppi.`locusA` AND suba3.src_ppi.`locusB` IN (‘AT3G62420.1’))
“denormalisation” src_msms
Suba2
Suzanne M. Embury and Peter M.D. Gray
Computational Computational Systems BiologySystems BiologyCentre of ExcellenceCentre of Excellence
@suba.jsondef query(filter,offset=0,limit=1000): return Session().query(Suba3).filter(json2sqla(filter))\
.offset(offset).limit(limit)
http://suba.plantenergy.uwa.edu.au/cgi/suba.py/query?filter=['Suba3.ppi.locusB','in',['AT1G04234.1'],'AND','mwt','gt',80000.0]&offset=0&limit=1000
{success: True, result:[
{ locus:’AT1G54321.1’, mwt:81454, ….
ppi:[{locusA:’AT1G54321.1’,locusB:’AT1G04234.1’,pubmed:14567845}]},{ locus:’AT1G63021.1’, mwt:91454, ….
ppi:[{locusA:’ AT1G63021.1’,locusB:’AT1G04234.1’ ,pubmed:34567767}]},… ] }
Computational Computational Systems BiologySystems BiologyCentre of ExcellenceCentre of Excellence
(Near) Future
• Large number of predictors often given conflicting predictions… what to do?• Bayesian analysis…
Acknowledgements
Ian Small Harvey Millar
Joshua Heazlewood Julian Tonti-Fillipini
Thanks for your attention!!