+ All Categories
Home > Technology > A Prlic - BioJava update

A Prlic - BioJava update

Date post: 10-May-2015
Category:
Upload: jan-aerts
View: 630 times
Download: 1 times
Share this document with a friend
Description:
Presentation by Prlic at BOSC2012 "BioJava Update"
Popular Tags:
11
How to use BioJava to calculate one billion protein structure alignments at the RCSB PDB website Andreas Prlić
Transcript
Page 1: A Prlic - BioJava update

How to use BioJavato calculate one billion protein structure alignments at

the RCSB PDB website

Andreas Prlić

Page 2: A Prlic - BioJava update

My Two Hats

RCSB PDBBioJava

Page 3: A Prlic - BioJava update

www.pdb.org

Overview N

umbe

r of r

elea

sed

entr

ies

Year

Page 4: A Prlic - BioJava update

Some of the things you can do at the RCSB PDB site

• Advanced queries

• Custom reports

• Visualization

• Education section

• Comparisons across PDB, based on sequence and 3D structure similarities

Jmol

LigandExplorer

Custom report

Page 5: A Prlic - BioJava update

www.pdb.org

Systematic Structural AlignmentObjective: Find novel relationships

Example: Green Fluorescent Protein§ Nidogen-1: similar 11-stranded § beta-barrel and internal helices§ 3 Å RMSD, only 9% sequence identity§ Nidogen-1: component of basement membrane, no chromophore§ GFP and NID-1 may share common ancestor

Page 6: A Prlic - BioJava update

Open Science Grid

based on the FATCAT (rigid) algorithm Yuzhen Ye & Adam Godzik. Flexible structure alignment by chaining aligned fragment pairs allowing twists. 2003. Bioinformatics vol.19 suppl. 2. ii246-ii255.

Systematic comparisons of representative chains from 40% sequence identity clusters

22000 sequence clusters33000 representative domains

Page 7: A Prlic - BioJava update

PDBCustom Job Management

Java Clients can run anywhere

Open Science

Grid

Sends out instructionsto clients

Writes resultsto disk

.

.

.

Page 8: A Prlic - BioJava update

Initial calculation of frozen snapshot of PDB

~170k CPU hourson OSG

Incremental weekly updates(~1-2 million alignments)

<1000 CPU hours

Code www.biojava.org

1 billion alignmentsavailable freely at

www.rcsb.org

Page 9: A Prlic - BioJava update

BioJava

• Major rewrite - BioJava 3

Page 10: A Prlic - BioJava update

BioJava 1 BioJava 3

core data model

symbols/alphabets, counts, distributions

Genome/sequencing

Mult. seq. align

Structure alignment

Modfinder

AA Properties

Protein Disorder

Hmmer3 WS

NCBI WS

Parsers: Genbank/Embl/Blast

Page 11: A Prlic - BioJava update

Acknowledgments

• Spencer Bliven

• Peter Rose

• Phil Bourne

• all contributors

• A. Yates, J. Jacobsen, P. Troshin, M. Chapman, J. Gao, C.H. Koh, S. Foisy, R. Holland, G. Rimsa, M. Heuer, H. Brandstaetter-Mueller, S. Willis

RCSB PDB BioJava

FundingRCSB PDBGoogle Summer of Code Open Science Grid


Recommended