Systems biology - Bioinformatics on complete biological systems

Post on 10-May-2015

377 views 1 download

Tags:

transcript

Lars Juhl Jensen

Systems biologyBioinformatics on complete biological

systems

can a biologist fix a radio?

Lazebnik, Biochemistry, 2004

one gene

one postdoc

knockout phenotype

name the gene

Lazebnik, Biochemistry, 2004

all aspects

one gene

high-throughput biology

one technology

one lab

all genes

one aspect

systems biology

complete systems

all aspects

all genes

systems-level properties

two subfields

mathematical modeling

small systems

data integration

large systems

mathematical modeling

small systems

Chen, Mol. Biol. Cell, 2004

many equations

Chen, Mol. Biol. Cell, 2004

simulation

Chen, Mol. Biol. Cell, 2004

many parameters

Chen, Mol. Biol. Cell, 2004

requires detailed knowledge

network biology

association networks

guilt by association

protein networks

STRING

>1100 organisms

~2.6 million proteins

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

Exercise 1Go to http://string-db.org

Query for CDC28 in budding yeast

Try different evidence views

Show only high-confidence links

Show only experimental evidence

genomic context

gene fusion

Korbel et al., Nature Biotechnology, 2004

operons

Korbel et al., Nature Biotechnology, 2004

bidirectional promoters

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

a real example

Cell

Cellulosomes

Cellulose

experimental data

gene coexpression

protein interactions

Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

curated knowledge

complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

chemical networks

STITCH

STRING + 300k chemicals

drugs

metabolites

known drug targets

high-throughput assays

metabolic pathways

Exercise 2Go to http://stitch-db.org

Query for TYMS in human

What is the role of thymidylate?

What is the role of dUMP

What is the role of Permetrexed

many databases

different formats

different identifiers

variable quality

not comparable

hard work

quality scores

von Mering et al., Nucleic Acids Research, 2005

calibrate vs. gold standard

missing most of the data

text mining

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDK1

CDC2

flexible matching

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

orthographic variation

CDC2

hCdc2

“black list”

SDS

information extraction

count co-mentioning

within documents

within paragraphs

within sentences

scoring scheme

proteins

small molecules

compartments

tissues

phenotypes

diseases

adverse drug reactions

organisms

environments

Exercise 3Go to http://diseases.jensenlab.org

Find TYMS disease associations

Inspect the text-mining evidence

Find genes linked to colorectal cancer

Explore the gene network

text corpus

~22 million abstracts

no access

~4 million full-text articles

augmented browsing

Reflect

browser add-on

real-time text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

localization and disease

suite of web resources

common backend database

curated knowledge

experimental data

text mining

computational predictions

unified identifiers

quality scores

visualization

COMPARTMENTS

compartments.jensenlab.org

TISSUES

tissues.jensenlab.org

more to come

summary

bioinformatics

more than alignment

data/text mining

save you much time

questions?