Download - STRING - Protein networks from data and text mining

STRINGProtein networks from data and

text mining

Lars Juhl Jensen

9.6 million proteins

functional associations

genomic context

gene fusion

Korbel et al., Nature Biotechnology, 2004

gene neighborhood


phylogenetic profiles


experimental data

gene coexpression

physical interactions

Jensen & Bork, Science, 2008

curated knowledge

protein complexes

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

hard work

parsers

mapping files

quality scores

von Mering et al., Nucleic Acids Research, 2005

score calibration

von Mering et al., Nucleic Acids Research, 2005

implicit weighting by quality

common scale

missing most of the data

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDC2

orthographic variation

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

prefixes and suffixes

CDC2

hCdc2

“black list”

SDS

co-mentioning

counting

within documents

within paragraphs

within sentences

quality scores

score calibration

combine all evidence

Szklarczyk et al., Nucleic Acids Research, 2015string-db.org

make it available

web resource

download files

REST API

Bioconductor package

Cytoscape App

AcknowledgmentsDamian Szklarczyk

Michael KuhnAndrea Franceschini

Milan SimonovicAlexander RothSune Pletscher-

FrankildJohn “Scooter”

MorrisChristian von

MeringPeer Bork