STRINGProtein networks from data and
text mining
Lars Juhl Jensen
9.6 million proteins
functional associations
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
experimental data
gene coexpression
physical interactions
Jensen & Bork, Science, 2008
curated knowledge
protein complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
parsers
mapping files
quality scores
von Mering et al., Nucleic Acids Research, 2005
score calibration
von Mering et al., Nucleic Acids Research, 2005
implicit weighting by quality
common scale
missing most of the data
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDC2
orthographic variation
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
prefixes and suffixes
CDC2
hCdc2
“black list”
SDS
co-mentioning
counting
within documents
within paragraphs
within sentences
quality scores
score calibration
combine all evidence
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
make it available
web resource
download files
REST API
Bioconductor package
Cytoscape App
AcknowledgmentsDamian Szklarczyk
Michael KuhnAndrea Franceschini
Milan SimonovicAlexander RothSune Pletscher-
FrankildJohn “Scooter”
MorrisChristian von
MeringPeer Bork