Large-scale integration of data and text
Lars Juhl Jensen
association networks
guilt by association
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
Korbel et al., Nature Biotechnology, 2004
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
Cell
Cellulosomes
Cellulose
experimental data
gene coexpression
physical interactions
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
curated knowledge
Letunic & Bork, Trends in Biochemical Sciences, 2008
different formats
different identifiers
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
homology-based transfer
Franceschini et al., Nucleic Acids Research, 2013
missing most of the data
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
flexible matching
cyclin dependent kinase 1
cyclin-dependent kinase 1
orthographic variation
information extraction
within paragraphs
NLPNatural Language Processing
grammatical analysis
Gene and protein namesCue words for entity
recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
Saric et al., Proceedings of ACL, 2004
related web resources
STRING + 300k chemicals
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
Binder et al., Database, 2014compartments.jensenlab.org
tissues.jensenlab.org Santos et al., submitted, 2015
diseases.jensenlab.org Frankild et al., Methods, 2015
general framework
curated knowledge
experimental data
computational predictions
common identifiers
Swiss army knife syndrome
targeted resources
common infrastructure
medical data mining
Jensen et al., Nature Reviews Genetics, 2012
Jensen et al., Nature Reviews Genetics, 2012
civil registration system
established in 1968
Jensen et al., Nature Reviews Genetics, 2012
national discharge registry
6.2 million patients
119 million diagnoses
Jensen et al., Nature Reviews Genetics, 2012
guilt by association
Jensen et al., Nature Reviews Genetics, 2012
confounding factors
type of hospital encounter
Jensen et al., Nature Communications, 2014
“known unknowns”
“unknown unknowns”
reporting biases
matched controls
temporal correlations
Jensen et al., Nature Communications, 2014
trajectory networks
Jensen et al., Nature Communications, 2014
complex networks
Jensen et al., Nature Communications, 2014
direct medical implications
medical text mining
pharmacovigilance
unstructured data
comprehensive lexicon
Clozapineclozapi
n
clossapin
klozapine
chlosapin
chlosapine
chlozapin
chlozapine
klossapin
closapine
klozapinklosapi
n
adverse drug events
rule-based system
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse event
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
ADR ofadditional drug
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse eventIdentification start
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
ADR ofadditional drug
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
new adverse drug reactions
Eriksson et al., Drug Safety, 2014
Drug substance ADE p-valueChlordiazepoxide Nystagmus 4.0e-8Simvastatin Personality
changes8.4e-8
Dipyridamole Visual impairment 4.4e-4Citalopram Psychosis 8.8e-4Bendroflumethiazide
Apoplexy 8.5e-3
estimate ADR frequencies
Eriksson et al., Drug Safety, 2014
Acknowledgments
STRING/STITCHMichael KuhnDamian SzklarczykAndrea Franceschini Milan SimonovicAlexander RothSune Pletscher-FrankildJianyi LinPablo MinguezChristian von MeringPeer Bork
Text miningSune Pletscher-FrankildJasmin SaricEvangelos PafilisAlberto SantosJanos BinderKalliopi TsafouHeiko HornMichael KuhnReinhardt SchneiderSean O’ Donoghue
EHR miningAnders Boeck JensenRobert ErikssonPeter Bjødstrup JensenAndreas Bok AndersenSabrina Gade Ellesøe Henriette Schmock Tudor OpreaPope MoseleyThomas WergeSøren Brunak