One tagger, many uses: Illustrating the power of dictionary-based named entity recognition

Post on 28-Jan-2018

77 views 0 download

transcript

Lars Juhl Jensen@larsjuhljensen

One tagger, many usesIllustrating the power of dictionary-based named entity

recognition

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

dictionary

genes / proteins

diseases

expansion rules

prefixes and suffixes

curated blacklist

SDS

software

C++ tagger

>1000 abstracts / second

70–80% recall

80–90% precision

open sourcebitbucket.org/larsjuhljensen/tagger/

Dockerhub.docker.com/r/larsjuhljensen/tagger/

web servicetagger.jensenlab.org

community resources

Extractextract.jensenlab.org

STRINGstring-db.org

string-db.org

DISEASESdiseases.jensenlab.org

Cytoscape

curated knowledge

experimental data

co-occurrence text mining

Medline abstracts

<1 km

15 million full-text articles

Westergaard et al., BioRxiv, 2017

~50% more associations

electronic health records

Jensen et al., Nature Reviews Genetics, 2012

in Danish

dictionary

drugs

adverse events

in Danish

named entity recognition

temporal correlations

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

Eriksson et al., Drug Safety, 2014

find novel associations

summary

broadly applicable

keep it simple

free tools

AcknowledgmentsEvangelos PafilisSune Pletscher-

FrankildNadezhda Doncheva

Damian SzklarczykMichael Kuhn

Robert Eriksson

John “Scooter” MorrisTudor OpreaChristian von MeringPeer BorkChristos ArvanitidisSøren Brunak