Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | clara-quinn |
View: | 218 times |
Download: | 0 times |
Lars Juhl Jensen
Biomedical text mining
exponential growth
~45 seconds per paper
information retrieval
named entity recognition
augmented browsing
text corpora
information extraction
information retrieval
find the relevant papers
ad hoc retrieval
user-specified query
“yeast AND cell cycle”
PubMed
indexing
fast lookup
stemming
word endings
dynamic query expansion
MeSH terms
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
no tool will find that
named entity recognition
computer
as smart as a dog
teach it specific tricks
identify the concepts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
comprehensive lexicon
proteins
chemicals
compartments
tissues
diseases
organisms
CDC2
cyclin dependent kinase 1
orthographic variation
upper- and lower-case
CDC2
Cdc2
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
prefixes and postfixes
CDC2
hCDC2
“black list”
SDS
scalable implementation
text corpora
>10 km<10 hours
most use Medline
~22 million abstracts
few use full-text articles
no access
PDF files
layout-aware extraction
millions of full-text articles
information extraction
formalize the facts
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1
and this modification served as a priming step to promote subsequent
Cdc5-dependent Swe1 hyperphosphorylation and degradation
two approaches
co-mentioning
counting
within documents
within paragraphs
within sentences
co-mentioning score
NLPNatural Language Processing
grammatical analysis
part-of-speech tagging
multiword detection
semantic tagging
sentence parsing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
extract stated facts
high precision
poor recall
ExerciseGo to http://diseases.jensenlab.org
Find TYMS disease associations
Inspect the text-mining evidence
Look for examples of synonym usage
Find genes linked to colorectal cancer
thank you!