+ All Categories
Home > Documents > An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy...

An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy...

Date post: 17-Dec-2015
Category:
Upload: alfred-gray
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
48
An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics University of California, San Diego Integrating Data for Analysis, Anonymization, and Sharing
Transcript
Page 1: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

An NLP Ecosystemfor Development and Use

of Natural Language Processing in the Clinical Domain

Wendy W. Chapman, PhD

Division of Biomedical InformaticsUniversity of California, San Diego

Integrating Data for Analysis, Anonymization, and Sharing

Page 2: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Overview

• The promise of natural language processing (NLP)

• Challenges of developing NLP in the clinical domain

• Challenges in applying NLP in the clinical domain

• iDASH

• Opportunities for sharing and collaboration in NLP

Page 3: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

NLP Success

Fresh off its butt-kicking performance on Jeopardy!, IBM’s supercomputer "Watson" has enrolled in medical school at Columbia University,” New York Daily News February 18th 2011

“IBM's computer could very well

herald a whole new era in

medicine." ComputerWorld

February 17, 2011

Dr. Watson??

Page 4: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Clinical NLP Since 1960’s

Why has clinical NLP had little impact on clinical care?

Page 5: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Barriers to Development

• Sharing clinical data difficult– Have not had shared datasets for development and

evaluation– Modules trained on general English not sufficient

• Insufficient common conventions and standards for annotations– Data sets are unique to a lab– Not easily interchangeable

Page 6: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

• Limited collaboration– Clinical NLP applications silos and black boxes– Have not had open source applications

• Reproducibility is formidable– Open source release not always sufficient– Software engineering quality not always great– Mechanisms for reproducing results are sparse

Page 7: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Overview

• The promise of natural language processing (NLP)

• Challenges of developing NLP in the clinical domain

• Challenges in applying NLP in the clinical domain

• Developing an NLP ecosystem on iDASH

Page 8: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Security & Privacy Concerns

• Clinical texts have many patient identifiers– 18 HIPAA identifiers

• Names• Addresses

• Items not regulated by HIPAA– tight end for the Steelers

• Unique cases– 50s-year-old woman who is pregnant

• Sensitive information– HIV status

Institutions are reluctant to share dataInstitutions are reluctant to share data

Page 9: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Lack of user-centered development and scalability– Perceived cost of applying NLP outweighs the

perceived benefit (Len D’Avolio)

Page 10: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Overview

• The promise of natural language processing (NLP)

• Challenges of developing NLP in the clinical domain

• Challenges in applying NLP in the clinical domain

• Developing an NLP ecosystem on iDASH

Page 11: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

iDASH

• integrating Data• Analysis• Anonymization• Sharing

DataData

Computational Resources

Computational Resources

Software/ToolsSoftware/Tools

Page 12: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Disincentives to Share

• ‘Scooping’ by faster analysts Exposure of potential errors in data

• Resources for preparing data submissions• Maintaining data• Interacting with potential users takes time• Threat of privacy breach when human subjects

are involved– Do not have policies in place– Fallible de-identification, anonymization algorithms

iDASH aims to minimize these disincentivesiDASH aims to minimize these disincentives

Page 13: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

nlp-ecosystem.ucsd.edu

Page 14: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Privacy preserving Privacy preserving

• Access control • De-identification • Query counts• Artificial data

generators

• Access control • De-identification • Query counts• Artificial data

generators

DigitalInformed consent

DigitalInformed consent

HIPAA &/or FISMA Compliant Cloud

CustomizableDUAs

CustomizableDUAs

Informed ConsentRegistry

Informed ConsentRegistry

Page 15: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

152011 summer internship program funded by NIH U54HL108460

NLP Ecosystem

Data

MT SamplesTools & Services Collaborative

Development Tools

Virtual Machines

Evaluation Workbench

Education

Bibliography

TutorialsResearch

Resources

Guidelines

Schemas

De-Identification

UCSD Clinical Data

TxtVect

Annotation Admin & eHOST

Registry

Page 16: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Tools & Services Collaborative

Knowledge Authoring

Virtual Machines

Evaluation WorkbenchDe-

Identification

TextVect

Annotation Environment

Increase access to NLP

DecreaseBurden of

DevelopingNLP

Collaborative Effort to Build Ecosystem

Registry

Page 17: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

orbit

Increase ability to find NLP tools

Page 18: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Registry: orbit.nlm.nih.gov

Len D’Avolio, Dina Demner-Fushman

Page 19: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

De-identification service

Increase access to clinical text

Page 20: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

De-identification

• Several available de-identification modules• Need to adapt to local text

– Efficient– Secure

• Customizable ensemble de-identification system– Build a de-identified corpus – Incorporate existing de-id modules– Launch as virtual machine– Iterative training, evaluation, and modification by user

• Correct mistakes

• Add regular expressions

Brett South, Stephane Meystre, Oscar Fernandez, Danielle Mowery

Page 21: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

TextVect

Increase access to textual features

Page 22: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

TextVect

NLM: Abhishek Kumar

Page 23: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

collaborative Knowledge Authoring Support Service (cKass)

Decrease the Burden of Customizing an NLP Application

Page 24: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Customizing an IE App

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

IE OutputIE Output

MapMap

Page 25: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Customizing an IE App

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

IE Output

Dry cough Productive coughCoughHacking coughBloody cough

IE Output

Dry cough Productive coughCoughHacking coughBloody cough

Which concepts?

Page 26: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Customizing an IE App

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

IE Output

Temp 38.0CLow-grade temperature

IE Output

Temp 38.0CLow-grade temperature

What is a fever?

Page 27: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Customizing an IE App

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

User’s ConceptsCough

DyspneaInfiltrate on CXR

WheezingFever

Cervical Lymphadenopathy

IE Output

NECK: no adenopathy

Disorder: adenopathyNegation: negated

IE Output

NECK: no adenopathy

Disorder: adenopathyNegation: negated

Section mapping

Page 28: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

KOS-IEKnowledge Organization Systems for Information Extraction

Page 29: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Compile information helpful for IE

Page 30: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

User KBUser KB

NLP ToolsNLP Tools

Physician Radiologist Nurse Clinical Researcher Knowledge Engineer.

Decision Support System

Decision Support System

Shared KBShared KB External KBExternal KB

Collaborative Knowledge Base Development: cKASS

LQ Wang, M Conway, F Fana, M Tharp, D Hillert

Page 31: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Knowledge Authoring

Augment user KB with lexical variants, synonyms, and related concepts

• User-driven authoring–Top-down: Provide access to external knowledge sources

• UMLS, Specialist Lexicon, Bioportal

–Bottom-up: Annotate to derive synonyms

• Recommendation-based authoring–Generate lexical variants–Mine external knowledge sources–Mine patient records

Page 32: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Evaluation workbench

Decrease the Burden of Evaluation & Error Analysis

Page 33: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Evaluation Workbench

• Compare the output of two NLP annotators on clinical text• NLP system vs human annotation

• View annotations• Calculate outcome measures • Drill down to all levels of annotation

• Document-level

• Perform error analysis• Future versions will support formal error analysis

Page 34: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Levels of Annotation

• Document – Report classified as Shigellosis

• Group – Section classified as Past Medical History Section

• Utterance – Group of text classified as Sentence

• Snippet – “chest pain” classified as CUI 058273

• Word – “pain” classified as noun)

• Token – “.” classified as EOS marker

34

Page 35: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Document & annotations

Outcome Measures forSelected Annotations

Select Classifications

to View

ReportList

Attributes for Selected

Annotation

Relationships for Selected

AnnotationVA and ONC SHARP: Christensen, Murphy, Frabetti, Rodriguez, Savova

Page 36: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Annotation Environment

Decrease the Burden of Annotation

Page 37: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Challenges to Annotating

• Time consuming– Recruiting & training annotators for high agreement

• Expensive– Domain experts especially expensive– Need for annotation by multiple people

• Challenging to design annotation task– How many annotators?– How should I quantify quality of annotations?

• Logistically challenging– Managing files and batches of reports– Setting up annotation tool

• Reinventing the wheel– Hasn’t someone created a schema for this before?

Page 38: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

How can we reduce the burden of annotation?

Page 39: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

iDASH Annotation Environment

Annotation Admin eHOST

Web applicationiDASH cloud

Client app on your computer

VA, SHARP, and NIGMS : S Duvall, B South, G Savova, N Elhadad, H Hochheiser

Goal: provide an environment to decrease theBurden of annotation for research and application

Annotator Registry

Page 40: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Annotator Registry

• Enlist for annotation • Certify for annotation tasks

– Personal health information– Part-of-speech tagging– UMLS mapping

• Set pay rate

• Searchable• Available for inclusion in

new annotation taskhttp://idash.ucsd.edu/nlp-annotator-registry

Page 41: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Annotation Admin: Intended Users & Uses

Users• NLP researchers• Annotation administrators

Uses• Manage annotation projects – who annotates what

– Currently done with hundreds of files on hard drive

• Integrate with annotation tool (eHOST)– Download batches of raw reports to annotators– Upload and store annotated reports

• Manage simple annotation projects• Facilitate distributed annotation

Page 42: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

1. Assign annotators to a task1. Assign annotators to a task

Annotation Admin

Page 43: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

2. Create a Schema2. Create a Schema

Brett South
ehost does this too, so there is some redundancy
Page 44: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

3. Assign users and set time expectations3. Assign users and set time expectations

Page 45: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

3. Keep track of progress3. Keep track of progress

Page 46: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Tools & Services Collaborative

Knowledge Authoring

Virtual Machines

Evaluation WorkbenchDe-

Identification

TextVect

Annotation Environment

Increase access to NLP

DecreaseBurden of

DevelopingNLP

Collaborative Effort to Build Resources

Registry

Page 47: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Conclusion

• More demand for EHR data– NLP has potential to extend value of narrative clinical reports

• There have been many barriers– To development– To deployment

• Recent developments facilitate collaboration & sharing– Common annotation conventions– Privacy algorithms– Shared datasets– Hosted environments

• iDASH hopes to facilitate – Development of NLP– Application of NLP

Page 48: An NLP Ecosystem for Development and Use of Natural Language Processing in the Clinical Domain Wendy W. Chapman, PhD Division of Biomedical Informatics.

Questions | Discussion

Division of Biomedical InformaticsUniversity of California, San Diego

Integrating Data for Analysis, Anonymization, and Sharing

[email protected]

iDASH/ShARe Workshop on AnnotationSeptember 29, 2012

La Jolla, CA

iDASH/ShARe Workshop on AnnotationSeptember 29, 2012

La Jolla, CA


Recommended