+ All Categories
Home > Documents > Data Mining: Ontologies

Data Mining: Ontologies

Date post: 03-Jun-2018
Category:
Upload: johngagliano
View: 221 times
Download: 0 times
Share this document with a friend

of 17

Transcript
  • 8/12/2019 Data Mining: Ontologies

    1/17

    Faculty of Computer Science

    2006CMPUT 605 March 31, 2013

    Towards Applying Text Mining and Natural

    Language Processing for Biomedical

    Ontology Acquisition

    Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A.

    TMBIO(2006)

    John G

  • 8/12/2019 Data Mining: Ontologies

    2/17

    2006

    Department of Computing Science

    CMPUT 605

    Focus

    Ontology for describing age-related macular degeneration

    (AMD)

    Comparison of the accuracy of three methods for Ontology

    Natural Language Processing (NLP)Text Mining (SAS Text Miner)Human Expert

    Manual and adhoc knowledge acquisition

    IDOCS (Intelligent Distributed Ontology Consensus System)

  • 8/12/2019 Data Mining: Ontologies

    3/17

    2006

    Department of Computing Science

    CMPUT 605

    Introduction

    No existing common and standardized vocabularyfor classification of disease types for certain eye-diseases

    Clinicians, dispersed geographically, may usedifferent terms to describe the same condition

    Research aimed at extracting the feature and

    attribute descriptions for the vocabulary of AMD,

    and build an Ontology from that.

  • 8/12/2019 Data Mining: Ontologies

    4/17

    2006

    Department of Computing Science

    CMPUT 605

    Related Work

    Lot of research done, since 1990s, for applying

    NLP techniques in medicine, bio-medicine etc.

    NLP & Text Data Mining have been recognized to

    play an important role in this endeavor

    Research focused on online repositories such as

    Medline & PubMed

    NLP systems developed: MedLee, UMLS, GENIES

    etc.

  • 8/12/2019 Data Mining: Ontologies

    5/17

    2006

    Department of Computing Science

    CMPUT 605

    IDOCS

  • 8/12/2019 Data Mining: Ontologies

    6/17

    2006

    Department of Computing Science

    CMPUT 605

    Methodology

    Four clinical experts in retinal diseases enlisted to

    view 100 eye sample images of AMD

    Experts in different geographic locations

    Described the observations using digital voice

    recordersno artificially imposed vocabulary

    constraints

    Another retinal expert for manual parsing of the

    transcribed textextracting key words,

    organization of key-words into categories etc.

  • 8/12/2019 Data Mining: Ontologies

    7/17 2006

    Department of Computing Science

    CMPUT 605

    Methodology: NLP

    NLP: Used for information extraction and automatic

    summarization.

    Identify short sequences of words having meaning

    over and above a meaning composed directly fromtheir partsextreme programming

    Ngram Statistics Package (NSP) used for

    collocation discovery in case of bi-grams

    Word-pair associations measured by PMI

  • 8/12/2019 Data Mining: Ontologies

    8/17 2006

    Department of Computing Science

    CMPUT 605

    Methodology: NLP

    Large PMI for larger degree of association between

    the words

    s

  • 8/12/2019 Data Mining: Ontologies

    9/17 2006

    Department of Computing Science

    CMPUT 605

    Methodology:Text Mining (SAS Text Miner)

    Collection of documents (corpus) used as input to

    any text mining algorithm

    Corpus broken into tokens or terms (tokens in a

    particular language)

    Term weighting Measures: Entropy, Inverse

    Document Frequency (IDF), Global Frequency (GF) -

    IDF, None (Global weight of 1) & Normal term wt.

  • 8/12/2019 Data Mining: Ontologies

    10/17 2006

    Department of Computing Science

    CMPUT 605

    Results: Human Experts

  • 8/12/2019 Data Mining: Ontologies

    11/17 2006

    Department of Computing Science

    CMPUT 605

    Results: NLP

  • 8/12/2019 Data Mining: Ontologies

    12/17

    2006

    Department of Computing Science

    CMPUT 605

    Results: Text Miner

    Frequency wt. None

    Term wt. Normal

  • 8/12/2019 Data Mining: Ontologies

    13/17

    2006

    Department of Computing Science

    CMPUT 605

    Comparison

    sss

  • 8/12/2019 Data Mining: Ontologies

    14/17

    2006

    Department of Computing Science

    CMPUT 605

    Comparison

    Thus text mining is a viable and effective method for

    determining vocabulary to describe a particular disease

    Text Mining found a lot of terms that NLP found

    Human Expert is the best Ground Truth

  • 8/12/2019 Data Mining: Ontologies

    15/17

    2006

    Department of Computing Science

    CMPUT 605

    Ontology Generation

  • 8/12/2019 Data Mining: Ontologies

    16/17

    2006

    Department of Computing Science

    CMPUT 605

    Conclusion and Future Work

    Human experts are the best, but they did miss

    some key descriptors

    Text Mining and NLP can enhance the generation of

    feature generations, by preventing the above case

    As a consequence more robust vocabulary can be

    generated

    Extensionevaluate the effectiveness of the

    automated tools, text mining & NLP

    Different weighting schemes will be tried in the

    future

  • 8/12/2019 Data Mining: Ontologies

    17/17

    2006

    Department of Computing Science

    CMPUT 605

    Thank You For Your Attention!


Recommended