Date post: | 26-Jan-2015 |
Category: |
Entertainment & Humor |
Upload: | loretta-auvil |
View: | 108 times |
Download: | 0 times |
SEASR and UIMA
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Mike Haberman [email protected]
UIMA
Unstructured Information Management Applications
UIMA to SEASR
SEASR
UIMA + P.O.S. tagging
Four Analysis Engines to analyze document to record POS information.
OpenNLP Tokenizer
OpenNLP PosTagger
OpenNLP SentanceDetector POSWriter
Serialization of the UIMA CAS
UIMA Structured data
• POSWriter is a CAS Consumer
– Extracted data from the CAS
– Ready for import into SEASR
UIMA + P.O.S. tagging: step 1
UIMA + P.O.S. tagging: step 2
UIMA + P.O.S. tagging: step 3
UIMA + P.O.S. tagging: step 4
UIMA Structured data
• Two SEASR examples using UIMA POS data
– Frequent patterns (rule associations) on nouns (fpgrowth)
– Sentiment analysis on adjectives
UIMA to SEASR: Experiment I
• Finding patterns
SEASR + UIMA: Frequent Patterns
Frequent Pattern Analysis on nouns
• Goal:
– Discover a cast of characters within the text
– Discover nouns that frequently occur together
• character relationships
Frequent Patterns: nouns
• Use of item sets in fpgrowth
• What’s new:
– handling sparse item sets
Transac'onId ItemA
ItemB
ItemC
1 0 1 1
2 1 1 1
3 1 0 1
4 1 0 0
•••
Frequent Patterns: nouns
• What’s new:
– handling sparse item sets
Transac'on
{A,B,C}
{X,Y}
{F,E,A,C,E}
{A,Z,X,U,I,O}
http://repository.seasr.org/Datasets/POS/ tomSawyer.NN.is, tomSawyer.NNP.is uncleTom.NN.is, uncleTom.NNP.is
Reads UIMA’s CAS consumer output • url of the UIMA data source
Frequent Patterns: nouns
SEASR Flow http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl (similar to fpgrowth demo) {word=tom}
{word=answer} {word=tom} {word=lady,word=spectacles,word=room,word=thing,word=boy,word=state,word=pair,word=pride,word=heart,word=style,word=service,word=pair,word=stove-lids,word=moment,word=furniture} {word=bed,word=broom,word=breath,word=punches,word=nothing,word=cat} {word=aunt,word=polly,word=moment,word=laugh} {word=boy,word=anything,word=aint,word=tricks,word=fools,word=fools,word=can't,word=dog,word=tricks,word=goodness,word=days,word=body,word=dander,word=minute,word=lick,word=duty,word=boy,word=lord,word=truth,word=goodness,word=spare,word=rod,word=child,word=good,word=book,word=sin,word=suffering,word=old,word=scratch,word=laws-a-me,word=sister,word=boy,word=thing,word=heart,word=conscience,word=heart,word=breaks,word=well-a-well,word=man,word=woman,word=days,word=trouble,word=scripture,word=hookey,word=evening,word=southwestern,word=afternoon,word=saturdays,word=boys,word=holiday,word=work,word=anything,word=duty,word=ruination,word=child}
Enter number of sentences to group
Enter support: 10%
Frequent Patterns: visualization
Analysis of Tom Sawyer 10 paragraph window Support set to 10%
Frequent Patterns: nouns
• Recap: SEASR flow information
• The repository location is:
– http://repository.seasr.org/Meandre/Locations/1.4/Demo-UIMA/repository.ttl
• Reads UIMA’s CAS consumer output
– Select file/url of the UIMA data source
– http://repository.seasr.org/Datasets/POS tomSawyer.NN.is, tomSawyer.NNP.is, uncleTom.NN.is, uncleTom.NNP.is
• Similar to fpgrowth demo
UIMA + SEASR: Frequent Patterns
• Extensions
– Analysis for separate chapters
• Discover new relationships that occur over small windows
– Adjectives, Adverbs
• Common, repeating word usage, phrases
– Entity Extraction: Dates, Locations, Geo
UIMA to SEASR: Experiment II
• Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
• Classifying text based on its sentiment
– Determining the attitude of a speaker or a writer
– Determining whether a review is positive/negative
UIMA + SEASR: Sentiment Analysis
• Ask: What emotion is being conveyed within a body of text?
– Look at only adjectives (UIMA POS)
• lots of issues, challenges, and but’s “but … “
UIMA + SEASR: Sentiment Analysis
• Need to Answer:
– What emotions to track?
– How to measure/classify an adjective to one of the selected emotions?
– How to visualize the results
UIMA + SEASR: Sentiment Analysis
• Which emotions:
– http://en.wikipedia.org/wiki/List_of_emotions
– http://changingminds.org/explanations/emotions/basic%20emotions.htm
– http://www.emotionalcompetency.com/recognizing.htm
• Parrot’s classification (2001)
– six core emotions
– Love, Joy, Surprise, Anger, Sadness, Fear
UIMA + SEASR: Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
• How to classify adjectives:
– Lots of metrics we could use …
• Lists of adjectives already classified
– http://www.derose.net/steve/resources/emotionwords/ewords.html
– Need a “nearness” metric for missing adjectives
– How about the thesaurus game ?
UIMA + SEASR: Sentiment Analysis
• Using only a thesaurus, find a path between two words
– no antonyms
– no colloquialisms or slang
UIMA + SEASR: Sentiment Analysis
• How to get from delightful to rainy ?
['delightful', 'fair', 'balmy', 'moist', 'rainy'].
['sexy', 'provocative', 'blue', 'joyless’]
['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’]
• sexy to joyless?
• bitter to lovable?
UIMA + SEASR: Sentiment Analysis
• Use this game as a metric for measuring a given adjective to one of the six emotions.
• Assume the longer the path, the “farther away” the two words are.
• address some of issues
UIMA + SEASR: Sentiment Analysis
• SynNet: a traversable graph of synonyms (adjectives)
SynNet: rainy to pleasant
UIMA + SEASR: Sentiment Analysis
• SynNet Metrics
• Common nodes
• Path length
• Symmetric: a->b->c c->b->a
• Link strength:
• tangy->sweet
• sweet->lovable
• Use of slang or informal usage
UIMA + SEASR: Sentiment Analysis
• Common Nodes
• depth of common
UIMA + SEASR: Sentiment Analysis
• Symmetry of path in common nodes
UIMA + SEASR: Sentiment Analysis
• Find the shortest path between adjective and each emotion:
• ['delightful', 'beatific', 'joyful']
• ['delightful', 'ineffable', 'unspeakable', 'fearful']
• Pick the emotion with shortest path length
• tie breaking procedures
UIMA + SEASR: Sentiment Analysis
• Not a perfect solution
– still need context to get quality
• Vain – ['vain', 'insignificant', 'contemptible', 'hateful'] – ['vain', 'misleading', 'puzzling', 'surprising’]
• Animal – ['animal', 'sensual', 'pleasing', 'joyful'] – ['animal', 'bestial', 'vile', 'hateful'] – ['animal', 'gross', 'shocking', 'fearful'] – ['animal', 'gross', 'grievous', 'sorrowful']
• Negation – “My mother was not a hateful person.”
UIMA + SEASR: Sentiment Analysis
• A word about WordNet
• http://wordnetweb.princeton.edu/
• English nouns, verbs, adjectives and adverbs organized into sets of synonyms (synsets)
UIMA + SEASR: Sentiment Analysis
• Adjective islands
• There is no path from delightful to happy
• happy: {beaming, beamy, effulgent, felicitous, glad, happy, radiant, refulgent, well-chosen}
UIMA + SEASR: Sentiment Analysis
• Process Overview
• Extract the adjectives (UIMA POS analysis)
• Read in adjectives (SEASR library)
• Label each adjective (SynNet)
• Summarize windows of adjectives
• lots of experimentation here
• Visualize the windows
UIMA + SEASR: Sentiment Analysis
• Visualization
• New SEASR visualization component
• Based on flare ActionScript Library
• http://flare.prefuse.org/
• Still in development
• http://demo.seasr.org:1714/public/resources/data/emotions/ev/EmotionViewer.html
UIMA + SEASR: Sentiment Analysis
UIMA + SEASR: Sentiment Analysis
• Extensions
• Adverbs, nouns, verbs
• Analysis of metrics, etc
• Goal and Relevancy
• Two new components
• SynNet
• Flash based visualization of sequential based data