+ All Categories
Home > Documents > Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas...

Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas...

Date post: 21-Dec-2015
Category:
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
16
Visualizing Visualizing Association Rules Association Rules for Text Mining for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory
Transcript
Page 1: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

Visualizing Visualizing Association Rules Association Rules

for Text Miningfor Text Mining

- Sangjik Lee

Pak Chung Wong, Paul Whitney, Jim Thomas

Pacific Northwest National Laboratory

Page 2: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

IntroductionIntroduction

• An association rule in data mining is an An association rule in data mining is an implication of the form X -> Y where X is a set implication of the form X -> Y where X is a set of antecedent items and Y is the consequent of antecedent items and Y is the consequent item.item.

• For years researchers have developed many For years researchers have developed many tools to visualize association rules. tools to visualize association rules.

• However, few of these tools can handle more However, few of these tools can handle more than dozens of rules, and none of them can than dozens of rules, and none of them can effectively manage rules with multiple antece-effectively manage rules with multiple antece-dents.dents.

• Thus, it is extremely difficult to visualize and Thus, it is extremely difficult to visualize and understand the association information of a understand the association information of a large data set even when all the rules are large data set even when all the rules are available.available.

Page 3: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

AssociationAssociation

• Powerful data analysis technique Powerful data analysis technique that appears frequently in data that appears frequently in data mining literature.mining literature.

• An example association rule of a An example association rule of a supermarket database is 80% of the supermarket database is 80% of the people who buy diapers and baby people who buy diapers and baby power also buy baby oil.power also buy baby oil.

Page 4: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

• The system was developed to support text The system was developed to support text mining and visualization research on large mining and visualization research on large unstructured document corpora.unstructured document corpora.

• The focus is to study the relationships and The focus is to study the relationships and implications among topics, or descriptive implications among topics, or descriptive concepts, that are used to characterize a concepts, that are used to characterize a corpus.corpus.

• The goal is to discover important The goal is to discover important association rules within a corpus such that association rules within a corpus such that the presence of a set of topics in an article the presence of a set of topics in an article implies the presence of another topic.implies the presence of another topic.

Page 5: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

• For example, one might learn in For example, one might learn in headline news that whenever the headline news that whenever the words “Greenspan” and “inflation” words “Greenspan” and “inflation” occur, it is highly probably that the occur, it is highly probably that the stock market is also mentioned.stock market is also mentioned.

• Demonstrate the results using a Demonstrate the results using a news corpus with more than 3000 news corpus with more than 3000 articles collected from open articles collected from open sources.sources.

Page 6: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

Current TechnologyCurrent Technology

• Two-Dimensional MatrixTwo-Dimensional Matrix

Page 7: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

Current TechnologyCurrent Technology

• Directed GraphDirected Graph

Page 8: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

Current TechnologyCurrent Technology

• Directed Graph Directed Graph

• This technique works well when only This technique works well when only a few items(nodes) and a few items(nodes) and associations(edges) are involved.associations(edges) are involved.

• An association graph can quickly An association graph can quickly turn into a tangled display with as turn into a tangled display with as few as a dozen rules.few as a dozen rules.

Page 9: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

A Novel Visualization A Novel Visualization TechniqueTechnique

• To visualize many-to-one To visualize many-to-one association rulesassociation rules

• Instead of using the tiles of a 2D Instead of using the tiles of a 2D matrix to show the item-to-item matrix to show the item-to-item association rules, used the matrix to association rules, used the matrix to depict the rule-to-item depict the rule-to-item relationship. relationship.

Page 10: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

A visualization of item associations with A visualization of item associations with support >= 0.4% and confidence >= 50%support >= 0.4% and confidence >= 50%

Page 11: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

A Novel Visualization Technique A Novel Visualization Technique ( Continued ) ( Continued )

• the rows of the matrix floor represent the the rows of the matrix floor represent the items (or items (or topics topics in the context of text mining)in the context of text mining)

• the columns represent the item associations. the columns represent the item associations.

• The blue and red blocks of each column (rule) The blue and red blocks of each column (rule) represent the antecedent and the consequent represent the antecedent and the consequent of the rule. The identities of the items are of the rule. The identities of the items are shown along the right side of the matrix. shown along the right side of the matrix.

• The confidence and support levels of the rules The confidence and support levels of the rules are given by the corresponding bar charts in are given by the corresponding bar charts in different scales at the far end of the matrix. different scales at the far end of the matrix.

Page 12: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

A Novel Visualization TechniqueA Novel Visualization Technique- Advantage- Advantage

• There is virtually no upper limit on the There is virtually no upper limit on the number of items in an antecedent.number of items in an antecedent.

• We can analyze the distributions of the We can analyze the distributions of the association rules (horizontal axis) as well association rules (horizontal axis) as well as the items within (vertical axis) as the items within (vertical axis) simultaneously.simultaneously.

• the identity of individual items within an the identity of individual items within an antecedent group is clearly shown.antecedent group is clearly shown.

• Because all the metadata are plotted at the Because all the metadata are plotted at the far end and the height of the columns are far end and the height of the columns are scaled so that the front columns do not scaled so that the front columns do not block the rear ones, few occlusions occur.block the rear ones, few occlusions occur.

Page 13: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.
Page 14: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.
Page 15: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

Conclusion and future workConclusion and future work• Applied the new technique to a text

mining system to analyze a large text corpus.

• The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a 3D display.

• Long-term goal is to integrate many of tools and techniques into a single visualization environment that provides time sequence analysis, hypothesis explanation and document summarization.

Page 16: Visualizing Association Rules for Text Mining - Sangjik Lee Pak Chung Wong, Paul Whitney, Jim Thomas Pacific Northwest National Laboratory.

ReferencesReferences

• Pak Chung Wong, Paul Whitney, and Jim Thomas. Visualizing Association Rules for Text Mining. In Graham Wills and Daniel Keim, editors, Proceedings of IEEE Information Visualization '99, Los Alamitos, CA, 1999. IEEE CS Press

• Pak Chung Wong, Wendy Cowley, Harlan Foote, Elizabeth Jurrus, and Jim Thomas. Visualizing Sequential Patterns for Text Mining. Proceedings IEEE Information Visualization 2000, Salt Lake City, Utah, Oct 8 - Oct 13, 2000.

• Nancy E. Miller, Pak Chung Wong, Mary Brewster, and Harlan Foote. TOPIC ISLANDS - A Wavelet-Based Text Visualization System. In David Ebert, Hans Hagan, and Holly Rushmeier, editors, Proceedings IEEE Visualization '98, pages 189 -- 196, New York, NY, Oct 18 -- 23, 1998. ACM Press.


Recommended