+ All Categories
Home > Documents > Computing & Information Sciences Kansas State University University of Kansas Center for Business...

Computing & Information Sciences Kansas State University University of Kansas Center for Business...

Date post: 01-Apr-2015
Category:
Upload: shreya-clowers
View: 216 times
Download: 0 times
Share this document with a friend
40
Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery in Databases University of Kansas CBAR Wednesday, 04 September 2013 William H. Hsu Laboratory for Knowledge Discovery in Databases, Kansas State University http://www.kddresearch.org Acknowledgements Kansas State: Wesam Elshamy, Ming Yang, Surya Teja Kallumadi, Majed Alsadhan Illinois: Chengxiang Zhai, Jiawei Han, Kevin Chang, Dan Roth iQGateway: Praveen Koduru, Krishna Kumar Vallyatodi Dynamic Topic Modeling for Spatiotemporal Event Extraction: Probabilistic Approaches and The Dim Sum Process
Transcript
Page 1: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

University of Kansas CBAR

Wednesday, 04 September 2013

William H. HsuLaboratory for Knowledge Discovery in Databases, Kansas State University

http://www.kddresearch.org

Acknowledgements

Kansas State: Wesam Elshamy,

Ming Yang, Surya Teja Kallumadi, Majed Alsadhan

Illinois: Chengxiang Zhai, Jiawei Han, Kevin Chang, Dan Roth

iQGateway: Praveen Koduru, Krishna Kumar Vallyatodi

Dynamic Topic Modeling for SpatiotemporalEvent Extraction: Probabilistic Approaches

and The Dim Sum Process

Page 2: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Based on

NLP Group NER Toolkit © 2005-2010 Stanford University

Simile © 2003-2010 Massachusetts Institute of Technology

Google Maps © 2007-2010 Tele Atlas, Inc. and Google, Inc.

Motivation: Thematic Mapping [1]Summarizing News from The Web

http://fingolfin.user.cis.ksu.edu/timemap2gs

Page 3: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

http://healthmap.org

© 2006 – 2013 Brownstein, J. & Freifeld, C.

Motivation: Thematic Mapping [2]HealthMap

Page 4: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

http://healthmap.org

© 2006 – 2013 Brownstein, J. & Freifeld, C.

Motivation: Thematic Mapping [2]HealthMap

Page 5: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

© 2011 – 2012 TextMap.org

Motivation: Thematic Mapping [4]TextMap & Topic modelsc

Page 6: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Volkova, S., Caragea, D., Hsu, W. H., Drouhard, J., & Fowles, L. (2010). Boosting Biomedical Entity Extraction by using Syntactic Patterns for Semantic Relation Discovery. Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2010).

See also: Volkova, S. (2010). As Entity Extraction, Animal Disease-related Event Recognition and Classification from Web. M.S. thesis, Kansas State University.

Motivation: Thematic Mapping [5]Existing Systems & Limitations

Page 7: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 8: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Timeline Formation:General Task Illustrated

Elshamy (2012)

Page 9: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration& Formation (STEF)

Adapted from Elshamy (2012)

Time t: 3 extant topics Time t + k: 2 extant topics

Page 10: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 11: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Topic Modeling [1]:Basic Task (Static)

Elshamy (2012)

Page 12: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Topic Modeling [2]:Understanding Plate Notation

Adapted from Elshamy (2012)

Page 13: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Topic Modeling [3]:Hyperparameters (Another Model)

Adapted from Elshamy (2012)

Page 14: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 15: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Continuous Time vs.Variable Number of Topics

Elshamy (2012)

State of the Field

Goal

Page 16: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Event s from Text: Markov Model for Topic Detection & Tracking

Adapted from Elshamy (2012)

Page 17: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 18: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Continuous-timeDynamic Topic Model (cDTM)

Elshamy (2012)

Page 19: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Discrete Time Online Hierarchical Dirichlet Process (oHDP)

Elshamy (2012)

Page 20: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Continuous-time InfiniteDynamic Topic Model (CIDTM)

Elshamy (2012)

Page 21: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 22: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

http://healthmap.org

© 2006 – 2013 Brownstein, J. & Freifeld, C.

HealthMap Redux: Thematic Mapping, Health Infor matics, &

Epidemiology

Page 23: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 24: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Thematic Mapping Tasks [1]:Entities

Example: CNN, 2007 Foot-and-Mouth Disease (http://bit.ly/3gof6o) Tests have confirmed a second foot-and-mouth outbreak in southern England, the government announced, raising fears that the highly contagious animal virus is spreading.

Chief Veterinary Officer Debby Reynolds said Tuesday that tests showed a herd of cattle had been infected.

The animals were culled Monday evening after showing signs of the disease.

Update SummarizationA second foot-and-mouth disease infection in a herd of cattle in southern England was responded to by culling on Monday evening and announced by Debby Reynolds on Tuesday.

(Second since earlier report – hence “update”.) Compare: Recognizing Textual Entailment

A foot-and-mouth disease infection was reportedthe day after culling. (True.)

Page 25: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Thematic Mapping Tasks [2]:Aspects

© 2008 C. Zhai

University of Illinois

http://sifaka.cs.uiuc.edu/ir/

Page 26: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

• Current off-the-shelf applications fall into ambiguity problems

Thematic Mapping Tasks [3]:Location & Disambiguation

© 2008 W. Elshamy

Page 27: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Search phrase: “smallpox”© 2007 – 2009 Google, Inc.

Thematic Mapping Tasks [4]:Time & Timelines

Page 28: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Thematic Mapping Tasks [5]:Timeline Reconstruction

Murphy, Hsu, Elshamy, Kallumadi, & Volkova (2012)

Page 29: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 30: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Recent Results [1]:Meth Lab mapping

Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)

Page 31: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Recent Results [2]:Visual Analytics

Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)

EDWARDS STEVENS HODGEMAN NORTON NESS POTTAWATOMIE FINNEY DOUGLASMONTGOMERY0

50

100

150

200

250

300

350

400

450

Page 32: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Recent Results [3]:Topic Proportions

Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)

Page 33: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Simultaneous Topic Enumeration & Formation

Topic Modeling: Static (Atemporal) to Dynamic

Continuous Time vs. Variable Number of Topics

Dim Sum Process for Hybrid STEF

Dynamic Topic Modeling Test Bed

News Monitoring: Geotagging & Timelines

Recent Results

STEF & Heterogeneous Info Network Analysis

OutlineOutline

Page 34: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Sentiment Analysis Tasks:Polarity

http://dslreports.com

© 1999 – 2012 dslreports.com

Page 35: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Aggregation & OLAP:Wikipedia Infobox as Fact Table

Infobox: Albert Einstein

© 2001 – 2010 Wikimedia Foundation

Q: Where can this information be found?

A: It depends…

How much formatting does source page have?

Marked up? (Machine-readable?)

Semantically rich markup?

Albert Einstein © 2001 – 2010 Wikimedia Foundation

Page 36: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Opinion Mapping Example [1]:Health Blogs on Chronic Disease

Page 37: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Opinion Mapping Example [2]:New Entities & Relationships

Page 38: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Opinion Mapping Example [3]:Polarity

http://twitrratr.com/search/EuroHCIR

© 2012 Twitrratr

Page 39: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

Opinion Mapping Example [4]:Aims & Approach

Aim 1 – Extend Algorithms to Detect New:

Entities: Diseases, Treatments, Complications

Relationships: Adverse Reactions, Controversies

Aim 2 – Domain-Specific Ontology

Symptoms, Disease Attributes

Treatments, Complications

Comparisons

Aim 3 – Better Recognition of Scope, Polarity

Page 40: Computing & Information Sciences Kansas State University University of Kansas Center for Business Analytics research Seminar Laboratory for Knowledge Discovery.

Computing & Information SciencesKansas State University

University of KansasCenter for Business Analytics research Seminar

Laboratory forKnowledge Discovery in Databases

User Groups:Goals & Primary Use Cases

Goal: Thematic Opinion Map (Choropleth, etc.)

User Groups Experienced: policymakers, health professionals Individual stakeholders: patients, activists, voters

Primary Use Case: Infographics as IE Views

http://bit.ly/fu04zf

© 2011 Mediabistro

Are Germans really the happiest Twitter users by country, Tennesseans by U.S. state?


Recommended