Date post: | 01-Apr-2015 |
Category: |
Documents |
Upload: | shreya-clowers |
View: | 216 times |
Download: | 0 times |
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
University of Kansas CBAR
Wednesday, 04 September 2013
William H. HsuLaboratory for Knowledge Discovery in Databases, Kansas State University
http://www.kddresearch.org
Acknowledgements
Kansas State: Wesam Elshamy,
Ming Yang, Surya Teja Kallumadi, Majed Alsadhan
Illinois: Chengxiang Zhai, Jiawei Han, Kevin Chang, Dan Roth
iQGateway: Praveen Koduru, Krishna Kumar Vallyatodi
Dynamic Topic Modeling for SpatiotemporalEvent Extraction: Probabilistic Approaches
and The Dim Sum Process
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Based on
NLP Group NER Toolkit © 2005-2010 Stanford University
Simile © 2003-2010 Massachusetts Institute of Technology
Google Maps © 2007-2010 Tele Atlas, Inc. and Google, Inc.
Motivation: Thematic Mapping [1]Summarizing News from The Web
http://fingolfin.user.cis.ksu.edu/timemap2gs
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
http://healthmap.org
© 2006 – 2013 Brownstein, J. & Freifeld, C.
Motivation: Thematic Mapping [2]HealthMap
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
http://healthmap.org
© 2006 – 2013 Brownstein, J. & Freifeld, C.
Motivation: Thematic Mapping [2]HealthMap
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
© 2011 – 2012 TextMap.org
Motivation: Thematic Mapping [4]TextMap & Topic modelsc
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Volkova, S., Caragea, D., Hsu, W. H., Drouhard, J., & Fowles, L. (2010). Boosting Biomedical Entity Extraction by using Syntactic Patterns for Semantic Relation Discovery. Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2010).
See also: Volkova, S. (2010). As Entity Extraction, Animal Disease-related Event Recognition and Classification from Web. M.S. thesis, Kansas State University.
Motivation: Thematic Mapping [5]Existing Systems & Limitations
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Timeline Formation:General Task Illustrated
Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration& Formation (STEF)
Adapted from Elshamy (2012)
Time t: 3 extant topics Time t + k: 2 extant topics
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Topic Modeling [1]:Basic Task (Static)
Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Topic Modeling [2]:Understanding Plate Notation
Adapted from Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Topic Modeling [3]:Hyperparameters (Another Model)
Adapted from Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Continuous Time vs.Variable Number of Topics
Elshamy (2012)
State of the Field
Goal
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Event s from Text: Markov Model for Topic Detection & Tracking
Adapted from Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Continuous-timeDynamic Topic Model (cDTM)
Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Discrete Time Online Hierarchical Dirichlet Process (oHDP)
Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Continuous-time InfiniteDynamic Topic Model (CIDTM)
Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
http://healthmap.org
© 2006 – 2013 Brownstein, J. & Freifeld, C.
HealthMap Redux: Thematic Mapping, Health Infor matics, &
Epidemiology
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Thematic Mapping Tasks [1]:Entities
Example: CNN, 2007 Foot-and-Mouth Disease (http://bit.ly/3gof6o) Tests have confirmed a second foot-and-mouth outbreak in southern England, the government announced, raising fears that the highly contagious animal virus is spreading.
Chief Veterinary Officer Debby Reynolds said Tuesday that tests showed a herd of cattle had been infected.
The animals were culled Monday evening after showing signs of the disease.
Update SummarizationA second foot-and-mouth disease infection in a herd of cattle in southern England was responded to by culling on Monday evening and announced by Debby Reynolds on Tuesday.
(Second since earlier report – hence “update”.) Compare: Recognizing Textual Entailment
A foot-and-mouth disease infection was reportedthe day after culling. (True.)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Thematic Mapping Tasks [2]:Aspects
© 2008 C. Zhai
University of Illinois
http://sifaka.cs.uiuc.edu/ir/
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
• Current off-the-shelf applications fall into ambiguity problems
Thematic Mapping Tasks [3]:Location & Disambiguation
© 2008 W. Elshamy
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Search phrase: “smallpox”© 2007 – 2009 Google, Inc.
Thematic Mapping Tasks [4]:Time & Timelines
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Thematic Mapping Tasks [5]:Timeline Reconstruction
Murphy, Hsu, Elshamy, Kallumadi, & Volkova (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Recent Results [1]:Meth Lab mapping
Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Recent Results [2]:Visual Analytics
Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)
EDWARDS STEVENS HODGEMAN NORTON NESS POTTAWATOMIE FINNEY DOUGLASMONTGOMERY0
50
100
150
200
250
300
350
400
450
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Recent Results [3]:Topic Proportions
Hsu, Abduljabbar, Osuga, Lu, & Elshamy (2012)
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Simultaneous Topic Enumeration & Formation
Topic Modeling: Static (Atemporal) to Dynamic
Continuous Time vs. Variable Number of Topics
Dim Sum Process for Hybrid STEF
Dynamic Topic Modeling Test Bed
News Monitoring: Geotagging & Timelines
Recent Results
STEF & Heterogeneous Info Network Analysis
OutlineOutline
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Sentiment Analysis Tasks:Polarity
http://dslreports.com
© 1999 – 2012 dslreports.com
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Aggregation & OLAP:Wikipedia Infobox as Fact Table
Infobox: Albert Einstein
© 2001 – 2010 Wikimedia Foundation
Q: Where can this information be found?
A: It depends…
How much formatting does source page have?
Marked up? (Machine-readable?)
Semantically rich markup?
Albert Einstein © 2001 – 2010 Wikimedia Foundation
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Opinion Mapping Example [1]:Health Blogs on Chronic Disease
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Opinion Mapping Example [2]:New Entities & Relationships
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Opinion Mapping Example [3]:Polarity
http://twitrratr.com/search/EuroHCIR
© 2012 Twitrratr
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
Opinion Mapping Example [4]:Aims & Approach
Aim 1 – Extend Algorithms to Detect New:
Entities: Diseases, Treatments, Complications
Relationships: Adverse Reactions, Controversies
Aim 2 – Domain-Specific Ontology
Symptoms, Disease Attributes
Treatments, Complications
Comparisons
Aim 3 – Better Recognition of Scope, Polarity
Computing & Information SciencesKansas State University
University of KansasCenter for Business Analytics research Seminar
Laboratory forKnowledge Discovery in Databases
User Groups:Goals & Primary Use Cases
Goal: Thematic Opinion Map (Choropleth, etc.)
User Groups Experienced: policymakers, health professionals Individual stakeholders: patients, activists, voters
Primary Use Case: Infographics as IE Views
http://bit.ly/fu04zf
© 2011 Mediabistro
Are Germans really the happiest Twitter users by country, Tennesseans by U.S. state?