+ All Categories
Home > Documents > Spatio-Textual Analytics · Spatio-Textual Analytics Mohsin Iqbal1,2, Torben Bach Pedersen1,...

Spatio-Textual Analytics · Spatio-Textual Analytics Mohsin Iqbal1,2, Torben Bach Pedersen1,...

Date post: 27-Jun-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
1
Spatio-Textual Analytics Mohsin Iqbal 1,2 , Torben Bach Pedersen 1 , Esteban Zimányi 2 , Matteo Lissandrini 1 1 Aalborg University, Aalborg 2 Université Libre de Bruxelles, Brussels Background & Motivation Spatio-Textual Cube Objectives Limitations of existing Spatio-Textual Analytics: Lacks formalization and definition of spatio- textual cube No support for OLAP over spatio-textual data No framework for exploring regions interactively for major activities, events and discussion topic Hypothesis Analyzing structured and unstructured data together produces better , elaborated and more powerful insights. Defining and Formalizing a Spatio-Textual Cube ü Defining dimensions and hierarchies for spatial and textual data ü Spatio-Textual Measures ü Spatial-Textual OLAP Operations Building a Region Exploration Framework ü Supports analysis of spatio-textual data combined with traditional data ü Links external resources ü Allows to compare & find similar regions ü Mechanism for ranking similar regions ü Functionality like magnifying glass User Engagement & Opinions ü Deeper Customer Understanding ü Smarter & Targeted Market Campaigns ü Informed Product Opportunities ü Affective Offerings Spatio-Textual Operations & Experiments An n-dimensional ST-Cube schema CS stc is a tuple CS stc = (D, M, F), with a set of dimensions D = {d time , d location , d text , d 4 , . . . , d n }, a set of measures M = {m 1 , m 2 , m 3 , . . . , m k }, and a fact type F . Spatio-Textual Dimensions ü Spatial Dimension ü Textual Dimension Spatio-Textual Dimension Hierarchies ü τ à Day à Month à Quarter à Year ü τ à Second à Minute à Hour ü λ à City à Region à Country ü φ à Term à Theme à Topic à Concept Spatio-Textual Measures ü Top k keywords discussed within a geographical region ü Keywords density in a region ü Most frequent keywords in a area defined by a polygon Spatio-Textual Object Obj st is formally defined as Obj st = <τ, λ, φ> where τ, λ and φ represents the timestamp, location and textual components respectively. T (Top) Future Work - Region Exploration ü Grid Based Hierarchy ü Semantic Based Hierarchy ü Individual Based Hierarchy ü Majority Based Hierarchy ü Importance Based Hierarchy ü Custom Hierarchy ü Exploring a geographical region and finding similar regions ü A magnifying glass like system to figure out the general trends, events and popular discussion topic of an interested region ü Parameters selection to capture any regions characteristics ü Comparison of different regions (e.g., future business opportunities) ü Ranking function for similar regions queries ü Evaluation of proposed techniques using real-world data and use cases ü Aggregation Operations combines more than one spatio-textual objects àO agg (obj st 1 , obj st 2 , ..., obj st n ) → obj st ʹ, e.g., Top-K Topics, Union of regions (e.g., polygons) and Most frequent KW in a region ü Comparison Operations – compares two or more spatio-textual objects for relevance à O com (obj st 1 , obj st 2 , ..., obj st n ) → true/false ü Numeric Operations – takes more than one spatio-textual objects and return a real value O n (obj st 1 ,..., obj st n ) → ü Extension of OnLine Analytical Processing (OLAP) operation to spatio-textual OLAP (STOLAP) using spatio-textual operations ü Preaggregation and materialization of spatio-textual measure and cube (space-time trade off) for efficient analysis ü Comparison of proposed preaggregation & partial-materialization technique with baseline fully-materialized and no-materialization ü Experimental Evaluation using real-world twitter dataset (8.5M) ü ST-Cube modeling using snowflake schema in MSSQL Server ü Spatial dimension implementation using the Geo Names dataset 1 ü Textual dimension implementation using WordNet 2 knowledge source 2 https://wordnet.princeton.edu 1 http://download.geonames.org/export/dump/ Analyze customer response to marketing campaign and compare effect on sales in different regions Cube Modeling ST-Measures Preaggregations & Cube Materialization STOLAP Operations ST-Cube Query Processing Engine Tweets, Check-Ins Snaps & FB Posts etc. Most suitable area for advertisement ? Most successful advertising Campaign? Performance Optimization
Transcript
Page 1: Spatio-Textual Analytics · Spatio-Textual Analytics Mohsin Iqbal1,2, Torben Bach Pedersen1, Esteban Zimányi 2, Matteo Lissandrini1 ... üSupports analysis of spatio-textual data

Spatio-Textual Analytics

Mohsin Iqbal1,2, Torben Bach Pedersen1,

Esteban Zimányi 2, Matteo Lissandrini1

1Aalborg University, Aalborg

2 Université Libre de Bruxelles, Brussels

Background & Motivation

Spatio-Textual Cube

Objectives

Limitations of existing Spatio-Textual Analytics:

• Lacks formalization and definition of spatio-

textual cube

• No support for OLAP over spatio-textual data

• No framework for exploring regions

interactively for major activities, events and

discussion topic

Hypothesis

Analyzing structured and unstructured

data together produces better, elaborated

and more powerful insights.

Defining and Formalizing a Spatio-Textual

Cube

ü Defining dimensions and hierarchies

for spatial and textual data

ü Spatio-Textual Measures

ü Spatial-Textual OLAP Operations

Building a Region Exploration Framework

ü Supports analysis of spatio-textual data

combined with traditional data

ü Links external resources

ü Allows to compare & find similar regions

ü Mechanism for ranking similar regions

ü Functionality like magnifying glass

User Engagement

& Opinions

üDeeper Customer

Understanding

üSmarter & Targeted

Market Campaigns

üInformed Product

Opportunities

üAffective Offerings

Spatio-Textual Operations & Experiments

An n-dimensional ST-Cube schema CSstc is a tuple CSstc

= (D, M, F),

with a set of dimensions D = {dtime

, dlocation

, dtext

, d4, . . . , d

n}, a set of

measures M = {m1, m

2, m

3, . . . , m

k}, and a fact type F.

Spatio-Textual Dimensions

ü Spatial Dimension

ü Textual Dimension

Spatio-Textual Dimension Hierarchies

ü τ à Day à Month à Quarter à Year

ü τ à Second à Minute à Hour

ü λ à City à Region à Country

ü φ à Term à Theme à Topic à Concept

Spatio-Textual Measures

ü Top k keywords discussed within a geographical region

ü Keywords density in a region

ü Most frequent keywords in a area defined by a polygon

Spatio-Textual Object Objst

is formally defined as Objst

= <τ, λ, φ> where τ, λ and

φ represents the timestamp, location and textual components respectively.

T (Top)

Future Work - Region Exploration

ü Grid Based Hierarchy ü Semantic Based Hierarchy

ü Individual Based Hierarchy

ü Majority Based Hierarchy

ü Importance Based Hierarchy

ü Custom Hierarchy

ü Exploring a geographical region and finding similar regions

ü A magnifying glass like system to figure out the general trends,

events and popular discussion topic of an interested region

ü Parameters selection to capture any regions characteristics

ü Comparison of different regions (e.g., future business opportunities)

ü Ranking function for similar regions queries

ü Evaluation of proposed techniques using real-world data and use

cases

ü Aggregation Operations – combines more than one spatio-textual

objects àOagg

(objst1, objst

2, ..., objst

n) → objstʹ, e.g., Top-K Topics,

Union of regions (e.g., polygons) and Most frequent KW in a region

ü Comparison Operations – compares two or more spatio-textual

objects for relevance à Ocom

(objst1, objst

2, ..., objst

n) → true/false

ü Numeric Operations – takes more than one spatio-textual objects

and return a real value On(objst

1,..., objst

n) → ℝ

ü Extension of OnLine Analytical Processing (OLAP) operation to

spatio-textual OLAP (STOLAP) using spatio-textual operations

ü Preaggregation and materialization of spatio-textual measure and

cube (space-time trade off) for efficient analysis

ü Comparison of proposed preaggregation & partial-materialization

technique with baseline fully-materialized and no-materialization

ü Experimental Evaluation using real-world twitter dataset (8.5M)

ü ST-Cube modeling using snowflake schema in MSSQL Server

ü Spatial dimension implementation using the Geo Names dataset1

ü Textual dimension implementation using WordNet2 knowledge

source 2https://wordnet.princeton.edu

1 http://download.geonames.org/export/dump/

Analyze customer response to marketing campaign

and compare effect on sales in different regions

Cu

be

Mo

de

lin

g

ST-M

ea

su

res P

rea

gg

rega

tio

ns

& C

ub

e M

ate

ria

liza

tio

n

ST

OLA

P O

pe

rati

on

s

ST-Cube

Qu

ery

Pro

ce

ssin

g E

ng

ine

Tweets

, C

heck

-Ins

Snaps

& F

B P

ost

s etc

.

Most

suit

able a

rea

for adve

rtisem

ent ?

Most successful

advertising

Campaign?

Performance Optimization

Recommended