Scalable Top- k Spatio-Temporal Term Querying

Date post:	24-Feb-2016
Category:	Documents
Upload:	minnie
View:	104 times
Download:	2 times

Download Report this document

Share this document with a friend

Description:

Anders Skovsgaard Aarhus University. Darius Š idlauskas Aarhus University. Christian S. Jensen Aarhus University. Scalable Top- k Spatio-Temporal Term Querying. Fig. 2: Multiple Spatial Granularities. Fig. 1: Dynamic Summaries. - PowerPoint PPT Presentation

Embed Size (px):

Popular Tags:

multiple spatial granularities

query processing

large scale stream processing

massive data algorithmics

average number of counters

merging of summaries

karchived summaryevacuationc

dynamic summaries1

Introduction Adaptive Frequent Item Aggregator (AFIA) Empirical Study Motivation What’s the “talk of the town” at a specific time interval? The top- most popular terms: Scalability We seek a solution that is capable of supporting the entire world, a long history of content, and a stream with much higher rates than what Twitter currently sees (5,000 tweets/sec). Problem Definition – a set of spatio-temporal objects . – a point location (latitude and longitude), – a text document (set of terms ), – a timestamp. – a score of a term for a set of objects. Input: – a top-k scored terms query – number of top-k terms, – a rectangular range, Dynamic summaries: extend SpaceSaving [1] to dynamically adjusts to incoming stream. Multiple spatio-temporal granularities (Figures 2 and 3). The top- spatio-temporal query processor including: 9 Support for ad-hoc spatio-temporal ranges. 9 Computing that captures which part of the result is exact rather than approximate. Data All geo-tagged posts from Twitter’s Streaming API during May, 2013. The total number of tweets is 110,426,053 (41 tweets/second). Baselines SS: (approximate) frequent item aggregation using SpaceSaving [1]. HT: (exact) frequent item counting using a hash table. Conclusion AFIA’s throughput exceeds Twitter’s current average rate by a factor of 4–10. Scalable Top-k Spatio-Temporal Term Querying Christian S. Jensen Aarhus University Darius Šidlauskas Aarhus University MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation Anders Skovsgaard Aarhus University Fig. 2: Multiple Spatial Granularities Fig. 3: Multiple Temporal Granularities Fig. 5: Accuracy at Different Spatio-Temporal Granularities Fig. 6: Average Number of Counters Maintained at Different Spatio-Temporal Granularit Fig. 7: Large Scale Stream Processing Fig. 8: Top-k Query Processing Fig. 4: Merging of Summaries m targeted-k Archived Summary evacuati on c=3, Δ=0 sandy c=2, Δ=0 flooding c=2, Δ=1 hurrican e c=1, Δ=0 floodi ng c=1, Δ=0 evacuatio n evacuatio n sandy evacuatio n sandy storm hurricane flooding flooding t a r g e t e d - k Active Summary Stream: Fig. 1: Dynamic Summaries

Transcript

Page 1: Scalable Top- k Spatio-Temporal Term Querying

Introduction Adaptive Frequent Item Aggregator (AFIA) Empirical StudyMotivationWhat’s the “talk of the town” at a specific time interval?

The top- most popular terms:

ScalabilityWe seek a solution that is capable of supporting the entire world, a long history of content, and a stream with much higher rates than what Twitter currently sees (5,000 tweets/sec).

Problem Definition

– a set of spatio-temporal objects .

� – a point location (latitude and longitude),� – a text document (set of terms ),� – a timestamp.

–a score of a term for a set of objects.

Input: – a top-k scored terms query� – number of top-k terms,� – a rectangular range,� – a time interval.

Output:� top scored terms from objects

,� – an integer () guaranteeing that the first

terms to have the highest scores (the rest terms are approximate).

Dynamic summaries: extend SpaceSaving [1] to dynamically adjusts to incoming stream. Multiple spatio-temporal granularities (Figures 2 and 3). The top- spatio-temporal query processor including:

9 Support for ad-hoc spatio-temporal ranges.9 Computing that captures which part of the result is exact rather than approximate.

Data All geo-tagged posts from Twitter’s Streaming API during May, 2013. The total number of tweets is 110,426,053 (41 tweets/second).

Baselines SS: (approximate) frequent item aggregation using SpaceSaving [1]. HT: (exact) frequent item counting using a hash table.

Conclusion AFIA’s throughput exceeds Twitter’s current average rate by a factor of 4–10. One month of dynamic summaries require 120 GB of memory. The lowest observed accuracy was 97%.

References[1] A. Metwally, D. Agrawal, and A. El Abbadi. Efficient computation of frequent and top-k elements in data streams. ICDT, 2005.

Scalable Top-k Spatio-Temporal Term Querying

Christian S. JensenAarhus University

Darius ŠidlauskasAarhus University

MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation

Anders Skovsgaard

Aarhus University

Fig. 2: Multiple Spatial Granularities

Fig. 3: Multiple Temporal Granularities

Fig. 5: Accuracy at Different Spatio-Temporal Granularities

Fig. 6: Average Number of Counters Maintained at Different Spatio-Temporal Granularities

Fig. 7: Large Scale Stream Processing Fig. 8: Top-k Query Processing

Fig. 4: Merging of Summaries

targeted-k

Archived Summary

evacuation c=3, Δ=0sandy c=2, Δ=0flooding c=2, Δ=1hurricane c=1, Δ=0

flooding c=1, Δ=0evacuationevacuationsandyevacuationsandystormhurricanefloodingflooding

targeted-k

Active Summary

Stream:

Fig. 1: Dynamic Summaries

Recommended

Scalable Top- k Spatio-Temporal Term Querying

Documents