+ All Categories
Home > Documents > Scalable Top- k Spatio-Temporal Term Querying

Scalable Top- k Spatio-Temporal Term Querying

Date post: 24-Feb-2016
Category:
Upload: minnie
View: 104 times
Download: 2 times
Share this document with a friend
Description:
Anders Skovsgaard Aarhus University. Darius Š idlauskas Aarhus University. Christian S. Jensen Aarhus University. Scalable Top- k Spatio-Temporal Term Querying. Fig. 2: Multiple Spatial Granularities. Fig. 1: Dynamic Summaries. - PowerPoint PPT Presentation
Popular Tags:
1
Introduction Adaptive Frequent Item Aggregator (AFIA) Empirical Study Motivation What’s the “talk of the town” at a specific time interval? The top- most popular terms: Scalability We seek a solution that is capable of supporting the entire world, a long history of content, and a stream with much higher rates than what Twitter currently sees (5,000 tweets/sec). Problem Definition – a set of spatio-temporal objects . – a point location (latitude and longitude), a text document (set of terms ), – a timestamp. a score of a term for a set of objects. Input: – a top-k scored terms query – number of top-k terms, – a rectangular range, Dynamic summaries: extend SpaceSaving [1] to dynamically adjusts to incoming stream. Multiple spatio-temporal granularities (Figures 2 and 3). The top- spatio-temporal query processor including: 9 Support for ad-hoc spatio-temporal ranges. 9 Computing that captures which part of the result is exact rather than approximate. Data All geo-tagged posts from Twitter’s Streaming API during May, 2013. The total number of tweets is 110,426,053 (41 tweets/second). Baselines SS: (approximate) frequent item aggregation using SpaceSaving [1]. HT: (exact) frequent item counting using a hash table. Conclusion AFIA’s throughput exceeds Twitter’s current average rate by a factor of 4–10. Scalable Top-k Spatio-Temporal Term Querying Christian S. Jensen Aarhus University Darius Šidlauskas Aarhus University MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation Anders Skovsgaard Aarhus University Fig. 2: Multiple Spatial Granularities Fig. 3: Multiple Temporal Granularities Fig. 5: Accuracy at Different Spatio-Temporal Granularities Fig. 6: Average Number of Counters Maintained at Different Spatio-Temporal Granularit Fig. 7: Large Scale Stream Processing Fig. 8: Top-k Query Processing Fig. 4: Merging of Summaries m targeted-k Archived Summary evacuati on c=3, Δ=0 sandy c=2, Δ=0 flooding c=2, Δ=1 hurrican e c=1, Δ=0 floodi ng c=1, Δ=0 evacuatio n evacuatio n sandy evacuatio n sandy storm hurricane flooding flooding t a r g e t e d - k Active Summary Stream: Fig. 1: Dynamic Summaries
Transcript
Page 1: Scalable Top- k  Spatio-Temporal Term  Querying

Introduction Adaptive Frequent Item Aggregator (AFIA) Empirical StudyMotivationWhat’s the “talk of the town” at a specific time interval?

The top- most popular terms:

ScalabilityWe seek a solution that is capable of supporting the entire world, a long history of content, and a stream with much higher rates than what Twitter currently sees (5,000 tweets/sec).

Problem Definition

– a set of spatio-temporal objects .

� – a point location (latitude and longitude),� – a text document (set of terms ),� – a timestamp.

–a score of a term for a set of objects.

Input: – a top-k scored terms query� – number of top-k terms,� – a rectangular range,� – a time interval.

Output:� top scored terms from objects

,� – an integer () guaranteeing that the first

terms to have the highest scores (the rest terms are approximate).

Dynamic summaries: extend SpaceSaving [1] to dynamically adjusts to incoming stream. Multiple spatio-temporal granularities (Figures 2 and 3). The top- spatio-temporal query processor including:

9 Support for ad-hoc spatio-temporal ranges.9 Computing that captures which part of the result is exact rather than approximate.

Data All geo-tagged posts from Twitter’s Streaming API during May, 2013. The total number of tweets is 110,426,053 (41 tweets/second).

Baselines SS: (approximate) frequent item aggregation using SpaceSaving [1]. HT: (exact) frequent item counting using a hash table.

Conclusion AFIA’s throughput exceeds Twitter’s current average rate by a factor of 4–10. One month of dynamic summaries require 120 GB of memory. The lowest observed accuracy was 97%.

References[1] A. Metwally, D. Agrawal, and A. El Abbadi. Efficient computation of frequent and top-k elements in data streams. ICDT, 2005.

Scalable Top-k Spatio-Temporal Term Querying

Christian S. JensenAarhus University

Darius ŠidlauskasAarhus University

MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation

Anders Skovsgaard

Aarhus University

Fig. 2: Multiple Spatial Granularities

Fig. 3: Multiple Temporal Granularities

Fig. 5: Accuracy at Different Spatio-Temporal Granularities

Fig. 6: Average Number of Counters Maintained at Different Spatio-Temporal Granularities

Fig. 7: Large Scale Stream Processing Fig. 8: Top-k Query Processing

Fig. 4: Merging of Summaries

m

targeted-k

Archived Summary

evacuation c=3, Δ=0sandy c=2, Δ=0flooding c=2, Δ=1hurricane c=1, Δ=0

flooding c=1, Δ=0evacuationevacuationsandyevacuationsandystormhurricanefloodingflooding

targeted-k

Active Summary

Stream:

Fig. 1: Dynamic Summaries

Recommended