+ All Categories
Home > Documents > Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht...

Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht...

Date post: 16-Dec-2015
Category:
Upload: stuart-lang
View: 217 times
Download: 0 times
Share this document with a friend
90
Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University
Transcript
Page 1: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Marc van Kreveld (and Giri Narasimhan)Department of Information and Computing SciencesUtrecht University

Page 2: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 3: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 4: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Are the people clustered in this room? How do we define a cluster?

In spatial data mining we have objects/ entities with a location given by coordinates

Cluster definitions involve distance between locations How do we define distance?

Page 5: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Determine whether clustering occursDetermine the degree of clusteringDetermine the clustersDetermine the largest clusterDetermine the largest empty region

Determine the outliers

Page 6: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 7: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 8: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Are the men clustered?Are the women clustered?

Is there a co-location of men and women?

Determine regions favored exclusively by women. Men? Loners? Couples? Families?

Determine empty regions.

Page 9: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 10: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Like before, we may be interested in is there co-location? the degree of co-location the largest co-location the co-locations themselves the objects not involved in co-location Regions with no (or little) co-location

Page 11: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 12: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Locations have a time stamp Interesting patterns involve space

and timeAnomalies?

Page 13: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 14: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Entities with a trajectory (time-stamped motion path)

Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ...

n entities = trajectories; n = 10 – 100,000 t time steps; t = 10 – 100,000

input size is nt m size subgroup (unknown); m = 10 – 100,000

Page 15: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Tracked animals (buffalo, birds, ...)Tracked people (potential terrorists)Tracked GSMs (e.g. for traffic

purposes)Trajectories of tornadoesSports scene analysis (players on a

soccer field)

Page 16: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

What is the location visited by most entities?

location = circular region of specified radius

Page 17: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

What is the location visited by most entities?

location = circular region of specified radius

4 entities

Page 18: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

What is the location visited by most entities?

location = circular region of specified radius

3 entities

Page 19: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Compute buffer of each trajectory

Page 20: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Compute buffer of each trajectory

0

1

2

1

11

• Compute the arrangement of the buffers and the cover count of each cell

1

Page 21: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

One trajectory has t time stamps; its buffer can be computed in O(t log t) time

All buffers can be computed in O(nt log t) time

The arrangement can be computed in O(nt log (nt) + k) time, where k = O( (nt)2 ) is the complexity of the arrangement

Cell cover counts are determined in O(k) time

Page 22: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Total: O(nt log (nt) + k) time If the most visited location is visited by

m entities, this is O(nt log (nt) + ntm)

Note: input size is nt ;n entities, each with location at t moments

Page 23: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Spatial data n points (locations) Distance is

important clustering pattern

Presence of attributes (e.g. man/woman): co-location patterns

Spatio-temporal data

n trajectories, each has t time steps

Distance is time-dependent flock pattern meet pattern

Heading and speed are important and are also time-dependent

Page 24: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Also co-location patternDiscovered simply by overlay

E.g., occurrences of oakson different soil types

Page 25: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

What if it is known that the entities only occur in regions of a certain type?

bird nests

radius of cluster

Situation without subdivision

Page 26: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

What if it is known that the entities only occur in regions of a certain type?

bird nests

Situation with subdivisionland-water

radius of cluster

Page 27: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

burglary

housecar

Page 28: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Determine clusters in point sets that are sensitive to the geographic context (at least, for the relevant aspects)

Assume that a set of regions is given where points can only be, how should we define clusters?

Joint research with Joachim Gudmundsson (NICTA, Sydney) and Giri Narasimhan (U of F, Miami), 2006

Page 29: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Given a set P of points, a set F of regions, a radius r and a subset size m, aregion-restricted cluster is a subset P’ P inside a circle C where P’ has size at least m C has radius at most 2r C contains at most r2 area of regions of F

≤ 2r sum area ≤ r2

r

Page 30: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Given a set P of n points, a set F of polygons with nf edges in total, and values for r and m, report all region-restricted clusters of exactly m points

Exactly m points?“Real” clustering (partition)?Outliers?

Page 31: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles

“Real” clustering (partition)?

Outliers?

m = 5

Page 32: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles

“Real” clustering (partition)?

Outliers?

m = 5

Page 33: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

1. Determine all smallest circles with m points of P inside

2. Test if the radius is ≤ r (report) or > 2r (discard)

3. If the radius is in between, determine the area of regions of F inside

Page 34: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

1. Determine all minimal circles with m points of P inside

2. Determine all minimal circles with 3 points of P inside

Page 35: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

ordinary =order-1 VD

Page 36: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

1. Determine all smallest circles with m points of P inside

• Use (m-2)-th order Voronoi diagram: cells where the same (m-2) points are closest

• Its vertices are centers of smallest circles around exactly m points

Page 37: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

ordinary =order-1 VD

Page 38: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

order-2 VD

Page 39: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

order-3 VD

Page 40: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

The m-th order Voronoi diagram (or (m-2)) has O(nm) cells, edges, and vertices

It can be constructed in O(nm log n) time

we get O(nm) smallest circles with m points inside; for each we also know the radius

Page 41: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

2. Test if the radius is ≤ r (report) or > 2r (discard)

Trivial in O(1) time per circle, so in O(nm) time overall

Page 42: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

3. Determine the area of regions of F inside

Brute force: O(nf) time per circle, so in O(nmnf) time overall

Page 43: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Complication: This need not give all region-restricted clusters! Need to compute area of F inside a circle

with moving center Requires solving high-degree polynomials

Page 44: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

The anti-climax: we cannot give an exact algorithm!

If we takes squares instead of circles, we can deal with the problem ....

Page 45: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

3. Determine the area of regions of F insideBrute force: O(nf) time per square, so in

O(nmnf) time overall

The total time for steps 1, 2, and 3 isO(nm log n) + O(nm) + O(nmnf) =

O(nm log n + nmnf) time

Page 46: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

3. Determine the area of regions of F insideUsing a suitable data structure (only

possible for squares): O(log2 nf) time per square, so in O(nm log2 nf) time overall

The total time becomesO(nm log n + nf log2 nf + nm log2 nf)

order- (m-2)VD construction

preprocessingof data structure

total query timein data structure

Page 47: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

The squares solution generalizes toregular polygons (e.g. 20-gons)

An approximation of the radius within (1+)r gives a O(n/2 + nf log2 nf + n log nf /(m 2)) time algorithm

16-gon

Page 48: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Open problems: Develop a region-restricted version of k-

means clustering, single link clustering, ... Region-restricted co-location? Replace region-restricted by gradual model

0 /unit 2 /unit 5 /unit 8 /unit

typical: clusters:

Page 49: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

n trajectories, each with t time steps n polygonal lines with t vertices

Already looked at most visited location

Page 50: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

• Flock: near positions of (sub)trajectories for some subset of the entities during some time

• Convergence: same destination region for some subset of the entities

• Encounter: same destination region with same arrival time for some subset of the entities

• Similarity of trajectories• Same direction of movement, leadership, ......

flock convergence

Page 51: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories• Flocking, convergence, encounter patterns

– Laube, van Kreveld, Imfeld (SDH 2004)– Gudmundsson, van Kreveld, Speckmann (ACM GIS 2004)– Benkert, Gudmundsson, Huebner, Wolle (ESA 2006)– ...

• Similarity of trajectories– Vlachos, Kollios, Gunopulos (ICDE 2002)– Shim, Chang (WAIM 2003)– ...

• Lifelines, motion mining, modeling motion– Mountain, Raper (GeoComputation 2001)– Kollios, Scaroff, Betke (DM&KD 2001)– Frank (GISDATA 8, 2001)– ...

Page 52: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

• Flock: near positions of (sub)trajectories for some subset of the entities during some time– clustering-type pattern– different definitions are used

• Given: radius r, subset size m, and duration T,a flock is a subset of size m that is inside a (moving) circle of radius r for a duration T

Page 53: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 54: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories• Longest flock: given a radius r and subset size m,

determine the longest time interval for which m entities were within each other’s proximity (circle radius r)

Time = 0 1 65432 7 8

longest flock in [ 1.8 , 6.4 ]

m = 3

Page 55: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories• Meet: near some position of (sub)trajectories for some

subset of the entities– clustering-type pattern

• Given: radius r, subset size m, and duration T,a meet is a subset of size m that is inside a (stationary) circle of radius r for a duration T

this was “moving” for flock

Page 56: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.
Page 57: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

• The same subset required for a flock or meet?

Example: meet with m = 4; duration is 3+ time steps or 4+ time steps?

Page 58: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

flock

meet

fixed subset variable subset

examples for m = 3

Page 59: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

Exact results ( input size is n )

NP-hard O(n3 log n)

O(n4 2 log n + n2 3)

fixed subset variable subset

flock

meet O(n4 2 log n + n2 3)

Page 60: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

• A radius-2 approximation of the longest flock can be computed in time O(n2

log n)

... meaning: if the longest flock of size m for radius rhas duration T, then we surely find a flock of size m and duration T for radius 2r

longest flock for r at least as long a flock for 2r

Page 61: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories

Approximate radius results ( input size is n )

flock

meet

fixed subset variable subset

O(n2 log n) O((n2

log n) / 2)

O((n2 log n) / (m2))O((n2

log n) / (m2))

factor 2 factor 2+

factor 1+ factor 1+

NP-hard O(n3 log n)

O(n4 2 log n + n2 3) O(n4 2 log n + n2 3)

Page 62: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

v3

Fixed subset flock

• It is NP-complete to decide if a graph has a subgraph with m nodes that is a clique

v1 v2 v3 v4 v5 v6 v7

For every node of the graph,make an entity with a trajectory

all nodes notadjacent to v1 go here

v1

v2 v4

v5v6

v7

v1 is not adjacent tov4, v5, and v7

r

Page 63: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

v3

Fixed subset flock

v1 v2 v3 v4 v5 v6 v7

v1

v2 v4

v5v6

v7

v4 not in flock

v4 in flock

Page 64: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

v3

Fixed subset flock

v1 v2 v3 v4 v5 v6 v7

v1

v2 v4

v5v6

v7

The trajectories have a fixed flock of size m and full duration if and only if the graph has a clique of size m

flock {v4,v5,v7} of (full) duration 23 (3·7+2) and size 3

Page 65: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Fixed subset flock

• Longest fixed flock is NP-hard• Max clique has no approximation

cannot approximate duration, nor flock size• The reduction applies for all radii < 2r

v1 v2 v3 v4 v5 v6 v7

v4 not in flock

v4 in flock

Page 66: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Flock and meet algorithms

• Go into 3D (space-time) for algorithms

time

0

1

2

4

3

flock meet

Page 67: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Fixed subset flock, approximation

• An efficient radius-2 approximation algorithm of longest fixed flock exists

• Idea: if some vi is in the longest flock, then all other entities are within distance 2r from vi

radius 2r, centered at vi

vi

flock with vi

2r

Page 68: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Fixed subset flock, approximation

• For each vj, we can determine the O() time intervals where vj is in the column of vi

• Maintain the intersections for all entities in an augmented tree inO(n log n) time

• Do this for all columns (role of vi)and report longest overall pattern

Total: O(n2 log n) time

Page 69: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• The subset that forms the flock may change entities, but must stay of size m

• Any flock subset at any instant has a disk D of radius r with at least 2 entities on the boundary defining entities

r

defining entities

Page 70: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 71: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 72: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 73: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 74: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 75: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 76: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 77: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 78: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 79: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 80: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Two entities define two cylinders through time by tracing the two possible radius r disks

Page 81: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• A critical moment is where another entity is on the boundary of the disk; it may go outside or inside

Page 82: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• At a critical moment:– a variable subset flock may start (m entities)– a variable subset flock may stop (<m

entities)– Three pairs of defining entities have disks

that coincide

• There are also critical moments when two entities are at distance exactly 2r

• Between two time steps ti and ti+1 there are O(n3) critical moments in total there are O(n3 ) critical moments

2r

Page 83: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• Let the O(n3 ) critical moments be the nodes in a directed acyclic graph G

• Edges of G are between two consecutive critical moments of the same two defining entities– directed from earlier to later– weight is time between critical moments– only if at least m entities are inside the disk

time A longest variable subset flock is a maximum weight path in G

Page 84: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Variable subset flock, exact

• The graph G can be built in O(n3 log n) time• A maximum weight path can be found in

O(n3 log n) time

time

A longest variable subset flock is a maximum weight path in G

Page 85: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories, summary

• Flock and meet patterns require algorithms in 3-dimensional space (space-time)

• Exact algorithms are inefficient only suitable for smaller data sets

• Approximation can reduce running time with one or two orders of magnitude

Page 86: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Patterns in trajectories, summary

flock

meet

fixed subset variable subset

O(n2 log n) O((n2

log n) / 2)

O((n2 log n) / (m2))O((n2

log n) / (m2))

factor 2 factor 2+

factor 1+ factor 1+

NP-hard O(n3 log n)

apx

exact

apx

exact O(n4 2 log n + n2 3) O(n4 2 log n + n2 3)

Page 87: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Future research on longest trajectories

• Faster exact and approximation algorithms• Better approximation factors• Remove restriction of fixed shape of flocking region

(compact or elongated both possible during same flock)• Longest duration convergence

longest convergence

Page 88: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

Flock and meet patterns require algorithms in 3-dimensional space (space-time)

Exact algorithms are inefficient only suitable for smaller data sets

Approximation can reduce running time with an order of magnitude

Page 89: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

With an exact definition of a spatial or spatio-temporal pattern, geometric algorithms can be used to compute all patterns

Many known structures from computational geometry are useful (Voronoi diagrams, arrangements, ...)

Since the (exact) algorithms may be inefficient, approximation may be a solution

Page 90: Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University.

What patterns must be detected in practice (both spatial and spatio-temporal)?

What is the most appropriate definition (formalization) of these?

Spatial association rules, auto-correlation, irregularities, classification, ... and other computable things in spatial/spatio-temporal data mining


Recommended