+ All Categories
Home > Documents > Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and...

Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and...

Date post: 18-Dec-2015
Category:
View: 218 times
Download: 3 times
Share this document with a friend
Popular Tags:
57
Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University
Transcript
Page 1: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computational Geometry and Spatial Data Mining

Marc van KreveldDepartment of Information and

Computing Sciences

Utrecht University

Page 2: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Two-part presentation

• Morning:Introduction to computational geometry with examples from spatial data mining

• Afternoon:Geometric algorithms for spatial data mining (and spatio-temporal data mining)

Page 3: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Spatial data mining and computation

• “Geographic data mining involves the application of computational tools to revealinteresting patterns in objects and events distributed in geographic space and across time” (Miller & Han, 2001) [ data analysis ? ]

• Large data sets attempt to carefully define interesting patterns (to avoid finding non-interesting patterns) advanced algorithms needed for efficiency

Page 4: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Introduction to CG

• Some words on algorithms and efficiency

• Computational geometry algorithms through examples from spatial data mining– Voronoi diagrams and clustering– Arrangements and largest clusters– Approximation for the largest cluster

Page 5: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms and efficiency

• You may know it all already:

• Please look bored if you know all of this

• Please look bewildered if you haven’t got a clue what I’m talking about

Page 6: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• Computational problems have an input size, denoted by n– A set of n numbers– A set of n points in the plane (2n coordinates)– A simple polygon with n vertices– A planar subdivision with n vertices

• A computational problem defines desired output in terms of the input

Page 7: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• Examples of computational problems:– Given a set of n numbers, put them in sorted

order– Given a set of n points, find the two that are

closest– Given a simple polygon P with n vertices and

a point q, determine if q is inside P

P

q

Page 8: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• An algorithm is a scheme (sequence of steps) that always gives the desired output from the given input

• An algorithm solves a computational problem

• An algorithm is the basis of an implementation

Page 9: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• An algorithm can be analyzed for its running time efficiency

• Efficiency is expressed using O(..) notation, it gives the scaling behavior of the algorithm– O(n) time: the running time doubles (roughly)

if the input size doubles– O(n2) time: the running time quadruples

(roughly) if the input size doubles

Page 10: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• Why big-Oh notation?– Because it is machine-independent– Because it is programming language-

independent– Because it is compiler-independent

unlike running time in seconds

It is only algorithm/method-dependent

Page 11: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• Algorithms research is concerned with determining the most efficient algorithm for each computational problem– Until ~1978: O(n2) time– Until 1990: O(n log n) time– Now: O(n) time

polygon triangulation}

Page 12: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Algorithms

• For some problems, efficient algorithms are unknown to exist

• Approximation algorithms may be an option. E.g. TSP– Exact: exponential time– 2-approx: O(n log n) time– 1.5-approx: O(n3) time– (1+)-approx: O(n1/) time

Page 13: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

• A Voronoi diagram stores proximity among points in a set

Page 14: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

• Single-link clustering attempts to maximize the distance between any two points in different sets

Page 15: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

Page 16: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

Page 17: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

• Algorithm (point set P; desired: k clusters):– Compute Voronoi diagram of P– Take all O(n) neighbors and sort by distance– While #clusters > k do

• Take nearest neighbor pair p and q• If they are in different clusters, then merge them

and decrement #clusters (else, do nothing)

Page 18: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

• Analysis; n points in P:– Compute Voronoi diagram: O(n log n) time– Sort by distance: O(n log n) time– While loop that merges clusters: O(n log n)

time (using union-find structure)

• Total: O(n log n) + O(n log n) + O(n log n) = O(n log n) time

Page 19: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Voronoi diagrams and clustering

• What would an “easy” algorithm have given?– really easy: O(n3) time– slightly less easy:

O(n2 log n) time

1000 n log n

10 n2 log n

n3

100 200 300

time

Page 20: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• By plane sweep

• By randomized incremental construction

• By divide-and-conquer

all give O(n log n) time

Page 21: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Study the geometry, find properties– 3-point empty circle Voronoi vertex– 2-point empty circle Voronoi edge

Page 22: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Some geometric properties are needed, regardless of the computational approach

• Other geometric properties are only needed for some approach

Page 23: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Fortune’s sweep line algorithm (1987)– An imaginary line moves from left to right– The Voronoi diagram is computed while the

known space expands (left of the line)

Page 24: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Beach line: boundary between known and unknown sequence of parabolic arcs– Geometric property: beach line is y-monotone

it can be stored in a balanced binary tree

Page 25: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Events: changes to the beach line = discovery of Voronoi diagram features– Point events

Page 26: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Events: changes to the beach line = discovery of Voronoi diagram features– Point events

Page 27: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Events: changes to the beach line = discovery of Voronoi diagram features– Circle events

Page 28: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Events: changes to the beach line = discovery of Voronoi diagram features– Circle events

Page 29: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Events: changes to the beach line = discovery of Voronoi diagram features– Only point events and circle events exist

Page 30: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• For n points, there are– n point events– at most 2n circle events

Page 31: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing Voronoi diagrams

• Handling an event takes O(log n) time due to the balanced binary tree that stores the beach line in total O(n log n) time

Page 32: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Intermediate summary

• Voronoi diagrams are useful for clustering (among many other things)

• Voronoi diagrams can be computed efficiently in the plane, in O(n log n) time

• The approach is plane sweep (by Fortune)

Figures from the on-line animation ofAllan Odgaard & Benny Kjær Nielsen

Page 33: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Arrangements and largest clusters

• Suppose we want to identify the largest subset of points that is in some small region– formalize “region” to circle– formalize “small’’ to radius r

r

Place circle to maximize point containment

Page 34: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Arrangements and largest clusters

• Bad idea: Try m = 1, 2, ... and test every subset of size m

• Not so bad idea: for every 3 points, compute the smallest enclosing circle, test the radius and test the other points for being inside

Page 35: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Arrangements and largest clusters

• Bad idea analysis: A set of n points has roughly ( ) = O(nm) subsets of size m

• Not so bad idea analysis: n points give ( ) = O(n3) triples of points. Each can be tested in O(n) time O(n4) time algorithm

nm

n3

Page 36: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Arrangements and largest clusters

• The placement space of circles of radius r

A circle C of radius r contains a point p

if and only if

the center of C lies inside a circle of radius r centered at p

C p

Page 37: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Arrangements and largest clusters

• The placement space of circles of radius r

Circles with center here contain 2 points of P

Circles with center here contain 3 points of P

Page 38: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Arrangements and largest clusters

• Maximum point containment is obtained for circles whose center lies in the most covered cell of the placement space

Page 39: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing the most covered cell

• Compute the circle arrangement in a topological data structure

• Fill the cells by the cover value by traversal of the arrangement

1 2

31

1

2

1

The value to be assigned to a cell is +/- 1 of its (known) neighbor

0

Page 40: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing the most covered cell

• Compute the circle arrangement:– by plane sweep: O(n log n + k log n) time– by randomized incremental construction in

O(n log n + k) time

where k is the complexity of the arrangement;k = O(n2)

If the maximum coverage is denoted m, then k = O(nm) and the running time is O(n log n + nm)

Page 41: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing the most covered cell

• Randomized incremental construction:– Put circles in random order– “Glue” them into the topological structure for

the arrangement with vertical extensions

Every cell has ≤ 4 sides (2 vertical and 2 circular)

Page 42: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing the most covered cell

Every cell has ≤ 4 sides (2 vertical and 2 circular)

Trace a new circle from its leftmost point to glue it into the arrangement the exit from any cell can be determined in O(1) time

Page 43: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing the most covered cell

• Randomized analysis can show that adding one circle C takes O(log n + k’ ) time, where k’ is the number of intersections with C

• The whole algorithm takes O(n log n + k) time, where k = k’ is the arrangement size

• The O(n + k) vertical extensions can be removed in O(n + k) time

Page 44: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Computing the most covered cell

• Traverse the arrangement (e.g., depth-first search) to fill the cover numbers in O(n + k) time

+1

-1

• into a circle

• out of a circle

Page 45: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Intermediate summary

• The largest cluster for a circle of radius r can be computed in O(n log n + nm) time if it has m entities

• We use arrangement construction and traversal

• The technique for arrangement construction is randomized incremental construction (Mulmuley, 1990)

Page 46: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Largest cluster for approximate radius

• Suppose the specified radius r for a cluster is not so strict, e.g. it may be 10% larger

r

Place circle to maximize point containment

(1+) r

If the largest cluster of radius r has m entities, we must guarantee to find a cluster of m entities and radius (1+) r

Page 47: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• The idea: snap the entity locations to grid points of a well-chosen grid

Snapping should not move points too much: less than r /4

grid spacing r /4 works

Page 48: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• The idea: snap the entity locations to grid points of a well-chosen grid

1

2

1

2

1

1

2

1

1

1

1

1

1

1 1

1

1

1

1

2

2

1

1

1

1

1

1

1

1

For each grid point, collect and add the count of all grid points within distance (1+/2) r

Page 49: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• The idea: snap the entity locations to grid points of a well-chosen grid

1

2

1

2

1

1

2

1

1

1

1

1

1

1 1

1

1

1

1

2

2

1

1

1

1

1

1

1

1

For each grid point, collect and add the count of all grid points within distance (1+/2) r

Collected count = 10

Page 50: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• The idea: snap the entity locations to grid points of a well-chosen grid

1

2

1

2

1

1

2

1

1

1

1

1

1

1

1

1

1

2

2

1

1

1

1

1

1

1

1

For each grid point, collect and add the count of all grid points within distance (1+/2) r

10

6

89

1

1

Page 51: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• Claim: a largest approximate radius cluster is given by the highest count

1

2

1

2

1

1

2

1

1

1

1

1

1

1

1

1

1

2

2

1

1

1

1

1

1

1

1

10

6

89

1

1

Page 52: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• Let Copt be a radius-r circle with the most entities inside

• Due to the grid spacing, we have a grid point within distance r /4 from the center of Copt that must have a count

Page 53: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• Snapping moves entities at most r /4

• C and Copt differ in radius r /2 no point in Copt can have moved outside C

• Snapped points inside C have their origins inside a circle of radius at most (1+) r no points too far from C can have entered C

Page 54: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• Intuition: We use the in different places– Snapping points– Trying only circle centers on grid points

... and we guarantee to test a circlethat contains all entities inthe optimal circle, but notother entities too far away

Page 55: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Approximate radius clustering

• Efficiency analysis– n entities: each gives a count to O(1/2) grid

cells– in O(n /2) time we have all collected counts

and hence the largest count

Page 56: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Exact or approximate?

• O(n log n + nm) versus O(n /2) time

• In practice: What is larger: m or 1 /2 ?– If the largest cluster is expected to be fairly

small, then the exact algorithm is fine– If the largest cluster may be large and we

don’t care about the precise radius, the approximate radius algorithm is better

Page 57: Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University.

Concluding this session

• Basic computational geometry ...

Voronoi diagrams, arrangements,-approximation techniques

... is already useful for spatial data mining

• Afternoon: spatial and spatio-temporal data mining and more geometric algorithms


Recommended