Home >Documents >Greedy Map Generalization by Iterative Point Removal

Greedy Map Generalization by Iterative Point Removal

Date post:31-Dec-2016
Category:
View:213 times
Download:1 times
Share this document with a friend
Transcript:
  • Greedy Map Generalization by Iterative Point Removal

    Yanzhe ChenShanghai Jiao Tong University

    Shanghai, [email protected]

    Yin WangFacebook

    Menlo Park, CA,USA

    [email protected]

    Rong Chen Haibo Chen Binyu ZangShanghai Jiao Tong University

    Shanghai, China{rongchen,haibochen,byzang}

    @sjtu.edu.cn

    ABSTRACTThis paper describes a map generalization program we sub-mitted to the ACM SIGSPATIAL Cup 2014. In this com-petition, the goal is to remove as many points in a set ofpolygonal lines as quickly as possible with respect to twoconstraints. The topological relationships among the linesmust not change, and the relationships between a set ofcontrol points and the lines must not change. Inspired byVisvalingam-Whyatt Algorithm, we iteratively examine suc-cessive triplets along each line, and remove the middle pointif no control point or point of other lines is in the associatedtriangle. Based on the features of the training datasets, wefurther introduce many optimization techniques to speed upthe computation.

    Categories and Subject DescriptorsH.2.8 [Database Management]: ApplicationsSpatialdatabases and GIS

    General TermsAlgorithms, Experimentation, Performance

    Keywordsmap generalization, spatial index, spatial query

    1. INTRODUCTION AND OVERVIEWGeometry generalization is a well-known problem of s-

    electing the information on a map in a way that adaptsto the scale of the display medium of the map. It filtersthe unnecessary cartographic details while maintaining themaps purpose and actuality of the object being mapped.In SIGSPATIAL Cup 2014, we consider a special case ofmap generalization. The input is a set of polygonal linesthat bound polygonal regions and a set of control points.The objective is to simplify the lines by removing its mid-dle points yet preserving the topological relationships among

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contactthe Owner/Author.Copyright is held by the owner/author(s).SIGSPATIAL 14, November 04 - 07 2014, Dallas/Fort Worth, TX, USAACM 978-1-4503-3131-9/14/11. http://dx.doi.org/10.1145/2666310.2666422

    A

    C

    D

    E

    B

    ... ...

    ... ...

    Figure 1: The middle point of a successive tripletcan be removed if its associated triangle is empty,point C in this case.

    the lines, as well as the relationships between control pointsand lines. The competition is evaluated by the number ofpoints removed divided by the computation time, subject tothe penalty on lines violating topological constraints and arequired minimum number of points to be removed.

    There are two classical algorithms for map generalization.Ramer-Douglas-Peucker algorithm [1] recursively divides thepolygonal line, and preserves the point which is furthest fromthe line segment between the two endpoints, if the distanceexceeds a threshold. Visvalingam-Whyatt algorithm [4] iter-atively eliminates a point with the smallest triangle formedby it and its two neighbor points. Neither of these algorithm-s takes into account the topological relationships among linesand between lines and control points. Therefore we cannotapply these algorithms directly to our map generalizationproblem.

    Inspired by Visvalingam-Whyatt algorithm, however, weobserve that it is safe to remove a point on a polygonal line ifthe triangle associated with the point and its two neighborsdoes not contain any control point or point from other lines.Figure 1 explains this idea. In this Figure, point B cannot beremoved because its associated triangle ABC contains a redcontrol point. Point D cannot be removed because triangleCDE contains a point of another line. Point C can be safelyremoved because triangle BCD is empty. Therefore we canexamine all triangles associated with successive triplets ofeach polygonal line, and remove the middle point when thetriangle is empty.

    Multiple iterations of the above procedure can help elim-inate more points. Figure 2 illustrates the idea. On theleft side, there are two lines, ABC and DEF . We cannotremove B in the first iteration because E is inside trian-gle ABC, but it can be removed after we remove E. Onthe right side, we cannot remove B in the first iteration be-cause control point P is inside triangle ABC, but C can beremoved. The second iteration removes B.

    Overall, our map generalization algorithm takes the fol-

  • Iteration 1

    A

    B

    C

    DE

    FA

    B

    C

    D

    P

    A

    B D

    P

    A

    D

    P

    A

    B

    C

    D F

    A C

    D F

    Iteration 2

    Figure 2: Iterations remove more points.

    A

    B

    C D

    E

    A

    C D

    E

    A

    B

    E

    Greedy OptimalOriginal

    Figure 3: Our greedy algorithm is not optimal.

    Table 1: Basic statistics of the provided datasets.lines inner points control points

    Dataset1 27 992 26Dataset2 46 1,564 127Dataset3 476 8,531 151Dataset4 1,353 28,014 356Dataset5 2,331 28,323 1,607

    lowing steps.

    1. For each polygonal line, examine all successive tripletsand remove the middle point if a triplet has no con-trol point or point from other lines in its associatedtriangle.

    2. Repeat Step 1 until no more point can be removed.

    Our map generalization algorithm is greedy and it doesnot necessarily yield the minimally simplified solution. Fig-ure 3 shows an example. If we remove point B first, bothpoints C and D must remain to keep the red control pointunderneath the line. The optimal solution in this case isto keep just B. We choose this suboptimal greedy algorith-m for a good balance between minimal simplification andcomputation speed.

    Next we describe various optimization techniques in Sec-tion 2, and presents our evaluation results in Section 3. Sec-tion 4 concludes the paper.

    2. OPTIMIZATION TECHNIQUESIn this section, we explain several optimization techniques

    employed by our program. Our optimizations are empiri-cal in nature, based on the characteristics of the trainingdatasets. Table 1 shows the datasets provided by the com-petition. The first three are training datasets given beforethe submission, and the last two are testing datasets releasedafter the submission.

    Our program finishes each of the datasets in at most tensof milliseconds after optimization. The time scale is too s-mall for reliable measurements, i.e., operating system schedul-ing, disk hiccups can significantly affect the computation

    Table 2: Large shift-cloned datasets.size of line file size of point file

    Dataset1 x4500 198 MB 21.5 MBDataset2 x3000 208 MB 70.3 MBDataset3 x500 204 MB 13.8 MBDataset4 x150 197 MB 9.79 MBDataset5 x150 221 MB 44.3 MB

    0

    0.2

    0.4

    0.6

    0.8

    1

    Dataset1_x4500

    Dataset2_x3000

    Dataset3_x500

    Dataset4_x150

    Dataset5_x150

    Exec

    uti

    on

    Tim

    e (s

    eco

    nd

    )

    quadratic rstar linear

    Figure 4: R-Tree building time (s) with differentnode splitting policies.

    00.5

    11.5

    22.5

    33.5

    4

    Dataset1_x4500

    Dataset2_x3000

    Dataset3_x500

    Dataset4_x150

    Dataset5_x150

    Exec

    uti

    on

    Tim

    e (s

    eco

    nd

    )

    quadratic rstar linear

    Figure 5: R-Tree querying time (s) with differentnode splitting policies.

    time. Sometimes even the overhead introduced by the mea-surement code can dominate the overall computation. There-fore, we created synthetic large datasets by shift-cloningthe provided datasets [6]. We prefer cloning the provideddatasets instead of generating random ones because we wantto preserve the data characteristics for proper optimization.Since each dataset appears to have different characteristic-s, e.g., different ratio between lines and control points, weclone each dataset to about 200 MB, and use all of themin experiments. Table 2 shows the cloned datasets, wherethe name likeDataset1 x4500means Dataset1 shift-cloned4,500 times.

    2.1 Spatial IndexChecking whether there is any point in a triangle is the

    most time-consuming part of our algorithm. Spatial indexis the key to achieving optimal performance. We employ R-Tree [2] in our implementation to index all points, and thenget all points within the bounding box of a given triangle.For each point returned, we use boost::geometry::withinfunction to check if it is inside the triangle.

    Minimizing both coverage and overlap is crucial to theperformance of R-tree. Different R-tree variants employ d-

  • A

    B

    C

    D

    E

    F

    A C

    D

    E

    F

    Figure 6: Incorrect result by ignoring inner points.

    Table 3: Correctness rate of ignoring inner points.total lines correct lines correctness rate

    Dataset1 27 27 100.0%Dataset2 46 45 97.8%Dataset3 476 473 99.4%Dataset4 1,353 1,346 99.5%Dataset5 2,331 2,317 99.4%

    Table 4: Building and querying time includ-ing/excluding inner points in indexing.

    Time (ms) all points excluding inner pointsDataset1 41 1Dataset2 117 1Dataset3 4,582 2Dataset4 35,605 7Dataset5 80,483 10

    ifferent heuristics to split overflowing nodes, trying to mini-mize coverage and overlap [3]. We examined different nodesplitting policies implemented in the Boost library, quadrat-ic, r-star, and linear, for both the tree building and queryingtime, shown in Figures 4 and 5. We can see that the nodesplitting policy has little effect on query performance, butmakes substantial difference on

Click here to load reader

Reader Image
Embed Size (px)
Recommended