+ All Categories
Home > Documents > [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California...

[ACM Press the 18th SIGSPATIAL International Conference - San Jose, California...

Date post: 15-Dec-2016
Category:
Upload: shashi
View: 214 times
Download: 1 times
Share this document with a friend
10
A Lagrangian Approach for Storage of Spatio-Temporal Network Datasets: A Summary of Results Michael R. Evans * University of Minnesota [email protected] KwangSoo Yang University of Minnesota [email protected] James M. Kang University of Minnesota [email protected] Shashi Shekhar University of Minnesota [email protected] ABSTRACT Given a set of operators and a spatio-temporal network, the goal of the Storing Spatio-Temporal Networks (SSTN) prob- lem is to produce an efficient data storage method that min- imizes disk I/O access costs. Storing and accessing spatio- temporal networks is increasingly important in many soci- etal applications such as transportation management and emergency planning. This problem is challenging due to strains on traditional adjacency list representations when storing temporal attribute values from the sizable increase in length of the time-series. Current approaches for the SSTN problem focus on orthogonal partitioning (e.g., snap- shot, longitudinal, etc.), which may produce excessive I/O costs when performing traversal-based spatio-temporal net- work queries (e.g., route evaluation, arrival time predic- tion, etc) due to the desired nodes not being allocated to a common page. We propose a Lagrangian-Connectivity Partitioning (LCP) technique to efficiently store and access spatio-temporal networks that utilizes the interaction be- tween nodes and edges in a network. Experimental evalua- tion using the Minneapolis, MN road network showed that LCP outperforms traditional orthogonal approaches. Categories and Subject Descriptors H.2.3 [Storage Model]: Spatio-Temporal Network Storage Model General Terms Design, Performance, Experimentation Keywords Spatio-Temporal Networks, Storage Methods, File Struc- ture, Spatio-Temporal Databases * First Authors: Michael R. Evans and KwangSoo Yang Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GIS ’10, 03-NOV-2010, San Jose CA, USA Copyright 2010 ACM 978-1-4503-0428-3/10/11 ...$10.00. 1. INTRODUCTION This paper proposes a new data storage method for storing spatio-temporal networks into data files for use in database systems. Our work is motivated by the growing number and size of real-world spatio-temporal networks. 1.1 Motivation Analyzing movement in spatio-temporal networks is im- portant in many societal applications such as transporta- tion, distribution of electricity and gas, and evacuation route planning. The ability to efficiently store, process and ana- lyze spatio-temporal networks with large time series data would provide benefit to a wide variety of applications. Airlines connect thousands of destinations across the world through various ‘routes’ between airports. Maintaining ac- curate records of route performance is key to evaluating and ensuring timely airline service, along with analyzing poten- tial effects from various delays. Figure 1 (a) shows the vari- ous routes available from MSP airport to other cities in the United States. In order to measure route characteristics, such as average delay, each flight along the route is recorded with parameters such as flight time, delay, causes, etc. This flight information creates a spatio-temporal network from these airline routes, allowing for historical average queries, such as the one shown in Figure 1 (b) to be answered. Other, more complexed queries, such as how delay on a particular route affects connecting flights, can also be analyzed with this data. These large spatio-temporal networks, with a high number of temporal attribute data (flight instances), can benefit from efficient secondary storage techniques. The U.S. natural gas pipeline network, shown in Figure 2 is a transmission and distribution grid for transporting nat- ural gas across the continental United States. Underground storage is used for efficient and reliable delivery across this network. The transmission of natural gas through the net- work and its various storage tanks (edges and nodes with dynamic gas levels) is monitored to ensure adequate sup- port for short-term peaking and volatile swing demands for gas that occur on a daily and even hourly basis. Storing and optimizing the transport of natural gas is a large business and can benefit from the analysis of large spatio-temporal networks [1]. The Federal Highway Administration [2] is recording traf- fic data of major roads and highways using sensors such as Loop Detectors, among others, across the United States. De- pending on the type of sensor, traffic levels are recorded ev- 212
Transcript
Page 1: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

A Lagrangian Approach for Storage ofSpatio-Temporal Network Datasets: A Summary of Results

Michael R. Evans∗

University of [email protected]

KwangSoo YangUniversity of Minnesota

[email protected]

James M. KangUniversity of [email protected]

Shashi ShekharUniversity of Minnesota

[email protected]

ABSTRACTGiven a set of operators and a spatio-temporal network, thegoal of the Storing Spatio-Temporal Networks (SSTN) prob-lem is to produce an efficient data storage method that min-imizes disk I/O access costs. Storing and accessing spatio-temporal networks is increasingly important in many soci-etal applications such as transportation management andemergency planning. This problem is challenging due tostrains on traditional adjacency list representations whenstoring temporal attribute values from the sizable increasein length of the time-series. Current approaches for theSSTN problem focus on orthogonal partitioning (e.g., snap-shot, longitudinal, etc.), which may produce excessive I/Ocosts when performing traversal-based spatio-temporal net-work queries (e.g., route evaluation, arrival time predic-tion, etc) due to the desired nodes not being allocated toa common page. We propose a Lagrangian-ConnectivityPartitioning (LCP) technique to efficiently store and accessspatio-temporal networks that utilizes the interaction be-tween nodes and edges in a network. Experimental evalua-tion using the Minneapolis, MN road network showed thatLCP outperforms traditional orthogonal approaches.

Categories and Subject DescriptorsH.2.3 [Storage Model]: Spatio-Temporal Network StorageModel

General TermsDesign, Performance, Experimentation

KeywordsSpatio-Temporal Networks, Storage Methods, File Struc-ture, Spatio-Temporal Databases

∗First Authors: Michael R. Evans and KwangSoo Yang

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GIS ’10, 03-NOV-2010, San Jose CA, USACopyright 2010 ACM 978-1-4503-0428-3/10/11 ...$10.00.

1. INTRODUCTIONThis paper proposes a new data storage method for storing

spatio-temporal networks into data files for use in databasesystems. Our work is motivated by the growing number andsize of real-world spatio-temporal networks.

1.1 MotivationAnalyzing movement in spatio-temporal networks is im-

portant in many societal applications such as transporta-tion, distribution of electricity and gas, and evacuation routeplanning. The ability to efficiently store, process and ana-lyze spatio-temporal networks with large time series datawould provide benefit to a wide variety of applications.

Airlines connect thousands of destinations across the worldthrough various ‘routes’ between airports. Maintaining ac-curate records of route performance is key to evaluating andensuring timely airline service, along with analyzing poten-tial effects from various delays. Figure 1 (a) shows the vari-ous routes available from MSP airport to other cities in theUnited States. In order to measure route characteristics,such as average delay, each flight along the route is recordedwith parameters such as flight time, delay, causes, etc. Thisflight information creates a spatio-temporal network fromthese airline routes, allowing for historical average queries,such as the one shown in Figure 1 (b) to be answered. Other,more complexed queries, such as how delay on a particularroute affects connecting flights, can also be analyzed withthis data. These large spatio-temporal networks, with ahigh number of temporal attribute data (flight instances),can benefit from efficient secondary storage techniques.

The U.S. natural gas pipeline network, shown in Figure 2is a transmission and distribution grid for transporting nat-ural gas across the continental United States. Undergroundstorage is used for efficient and reliable delivery across thisnetwork. The transmission of natural gas through the net-work and its various storage tanks (edges and nodes withdynamic gas levels) is monitored to ensure adequate sup-port for short-term peaking and volatile swing demands forgas that occur on a daily and even hourly basis. Storing andoptimizing the transport of natural gas is a large businessand can benefit from the analysis of large spatio-temporalnetworks [1].

The Federal Highway Administration [2] is recording traf-fic data of major roads and highways using sensors such asLoop Detectors, among others, across the United States. De-pending on the type of sensor, traffic levels are recorded ev-

212

Page 2: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

(a) Delta Airline Routes from MSP. Courtesy:www.airlineroutemaps.com

(b) Example Route Statistics. Courtesy:www.flightstats.com

Figure 1: Airline travel information as a spatio-temporal network.

ery minute, as shown in Figure 3. The Mobility MonitoringProgram (MMP), started in 2000 by the Texas Transporta-tion Institute, aimed to evaluate the use of sensors for trafficinformation around the United States. By 2003, MMP wasreceiving traffic sensor data from over 30 cities and 3,000miles of highway, with sensor readings occurring roughly ev-ery 30 seconds. This data is then recorded 24 hours a day,365 days a year, resulting in millions of time steps per yearfor each sensor. MMP published a report citing the need forprocessing and storage of historical traffic data, and how itmay benefit traffic management [23].

In this paper, we consider storage and access of STN dataused by traversal-based road transportation applications.These applications pose queries on STNs such as shortestpath and route evaluation based on traffic levels. A routeevaluation query calculates travel time along a given routeand start time, motivated by ‘commuter’ movement where aperson headed to work at 8:00 am has a few different knownroutes and is looking for the shortest travel time. Com-panies such as NAVTEQ [3] are beginning to provide traf-fic information for road networks where travel-time for anedge is specified for all distinct five minute time-intervalsof a week. However, industry is relying on an aggregationof these spatio-temporal network datasets, specifically usinglossy compression, turning the recorded traffic informationinto “speed profiles”, in order to reduce the magnitude ofthe data. These profiles then are used to represent a sub-set of road segments. Interesting events (e.g., interesting oranomalous traffic patterns) may be missed due to this lossycompression and while useful for some applications, it may

Figure 2: The U.S. natural gas pipeline network. [1]

Figure 3: Traffic speed measurements over 30 dayson a portion of highway. Courtesy: [2]

be desirable to capture and utilize the full granularity of thedata as it is recorded.

1.2 Spatio-Temporal Networks (STN)A spatio-temporal network (STN) can be represented as

a spatial graph with temporal attributes. Spatial elementsof the graph are a finite set of nodes and a finite set of edgesconnecting nodes. Temporal attributes are represented bydiscrete time steps, as shown as a snapshot series in Fig-ure 4. In the figure, time steps 1, 2, 3, and 4 illustrate theprogression of time in the network, and the correspondingeffect on edge attribute values. For example, at Time=1,edge AC has a value of 2. In the next time step, Time=2,the edge value decreases to 1, indicating a reduced travelcost for the edge. Consider a car traveling from node A tonode D starting at time step 1. As the car traverses acrossAC, it takes 2 time steps to reach C. Once time progresses,travel time edge attributes may change, note that the traveltime edge attribute of AC changes from time step 1 to 2.

Table 1 lists the operators defined for spatio-temporal net-works as described in [11]. This paper mainly focuses onLagrangian movement queries, such as route evaluation.

Node Retrieval: In Figure 4, every node in every timestep has a boolean attribute representing whether the caris currently at that node. Calling getNode(C) will return aseries of 4 booleans indicating whether the car is present atthe corresponding time step, e.g., (False, False, True, False).Calling getNode(C, 1), thereby indicating a specific time in-stant, returning a boolean with the attribute at that mo-ment, in this case, False.

Edge Retrieval: In Figure 4, each edge has a series

213

Page 3: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

Table 1: Access Operators for Spatio-Temporal Networks from [11]atTime atAllTime atEarliest

Node getNode(node,time) getNode(node) getNodeEarliest(node,time)Edge getEdge(node1,node2,time) getEdge(node1,node2) getEdgeEarliest(node1,node2,time)Route getRoute(node1,node2,time) getRoute(node1,node2) getRouteEarliest(node1,node2,time)

Evaluate Route evalRoute(route,time) evalRoute(route) -Graph getGraph(time) getGraph() -

1 2

1 1

A

B C

D

Time = 1

1 1

2 1

A

B C

D

Time = 2

1 1

2 1

A

B C

D

Time = 3

1 1

2 1

A

B C

D

Time = 4

Figure 4: Snapshot model of a spatio-temporal net-work

of scalar attribute values representing traversal time cor-responding to each time instant. Calling getEdge(A,C) re-turns the traversal attribute series of edge AC, e.g., (2,1,1,1).Calling getEdge(A,C, 1) returns the traversal attribute ofedge AC at time instant 1, e.g., (2).

Route Evaluation Operation: As mentioned above,industry is beginning to offer route evaluation services mo-tivated by commuter’s desire to check morning traffic alongfavorite routes. Given a route A → C → D, starting attime step 1, evalRoute(ACD, 1), this operation will retrievetraversal time based on the temporal edge properties fromA → C at time 1, and C → D at time step 3. If no startingtime is given, evalRoute(ACD) returns a series of traversaltimes for which each time instant is the starting time, e.g.,(3,2,2,2).

Since we use the route evaluation query extensively inour experiments, we provide the pseudocode in Algorithm 1.Line (1) sets the original starting time, which is useful whencalculating total trip time. Line (2) iterates through eachnode in the given route, Line (3) gets the desired edge withthe getEdge operator, defined as the connection betweena node and its next connection in route N . Finally, thecurrent time value is updated with the travel time takenwhen moving across edge E in Line (4) and finally the totaltravel time is calculated in Line (6).

Route Retrieval: This operation represents the short-est path traversed between nodes, ignoring travel time if nostart time is given. However, if a start time is given, a timedependent shortest path will be found. The details of theseand other routing queries, which are out of the scope of thispaper, can be found in [11].

1.3 Problem StatementThe problem of storing spatio-temporal networks (SSTN)

can be formalized as follows: Given a spatio-temporal net-work and a set of spatio-temporal operations– find a storagescheme that minimizes the I/O costs of operations. The in-put to this problem is a spatio-temporal network, (e.g., Fig-ure 4) and a set of operations, listed in Table 1. We formally

Algorithm 1 Pseudocode for Route Evaluation

Inputs:

• R: A sequence of nodes and edges

• time: Departure time

Outputs:

• Travel Time

evalRoute(R,time)1: startTime = time2: for each edge in R do3: E = getEdge(edge.srcNode, edge.sinkNode, time)4: time = time + travel time of E5: end for6: return time - startTime

define the Storage of Spatio-Temporal Network (SSTN) prob-lem as follows:

Input:•A spatio-temporal network S

•A set of operations O

Output:•Data file containing S stored across data pages

Objective:•Minimize data page access for operations in O

Constraints:•S is too large for storage in main memory.•Preserve temporal edge attribute information.

Due to the increasing size of spatio-temporal network datasets,potentially containing hundreds of thousands of nodes andmillions of time steps, we assume that main memory cannothandle these networks. Therefore we focus on secondarystorage techniques of STNs for database systems. A sec-ondary storage method for database systems is composedof a data file (consisting of data pages) and an indexingmethod. A data file, the output of a storage method, consistsof a partitioned STN across a set of data pages. For exam-ple, a storage scheme may assign each snapshot in Figure 4to a different data page. A secondary index can be built onthe data file. For example, an unclustered B+tree [21] canbe used to identify the data record needed for a query givenkeys like (node-id, time) or (edge-id, time). Once a datapage is retrieved, the page may be reused on subsequent datarecord retrievals if they happen to collocate that data page.An efficient output of the SSTN problem should reduce datapage retrieval for the operations in Table 1 through this col-location of relevant data.

In a database environment, the I/O cost to answer queriesis determined by the number of pages which are transferredbetween disks and main memory. If topologically related

214

Page 4: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

nodes can be stored physically into the same data page, theretrieval of data pages is reduced resulting in lower disk I/O.Thus, data partitioning plays a crucial role in decreasing theI/O cost of an access method. An important observation isthat as the storage space decreases, the I/O cost is alsoreduced due to fewer numbers of pages with the same data.

Challenges: Storing spatio-temporal network datasetsis not a straightforward task due to the complexity of in-corporating temporal data into the network and the carefulanalysis required to reduce the disk I/O. In spatio-temporalnetwork database systems, the accessibility of data records isconstrained by network topology as well as temporal accesspatterns. One of the keys to improving storage methods forspatio-temporal network datasets is understanding and cap-turing the space-time interaction that occurs between nodesand edges in a network. Developing partitioning strategiesthat account for this interaction is crucial for improving theefficiency of storing and accessing spatio-temporal networks.

1.4 Related Work and LimitationsCurrent methods for storing spatio-temporal data have fo-

cused on orthogonal partitioning, e.g., snapshot and longitu-dinal partitioning. In essence, the data is segmented basedon some temporal aspect. For example, snapshot partition-ing stores data based on grouping data of the same timestep together, whereas longitudinal partitioning groups databased on the object (e.g., a node) and it’s entire time series.Spatio-temporal operators such as route evaluation do notfollow orthogonal partitions and therefore may benefit froma different kind of partitioning for storage.

Disk Page 1 Disk Page 2

1 2

1 1

A

B C

D

Time = 1

1 1

2 1

A

B C

D

Time = 2

1 1

2 1

A

B C

D

Time = 3

1 1

2 1

A

B C

D

Time = 4

Secondary Index

Data File (Disk Pages)

Figure 5: Snapshot storage of a STN

Snapshot Partitioning for spatio-temporal network stor-age can be represented by a snapshot graph, such as Fig-ure 4. Snapshot storage techniques partition data into pagesusing geometry [13, 19] or connectivity [22] methods. Fig-ure 5 shows a snapshot storage approximation, with the datapage partitioning visualized with dashed lines and page num-bering, using a small graph where multiple snapshots fit in-side a data page. However producing a time stamped staticgraph at each time step leads to great I/O cost when exe-cuting queries such as evalRoute(route, time) in Table 1 dueto the need to frequently access data pages as the STN is tra-versed. In this Snapshot example, calling evalRoute(ACD, 1)requires first accessing the traversal time attribute of edgeAC at t = 1, stored on Data Page 1. Next, edge CD att = 3 is needed to complete the route evaluation, stored on

Data Page 2. Thus, under snapshot partitioning, the routeevaluation operation has to access two different data pages.

Longitudinal Partitioning for a spatio-temporal net-work is based on the adjacency-list main memory storagestructure used by [6, 11]. Each node is stored with itsattribute information and all outgoing edges and their at-tribute information. This orthogonal storage solution, asshown in Figure 6, also suffers from the increasing disk I/Oto evaluate routes in spatio-temporal networks using opera-tors such as evalRoute(route, time) in Table 1. This exam-ple network has a short time series compared to its graphsize, allowing multiple node records (with adjacency list) tofit inside a data page. However, if the time-series length waslarger, then the node record may not fit into one data page,and be split into multiple pages. This is due to the longtime series being stored with each node, resulting in a smallnumber of nodes to be stored on each data page.

As with Snapshot partitioning, when using the Longitu-dinal method, calling evalRoute(ACD, 1) requires first ac-cessing the traversal time attribute of edge AC at t = 1,stored on Data Page 1 and then accessing edge CD at t = 3is needed to complete the route evaluation, stored on DataPage 2. Again, the route evaluation operation had to accesstwo different data pages due to the node-based orthogonalpartitioning. CCAM [22] did not consider STNs, however, ituses node-centric storage techniques. Therefore, if CCAMwas applied to a spatio-temporal network, it may end upwith a storage method resembling Longitudinal Partitioningdue to the long time series characteristic of STNs causingnode-centric storage to fill entire data pages with a singlenode’s information.

The limitations of orthogonal approaches such as Snap-shot and Longitudinal stem from their inability to capturespatio-temporal movement access patterns. In other sci-ences, spatio-temporal movement is formalized via a La-grangian frame of reference [14] attached to a user movingthrough space over time. For example, evaluation of routeACD at different start times will retrieve the following sub-sets of edges: (AC at t = 1, CD at t = 3), (AC at t = 2, CD

at t = 3), (AC at t = 3, CD at t = 4), etc. Either orthogonalapproach will require a new data page access for each edgeas they group entirely by either time or space. Our proposeduse of a Lagrangian frame of reference in our approach in-tends to move toward capturing such non-orthogonal accesspatterns.

1.5 ContributionIn this paper, we propose a novel storage and accessing

method called Lagrangian-Connectivity Partitioning (LCP)using the following key concepts: non-orthogonal STN parti-tioning, a sub-node database record format and Lagrangianuse of time-expanded networks for storage. Non-orthogonalSTN partitioning groups temporally connected informationfor storage. The new sub-node database record format al-lows this temporally connected information, with variabletime information, to be stored on a single data page. Lastly,LCP partitions data using a Lagrangian point of referenceusing a time-expanded graph representation of data, allow-ing graph partitioning to divide the network into spatio-temporal groups and therefore attempts to capture “move-ment” through a spatio-temporal network. The result is astored STN in a data file that requires less disk I/O neededfor the spatio-temporal operators listed in Table 1.

215

Page 5: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

NodeA

NodeB

NodeC

NodeD

D

1

1

1

D

B C

1 1 1 2 1 1 1

2 2 2

1 1 1

Secondary Index

Data File (Disk Pages)

Figure 6: Longitudinal storage of a STN

In summary, our contributions are as follows:

• Proposed a Lagrangian-based storage and access methodfor spatio-temporal networks.

• Proposed a cost model to estimate data I/O cost forLagrangian queries on stored spatio-temporal networks.

• Experimentally evaluated proposed storage method andcost model against traditional approaches.

1.6 Scope and OutlineWe propose a new method for storing spatio-temporal net-

works into a data file with efficient data clustering for spatio-temporal operators. Secondary indexing techniques are notconsidered, as they can be applied on top of the data file.

This paper focuses on route evaluation operations on spatio-temporal networks. For simplicity of discussion, it does notexamine compression techniques (e.g., sharing time-seriesamong edges, aggregation of time-series, etc), workload re-lated to shortest-path computations, choice of graph parti-tioning algorithms, and the infinite nature of time.

Industry is examining and implementing approaches forstoring spatio-temporal networks based on orthogonal par-titioning and sharing of time-series between edges and othercompression techniques. These “speed profiles” are not ex-amined in this paper, as we focus on full data storage as itis recorded from sensors. In addition, the intent of our workis not to evaluate industry choices but to explore conceptualideas relevant to storage of spatio-temporal networks.

The paper is organized as follows. In Section 2 we intro-duce our proposed approach, followed by our cost model.Section 3 gives our experimental evaluation followed by re-lated work in Section 4. Our concluding remarks are inSection 5.

2. PROPOSED APPROACHWe use a Lagrangian representation of a STN through a

model called a time-expanded network [9] (TEN). A TEN isa spatio-temporal network model that replicates each nodealong the time set such that a time varying attribute is rep-resented between replicated nodes. Figure 7 illustrates thespatio-temporal network displayed in Figure 4 as a time-expanded network.

The TEN is used as a representation of spatio-temporalconnectivity of the data. It allows for partitioning of thetemporal attribute data (travel time values of edges) basedon Lagrangian connectivity. It also helps illustrate how or-thogonal approaches to partitioning temporal attribute datainefficiently for Lagrangian queries. The time-expanded net-work used for the partitioning decisions is not stored on disk,only the node, edge and temporal attribute data comingfrom the input STN are stored.

A1 A2 A3 A4

B1 B2 B3 B4

C1 C2 C3 C4

D1 D2 D3 D4

A

B

C

D

1 2 3 4Time Step

Figure 7: STN as a time-expanded network

Our approach, Lagrangian-Connectivity Partitioning (LCP),utilizes a time-expanded graph to capture non-orthogonalaccess patterns of route evaluation operations along with anovel data record based on sub−nodes. This section detailseach of our main contributions and provides a cost model forestimating disk I/O based on STN operators (e.g., Table 1).

We propose an asynchronous time series record formatfor storing our temporal attribute data. Data can be storedbased on either synchronous or asynchronous time series.Synchronous time grouping, or clustering data within a settime series, stores some number of nodes and edges with a settime interval, see Figure 8. Each page in both approaches,Snapshot and Longitudinal, represents a synchronous timeinterval, either one time step or the entire time series in thisexample. For example, a page in both approaches, Snap-shot or Longitudinal, represents a synchronous time inter-val where the Snapshot method stores its all data relatingto a single time step in a record. The Longitudinal stor-age method stores an entire temporal attribute’s time series(broken into multiple records if needed). Asynchronous timedata grouping allows for storing data disjoint time intervalsand yet be grouped and stored on the same data page. Thismodel, which we refer to as sub-nodes, allows for storage ofdisjoint temporal attribute data in the same record. Thisis useful as, for route evaluation queries, it is seldom thattemporal data is accessed orthogonally.

2.1 Lagrangian-Connectivity PartitioningThe Lagrangian-Connectivity Partitioning (LCP) method

optimizes disk storage based upon traversal across spatio-temporal networks. Data is stored using the sub-node recorddesign, allowing for non-orthogonal temporal information tobe stored on a data page.

By representing a spatio-temporal network as a modifiedtime-expanded graph, focusing on Lagrangian connectionsbetween nodes (movement edges), a min-cut graph parti-tioning [5] algorithm creates partitions clustering nodes byminimizing the cuts of these movement edges. This results in

216

Page 6: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

A1 A2 A3 A4

B1 B2 B3 B4

C1 C2 C3 C4

D1 D2 D3 D4

A

B

C

D

1 2 3 4Time Step

Page 1

Page 2

Page 3

Page 4

Disk:

(a) Snapshot Partitioning

A1 A2 A3 A4

B1 B2 B3 B4

C1 C2 C3 C4

D1 D2 D3 D4

A

B

C

D

1 2 3 4Time Step

Page 1

Page 2

Page 3

Page 4

Disk:

(b) Longitudinal Partitioning

Figure 8: Orthogonal partitioning of Spatio-Temporal Networks.

LCP collocating connected spatio-temporal nodes togetheron data pages, stored as sub-node records. We propose LCPwill result in more efficient I/O when using STN operatorsfrom Table 1 or queries composed of them.

To illustrate, we call the route evaluation operation eval-Route(ACD,1) on each of the partitioning methods. Withorthogonal partitioning, Snapshot or Longitudinal, when-ever an edge is traversed, (e.g., A1 to C3 and C3 to D4),a disk I/O is needed to retrieve the data page containingthe record for the next node. However, with Lagrangian-Connectivity Partitioning, traversing from node A1 to C3and then C3 to D4 requires only one data page as all rele-vant sub-node records are collocated on the same data page.The pseudocode shown in Algorithm 2.

A1 A2 A3 A4

B1 B2 B3 B4

C1 C2 C3 C4

D1 D2 D3 D4

A

B

C

D

1 2 3 4Time Step

Page 1

Page 2

Page 3

Page 4

Disk:

Figure 9: Lagrangian-Connectivity Partitioning

The input for the pseudocode is a spatio-temporal net-work consisting of nodes, edges and time values for each

Algorithm 2 Pseudocode for the LCP Method

Inputs:

• A set of nodes V

• A set of edges E

• A set of travel times T

• P: size of data page

Outputs:

• Data Pages written to disk

LCP1: numPages = estimate num of pages using V,E,T and P2: TEN[] = create time-expanded network from V,E,T3: Part[] = run graph partitioning on edges in TEN using

numPages4: for each partition in Part[] do5: SN[] = create sub-nodes from partition6: for each sub-node in SN do7: RID = write sub-node to a data page8: end for9: end for

edge along with the physical page size for storage on disk.The output is a data file consisting of the data pages, con-taining records of node and edge information. The min-cut graph partitioning algorithm used requires a predefinednumber of partitions; therefore Line (1) estimates the num-ber of pages needed based on the size of the spatio-temporalnetwork and the size of the data page. Line (2) expandsthe spatio-temporal network into a time-expanded networkfor the min-cut partitioning in Line (3). The result of thepartitioning method is an array of partitions of the time-expanded network which are then converted to sub-nodesin Line (5). Iterating through these partitions with Lines(4-9), each partition is converted into a set of sub-nodes inLine (6) and written to disk in Line (8).

Figure 9 illustrates the results of LCP applied to the sameinput STN as in the other orthogonal partitioning meth-ods. A min-cut partitioning was run on the modified time-expanded graph (wait edges are removed to emphasize La-grangian connectivity) and then stored as sub-nodes on fourdata pages. For efficiency, we used a bulk load operation,which sorts these statements along the block number andinserts them into data pages. Physically, sub-node datarecords are stored in data pages and a B+tree index is cre-ated to support retrieve operations.

2.2 Cost ModelTraditional spatial networks use a connectivity ratio to

measure predicted disk I/O [22]. We extend this connec-tivity ratio to formulate a spatio-temporal measurement wecall LRatio (Lagrangian connectivity Ratio). LRatio mea-sures the connectivity along time and space in a STN. InEquation 1, Lagrangian edges refer to edges connectingnodes through time, such as the edges displayed in a time-expanded network. This metric ignores the ‘wait’ edges in atime-expanded network, maximizing LRatio minimize diskI/O for STN Lagrangian operators.

LRatio =Total number of unsplit Lagrangian edges

Total number of Lagrangian edges(1)

Intuitively, the average number of disk accesses (DA) in

217

Page 7: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

Equation 2 according to a STN route can be expressed as afunction of the LRatio and the number of accessed nodes:

DA(route) = 1 + (1 − LRatio) ∗ (Route Length − 1) (2)

where RouteLength is the number of edges traversed inthe route. To be specific, for a route with n edges, the firstedge is accessed by one disk I/O. The remaining n−1 edges,are accessed by (n−1)∗ (1−LRatio) disk I/O because eachedge has a (1 − LRatio) chance to cause additional diskI/O. This approximation is reasonable because, in a time-expanded network, 1) accessing nodes causes a page faultonly when the access method meets a split edge and 2) allnodes are always accessed in increasing temporal order.

3. EXPERIMENTAL EVALUATIONIn this section we evaluate our proposed method against

traditional approaches using route evaluation queries andrelevant STN operators from Table 1. All experiments wereperformed on an Intel Core 2 Duo CPU machine runningMicrosoft Windows XP with 4GB of RAM.

Figure 10: Minneapolis, MN road network [4]

3.1 Experiment Setup

Traffic Generation

Clustering Methods

3. Lagrangian−Connectivity 2. Longitudinal 1. Snapshot

Route Evaluation

Disk block size

Minneapolis Road Networkwith Traffic Data

Time Series LengthTraffic distribution

Generate Route

Route Length

Data Pages Data Pages

Routes Page Access Analysis

LCR, I/O Cost

Buffers

Minneapolis Road Network

Figure 11: Experimental Setup

Figure 11 gives our experimental setup. Using a Min-neapolis, MN roadmap from the Minnesota Department ofTransportation [4], we created and stored three data files,

one generated by LCP, and the other two using orthogo-nal partitioning methods. These stored networks were thenevaluated with a route evaluation workload, specific to eachexperiment, and sent for analysis.

The dataset consisted of 1,140 nodes and 3,764 edges. Theedge data were classified into three types: highways, countyroads, and side streets. The travel time attribute series wassynthetically generated based on various Gaussian distribu-tions for each road type and took into account activity levelsover a day, e.g., rush hour. We set the length of time stepsfor the STN to 288 instants (5 minute slots) and generatedthe time-expanded network based on replicated nodes andtravel times.

Route Evaluation: In our study, we focused on a pop-ular query distinct to spatio-temporal networks. This queryis an encapsulation of the operator first mentioned in Ta-ble 1 and returns the travel time between two nodes givena set route and start time. This query represents real-worldqueries calculating estimated travel time for a given routeat a given start time.

3.2 LCP Approximation: ATSSWe performed experiments using three different STN stor-

age candidates. The first two, the orthogonal Spatial andLongitudinal methods are compared against the LCP method.A fourth candidate, the aggregated time-stamped snapshot(ATSS) method, is presented here as a simple alternative toLCP, attempting non-orthogonal partitioning without theneed to create a time-expanded graph.

ATSS partitions the network based on static network con-nectivity using CCAM [22] and then the time-series infor-mation is divided into temporal chunks, similar to the Snap-shot method, only with multiple sequential time instants in-stead of one time instant. The temporal information foreach node is segmented based on this time interval lengthand the proposed non-orthogonal record format is used tostore the nodes with their temporal subsets.

A1 A2 A3 A4

B1 B2 B3 B4

C1 C2 C3 C4

D1 D2 D3 D4

A

B

C

D

1 2 3 4Time Step

Page 1

Page 2

Page 3

Page 4

Disk:

Figure 12: Aggregated Time-Stamped Snapshot

The ATSS method can be seen as a trade-off betweenSnapshot and Longitudinal partitioning. Since the networkis sliced with respect to time intervals, we need to define aparameter as time intervals to slice the graph, see Figure 12where the time interval parameter is 2 time steps. It isdifficult to determine an adequate value for the time lengthparameter. The strength of the aggregated time-stampedsnapshot model is that it is relatively practical when thetravel time is fairly uniform. The main disadvantage is thatit is not possible to determine the appropriate time interval

218

Page 8: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

parameter value to yield a better performance.

3.3 Experimental ResultsIn our experiments, we compare the LCP storage method

against the Snapshot and Longitudinal approaches, and dis-cuss results. Lastly, we compare LCP and ATSS in Experi-ment 4 as the only non-orthogonal partitioning methods.

0

5

10

15

20

25

30

35

40

10 15 20 25 30 35 40

Ave

rage

num

ber

of p

ages

acc

esse

d

Route Length

SnapShotLongitudinal

LCP

Figure 13: Experiment 1 - The effect of the routelength. Note that Snapshot and Longitudinal areoverlapping.

Experiment 1: Effect of Route Length.To evaluate query performance, we varied route length interms of nodes traversed along the route, and compared thenumber of data pages accessed by the three storage meth-ods, Snapshot, Longitudinal and LCP. We used 1,500 ran-domly generated simple-path routes over the STN and var-ied the route length from 10 to 40 edges traversed. We used4KB block size and one buffer cache. The number of buffersshowed no effect on performance due to the progression oftime and the properties of simple-paths, in that neither timesteps nor nodes can be revisited.

Figure 13 shows the performance comparison using allthree models or different route lengths. For all methods,the number of data page accesses for route evaluation queriesincreases along with the increase in the route length. How-ever, Snapshot and Longitudinal partitioning perform worsein this experiment at every step. This is due to the orthogo-nal partitioning of data which requires accessing a new datapage for every next node accessed in a path. Snapshot ac-cesses a new data page for every node because it stores eachtime instant on separate pages. Longitudinal does this dueto the long time series used, therefore filling entire disk pageswith a single node’s attribute data. Only LCP requires lessdata pages accessed than nodes accessed.

Experiment 2: Size of Data Page.This experiment evaluates the effect of varying the size ofthe data page on data page access. If more informationcan stored on each data page, this potentially increases thedisk access efficiency of the orthogonal storage methods iftemporally connected information can be collocated on asingle page, as LCP is designed to do. Figure 14 showsaverage I/O costs and the LRatio as we varied the blocksize from 2 KB to 16 KB. We observed an improvement indisk I/O efficiency for LCP, as expected, due to storing moretemporally connected information on a data page. With theLongitudinal partitioning model, increasing the size of thedisk block allowed more than one partition to fit on a data

0

5

10

15

20

25

30

35

40

45

2 4 8 16

Ave

rage

num

ber

of p

ages

acc

esse

d

Block Size(K)

SnapShotLongitudinal

LCP

(a) Data Page Access

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 4 8 16

Lagr

angi

an C

onne

ctiv

ity R

atio

Block Size(K)

LCPLongitudinal

SnapShot

(b) LRatio

Figure 14: Experiment 2 - Effects of varying the sizeof data pages

page, resulting in an improvement in data page access andLRatio measurement.

8

9

10

11

12

13

14

15

0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82

Ave

rage

num

ber

of p

ages

acc

esse

d

Lagrangian Connectivity Ratio

measured page accessespredicted cost model

Figure 15: Experiment 3 - Accuracy of the costmodel in a Lagrangian path evaluation

Experiment 3: Cost Model Evaluation.We measured the accuracy of the LRatio cost model fromSection 2.2. To measure LRatio, we read all data pagesstored in a data file for each storage method and countsplit and unsplit movement edges. Our experimental re-sults matched the predicted I/O cost from LRatio within10% (Figure 15). This gives some credence to the LRatiometric as a cost-estimator for STN operators and we there-fore use the LRatio measurement as a performance metricin our experiments.

Experiment 4: Evaluating ATSS.This experiment compares the LCP and ATSS methods.The intent of this experiment is to evaluate the effects ofdifferent kinds of non-orthogonal partitioning. Figure 16(a)shows the ATSS method compared to the LCP method.ATSS was run with six different time interval parametervalues, which range from a interval length of 2 time steps to20. Note how if the time window is too long or too short,performance degrades. Also note that LCP consistently out-performs ATSS without any need of ‘parameter tuning’. Theresult is also visualized in Figure 16(b) using the LRatio cost

219

Page 9: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

model. The LRatio metric varies depending on the time in-terval parameter selected for the ATSS partitioning. Thisdemonstrates how ATSS is sensitive to parameter tuningand the composition of the dataset.

2

4

6

8

10

12

14

16

18

20

22

24

10 15 20 25 30 35 40

Ave

rage

num

ber

of p

ages

acc

esse

d

Route Length

ATSS_02ATSS_04ATSS_06ATSS_08ATSS_10ATSS_20

LCP

(a) LCP vs ATSS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5 10 15 20

Lagr

angi

an C

onne

ctiv

ity R

atio

length of time slots

LCPATSS

(b) LRatio of ATSS

Figure 16: Experiment 4 - Comparison of Non-Orthogonal Methods

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 2.5 3 3.5 4 4.5

Lagr

angi

an C

onne

ctiv

ity R

atio

Edge/Node ratio

Lagrangian

(a) edge/node ratio

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

200 400 600 800 1000 1200 1400 1600

Lagr

angi

an C

onne

ctiv

ity R

atio

number of time slots

Lagrangian

(b) time steps length

Figure 17: Experiment 5 - Changing the Spatio-Temporal Network

Experiment 5: Changing the Spatio-Temporal Network.In this experiment, the spatial connectivity and temporallength were varied on a synthetic dataset in order to gaugerobustness. The effect of the length of time steps on I/Ocosts is shown in Figure 17(b). We used different time se-ries lengths and therefore changed the number of pages usedbased on the increased data. As can be seen, the time se-ries length does not affect performance. This property isdesirable when a user stores large time steps incremental.That is, we can slice large time steps and store each sectionindividually. In Figure 17(a), we varied the number of con-nections between each node, referred to as the edge/noderatio or connectivity ratio. A typical road network has lowconnectivity, between 2 and 3 edges per node. Preliminaryexperiments show LRatio may suffer as the connectivity ra-tio increases.

4. RELATED WORKA broader set of related work for the SSTN problem is

summarized by Table 2. Much work has been done in thegeometric space, indexes for both space and space-time havebeen popular for years [16, 19, 13, 6, 10, 12]. However, ge-ometrical approaches, (e.g., Euclidean distance), may notbe ideal for spatial networks. CCAM [22] demonstratedthat geometric partitioning less efficient when dealing withnetworks and that topological connectivity may be better.In connectivity based partitioning, orthogonal partitioningmethods, such as the Longitudinal or Snapshot method,could capture network connectivity based on either space ortime independently. For instance, CCAM considered onlyspatial connectivity when partitioning and storing spatialnetworks, and therefore would likely emulate the Longitu-dinal method when given a spatio-temporal network, espe-cially one with large time series such as the ones mentionedin this paper.

Table 2: Related work for Spatio-Temporal Network

Spatial Spatio-TemporalGeometric R-Tree [13], Quad

Tree [19]MVR-Tree, TPR-Tree [16]

Topological CCAM [22] LCP, Snapshot,Longitudinal

Both Longitudinal and Snapshot method store data assynchronous mode, that is, either space or time is fixed anddata are arranged sequentially. In CCAM [22], temporalinformation is not considered as it is a node-based parti-tioning and a naive extension would be to use synchronoustime grouping for each node, either with some time intervalor the entire time series (see Figure 18). The synchronousdata structure, however, could not store the spatial tem-poral connectivity, such as Lagrangian-connectivity. Oursub-node storage design focuses on non-orthogonal spatio-temporal node storage based on Lagrangian-connectivity.This allows multiple nodes, connected through Lagrangianedges, to be on the same data page and reduce I/O cost.

(CCAM, Long. Part.)

Asynchronous(LCP)

Partial History(ATSS)

Data Grouping

Synchronous

Full History

Figure 18: Related work in Record Formats forTime Series Storage

Topological ordering, traversal partitioning, and graphpartitioning have been used to optimize access methods ofgraphs [8, 15, 17, 18]. Topological ordering and traver-sal partitioning, however, need preprocessing to layout thegraph and is slower than graph partitioning. In graph parti-tioning literature [7, 20], a multi-level partitioning algorithmwhich balance cluster size and minimize the min-cut, is oneof the more efficient methods and we used Metis [5] as multi-level partitioning way to group the Lagrangian connectivity.

220

Page 10: [ACM Press the 18th SIGSPATIAL International Conference - San Jose, California (2010.11.02-2010.11.05)] Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic

5. CONCLUSIONS AND FUTURE WORKSpatio-temporal networks are becoming increasingly pop-

ular for a variety of important societal applications such astransportation management, fuel distribution, airline man-agement, electrical grid usage analysis, etc. Traditional ap-proaches for the SSTN problem have focused on orthogo-nal partitioning (e.g., snapshot, longitudinal, etc.) of thenetwork, which produces significant I/O costs when per-forming Lagrangian movement queries (e.g., route evalua-tion). We proposed a Lagrangian-Connectivity Partitioning(LCP) method to efficiently store and access spatio-temporalnetworks that utilizes the interaction between nodes andedges in a network. We introduced a sub-node record for-mat for storing STN data based on non-orthogonal spatio-temporal network partitioning. Experimental evaluation ofLCP demonstrated significant improvements over previouswork.

Our immediate future work is to expand our experimentsto a real world dataset, with hundreds of thousands of nodes,edges and time steps. Another interesting experiment wouldevaluate the lossy compression (speed profiles) approach. In-teresting comparisons would look for travel time accuracyalong with evaluating interesting events in the data thatmay be lost through compression. Other future intentionsfor this work include exploring more complex queries, suchas time-dependent shortest path computation, etc. Lastly,the chosen min-cut graph partitioning method was used dueto its simplicity and available implementation. Other, morecomplex graph partitioning algorithms may be more efficientat capturing unique characteristics of spatio-temporal net-works.

6. ACKNOWLEDGMENTSWe would like to thank the National Science Foundation

and the US Department of Defense for their support with thefollowing grants: NSF III-CXT IIS-0713214, NSF IGERTDGE-0504195, NSF CRI:IAD CNS-0708604 and USDODHM1582-08-1-0017, USDOD HM1582-07-1-2035. We thankESRI for open conversations regarding spatio-temporal net-works. We are particularly thankful to the reviewers fortheir helpful comments, especially in relation to suggest-ing the comparison with the compression technique such as“speed profiles”. We also extend thanks to the University ofMinnesota Spatial Databases and Spatial Data Mining Re-search Group for their comments. We would like to thankKim Koffolt for improving the readability of this paper.

7. REFERENCES[1] EIA. “U.S. Natural Gas Pipelines”.

www.eia.doe.gov/pub/oil gas/natural gas/analysis publications/ngpipeline/index.html.

[2] Federal High Administration, www.fhwa.dot.gov.

[3] NAVTEQ, www.navteq.com.

[4] Minnesota Department of Transportation,www.dot.state.mn.us/.

[5] A. Abou-Rjeili and G. Karypis. Multilevel algorithmsfor partitioning power-law graphs. In Parallel andDistributed Processing Symposium, 2006. IPDPS2006. 20th International, page 10, 2006.

[6] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin.Network flows. Englewood Cliffs, New Jersey, 1993.Prentice Hall, Inc.

[7] C. Alpert and A. Kahng. Multiway partitioning viageometric embeddings, orderings, and dynamicprogramming. IEEE Transactions on Computer-aidedDesign of Integrated Circuits and Systems,14(11):1342–1358, 1995.

[8] J. Banerjee, W. Kim, S. J. Kim, and J. F. Garza.Clustering a DAG for CAD databases. volume 14,page 1684, Nov. 1988.

[9] L. R. Ford and D. R. Fulkerson. Constructingmaximal dynamic flows from static flows. InOPERATIONS RESEARCH Vol. 6, No. 3, May-June1958, pp. 419-433 DOI: 10.1287/opre.6.3.419, 1958.

[10] A. Frank, S. Grumbach, R. Gueting, C. Jensen,M. Koubarakis, N. Lorentzos, Y. Manolopoulos,E. Nardelli, B. Pernici, H. Schek, et al. Chorochronos:A research network for spatiotemporal databasesystems. SIGMOD Record, 28(3):12–21, 1999.

[11] B. George, S. Kim, and S. Shekhar. Spatio-temporalnetwork databases and routing algorithms: Asummary of results. In SSTD, pages 460–477, 2007.

[12] B. George and S. Shekhar. Time-aggregated graphs formodeling spatio-temporal networks. Journal on DataSemantics XI, pages 191–212, 2008.

[13] A. Guttman. R-trees: a dynamic index structure forspatial searching. In Proceedings of the 1984 ACMSIGMOD international conference on Management ofdata, pages 47–57. ACM, 1984.

[14] J. Herrera and A. Bayen. Incorporation of Lagrangianmeasurements in freeway traffic state estimation.Transportation Research Part B: Methodological, 2009.

[15] E. G. Hoel, W.-L. Heng, and D. Honeycutt. Highperformance multimodal networks. In SSTD, pages308–327, 2005.

[16] L.-V. Nguyen-Dinh, W. G. Aref, and M. F. Mokbel.Spatio-temporal access methods: Part 2 (2003 - 2010).IEEE Data Eng. Bull., 33(2):46–55, 2010.

[17] D. Papadias, J. Zhang, N. Mamoulis, and Y. Tao.Query processing in spatial network databases. InProceedings of the 29th international conference onVery large data bases-Volume 29, page 813. VLDBEndowment, 2003.

[18] H. Samet. Foundations of multidimensional andmetric data structures. Morgan Kaufmann, 2006.

[19] H. Samet. Spatital data structures. In ACMSIGGRAPH 2007 courses, page 1. ACM, 2007.

[20] S. Schaeffer. Graph clustering. Computer ScienceReview, 1(1):27–64, 2007.

[21] S. Shekhar and S. Chawla. Spatial Databases: A Tour.Prentice Hall. ISBN 013-017480-7., 2003.

[22] S. Shekhar and D.-R. Liu. CCAM: Aconnectivity-clustered access method for networks andnetwork computations. IEEE Trans. Knowl. DataEng., 9(1):102–119, 1997.

[23] S. Turner, R. Margiotta, and T. Lomax. Lessonslearned: monitoring highway congestion and reliabilityusing archived traffic detector data. US Department ofTransportation, Federal Highway Administration, 2004.

221


Recommended