Oriented Online Route Recommendation for Spatial ...csmlyiu/conf/SSTD15_crowd.pdf · Spatial...

Oriented Online Route Recommendation for SpatialCrowdsourcing Task Workers?

Yu Li, Man Lung Yiu, and Wenjian Xu

Department of Computing, Hong Kong Polytechnic University,Hong Kong{csyuli, csmlyiu, cswxu}@comp.polyu.edu.hk

Abstract. Emerging spatial crowdsourcing platforms enable the workers (i.e.,crowd) to complete spatial crowdsourcing tasks (like taking photos, conductingcitizen journalism) that are associated with rewards and tagged with both timeand location features. In this paper, we study the problem of online recommend-ing an optimal route for a crowdsourcing worker, such that he can (i) reach hisdestination on time and (ii) receive the maximum reward from tasks along theroute. We show that no optimal online algorithm exists in this problem. There-fore, we propose several heuristics, and powerful pruning rules to speed up ourmethods. Experimental results on real datasets show that our proposed heuristicsare very efficient, and return routes that contain 82–91% of the optimal reward.

1 Introduction

Spatial crowdsourcing platforms1 2 publish crowdsourcing tasks that are associated withrewards and tagged with spatial / temporal attributes (e.g., location, release time anddeadline). To complete a task, a worker must reach the task’s location before its dead-line. Popular tasks include taking photos, reporting activities / accidents, and verifyingdata on-site, etc.

Regarding the matching between tasks and workers, existing approaches on spatialcrowdsourcing can be divided into: (i) the server-centric mode [15,16], where the serverassigns tasks to workers based on their reported locations / regions, or (ii) the worker-centric mode [3,7,10], where the server publishes its tasks and let workers to choose anytask freely. In this paper, we adopt the worker-centric mode as it protects the locationprivacy of the worker [10] and enables the worker to choose tasks autonomously fromdifferent crowdsourcing platforms which he has registered in.

The closest work to ours is the maximum task scheduling (MTS) problem [10]. Itreturns a route that covers the maximum number of tasks (in a worker’s specified region,e.g., his city). Since [10] considers the MTS problem at a snapshot, it would not updatethe worker’s route when new tasks arrive. We illustrate it in Figure 1a. Assume that weuse the Manhattan distance and each grid takes a time unit to travel. Each task pi istagged with its release time and deadline. Suppose that the worker starts from s at time0. The MTS route is s → p1 → p2. The solution in [10] would not update the routewhen new tasks are released (e.g., p3, p4).? The research is partly supported by grant GRF 152201/14E from Hong Kong RGC.1 www.clickworker.com/en/mobile-crowdsourcing2 features.en.softonic.com/mobile-crowdsourcing-does-it-work

[0-7]p2

[2-7]p3

s

p1

[0-5]

0 1 2 30

1

2

3

p4

[3-7]

4 5 x

y

Arrive here before time 8

[0-7]p2

[2-7]p3

p4

[3-7]

p1

[0-5]

d

s

0 1 2 30

1

2

3

4 5 x

y

(a) snapshot route by MTS [10] (b) online route by our methodFig. 1. Route recommendation for the worker: each task pi with [release time - deadline]

In this paper, we wish to support two extra requirements compared to [10]: (R1)update the worker’s route online with respect to newly released tasks and (R2) alignwith the worker’s trip, i.e., reaching a destination before expected time. It is impor-tant to support R1 in order to assign a worker as many tasks as possible. New spatialcrowdsourcing tasks are indeed being released continuously in real systems3. We alsoconsider the requirement R2 as the worker may have planned his own activities, e.g.,reaching a specified destination by an expected time [17]. Such worker is willing to takecrowdsourcing tasks along his trip provided that he can arrive at his destination on time.

To this end, we study the online route recommendation problem for spatial crowd-sourcing workers, by taking requirements R1 and R2 into consideration. Figure 1b il-lustrates the route recommended by our method. Suppose that the worker starts from s attime 0 and plans to arrive at home (5, 0) at time 8. At time 0, the worker is recommendedto take the task p2. When new tasks are released (e.g., p3, p4), the worker is recom-mended to take them. In summary, our recommended route is s→ p2 → p3 → p4 → d,which covers 3 tasks and reaches the destination d on time.

To the best of our knowledge, this paper is the first on tackling the online route rec-ommendation problem for spatial crowdsourcing workers with destination and arrivaltime constraints. We contribute the followings:

– We show that no algorithm can achieve a non-zero competitive ratio [2] in ouronline problem, meaning that the number of tasks found by any online algorithmmay be arbitrarily small compared to the optimal offline solution.

– We propose two categories of heuristics (GetNextTask and Re-Route) that offertrade-offs between the response time and the number of tasks. GetNextTask greed-ily selects the next task to complete so it incurs a short response time. On the otherhand, Re-Route produces a route with more tasks as it conducts a complete searchto update the optimal route with respect to newly released tasks.

– We further propose pruning rules to reduce the response time of Re-Route.Experiments on real datasets show that our methods take less than 1 second to updatethe route, and return routes that contain 82–91% of the optimal number of tasks.

The remainder of this paper is organized as follows. We formally define our prob-lem in Section 2. Then, we illustrate our proposed heuristics in Section 3 and present

3 www.clickworker.com/en/clickworkerjobwww.lionbridge.com

optimization techniques in Section 4. In Section 5, we test the performance of our pro-posed techniques on both real and synthetic datasets. Section 6 highlights the relatedwork. Finally, we conclude our paper in Section 7.

2 Problem Statement

We first introduce some terminology and then define our problem formally.

Definition 1 (Task p). We denote a task by psid,kid = (loc, [t−p , t+p ]), where loc is the

task’s location, t−p , t+p are the release time and deadline of the task, respectively. The

subscripts sid and kid denote the task’s server ID and task ID, respectively. A workermay complete p and collect the reward4 if he can reach p.loc before t+p .

Definition 2 (Query q). We denote a query q by q = (s, d, [t−q , t+q ]). s and d are the

worker’s start and destination locations, respectively. t−q and t+q are the start time froms and expected arrival time at d, respectively.

Definition 3 (Travel Time τ ). We denote the travel time as τ(v, u) = dist(v,u)speedq

, wheredist(v, u) is the distance5 between v and u, and speedq is the (constant) travel speedof the worker for q. τ(R) denotes the travel time along a route R (via vertices on R).

With the above terminology, we are ready to define our problem formally below.

Problem 1 (Oriented Online Route Recommendation (OnlineRR)). Let a worker’squery be q = (s, d, [t−q , t

+q ]). OnlineRR aims to find a route such that it covers the

maximum number of tasks and the worker can arrive at d by t+q . It may update the routeaccording to the worker’s live location and the new tasks released by crowdsourcingservers.

Route Recommender

Cro

wd

sou

rcin

g T

ask S

erver

s

...

tasks

q

Ropt

tasks

tasks

...

Worker

Fig. 2. System architecture

We adopt the system architecture as depicted in Figure 2. Spatial crowdsourcingservers publish new spatial crowdsourcing tasks. A worker may install our route rec-ommender on his mobile device (smartphone). The route recommender is responsiblefor: (i) collecting task information from different servers continuously, (ii) recommend-ing / updating a route based on the worker’s current location and available tasks.

4 The reward of a task can be collected by the same worker for only once. Similar to [10], weassume that each task has a unit reward and can be completed immediately.

5 Our method can be applied to any distance function provided that it satisfies the triangle in-equality, such as Euclidean distance, Manhattan distance, and road network distance.

3 Online Route Recommendation

First, we prove in Section 3.1 that no online algorithm can achieve a non-zero compet-itive ratio in OnlineRR. Then, we propose two categories of heuristic approaches forOnlineRR in Sections 3.2 and 3.3.

3.1 Competitive Analysis

We use the competitive ratio [2] to measure the performance of online algorithms. SinceOnlineRR is a maximization problem, the competitive ratio CR is defined as:

CR = mine∈E

count(Ralg(e))

count(Ropt(e))(1)

where E denotes the set of all problem instances, Ralg(e) is the route recommended byan online algorithm alg for instance e, Ropt(e) is the optimal route Ropt for instance e(cf. Definition 4), and count(R∗(e)) means the number of tasks on R∗(e).

Definition 4 (Optimal route Ropt(e) for OnlineRR). Given a problem instance e,we denote its optimal route by Ropt(e), which is obtained under assumption that theinformation of all tasks are known in advance (even before their release times).

We show our competitive analysis below. It applies to any online algorithm, includ-ing both deterministic algorithms and randomized algorithms.

Theorem 1. No online algorithm has a non-zero competitive ratio for OnlineRR.

Proof. Since CR = mine∈Ecount(Ralg(e))count(Ropt(e))

, it suffices to find a specific instance (i.e.,the adversary) that makes CR as low as possible. Without loss of generality, in thefollowing proof, we consider only locations on the positive half line [0,+∞). For thequery, we set t−q = 0, s = 0, t+q = 10, d = 7. Assume that speedq = 1, that isτe(v, u) = |v − u|. We simply denote a task p by (p.loc, [t−p , t

+p ]).

At time 0, the adversary releases a task p1 = (3, [0, 3]). At time m = 3, the adver-sary will check the worker’s current location (say x), and then decides to further releasen tasks accordingly. There are two cases: (1) x = 0, or (2) x > 0. We show that theadversary can release those n tasks to make CR arbitrarily small.

Case 1: x = 0. In this case, the adversary will release tasks p2≤i≤n+1 = (2, [3, 4])(see Figure 3a). The worker cannot complete these tasks, since he cannot reach them

0(x)

p2...n+1: [3-4]

7

d

2 3

p1: [0-3]

0 7

d

x 3

p1: [0-3]

p2...n+1: [3-]

(a) Case 1: the worker cannot reach the location of (b) Case 2: the worker cannot proceed top2≤i≤n+1 before their deadline (i.e., time 4) p2≤i≤n+1 and arrive at d on timeFig. 3. At time m = 3, adversaries release tasks p2≤i≤n+1 with [release time - deadline]

before their deadlines, and thus count(Ralg) = 0. But if all tasks are known in advance,the worker can wait at position 2 until all tasks are released and finish them on timem = 3. In this case, the competitive ratio is: CR = 0/n = 0.

Case 2: x > 0. In this case, the adversary would release n tasks p2≤i≤n+1 =(0, [m,∞]) (see Figure 3b). As m + x + d > m + d = 10 = t+q , the worker cannotproceed to position 0 at time m; otherwise, he cannot reach d before t+q . So, the workercan finish at most the task p1 only if he moves directly to m at time 0. However, if alltasks are known in advance, the worker could stay at 0 until time m = 3 to finish tasksp2≤i≤n+1, and thus count(Ropt) = n. Therefore, CR ≤ 1/n→ 0 because n can be anarbitrary large value.

3.2 Greedy Task Approach

In this section, we present a greedy approach that incurs low response times.The greedy approach works as follows. Initially, it calls GetNextTask (cf. Algo-

rithm 1) to find the first task for the worker. Given the set of available6 tasks P andthe worker’s location snow at current time tnow, GetNextTask greedily selects the taskwith the highest score ψp. Upon reaching the chosen task, GetNextTask is involved toget the next task repeatedly until reaching d.

Algorithm 1 Get next best taskalgorithm GetNextTask (Query q = (snow, d, [tnow, t

+q ]), Set of available tasks P )

1: Cand← compute the set of feasible tasks from P . apply Equation 22: if Cand 6= ∅ then3: pnext ← choose p ∈ Cand with best score ψp . ψp is a heuristic function4: Return pnext

5: else6: Apply policy Pstay or Pgo until Cand 6= ∅ or tnow + τ(snow, d) = t+q

Due to the tasks’ deadlines and the worker’s expected arrivial time (cf. Definitions 1,2), the worker may complete a task p if: (i) he can reach p.loc before t+p , and (ii) he canreach d no later than t+q . Therefore, we call a task to be feasible if it satisfies:

τ(snow, p) + τ(p, d) ≤ t+q − tnow and tnow + τ(snow, p) ≤ t+p (2)

If there is no feasible task for q, the worker may stay or move based on a pre-defined policy (cf. Line 6 in Algorithm 1). In the policy Pgo, the worker simplymoves towards the destination d. In the policy Pstay , the worker waits at snow untiltnow + τ(snow, d) = t+q . When new feasible tasks are released, we resume the searchand invoke GetNextTask to obtain the next task.

We illustrate several heuristics for computing the score ψp. Figure 4a shows the mapof tasks which are labeled with release times and deadlines, and Figure 4b shows the

6 Available tasks are tasks released before the current time tnow.

s

d

p3

p4

[0-10]

[2-10]

p2

[0-10]

p7

[0-2]

p8

[3-10]

p6

[3-10]

[0-5] p1

p5

[1-3]

0 1 2 3 4 5 6 70

1

2

3

4

p9

[4-10]

x

y

Heuristic Route

G-NN 〈s, p7, p5, d〉G-ED 〈s, p7, p5, d〉

G-MCS 〈s, p1, p4, p6, d〉Re-Route 〈s, p1, p2, p3, p8, d〉

(a) map of tasks with [release time - deadline] (b) result routes (with Pstay)

Route # of tasks Route # of tasks Route # of tasks

〈s, p1, d〉 1 〈s, p2, d〉 1 〈s, p3, d〉 1〈s, p7, d〉 1 〈s, p1, p2, d〉 2 〈s, p1, p3, d〉 2〈s, p2, p3, d〉 2 〈s, p7, p1, d〉 2 〈s, p1, p2, p3, d〉 3〈s, p3, p1, d〉 not feasible · · · not feasible · · · not feasible

(c) all possible routes known at tnow = 0

Fig. 4. Example of query q = (s, d, [0, 10]) in OnlineRR (using Manhattan Distance)

result route of each heuristic. In this example, we use the query q = (s, d, [0, 10]), thepolicy Pstay , and the Manhattan distance.

Nearest Neighbor Heuristic (G-NN). It chooses the nearest feasible task to theworker’s current location snow, and thus setting ψp = τ(snow, p). In Figure 4, G-NNproduces the route 〈s, p7, p5, d〉.

Earliest Deadline Heuristic (G-ED). It chooses the task with the earliest deadline,and thus setting ψp = t+p . In Figure 4, G-ED recommends the route 〈s, p7, p5, d〉.

Maximum Candidate Space Heuristic (G-MCS). It chooses the task p that can maxi-mize the search space of feasible tasks (Equation 2) in future. The search space in futureis obtained under the assumption that p is just completed. The space shape differs fordifferent distance metrics, but we can use a general approach Monte Carlo [20] to com-pare it. If a specific distance metric is used, then the exact candidate space size can becalculated. Take Euclidean distance for example, the space size is the area of the ellipseshown in Figure 5a, and thus we can calculate the score ψp using equations in Figure 5bfor Euclidean distance metric.

s'now d

t(s'now, p')+t(p', d) tq-t'now

p'

+

rBp

rAp

p'ψp = π · rAp · rBp

rAp = (t+q − t′now)/2

rBp =√rA2

p − (τ(s′now, d)/2)2

where t′now = tnow + τ(snow, p) and s′now = p.loc

(a) search space (in shade) (b) search space size calculationFig. 5. Feasible candidates search space for Euclidean distance metric

We illustrate how G-MCS works in Figure 4. At time 0, the feasible tasks arep1, p2, p3, p7. Since p1 has the highest score (ψp1

), p1 is chosen to be visited. When theworker reaches p1, a new task p4 is released while p7 expires, so the set of feasible tasksbecomes {p2, p3, p4}. Then p4 is chosen as it has the highest score (ψp4 ). Upon reachingp4, the algorithm selects p6 as it has the best score among {p3, p6, p8}. After complet-ing task p6, there are no more feasible tasks. After waiting for two more time units, theworker moves toward d. In summary, G-MCS obtains the route 〈s, p1, p4, p6, d〉.

3.3 Complete Search for Route Approach

In this section, we present a complete search approach that tends to find more tasks thanthe heuristics in Section 3.2.

Specifically, we formulate the following SnapshotRR problem, which takes thecurrent query and the set of available tasks as input. Then, we solve SnapshotRR byenumerating all possible routes and obtain the one with the maximum number of tasks.

Problem 2 (Snapshot Route Recommendation (SnapshotRR)). Given a query q =(snow, d, [tnow, t

+q ]) at the current snapshot tnow, SnapshotRR aims to find a route

such that it covers the maximum number of tasks and the worker can arrive at d by t+q .

We illustrate this approach for the query q = (s, d, [0, 10]) in Figure 4. At time 0,we apply Equation 2 and obtain the set of feasible tasks: P = {p1, p2, p3, p7}. Fig-ure 4c shows all possible routes (known at time 0). The optimal route at time 0 is〈s, p1, p2, p3, d〉.

We propose a simple optimization to solve SnapshotRR in Algorithm 2. At Line3, we check whether there exists a new feasible task p (that was not available in theprevious call of Algorithm 2). If such p exists, we must solve SnapshotRR again.Otherwise, the best route remains the same as in the previous call, so we need not solveSnapshotRR again.

Algorithm 2 Complete search the result routealgorithm Re-Route (Query q = (snow, d, [tnow, t

+q ]), Set of available tasks P )

1: Let Pprev be the set of available tasks in the previous call2: if P 6= ∅ then3: if ∃p ∈ P − Pprev such that p is feasible then . Equation 24: R← Solve SnapshotRR(q, P ) . conduct complete search5: else6: Apply policy Pstay or Pgo until P 6= ∅ or tnow + τ(snow, d) = t+q

We proceed to illustrate how Re-Route works in the example in Figure 4. At time 0,Re-Route computes the routeR0 = 〈s, p1, p2, p3, d〉, and then the worker moves alongR0 to p1. Upon reaching p1, a new feasible task p4 is found, so Re-Route re-calculatesthe route as R1 = 〈p1, p2, p3, d〉. When the worker reaches p2, a new feasible task p8 isfound, so Re-Route updates the route to R2 = 〈p2, p3, p8, d〉. After reaching p8, a newtask p9 is found but it is not feasible. Thus, Re-Route would not computes the route

again (cf. Line 3 in Algorithm 2). Eventually, the worker moves to d. In summary, theactual route traveled by the worker is: 〈s, p1, p2, p3, p8, d〉. It covers more tasks thanother heuristics (cf. Figure 4b).

Since it is expensive to solve SnapshotRR by enumerating all possible routes, wewill present optimizations to solve SnapshotRR efficiently in Section 4.

4 Optimization for SnapshotRR

We adapt the bi-directional search algorithm for the Orienteering Problem with TimeWindows (OPTW) problem [19] to solve our problem. For brevity in discussion, we useq = (s, d, [t−q , t

+q ]) instead of q = (snow, d, [tnow, t

+q ]). We will conduct bi-directional

search for SnapshotRR in three steps:

Step 1: Search sub-routes in the forward direction (from s) and store them in−→IR

Step 2: Search sub-routes in the backward direction (from d) and store them in←−IR

Step 3: Join sub-routes between−→IR and

←−IR

According to Pruning Rule 1, the bi-directional search can reduce the search space.However, the method in [19] does not exploit spatial properties in our problem. In thissection, we develop more effective pruning rules to accelerate bi-directional search onSnapshotRR.

Pruning Rule 1 (Half travel time bound property proved in [19]) In the forward(or backward) route searching from vertex s (or d), only routes R with τ(R) ≤ τmax/2are maintained and extended, where τmax = t+q − t−q .

4.1 Forward Search and Backward Search

In this section, we elaborate the forward search (Step 1) and discuss adaptations for thebackward search (Step 2) at the end. In the following discussion, we use R instead of−→R to represent a sub-route found in forward search (which will be stored in

−→IR ) for

simplicity.We first introduce the sub-route concept and its extension operation. Then, we pro-

pose a pruning rule and a search strategy to speedup the computation. In the following,we denote the set of vertices as V = P ∪ {s, d}, where P is the set of available tasks.Sub-route Extension.

We denote a path from s to v ∈ V as a sub-route Rv , which contains four attributesRv = (τ(Rv), BRv

, CRv, v).

– τ(Rv) represents the travel time along Rv (i.e., from s to v).– BRv stores a sequence of tasks visited before on the sub-route Rv . We denote the

profit of Rv as |BRv| because all tasks have the same reward.

– CRvis a set of candidate vertices (that are feasible for visiting in future), and its

calculation is discussed in Equation 5.

During route search, for each vertex v, we store all sub-routes of the form Rv into aset IRv . In addition, we only consider feasible routes. Recall that τ(Rv) represents thetravel time (along Rv) from s to v. According to Equation 2, a sub-route Rv is said tobe feasible if:

τ(Rv) ≤ t+v − t−q and τ(Rv) ≤ t+q − t−q (3)

where t+v is the deadline for vertex v when v is a task, or∞ when v ∈ {s, d}.For each vertex u ∈ CRv

, we can extend Rv with an arc (v, u) to form a newsub-route Ru. The component of Ru = (τ(Ru), BRu

, CRu, u) is calculated as follows:

BRu ← 〈BRv , v〉 and τ(Ru)← τ(Rv) + τe(v, u) (4)

The set CRu contains each candidate vertex p that satisfies:

p ∈ CRv(♥) and p /∈ BRu

(♦)τ(u, p) ≤ t+p − t−q − τ(Ru) and τ(u, p) ≤ (t+q − t−q )/2− τ(Ru)(♣,♠,♦)

τ(u, p) + τ(p, d) ≤ t+q − t−q − τ(Ru)(♣,♥) (5)

which involve the constraints in Equation 4 (♣), Equation 3 (♠), triangle inequality(♥), the constraint that each task can be visited only once (♦), the worker’s arrival timet+q (♥) and Pruning Rule 1 (♦).

We illustrate sub-route extension in Figure 6. Assume that q = (s, d, [0, 10]) andP = {p1, p2, · · · , p7}. We consider Manhattan distance in this example. First, we com-pute the candidate set of s. By Pruning Rule 1, we only consider tasks within 10/2 = 5units from s (i.e., tasks in the dotted diamond in Figure 6). Thus, tasks p3, p7 are notfeasible. The tasks p4 and p5 are not feasible as they violate constraints on the task’sdeadline and the worker’s arrival time, respectively. Thus, we obtain the candidate setof s as Cs = {p1, p2, p6}, and compute the sub-route for s as Rs = (0, ∅, Cs, s).Next, we append arcs (s, p1), (s, p2), (s, p6) into Rs to generate three new sub-routes:R1 = (1, 〈p1〉, {p2, p6}, p1), R2 = (3, 〈p2〉, {p6}, p2), R6 = (5, 〈p6〉, ∅, p6).

p7

s

d

p3

p6

[0-10]

[0-10]

[0-10]

p5

[0-10]

[0-2] p4

p2

[0-10]

p1

[0-10]

0 1 2 3 4 5 60

1

2

3

4

x

y

Fig. 6. Example query q = (s, d, [0, 10]) for SnapshotRR problem (using Manhattan distance)

Dominate Test Pruning.We develop the following pruning rule to further reduce the search space.

Pruning Rule 2 (Dominating Pruning) Let Rv = (τ(R), BRv, CRv

, v) and R′v =(τ(R′v), BR′

v, CR′

v, v) be two feasible routes associated with v. We can prune R′v if:

τ(Rv) ≤ τ(R′v) and |CR′v∩BRv

| ≤ |BRv| − |BR′

v|

Proof. Among all full routes with R′v as the prefix, let R′opt = 〈s,BR′v, R′tail, d〉 be

the maximum reward route. With the given condition τ(Rv) ≤ τ(R′v), after travelingalongRv , we can still follow all tasks inR′tail and arrive at d by t+q . There exists a routeRexist = 〈s,BRv , Rtail, d〉 where Rtail = R′tail−BRv . Rexist ensures that the rewardof each task is gained at most once as BRv

and Rtail have no common tasks.Since R′tail ⊆ CR′

v, we have |R′tail| = |Rtail|+ |R′tail ∩BRv

| ≤ |Rtail|+ |CR′v∩

BRv|.

By combining the above with the given condition |CR′v∩ BRv

| ≤ |BRv| − |BR′

v|,

we derive: |BRv |+ |Rtail| ≥ |BR′v|+ |CR′

v∩BRv |+ |Rtail| ≥ |BR′

v|+ |R′tail|. As the

reward of Rexist (extended from Rv) is greater than or equal to that of R′opt (extendedfrom R′v), we can prune the subroute R′v .

Search Strategy.Our strategy is to identify sub-routes with better reward values in order to utilize

pruning rule 2. To do so, we introduce the concept of upper bound reward:

Definition 5 (Vertex upper bound reward $+v ). Given a sub-route Rv =(τ(Rv), BRv , CRv , v), we define its upper bound reward as $+Rv

= |BRv |+ |CRv |.The upper bound reward of vertex v ∈ V is defined as: $+v = max{$+Rv

| Rv ∈−→IRv}.

Initially, we begin the search from a sub-route at s. We iteratively extend sub-routesfound so far and apply pruning rule 2 to discard unpromising sub-routes. During thesearch, we employ a heap H to process vertices in descending order of $+v .

We illustrate this method on the example in Figure 6 and show the running stepsin Table 1. Iteration 1 corresponds to the extension of the sub-route Rs at s, which wehave discussed before. We obtain three new subroutes R1, R2, R6, insert them in theircorresponding route sets

−→IRp, and also enheap p1, p2, p6 into H . In each subsequent

iteration, we deheap the vertex v ∈ H with the largest $+v , and extend its sub-routes Rv

in the descending order of |BRv|.

In iteration 2, we generate a new sub-route (3, 〈p1, p2〉, {p6}, p2) and apply PruningRule 2 to discard the previous subroute at p2, i.e., (3, 〈p2〉, {p6}, p2). Similarly, the pre-vious sub-routes for p6: (5, 〈p6〉, ∅, p6) and (5, 〈p1, p6〉, ∅, p6) are pruned in iterations 2and 3, respectively.

Table 1. Forward space searchIteration Selected Vertex Extended Route R Modified IR Heap H

1 s (0, ∅, {p1, p2, p6}, s)−−→IRp1 = {(1, 〈p1〉, {p2, p6}, p1)} (p1, 3)−−→IRp2

= {(3, 〈p2〉, {p6}, p2)} (p2, 2)−−→IRp6

= {(5, 〈p6〉, ∅, p6)} (p6, 1)2 p1 (1, 〈p1〉, {p2, p6}, p1)

−−→IRp2

= {(3, 〈p1, p2〉, {p6}, p2)} (p2, 3)−−→IRp6 = {(5, 〈p1, p6〉, ∅, p6)} (p6, 2)

3 p2 (3, 〈p1, p2〉, {p6}, p2)−−→IRp6

= {(5, 〈p1, p2, p6〉, ∅, p6)} (p6, 3)4 p6 ∅ ∅ ∅−→IR (5, 〈p1, p2, p6〉, ∅, p6), (3, 〈p1, p2〉, {p6}, p2), (1, 〈p1〉, {p2, p6}, p1), (0, ∅, {p1, p2, p6}, s)

Algorithm 3 Forward searchfunction RouteSearchFW(Query q = (s, d, [t−q , t

+q ]), Vertex set V = P ∪ {s, d})

. Initialization1: Create an empty set

−→IRv for each vertex v ∈ V to store sub-routes associated with v

2: Calculate the candidate vertex set Cs of s . Equation 53:−→IRs ← {(0, ∅, Cs, s)}

4: Create a max-heap H ← {(s, |Cs|)} to store vertices whose routes will be extended. Repeatedly generate feasible sub-routes

5: while H 6= ∅ do6: (v, v.ub)← Extract-Max(H) . Searching strategy7: Sort routes R ∈

−→IRv in the descending order of |BR| . Searching strategy

8: for all Rv ∈−→IRv do

9: for all u ∈ CRv do10: Ru ← Extend(Rv, q, u) . Equation 4, 5, Pruning Rule 111: RemoveDominate(

−→IRu, Ru) . Pruning Rule 2

12: if Ru ∈−→IRu then . Ru not pruned

13: if (u, u.ub) /∈ H then14: Insert (u, $+Ru

) into H15: else16: u.ub← max{u.ub, $+Ru

}17: Return

−→IR ← all routes in each nonempty

−→IRv

The forward search terminates when H becomes empty, i.e., no sub-routes can beextended. It returns the set

−→IR of all surviving sub-routes.

Algorithm 3 illustrates the pseudo code of route search in forward direction. It isself-explanatory and summarizes what we have discussed above.

Backward Search. Route space search in backward direction is similar to that inforward direction. The pruning rules, searching strategies, and dominating testing dis-cussed for forward search can be modified for backward search directly.

4.2 Route Join

In this section, we elaborate on how to join sub-routes obtained in the forwardsearch and the backward search. Let

−→Rv = (τ(

−→Rv), B−→Rv

, C−→Rv, v) and

←−Ru =

(τ(←−Ru), B←−Ru

, C←−Ru, u) be two sub-routes in the forward and the backward directions,

respectively. They are feasible to be joined if:

τ(−→Rv) + τ(v, u) ≤ u.t+p and τ(

−→Rv) + τ(

←−Ru) + τ(v, u) ≤ t+q − t−q

B−→Rv∩B←−

Ru= ∅ (6)

We denote the joined route as Rjoin = 〈s,B−→Rv, rev(B←−

Ru), d〉, where rev(B←−

Ru) refers

to a list of vertices in B←−Ru

but in the reversed order. Its reward is: |B−→Rv|+ |B←−

Ru|.

We develop two optimization techniques to accelerate the join procedure. First, weapply pruning rule 3 to skip the feasible checking (cf. Equation 6) for pairs of sub-

Table 2. Route joinsub-routes sorted in the descending order of |BR|

−→IR (5, 〈p1, p2, p6〉, ∅, p6), (3, 〈p1, p2〉, {p6}, p2), (1, 〈p1〉, {p2, p6}, p1), (0, ∅, {p1, p2, p6}, s)←−IR (5, 〈p7, p3, p6〉, ∅, p6), (2, 〈p7, p3〉, {p6}, p3), (1, 〈p7〉, {p3, p6}, p7), (0, ∅, {p3, p6, p7}, d)

route join iterationsiteration candidate join pairs join result $best

−→R

←−R Rjoin

1 (5, 〈p1, p2, p6〉, ∅, p6) (5, 〈p7, p3, p6〉, ∅, p6) not feasible (Equation 6) 02 (5, 〈p1, p2, p6〉, ∅, p6) (2, 〈p7, p3〉, {p6}, p3) R = 〈s, p1, p2, p6, p3, p7, d〉 5· · · · · · · · · skipped (Pruning Rule 3) 5

optimal route for this snapshot R = 〈s, p1, p2, p6, p3, p7, d〉

routes. Second, we sort sub-routes in the descending order of their |BR|. This helps usfind a tigher $best earlier, and in turn boosts the power of Pruning Rule 3.

Pruning Rule 3 (Reward bound pruning) Let $best be the maximum reward on alljoined routes found so far. If |B−→

R|+ |B←−

R| ≤ $best, then we need not join

−→R and

←−R .

Continuing with the example in Figure 6, we illustrate the join procedure in Table 2.First, we sort forward sub-routes

−→R ∈

−→IR and backward sub-routes

←−R ∈

←−IR in descend-

ing order of |BR|. For each pair of−→R and

←−R , if it survives Pruning Rule 3, then we

conduct feasible checking and then join the pair. After joining the forward sub-route−→R = (5, 〈p1, p2, p6〉, ∅, p6) with the backward sub-route

←−R =(2, 〈p7, p3〉, {p6}, p3),

we update $best to 5. All remaining pairs are pruned according to Pruning Rule 3. Thebest route (known at this snapshot) is 〈s, p1, p2, p6, p3, p7, d〉.

5 Experiment

This section studies the effectiveness and efficiency of our proposed methods on bothreal and synthetic datasets.

5.1 Experimental Setting

We first introduce the datasets used in experiments, and then describe the performancemeasures for algorithms.

Datasets.Real datasets. Similar to [10], we obtain real check-in data in Foursquare7 and con-

vert them to crowdsourcing tasks in our problem. Specifically, we collect check-in datafor New York city (NYC) and Los Angeles County (LA) in a month (September 2012).For each day in that month, we use all check-in items within a 90-minute duration. Wetake check-in items at the same location as a single task, set its release time and dead-line to the earliest and the latest check-in time respectively8. We measure the travel timeτ(v, u) as the Euclidean distance between two locations divided by the average speed.

7 https://foursquare.com/8 For each location with only one check-in item (say, at time t), we choose its deadline randomly

in [t, t+q ], where t+q refers to the query’s deadline.

Table 3. Experiment parameters

Parameter Default Range

total number of tasks 100 20, 50, 100, 200, 500

t+q − t−q [minutes] 90 30, 60, 90, 120, 150

Gaussian x 0.1 0.05, 0.1, 0.25, 0.5

We use a walking speed 6 km/h for NYC (whose map size 789 km2 is small), and use adriving speed 60 km/h for LA (whose map size 10,570 km2 is large).

Synthetic datasets. As NYC and LA have similar result trends (see Figure 7), weuse the map domain of LA to generate synthetic datasets. For each synthetic task, werandomly choose its release time t−p randomly in [t−q , t

+q ] and then choose its deadline

t+p in range [t−p , t+q ], as we consider queries of the form q = (s, d, [t−q , t

+q ]) in our

experiments. We generate two types of datasets. In each uniform dataset (UNI), tasklocations are randomly chosen within the map domain. In each Gaussian dataset (GAU),task locations are generated based on four Gaussian bells, with the standard deviationof Gaussian bell as x times of the map domain length. The parameter values for thenumber of tasks and Gaussian standard deviation x are shown in Table 3.

Platform and Performance Measures. We implemented our methods (G-NN, G-MCS, G-ED, Re-Route) in C++, and conducted experiments on an Ubuntu 11.10machine with a 3.4 GHz Intel Core i7-3770 processor and 16 GB RAM.

We use queries of the form q = (s, d, [t−q , t+q ]), where t+q − t−q = 90 minutes by

default. We randomly choose s, d in the map domain such that τ(s, d) = 45 minutes.The parameter values for t+q − t−q are given in Table 3.

In each experiment, we run a set Q of 50 queries and report (i) the quality ratio forQ, and (ii) the average response time per call of a method. Specifically, we define thequality ratio of a method as:

quality ratio =1

|Q|·∑q∈Q

count(Rmethod(q))

count(Ropt(q))

where q is a query in Q, Rmethod(q) is the route for q found by our method, Ropt(q) isthe optimal route for q found by an offline method that knows all tasks in advance9.

We have tested the effects of policies Pstay and Pgo (cf. Section 3.2) on our meth-ods. For the same method, the quality ratios between Pstay and Pgo differ only by0.01− 0.02. Thus, we take the default policy in our methods as Pstay .

5.2 An Experiment on Real Datasets

We plot the performance of methods on real datasets (LA and NYC) on each day fromSep/21/2012 to Sep/30/2012 in Figure 7. Within the query period, LA and NYC contain60 and 40 tasks on average, respectively. The optimal routes Ropt in LA and NYC

9 As mentioned in Definition 4, Ropt(q) is obtained with assumption that all tasks’ informa-tion are known in advance at time t−q . With this assumption, OnlineRR becomes a specialcase of SnapshotRR where tasks can have release time larger than t−q and the approach forSnapshotRR can be used to find Ropt(q) then.

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9

qual

ity r

atio

day ID

Re-Route

G-MCS

G-NN

G-ED 1e-06

1e-05

0.0001

0.001

0.01

0.1

1

0 1 2 3 4 5 6 7 8 9

resp

onse

tim

e per

cal

l (s

)

day ID

Re-Route

G-MCS

G-NN

G-ED

(a) quality ratio (NYC) (b) response time per call (NYC)

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9

qual

ity r

atio

day ID

Re-Route

G-MCS

G-NN

G-ED 1e-06

1e-05

0.0001

0.001

0.01

0.1

1

0 1 2 3 4 5 6 7 8 9

resp

onse

tim

e per

cal

l (s

)

day ID

Re-Route

G-MCS

G-NN

G-ED

(c) quality ratio (LA) (d) response time per call (LA)

Fig. 7. Performance on real datasets

cover 10 and 5 tasks on average, respectively. Figures 7a,c show the quality ratio ofthe methods on NYC and LA, respectively. Re-Route outperforms other methods andachieves 0.82–0.91 quality. G-MCS is the second best and obtains 0.70–0.84 quality.Although Re-Route incurs higher response time, it takes less than 1 second per call, asdepicted in Figure 7b,d. We consider such time acceptable for crowdsourcing workers.For example, for the LA dataset, Re-Route is called for 10 times (on average) duringthe query period (90 minutes). Observe that the time per call (1 second) is negligiblecompared to the average travel time between two tasks (90/10 = 9 minutes).

5.3 Scalability Experiments on Synthetic Datasets

Effect of task distribution. Figure 8 depicts the performance of methods on GAUdatasets with standard deviation x and on a UNI dataset. As illustrated in Table 4a, amore skewed dataset (i.e., with smaller x) leads to an optimal route with higher rewardbecause tasks in the same cluster are close together. Since our methods can also findroutes with higher reward on a more skewed dataset, the quality ratio does not varymuch (See Figure 8a). Re-Route again outperforms other methods on the quality ratio.On the other hand, a more skewed dataset induces more feasible candidate tasks in Re-Route, and thus it incurs higher response time. Nevertheless, Re-Route takes at mostaround 1 second per call in Figure 8b, which is acceptable for crowdsourcing workers.

Since the trend on quality is consistent across different task distributions, we onlyuse UNI datasets in the remaining experiments.

0.2

0.4

0.6

0.8

1

GAU0.05

GAU0.1

GAU0.25

GAU0.5

UNI

qu

alit

y r

atio

G-EDG-NN

G-MCSRe-Route

1e-05

0.0001

0.001

0.01

0.1

1

10

GAU0.05

GAU0.1

GAU0.25

GAU0.5

UNI

resp

on

se t

ime

per

cal

l (s

)

G-EDG-NN

G-MCSRe-Route

(a) quality ratio (b) response time per call

Fig. 8. Effect of task distribution

0

0.2

0.4

0.6

0.8

1

20 50 100 200 500

qual

ity r

atio

total number of tasks

Re-Route

G-MCS

G-NN

G-ED 1e-06

1e-05

0.0001

0.001

0.01

0.1

20 50 100 200 500

resp

onse

tim

e per

cal

l (s

)


Re-Route

G-MCS

G-NN

G-ED

(a) quality ratio (UNI) (b) response time per call (UNI)Fig. 9. Effect of the total number of tasks

Table 4. Reward on the optimal route

Task distribution Gaussian UniformParameter values (a) standard deviation x (b) total number of tasks (c) query period t+q − t−q

0.05, 0.1, 0.25, 0.5 20, 50, 100, 200, 500 30, 60, 90, 120, 150

Reward of Ropt 12.57, 9.39, 6.84, 4.72 1.7, 3.14, 5.26, 7.94, 13.2 1.62, 3.26, 5.26, 6.92, 8.92

Effect of total number of tasks. When the total number of tasks increases, both theoptimal route (cf. Table 4b) and our methods’ routes would cover more tasks. Thus,the quality ratio is independent of the total number of tasks, as shown in Figure 9a.The response time of Re-Route increases slightly with the total number of tasks (seeFigure 9b), but it is still within 0.1 seconds per call.

Effect of the query period t+q − t−q . As the query period t+q − t−q widens, more tasksbecome feasible and thus the optimal route contains more tasks, as shown in Table 4c.We plot the performance of the methods with respect to t+q − t−q in Figure 10. Thequality ratio is independent of t+q − t−q as our methods are also able to find routes withmore tasks. The response time per call in Re-Route remains acceptable.

0

0.2

0.4

0.6

0.8

1

30 60 90 120 150

qual

ity r

atio

tq+ - tq

-

Re-Route

G-MCS

G-NN

G-ED 1e-05

0.0001

0.001

0.01

0.1

30 60 90 120 150

resp

onse

tim

e per

cal

l (s

)

tq+ - tq

-

Re-Route

G-MCS

G-NN

G-ED

(a) quality ratio (UNI) (b) response time per call (UNI)Fig. 10. Effect of the query period t+q − t−q

10-4

10-3

10-2

10-1

100

101

102

D50

E D60

E D70

E D80

E D90

E

resp

on

se t

ime

per

cal

l (s

)


2.4

0.00

39

3.6

0.00

41

12.0

0.00

42

300 +

0.00

45

300 +

0.00

45

10-4

10-3

10-2

10-1

100

101

102

D5

E D10

E D20

E D30

E D40

E

resp

on

se t

ime

per

cal

l (s

)


0.00120.

0008

2.4

0.00

13

2.5

0.00

22

300 +

0.00

34

300 +

0.00

41

(a) response time per call (UNI) (b) response time per call (GAU)

Fig. 11. Effect of pruning rules on Re-Route (E for ENABLE, D for DISABLE)

Effect of Pruning Rules on Re-Route. We proceed to test the effect of optimizationtechniques (cf. Section 4) on the response time per call of Re-Route. We consider twovariations of Re-Route: (i) DISABLE applies only pruning rule 1 (in Ref. [19]), and(ii) ENABLE applies all three pruning rules in Section 4.

As DISABLE is very slow, we scale down the total number of tasks in this experi-ment, and terminate it if it takes more than 300 seconds per call. We show the responsetime per call of DISABLE and ENABLE on both UNI and GAU datasets in Figure 11.Observe that ENABLE runs much faster than DISABLE, implying that our pruningrules are able to shrink the search space significantly.

6 Related Work

Spatial crowdsourcing is an emerging topic in crowdsourcing research. Existing re-searches are divided into the server-centric mode [9, 15, 16, 18, 22] and the worker-centric mode [3, 7, 10]. We focus on the latter one as discussed in the introduction.However, [3, 7] do not consider the influence of the worker’s travel time, which is crit-ical in our OnlineRR problem. The closest work to ours is [10], which selects a route

with the maximum number of tasks for a worker. However, [10] does not discuss howto update a route with respect to online task arrivals. Also, it does not consider theworker’s destination and deadline.

Our OnlineRR problem is related to the orienteering problem [13, 23]. The orien-teering problem is a variant of the selective traveling salesman problem [11], where (i)not all requests need to be completed, and (ii) the cost is the sum of the total travel timeand the penalty of rejected requests. The orienteering problem is well studied [13, 23],but only several works [5, 8, 12, 19] consider the Orienteering Problems with eachrequest having a Time Window (OPTW). Those works focus on the offline scenariobut not the online scenario. While there exist approximation algorithms for OPTW of-fline [5,8,12], OnlineRR is an online problem and does not permit any online algorithmto achieve a non-zero competitive ratio.

Righini et al. [19] propose an exact bi-directional search algorithm for OPTW,which can be adapted to solve our SnapshotRR problem. Unlike our solution, thisalgorithm does not exploit spatial properties to prune unpromising sub-routes. In Sec-tion 4, we have developed two pruning rules and a search strategy that are specific forSnapshotRR.

Other related route planning problems include the trip planning problem [17] andthe optimal sequenced route problem [21]. They require finding the shortest route thatpasses through specific types of points-of-interests. On the other hand, our problemneeds to maximize the number of tasks on a route subject to the tasks’ deadlines andthe worker’s deadline.

OnlineRR problem is also related to online traveling salesman problem (OL-TSP) [4, 14]. Few works have studied OL-TSP with each request having a dead-line [6, 24]. While OL-TSP aims to minimize the travel distance, our OnlineRR prob-lem aims to maximize the number of tasks on a route. Moreover, the above works onOL-TSP do not consider the worker’s destination and deadline. Finally, our problem issimilar to an online job-scheduling problem whose tasks have dependent setup costs [1].However, this problem does not exploit the spatial properties as in OnlineRR.

7 Conclusion

In this paper, we study the oriented online route recommendation (OnlineRR) problemfor spatial crowdsourcing task workers. We prove that no online algorithm can achievea non-zero competitive ratio for OnlineRR. Then we propose several heuristics for On-lineRR and optimizations to speedup the computation. According to our experimentalfindings, Re-Route produces routes with the highest quality (0.82–0.91) in acceptableresponse time per call (0.1–1 s), whereas G-MCS returns routes with the second highestquality (0.70–0.84) at real-time (below 1 ms). Workers preferring to save smartphonebattery power may choose G-MCS as it has less computation cost. OnlineRR will beextended to consider the task diversity and task novelty in the future.

References1. A. Allahverdi, C. T. Ng, T. C. E. Cheng, and M. Y. Kovalyov. A survey of scheduling

problems with setup times or costs. European Journal of Operational Research, pages 985–

1032, 2008.2. B. Allan and E.-Y. Ran. Online Computation and Competitive Analysis. Cambridge Univer-

sity Press, 1998.3. F. Alt, A. S. Shirazi, A. Schmidt, U. Kramer, and Z. Nawaz. Location-based crowdsourcing:

extending crowdsourcing to the real world. In NordiCHI, pages 13–22, 2010.4. G. Ausiello, E. Feuerstein, S. Leonardi, L. Stougie, and M. Talamo. Algorithms for the

on-line travelling salesman. Algorithmica, pages 560–581, 2001.5. N. Bansal, A. Blum, S. Chawla, and A. Meyerson. Approximation algorithms for deadline-

tsp and vehicle routing with time-windows. In Symposium on Theory of Computing, pages166–174, 2004.

6. M. Blom, S. O. Krumke, W. E. D. Paepe, and L. Stougie. The online tsp against fair adver-saries. INFORMS Journal on Computing, pages 138–148, 2001.

7. M. F. Bulut, Y. S. Yilmaz, and M. Demirbas. Crowdsourcing location-based queries. InPERCOM Workshops, pages 513–518, 2011.

8. C. Chekuri and N. Korula. Approximation algorithms for orienteering with time windows.CoRR, 2007.

9. Z. Chen, R. Fu, Z. Zhao, Z. Liu, L. Xia, L. Chen, P. Cheng, C. C. Cao, Y. Tong, and C. J.Zhang. gmission: A general spatial crowdsourcing platform. PVLDB, pages 1629–1632,2014.

10. D. Deng, C. Shahabi, and U. Demiryurek. Maximizing the number of worker’s self-selectedtasks in spatial crowdsourcing. In SIGSPATIAL, pages 314–323, 2013.

11. H. A. Eiselt, M. Gendreau, and G. Laporte. Location of facilities on a network subject to asingle-edge failure. Networks, pages 231–246, 1992.

12. G. N. Frederickson and B. Wittman. Approximation algorithms for the traveling repairmanand speeding deliveryman problems. Algorithmica, pages 1198–1221, 2012.

13. D. Gavalas, C. Konstantopoulos, K. Mastakas, and G. E. Pantziou. A survey on algorithmicapproaches for solving tourist trip design problems. J. Heuristics, pages 291–328, 2014.

14. P. Jaillet and M. R. Wagner. Online routing problems: Value of advanced information asimproved competitive ratios. Transportation Science, pages 200–210, 2006.

15. L. Kazemi and C. Shahabi. Geocrowd: enabling query answering with spatial crowdsourcing.In SIGSPATIAL, pages 189–198, 2012.

16. L. Kazemi, C. Shahabi, and L. Chen. Geotrucrowd: trustworthy query answering with spatialcrowdsourcing. In SIGSPATIAL, pages 304–313, 2013.

17. F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, and S.-H. Teng. On trip planning queriesin spatial databases. In SSTD, pages 273–290, 2005.

18. L. Pournajaf, L. Xiong, V. S. Sunderam, and S. Goryczka. Spatial task assignment for crowdsensing with cloaked locations. In MDM, pages 73–82, 2014.

19. G. Righini and M. Salani. Decremental state space relaxation strategies and initializationheuristics for solving the orienteering problem with time windows with dynamic program-ming. Comput. Oper. Res., 36:1191–1203, 2009.

20. R. Y. Rubinstein and D. P. Kroese. Simulation and the Monte Carlo method. John Wiley &Sons, 2011.

21. M. Sharifzadeh, M. R. Kolahdouzan, and C. Shahabi. The optimal sequenced route query.VLDB J., pages 765–787, 2008.

22. H. To, G. Ghinita, and C. Shahabi. A framework for protecting worker location privacy inspatial crowdsourcing. PVLDB, pages 919–930, 2014.

23. P. Vansteenwegen, W. Souffriau, and D. V. Oudheusden. The orienteering problem: A survey.European Journal of Operational Research, pages 1–10, 2011.

24. X. Wen, Y. Xu, and H. Zhang. Online traveling salesman problem with deadlines and serviceflexibility. Journal of Combinatorial Optimization, pages 1–18, 2013.

Date post:	27-May-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Oriented Online Route Recommendation for Spatial ...csmlyiu/conf/SSTD15_crowd.pdf · Spatial...

Documents