Enabling Efficient WiFi-Based Vehicular Content Distribution

Enabling Efficient WiFi-Based VehicularContent Distribution

Da Zhang, Member, IEEE, and Chai Kiat Yeo

Abstract—For better road safety and driving experience, content distribution for vehicle users through roadside Access Points (APs)

becomes an important and promising complement to 3G and other cellular networks. In this paper, we introduce Cooperative Content

Distribution System for Vehicles (CCDSV) which operates upon a network of infrastructure APs to collaboratively distribute contents to

moving vehicles. CCDSV solves several important issues in a practical system, like the robustness to mobility prediction errors, limited

resources of APs and the shared content distribution. Our system organizes the cooperative APs into a novel structure, namely, the

contact map which is based on the vehicular contact patterns observed by APs. To fully utilize the wireless bandwidth provided by APs,

we propose a representative-based prefetching mechanism, in which a set of representative APs are carefully selected and then share

their prefetched data with others. The selection process explicitly takes into account the AP’s storage capacity, storage status, inter-

APs bandwidth and traffic loads on the backhaul links. We apply network coding in CCDSV to augment the distribution of shared

contents. The selection of shared contents to be prefetched on an AP is based on the storage status of neighboring APs in the contact

map in order to increase the information utility of each prefetched data piece. Through extensive simulations, CCDSV proves its

effectiveness in vehicular content distribution under various scenarios.

Index Terms—Content distribution, vehicular networks, roadside access points, prefetching

Ç

1 INTRODUCTION

CONTENT distribution to vehicular users through wirelessnetwork access is emerging as a necessity to facilitate

better road safety and enhance driving experience. Thetypes of contents can include electronic newspapers,advertisements, road-situation reports, maps with trafficstatistics, music or movie clips, etc. The cellular network(e.g., 3G), mainly adopted due to its ubiquitous availability,is experiencing an explosive growth in subscribers and indemands for multimedia contents, thus risking being pushedto its capacity limit [1]. Meanwhile, WiFi-based AccessPoints (APs), as promising complement and augmentationto the cellular networks, have shown their feasibility incontent distribution for vehicles [2] [3]. These APs can beRoadSide Units (RSUs) deployed intentionally by networkservice providers and government departments or Hot-spotsthat are installed in roadside shops or buildings andconfigured for public access. These APs are characterizedby short-range coverage (hundreds of meters), relativelycheap and easy deployment and high data access rate (atheoretically 600 Mbps data rate in the latest IEEE 802.11n).

A typical architecture of WiFi-based vehicular contentdistribution system, as illustrated in Fig. 1, is made up of anetwork of interconnected APs, which are geographicallydeployed near the roads, running the customized protocolsfor cooperation and also equipped with local storage. The

APs can communicate with each other through backhaullinks to the Internet or via high-speed LAN. Data-originservers are the content providers, providing vehicular userswith both the shared (popular) and also private contents.

However, such network access scheme poses manychallenges on the system design for effective contentdistribution to vehicles: 1) a single vehicle-AP contactduration is quite limited (typically tens of seconds) due tothe fast vehicle speed and the AP’s short coverage range,thus constraining the data transfer opportunities; 2) theresponse latency of remote data-origin server on theInternet can waste the valuable contact duration, especiallyfor the heavily loaded server and congested or long-delayroute; 3) the wireless bandwidth between the AP andvehicles can be bottlenecked by the AP’s backhaul path tothe server on the Internet, considering that most of theWiFi-based access protocols are in the order of 10 Mbpswhile a median bandwidth of 5 Mbps of wired bandwidthto residents is reported in [4].

To solve the above challenges in vehicular contentdistribution, several recent works [5], [6], [7], [8], [9]adopted prefetching technique that has a long history andis widely applied to computer architecture, CPU design andweb accessing, etc. In these works, the requested data isprefetched onto the APs ahead along the driving trajectoryof the requesting vehicle, which can then download theprefetched data with a high throughput when connection isestablished, without resorting to the remote server or beingbottlenecked by the AP’s backhaul link. Such apparentlysimple technique, however, requires careful design when itis applied onto large-scale and real systems, as described inthe following section.

1.1 Related Works and Motivation

In this section, we capture some important issues arisingfrom the design of a prefetching system for vehicular content

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 479

. The authors are with the Centre for Multimedia and Network Technology(CeMNet), School of Computer Engineering, Nanyang TechnologicalUniversity, N4-B2c-06, Nanyang Ave, Singapore 639798.E-mail: [email protected], [email protected].

Manuscript received 2 Feb. 2012; revised 17 Apr. 2012; accepted 21 Apr.2012; published online 4 May 2012.Recommended for acceptance by J. Cao.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2012-02-0078.Digital Object Identifier no. 10.1109/TPDS.2012.142.

1045-9219/13/$31.00 � 2013 IEEE Published by the IEEE Computer Society

distribution and investigate for each issue the related works

and their limitations, which motivate our proposal.

1.1.1 Mobility Prediction and Prefetching

The content distribution systems based on prefetching need

the mobility prediction component to predict the vehiculartrajectory and the APs that will be connected by the vehicle

in the recent future [5], [6], [7], [8]. Then the systemprefetches the requested contents on these APs. Thus, the

prediction accuracy directly affects the system performance,

because false prediction would make the vehicle miss theprefetched data.

However, the accuracy of mobility prediction1 is difficult

to maintain at a high level and varies largely from scenario

to scenario [10], [11], especially under the highly dynamicvehicular environment. Also, the intelligent AP selection

[12], [13] which is based on the ever-changing metrics of

APs like signal strength or available wireless bandwidth,further aggravates such prediction accuracy when a vehicle

is exposed to multiple surrounding APs. To the best of ourknowledge, almost all the prior works design separately the

prefetching component and mobility prediction component,

without explicitly considering whether the prefetchingcomponent can still work properly and keep the system

performance at a reasonable level under low-accuracy

mobility prediction.All the works in [5], [6], [7], and [8] predict and prefetch

data on either only one AP that is most likely connected

next by the vehicle or a series of APs that are most likely

connected successively in the future. Hence, the systemperformance totally relies on the accuracy of mobility

prediction. In [14], to reduce the association time during

handoff between two APs, the system prefetches contextinformation onto all the possible APs that would be next

connected, thus increasing the robustness of systemperformance against prediction error. However, a small

piece of context information needs much less resource than

a large bundle of data in our content distribution system,where such strategy would inefficiently utilize the AP’s

storage as well as backhaul bandwidth due to the excessive

redundant prefetching.

1.1.2 Storage Resource on APs

Each AP needs a local storage to contain the prefetcheddata. Although large-volume and low-cost data storagehardware is commonly available today, the storage resourceon an AP can still be limited due to following reasons:

1. multimedia content with large size;2. prolonged dwelling time of prefetched data on

APs due to the delay-tolerant pattern of accessfrom vehicles;

3. increased data volume needed to be prefetched dueto the enhancement in the wireless transmissionrate; and

4. huge user demands over large and busy areas.

However, most of the related works [5], [7], [8], [15]assume unlimited storage resources. The literature [6] is oneof the few that take storage limitation into consideration. Ituses a centralized content manager that periodicallyexecutes the content distribution algorithm to decide whichAPs to cache which parts of the requested files whilesatisfying the storage constraint. However, the AP itselfdoes not have a local storage management component andpasses all the management to the central controller. Thedrawback lies in scalability as the burden is imposed on thecentral controller which has to frequently execute to adaptto the changing user-demands.

1.1.3 Shared Content Distribution

The vehicular download process usually continues acrossmultiple APs due to the high vehicular speed and small APcoverage and thus it is common to divide the original fileespecially large multimedia content, into multiple piecesand store these pieces along the specific APs ahead [5], [16].With the objective of effective usage of system resources likestorage and backhaul bandwidth of APs, we expect that thedata pieces stored on APs for one vehicle can incidentallybe useful and contribute to the download progress of asmany passing vehicles as possible, thus reducing the cost ofadditional data fetch. Such expectation becomes evenstronger under the fact that most vehicular users usuallyshare the same interest on specific types of contents, likeroad-situation reports, weather reports, advertisements,electronic newspapers, and popular web-sites, etc.

However, simple division of the original content is lesseffective at this point since it usually presents manyduplicate pieces among APs as the shared contents arebeing downloaded. For example, vehicles V1 and V2 hold thesame interest on one file and download the first half of it(which is divided into two pieces) from AP A and B,respectively. If V1 contacts B next in its route, the first halfcached for V2 onBwould be useless for V1 because it alreadyholds that part andB has to launch a new data fetch from theremote server. On the contrary, encoded contents can domuch better than the simple division. Among the encodingmethods, network-coding [17] has proven its benefits oninformation sharing in vehicular content distribution [8], [9].A brief introduction of practical network coding is given inSection 5. Through network coding, each coded pieceencodes the information of all pieces in a file and remainsuseful as long as it is independent of the pieces already

480 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013

1. The purpose of mobility prediction is to predict vehicle-AP contacts.Thus, we use the vehicle-AP contact prediction and mobility predictioninterchangeably hereinafter.

Fig. 1. Architecture of WiFi-based vehicular content distribution.

collected by the vehicle. However, adopting network codingin a naive way can still introduce duplicate data piecesamong the APs along a vehicle’s route. An illustrativeexample of this claim is shown in Section 5.1 and Fig. 5.

2 SYSTEM OVERVIEW AND CONTRIBUTIONS

In order to address these important issues in Section 1.1, wepropose CCDSV, a prefetching system built upon coopera-tive APs for speeding up content distribution to vehicularusers and efficiently utilizing the resource of storage andbackhaul bandwidth on APs. In other words, CCDSVattempts to maximize the vehicular download performanceobtained from the infrastructure of APs while reducing asmuch as possible both the data and control trafficintroduced into the set of APs.

We use Fig. 2 to show a high level view of how theCCDSV works from AP A’s perspective. Here, AP Areceives a request for file, say M, from an associatedvehicle and then transmits data of M from either A’s localstorage or the remote server hosting M. At the same time, Aforecasts whether the download of M can be completedbefore the vehicle drives out of the coverage range. If not, Ais responsible to select and notify those APs which mayprobably be connected ahead by the requesting vehicle, ofprefetching (parts of) the uncompleted portion. Fig. 2 showsall the probable to-be-contacted APs which are 3 hopsahead (these APs are abbreviated as lookahead-APs) alongthe vehicle’s path, in a tree-structure with transitionprobabilities being marked on the edges. In general, Aneeds to consider prefetching on the APs within k-hops(k � 1) ahead, usually more than just one hop, in order togive these APs ahead enough time to complete the prefetchbefore the requesting vehicle’s arrival.

For the selection of prefetching APs, we consider twocases in Fig. 2: 1) A issues notification of prefetching toB;F; . . . and J which is the predicted most probablevehicle-AP contact sequence; 2) A gives notification ofprefetching to all the lookahead-APs to get the maximumcache hit probability. These two cases correspond to thetwo extremes of the spectrum and face disadvantages. Thefirst case costs the least resources but has the worstrobustness against mobility prediction error, which makesthe prefetched data useless and thus wastes the invested

resources (storage, backhaul bandwidth, etc.). The secondcase is suitable if the storage and backhaul bandwidth aresufficiently abundant, otherwise such excessive redundantprefetching may risk overwhelming these resources: theejected prefetched data for the other sessions may degradethe overall performance. Moreover, the overwhelmingtraffic introduced into APs can bring down the throughputof the users currently being served due to the sharedbackhaul link.

We propose representative-based prefetching approachthat strikes a good balance between the two extreme casesabove. This approach selects a subset of lookahead-APs asrepresentatives for prefetching while the others remainas “client APs.” Only representatives do prefetching andeach representative is responsible for a group (may also beempty) of clients. In the case that the requesting vehicleends up contacting a client AP, the requested data is fetchedfrom the corresponding representative rather than from thedata-origin server, based on the fact that communicatingwith a geographically near AP generally enjoys a higherthroughput than with a remote server [18].

During the system design, we seek to answer thefollowing questions:

. How to construct the set of lookahead-APs, alongwith the transition probabilities, from each AP’sperspective? (Section 3)

. How to manage the finite storage, i.e., replacingwhich set of objects for sufficient space? (Sec-tions 4.1.2 and 4.2)

. How to select the representative APs to optimize theoverall system performance? (Section 4.1)

. How much and which part of the requested file is tobe prefetched onto the selected representative APs?(Sections 4.1.3 and 5)

Our contributions in designing CCDSV are as follows:

. We define and construct contact map—an overlaystructure that is on top of the network of APs andencodes the vehicle-AP contact patterns (Section 3),in order to accurately predict the future contact APsand feed comprehensive information to the pre-fetching component.

. We propose a novel representative-based prefetchingstrategy in Section 4.1 in order to increase thestability of the system performance in the presenceof varied mobility prediction accuracy and at thesame time, control the overhead incurred.

. We devise a series of algorithms in Sections 4.1.2and 4.2 to manage the storage where the prefetchedcontents and cached contents coexist. The costs ofejecting the objects of both content types areformally defined, helping the algorithms minimizethe risk of overall performance degradation due tocontent ejection.

. While recognizing the benefits of network codingin the shared content distribution, we propose, inSection 5, selecting the prefetch contents based onthe metric, RankSum of the neighboring APs in thecontact map, in order to augment the informationutility under practical network coding.

ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 481

Fig. 2. High-level overview of CCDSV operations from AP A’sperspective (dashed curve is the predicted most probably vehicle-APcontact sequence B! F! J). Transition probability is marked besideeach edge.

. We implement CCDSV in NS-2 simulator andevaluate its performance under various aspects byextensive simulations in Section 6. The resultsprove the effectiveness of CCDSV for vehicularcontent distribution.

3 CONTACT MAP

In this section, we introduce the concept of contact map,

which is an overlay structure on top of the network of APs

and encodes the observed patterns of vehicle-AP contacts.

Contact map is used in predicting a vehicle’s potential

contact AP(s) ahead on the route and the respective

transition probabilities to them. The predicted APs and

probabilities then form the building blocks in the algorithm

for representative-AP selection. Furthermore, the encoded

contact patterns in the contact map are also necessary in the

process of selecting prefetch contents (Section 5.1).During the trip, a vehicle sequentially associates with a

series of road-side APs, forming a contact sequence � ¼ðapi1 ; api2 ; ::::::Þ. Such contact sequences are observed from

AP-side and are used to extract the contact map. In essence,

contact map can be modeled as a directed graph ~G ¼ ðV ;EÞ.The vertices, V ¼ fapiji ¼ 1; 2; . . . ; Ng, are APs in the

infrastructure and directed edge, ~eijð2 EÞ, between two

neighboring vertices corresponds to a transition of vehicle’s

contact from api to apj. Each edge~eij is also associated with a

setCij of contexts, called transition contexts which records the

context information captured with the transition api ! apj.Contact map is flexible and can include as transition

contexts, various types of knowledge like the previous

contact AP before transition api ! apj, driving trajectory, or

driver’s profile (e.g., vehicle ID, daily driving habits, etc.).

The last two contexts require additional devices such as

GPS device or navigation system and also poses issue on

privacy. We leave the study on them for our future work.

Thus, for the sake of availability and simplicity, we adopt

the previous contact AP before transition as the context.

Hence, the set of transition contexts Cij associated with

edge ~eij is a set of APs such that: Cij ¼ fapkjapk 2 V and

9 � ¼ ð. . . ; apk; api; apj; . . .Þg. Fig. 3 gives an example sce-

nario and its extracted contact map.Transition context can help to better predict the contact

transitions by differentiating vehicles rather than treating

them uniformly. Generally, the more knowledge about the

driving information as reflected in the transition contexts, the

more accurate is the transition probability prediction. Actually,

the context here, i.e., the previous contact AP, provides ahint on the driving direction.

3.1 Contact Map Construction

The contact map with transition contexts can be expressedas set fð~eij; CijÞjapi; apj 2 V ;~eij 2 Eg. We maintain thecontact map distributedly, with each AP, say api, storingits outgoing neighbors, NBþðiÞ ¼ fapkj~eik 2 Eg, along withthe corresponding transition contexts, Cik, in whichapk 2 NBþðiÞ. The union of Cik for all apk 2 NBþðiÞ is theset of i’s incoming neighbors NB�ðiÞ ¼ fapjj~eji 2 Eg. Werefer to these information, stored at each AP, as the localview. Local views of all the APs jointly constitute the wholecontact map.

In practice, we use a table with three columns,½next ap; previous ap; probability� to represent the localview. We assume a vehicle, V , sequentially contacts threeAPs, aps, api and apt. Through V ’s feedback, api canobserve the contact sequence ð. . . ; aps; api; apt; . . .Þ and thenupdate the corresponding row of its local view table with a2-tuple ðapt; apsÞ: update probability, if there exists a rowwith ðnext ap; previous apÞ ¼ ðapt; apsÞ; otherwise, add anew row. The corresponding probability of this row iscalculated as

Probðapi ! aptjapsÞ ¼Nðapi ! aptjapsÞNðaps ! apiÞ

; ð3:1Þ

where Nðapi ! aptjapsÞ is the number of times the tupleðapt; apsÞ has been observed (initially set to 1 if this tuple isfirst added) and Nðaps ! apiÞ is the sum of observationnumbers of all the rows with previous ap being aps.

In reality, the infrastructure APs can change dynamically[13] because new APs can be deployed and some existingAPs may disappear due to system failure or signalobstruction. To quickly adapt to such dynamic factors, weimprove the basic (3.1) by a time-window technique. Itsprinciple is to segment the whole observation period into aseries of smaller time windows and to place the highestemphasis on the most recent observation while graduallydecreasing the emphasis on the preceding ones. Thetransition probability during time window j is expressed as

Probðapi ! aptjaps; jÞ ¼Niðapi ! aptjaps; jÞ

NiðapsjjÞ; ð3:2Þ

where notation Nið�Þ is the same as that in (3.1), except it isobserved within time window j. Note that the index of timewindow starts from the most recent window. We write theabove probability as Pj

i ðs; tÞ for brevity. Now, (3.1) can beimproved and expressed as a weighted average of Pj

i ðs; tÞover a certain number (i.e., m) of recent time windows

Probðapi ! aptjapsÞ¼ wkPk

i ðs; tÞ þ wk�1Pk�1i ðs; tÞ þ � � � þ wk�mþ1P

k�mþ1i ðs; tÞ

¼Xk

j¼k�mþ1

wjPji ðs; tÞ;

ð3:3Þ

where k is the index of the most recent time window and wjis the weight of the jth window. Different weight-selection


Fig. 3. Contact map. (a) An example of road map and AP deployment(b) extracted contact map with transition contexts. As an example, APB’s outgoing edges are labeled with the respective transition contexts.

methods (e.g., linearly or exponentially decreasing weights)would discard the history data at different rate. Here, wesimply give the ith time window the weight of wi ¼i=Pm

j¼1 j, where i 2 ½1;m�.

3.2 Predicting Next-Hop Transitions andProbabilities

The prediction function at an AP (say api) returns a set ofpotential APs, RðiÞ, to be contacted next and the respectivetransition probabilities PiðjÞ for each apj 2 RðiÞ. Forvehicles which can provide the identity of the previouscontact AP, say aps, the predicting AP api can directly makethe prediction by querying its local view table with thekeyword aps to obtain the results list.

Otherwise, the prediction function of api will return allthe APs in NBþðiÞ (here RðiÞ ¼ NBþðiÞ) with transitionprobability for each one as

PiðxÞ ¼X

apy2NB�ðiÞPiðxjyÞ � Probðapy ! apiÞ;

where Probðapy ! apiÞ ¼Nðapy ! apiÞP

apj2NB�ðiÞNðapj ! apiÞ;

apx 2 NBþðiÞ:ð3:4Þ

Equation (3.4) is actually a total probability taking intoaccount all the observed previous contact APs.

3.3 Constructing Lookahead-APs

In order to give sufficient time for the APs ahead tocomplete the prefetching, it is sometimes not enoughto consider the prefetching on the next-contact APs but toinclude the APs further ahead. Take Fig. 2 for examplewhere we assume ðA;B;EÞ is a contact sequence of therequesting vehicle that is currently associated with A. Ifthe expected transition time between B and E, i.e., theexpected time between when a vehicle makes a hand-off toB and when then to E, is shorter than the time needed tocomplete prefetching at E, then such prefetching at E,instead of being delayed until B is contacted, needs to startnow through A’s notification. Based on this rule, Arecursively expands the set of lookahead-APs and thealgorithm is illustrated in Algorithm 1.

Algorithm 1: Construction of lookahead-APs at AP u

OutPut LLu and EEu.

LLu ¼ ;;EEu ¼ ;;foreach i in RðuÞ do

LLu ¼ LLu [ fig;EEu ¼ EEu [ fðu; i; PcðiÞÞg;SetUpLookaheadAPsði 0Þ;

//Definition of Function SetUpLookaheadAPs

SetUpLookaheadAPs(current ap,time) {

foreach j in Rðcurrent apÞ do

If PV ðjÞBWðjÞ � ��T ðcurrent ap; jÞ þ time then

Lu ¼ Lu [ fjg;Eu ¼ Eu [ fcurrent ap; j; Pcurrent apðjÞg;//recursive calling

SetUpLookaheadAPs (j, ��T ðcurrent ap; jÞþtime);

end ifend foreach

}

The algorithm is from the perspective of AP u. The

outputs are two sets both from u’s perspective: LLu denotes

the set of lookahead-APs, and EEu is the set of 3-tuples

ði; j; probÞ denoting the transition of AP i to j with

probability prob. Actually, LLu and EEu together constitute a

subgraph, rooting at u, of the contact map with edge

weights. PV ðjÞ is the data volume needed to prefetch at AP

j. BWðjÞ is the expected available down-link bandwidth of

AP j when downloading from the data-origin server. Thus

PV ðjÞ=BWðjÞ is the expected time for AP j to complete

prefetching. ��T ði; jÞ is the expected time of transition (i.e.,

hand-off) from AP i to j.

4 REPRESENTATIVE-BASED PREFETCHING

APPROACH

Representative-based AP selection aims to optimize the

selection of a set of prefetching APs such that the system can

provide requesting vehicles with maximum gain in down-

load performance. Essentially, our representative-based ap-

proach organizes the lookahead-APs into an overlay consisting of

clusters with representative APs as cluster heads and client APs as

cluster members, as shown in Fig. 4. Cluster structure allows

APs without prefetching the requested data to immediately

locate the “nearest”data source (i.e., corresponding cluster

heads) with constant query delay. This property is crucial

for high-speed vehicle users whose limited contact duration

with APs can thus be saved without long delay for querying

and locating the requested data.

4.1 Algorithms for Representative AP Selection

The frequently used notations and symbols are summarized

in Table 1.

4.1.1 Vehicular Download Volume and Performance

Gain

To quantify the performance gain and loss, we define the

metric vehicular download volume (VDV ), i.e the data volume

a vehicle can download from the AP. Then we define Dði; jÞas the VDV at AP i which fetches data from entity j (either

the remote data-origin server or a representative AP) to

satisfy the download request and express it as


Fig. 4. Organizing the lookahead-APs into clusters with representativeAPs as cluster heads and client APs as cluster members.

Dði; jÞ ¼ min�Bbji; B

wi

�Tconi �Bw

i �RTTbij � IiðjÞ;

where IiðjÞ ¼1; j 6¼ i;0; j ¼ i;

� ð4:1Þ

where Bwi is the available wireless bandwidth of i and Tconi

is the expected connection duration of a vehicle with i. Bbji

and RTTbij are the corresponding bandwidth and roundtrip time achieved between j and i through i’s backhaullink, respectively. When j ¼ i, AP i has the requested dataand can directly reply to the request with locally storeddata, thus RTTbii ¼ 0 and Bb

ii ¼ Bwi . The right item of the

minus sign indicates there is no data downloaded duringsending request to, and waiting for reply from, entity j.

Let a requesting vehicle’s current contact AP be APc andthe set of APc’s lookahead-APs is Lc ¼ fAPk1

; APk2; . . . ;

APkng. A selection vector, U ¼ ½u1; u2; . . . ; un�, correspondsto APc’s decision that, among Lc which are selected asrepresentatives and which as clients as well as whichrepresentative a client is attached to. Let ui ¼ i if APi is arepresentative, and ui ¼ jðj 6¼ iÞ if APi is a client and selectsAPj as its representative. We define the expected VDV thatthe group of APs in Lc can provide to the requestingvehicle when it passes through Lc, as a function of theselection vector U:

DGðUÞ ¼XAPi2Lcui¼i

PcðiÞ �Dði; iÞ þXAPi2Lcui¼i

XAPj2Lcuj¼i

PcðjÞ �Dðj; iÞ;

ð4:2Þ

where PcðiÞ is the predicted probability of transiting fromAPc to APi and may need multiplying multiple one-hoptransition probabilities. For example, in Fig. 2, PAðF Þ ¼PAðBÞ � PBðF Þ ¼ 0:42.DGðUÞ measures the gross gain of prefetching based on

the selection vector U and do not consider the performanceloss due to possible cache replacement. The left item ofthe plus sign represents the expected gain of VDV when therequesting vehicle contacts a representative AP and directlydownload the data from it. The right item of the plus signrepresents the expected VDV gain when the vehicle ends upcontacting a client AP and requesting the data from thecorresponding representative.

4.1.2 Storage Management for Prefetched Contents

Owing to the finite storage capacity, prefetching for onevehicle can cause the replacement of some contents alreadyprefetched for other vehicles and thus risk degrading theoverall performance in terms of VDV. Thus in the following,

we describe the storage management algorithm for the

prefetched contents and quantify the loss in content

replacement for each AP.When facing insufficient storage space, the storage

management algorithm here tries to eject a set of prefetched

objects with minimum loss in terms of VDV. The

representative-based approach somewhat complicates the

calculation of loss in VDV. The VDV loss of ejecting a

prefetched object from an AP would occur if either that AP

or any of its clients ends up being contacted by the vehicle

that requests the ejected object and it equals the reduction of

VDV when the finally contacted AP refetches that object

from the data-origin server. Let set Ok ¼ fO1; O2; . . . ; Omgdenote the prefetched objects in APk and a corresponding

replacement vector V ¼ ½v1; v2; . . . ; vm� denote which objects

are to be replaced (vi ¼ 1 if Oi is replaced while vi ¼ 0 if Oi

is retained). Assume the size of Oeð2 OkÞ to be ejected is

SðOeÞ. We suppose the index of the AP which notifies APkto prefetch Oe is hkðOeÞ. Then the VDV loss as a result of

ejecting Oe from APk can be expressed as a function of the

size of that ejected object

DLkðSðOeÞÞ ¼XAPj2fMOe

k[APkg

PhkðOeÞðjÞ � ðDðj; kÞ �Dðj; originÞÞ: ð4:3Þ

where MOe

k is the set of client APs taking APk as

representative with respect to object Oe. This equation

clearly reflects that the loss of ejecting an object from APkwould also be possibly imposed on the other APs which

take APk as representative for that ejected object. The actual

loss depends on which AP among the set fMOe

k [APkg is

finally contacted by the requesting vehicle and the prob-

ability of contacting each AP is expressed as PhkðOeÞðjÞ.Our replacement algorithm is to find a V leaving

minimum VDV loss and enough space for the contents to

be prefetched. An optimization problem can be modeled as

minimize: DLkðPV ðkÞÞ ¼XOi2Ok

vi �DLkðSðOiÞÞ

subject to:Xmi¼1

viSðOiÞ � PV ðkÞ; vi 2 f0; 1g:

where PV ðkÞ is the prefetch volume, i.e., size of contents

needed to be prefetched at APk. This optimization problem

is equivalent to the classical knapsack problem and has

been proved to be NP-hard. Hence a heuristic is developed

to approximate the optimal V. We define the normalized

DL (NDL) which is the loss of creating one unit of free

space by ejecting Oi from APk as

NDLi ¼DLkðSðOiÞÞ

SðOiÞ: ð4:4Þ

The prefetched objects in storage are arranged in increasing

order of NDL. Replacement starts from the object with the

smallest NDL and continues until PV ðkÞ units of space are

freed up. Then the VDV loss at APk, DLkðPV ðkÞÞ, is the sum

of DL of each ejected object.


TABLE 1Frequently Used Notations and Symbols

4.1.3 Estimating the Prefetch Volume

Here, we describe how to determine the value for prefetch

volume (PV), i.e., the size of the content to be prefetched,

which is mentioned in the previous section. The prefetch

volume is upper bounded by the vehicle-AP contact capacity

(CC) which is the maximum data volume a vehicle in

driving can download from an AP during their connection.

Contact capacity varies based on many factors, like the

number of vehicles being served, vehicle speed, channel

quality, etc. [19]. We estimate the contact capacity based on

the cumulative distribution function (CDF) of the histori-

cally observed ones. Then the p percentile value is chosen as

the estimate. In the simulation, we set p as 85 percent which

proves to be a proper value in two aspects: avoid insufficient

prefetching; and excluding some outliers with rare large

values to save both storage and backhaul bandwidth.Since the representative AP is also responsible for a set of

client APs, it should choose the prefetch volume to be the

maximum among the estimated contact capacities of those

client APs together with itself. Hence, for a certain selection

vector U, APi needs to prefetch

PV ðiÞ ¼ maxuj¼i

CCðjÞ: ð4:5Þ

4.1.4 Optimizing the Overall Prefetching Gain

Now we can formulate the objective of maximizing the

overall vehicular download volume into an optimization

problem

maximize DGðUÞ �Xui¼i

DLiðPV ðiÞÞ ð4:6Þ

subject to ui 2 ½1; n�8i : ui ¼ uui

ð4:7Þ

ui 6¼ i; if BTLi > threshold;

ui ¼ i; if APi has the requested content;

PV ðiÞ ¼ maxuj¼i

CCðjÞ;ð4:8Þ

where DGðUÞ expressed in (4.2) is not extended here forbrevity. Equation (4.6) expresses the net gain of VDV,excluding the possible loss due to cache replacement.Equation (4.7) is used for limiting the cluster radius (inFig. 4) to one-hop. Equation (4.8) indicates APi would not beselected for prefetching if its backhaul link is heavilyloaded. Backhaul Traffic Load (BTL) of APi is defined as theratio of total throughput of APi’s current flows to itsbackhaul link capacity. Total throughput of current flowscan be calculated by monitoring the packet arrival rate atthe application layer; while backhaul link capacity can bemeasured using available tools like [20].

To find the optimal selection of prefetching APs, the

algorithm needs to compute the net VDV gain for each

possible combination of variables in vector U. If variable uiis binary (e.g., APi is either selected as representative or

not), the search space is already in the order of Oð2nÞ. In

reality ui 2 ½1; n�, the search space would be much larger

than the binary case as n increases. Therefore, we propose a

heuristic algorithm, shown in Algorithm 2, to efficientlyaddress the optimal prefetching AP selection.

Algorithm 2. Representative-AP Selection by APcOutput: selection vector U ¼ ½ui�1�n;

prefetch volume vector Y ¼ ½yi�1�n�1. foreach ui 2 U do /* Initialize U */2. if APi has the requested content then

3. ui i //be fixed as representative;

4. else if BTLi � threshold then

5. ui �1 //indetermined but forbidden to be

representative;

6. else

7. ui 0 //indetermined;

8. foreach yi 2 Y do /* Initialize Y */9. yi ¼ PV ðiÞ;

10. Define Matrix G ¼ ½gij�n�n;

11. for i ¼ 1 to n do /* Initialize G /*

12. for j ¼ 1 to n do

13. if i ¼¼ j then

14. gii PcðiÞ � ½Dði; iÞ �DLiðPV ðiÞÞ�;15. else

16. t ¼ maxðPV ðiÞ; PV ðjÞÞ17. gij PcðjÞ � ½Dðj; iÞ �DLiðtÞ�;18. while ð9i 2 ½1; n� : ui � 0Þ do /* looping until

all in U are determined */

19. ðr; sÞ argmaxði;jÞðgii þ gijÞ20. if ur ¼¼ �1 or ður 6¼ 0 and ur 6¼ rÞ then

21. Continue

22. ur r, us r;

23. yr maxðyr; PV ðsÞÞ, ys 0;24. for j ¼ 1 to n do

25. if yr � PV ðjÞ and j 6¼ s then

26. grj PcðjÞ�Dðj; rÞ;27. else if yr < PV ðjÞ and j 6¼ s then

28. grj PcðjÞ � ½Dðj; rÞ þDLrðyrÞ�DLrðPV ðjÞÞ�;

29. Return U and Y

A matrix G ¼ ½gij� is defined and gij records the VDV atAPj which is a client and takes APi as representative. Onthe main diagonal of G, gii records the VDV at APi whichis a representative itself. Our heuristic algorithm isessentially greedy and attempts to select, at each iterationfrom lines 18 to 28, a pair of APs of representative-clientrelationship with maximum sum of VDV gain (gii þ gij inline 19). Note that it is possible that the two APs in theselected pair are the same, and in such case, that AP is arepresentative with no client selected. When pair ðr; sÞ isselected, we need to update (Lines 24 to 28) the elements inrth row of G due to the correlation among them. Forexample, if APs selects APr as representative, the cost(VDV loss) of another AP APj selecting APr as representa-tive would be reduced accordingly (lines 26 and 28)because this cost is already (partially) included into thepair ðr; sÞ. The algorithm would terminate when all ui in U

are determined. Then the selection vector U and prefetchvolume vector Y are returned as results. The heuristicalgorithm has a complexity of Oðn2Þ which is much moreefficient than the optimal algorithm. In Section 6, we will


show that the heuristic algorithm can achieve near-optimalperformance.

4.2 More on Storage Management: Prefetching andCaching

Other than prefetching the contents in advance for therequesting vehicle, each AP also caches the contents thathave been accessed historically by vehicles, in order tospeed up future access. The reason for caching is based onthe fact that specific contents historically accessed throughan AP reflect their local popularity among the vehicleswhich passed that AP. In this section, we describe how to

manage the storage where the prefetched contents and cached

contents exist simultaneously. Prefetched contents are the onesstored for potential access and yet to be accessed so far. Theywill change to cached contents once they are accessed byvehicles which pass the hosting AP. Owing to the differentproperty of these two types of contents, it is difficult tomanage them as a whole. Thus in our system, we managethe cached and prefetched contents separately, i.e., newlyprefetched (cached) contents can only replace prefetched(cached) ones when there is not enough storage space.

Cached objects are further categorized into two types—active and idle. An AP, being notified of prefetching, doesnot need to do so if the intended object has already beencached and what it only needs is to set that object to“active.” The other cached objects not set as active areconsidered as “idle.” Cached objects being active wouldrevert to idle when the requesting vehicle drives away fromthe caching AP and its clients.

Assume the APk has a set of cached objects Ck ¼ fC1;

C2; . . . ; Cng. For each cached object, we define two rates:local access rate, rl and activation rate, ra. Local access ratereflects the relative popularity of an object among thelocally cached ones and is expressed as rli ¼ fli=ðP

j2½1;n� fljÞ, where fli denotes the number of local accessto object Ci during a recent time period T . Activation rate isdefined as rai ¼ fai=ð

Pj2½1;n� fajÞ, where fai means the

number of times object Ci is being activated (from idle toactive) by other APs during T . High activation rate of anobject indicates the caching AP is ideal to be representativefor that object and also implicitly reflects the object’spopularity in this neighborhood.

Ejecting a cached object to be accessed locally by anassociated vehicle can cause VDV loss; while ejecting anobject to be activated can lead to additional cost in thereprefetching process although it will not cause VDV loss.We measure the reprefetching cost as normalized prefetch

latency (NPL) which for an object Ci on APk is defined as

NPLi ¼SðCiÞ=Bb

sk

Twaitik

; ð4:9Þ

where Bbsk is the achievable bandwidth for APk fetching

data from the data-origin server. The normalizationconstant Twaitik is the expected waiting time from when Cion APk is activated until Ci ends up being accessed (by therequesting vehicle or a client AP). Normalized prefetchlatency can be perceived as measuring how quick the datato be prefetched can be prepared before the access.

For ejecting an idle cached object, we combine the VDV

loss and NPL into a single function to define the total cost

Cost�Cidlei

�¼ rli � ð1�Dðk; originÞÞ þ pc � rai �NPLi;

where origin is the data-origin server from which APkfetches Ci and pc ð0 < pc < 1Þ is a penalty coefficient to give

a lighter weight to the cost measured by NPL, because

VDV loss is much more directly detrimental to our objective

of maximizing the vehicle users satisfaction.For active cached objects, we should also include (4.4) to

reflect the additional loss from the client APs which take

APk as representative

CostðCactivei Þ ¼ NDLi þ rli � ð1�Dðk; originÞÞ

þ pc � rai �NPLi:

Similar to prefetched object replacement, we arrange the

cached objects in increasing order of their normalized cost,

CostðCiÞ=SðCiÞ and continue replacing the objects from the

one with the smallest normalized cost until enough units of

space are freed up.

5 APPLYING NETWORK CODING IN CCDSV

As discussed in Section 1.1.3, network coding helps increase

the information utility of the stored contents on APs, i.e., the

higher the information utility of a piece of data, the more

vehicles there are that find it useful and contribute to their

download process. However, adopting network coding in a

naive way can still introduce duplicate data pieces among

the APs along vehicle’s route thus reducing the information

utility and effective resource usage. Before showing an

illustrative example and our solutions, we first briefly

describe the implementation of network coding in CCDSV.Implementation. Chou et al. [17] propose a practical

network coding method which bridges the gap between

initial theoretical works and implementation in realistic

system. We follow the proposed concepts and methods in

[17] in our work. CCDSV does encoding at piece-level and

the data transmissions between the data-origin server and

AP as well as among APs are all in units of pieces. The data-

origin server divides the original file into N generations,

each of which is further divided into M pieces (i.e.,

generation size). A piece can consist of one or multiple

packet(s). By network coding, the server generates an

encoded piece by linearly combining the set of pieces in

one generation with random coefficients: p0

ij ¼PM

j¼1 cjpij,

where pij is the jth piece in ith generation. For decoding

purpose, encoding vector ½c1; c2; . . . ; cM � needs to be

transmitted together. The benefits of piece-level network

coding lie in that both computation complexity and

communication overheads can be flexibly kept at a reason-

able level for different file sizes, by tuning the piece size and

the generation size [21]. After collecting M independent

encoded pieces of a generation together with the corre-

sponding encoding vectors, the vehicle can recover the

original contents of that generation by solving a set of linear

equations. The entire original file can be recovered after

collecting all the generations.


5.1 Network Coding-Based Prefetch ContentSelection

In this section, we deal with the question regarding whichparts of the requested file, encoded under network coding,are prefetched onto the selected representative APs. Sincethe encoding is done at piece level for each generation of afile, the APs prefetch data in unit of pieces. Randomcombination of pieces in network coding makes no piece inthe same generation special or indispensable. The vehiclecan recover the original generation as long as enoughindependent pieces are collected. Thus, we need not toconsider which specific pieces of one generation areprefetched on the APs but only need to determine howmany pieces from each generation are prefetched, under theconstraint of total prefetch volume defined in (4.5).

Actually, for the vehicle which triggers the prefetching,the content selection is simple: prefetching the pieces thatare disjoint with the ones the requesting vehicle hasalready collected. Note that two sets of pieces are “disjoint”if they are either independent pieces in the same genera-tion or belonging to different generations. However, weexpect the prefetched data can also incidentally satisfy thedemands of other vehicles passing by later, in order toincrease the information utility of each prefetched contentand thus save on the system resources. Although networkcoding potentially helps in this point, we still need todesign a systematic approach to realize such potential asfully as possible.

We first describe how to represent each AP’s storagestatus, i.e., how much data is in storage for each generation.One accurate method is to use the matrix of encodingvectors of the stored pieces. However, examining andoperating the matrix is time-consuming, and exchanging thematrix with other APs is bandwidth-consuming [22]. Thuswe use a mathematical property of matrix, rank, to tradeoffsome accuracy for a quick and succinct reflection of thestorage status. Rank indicates the number of independentpieces currently stored for a specific generation. Now thestorage status is defined as: Riðf; kÞ which denotes the rankof the pieces of file f’s kth generation stored on APi. Notethat storage status with Riðf; kÞ ¼ GSðf; kÞ, where GSð�Þ isthe generation size of file f’s kth generation, completelyrepresents the information necessary for recovering theoriginal contents of f’s kth generation.

The example in Fig. 5 illustrates the impact of prefetchcontents selection. Assume a file f has two generations G1

and G2, each with generation size GSðf; 1Þ ¼ GSðf; 2Þ ¼ 6.Now, APB is asked to prefetch three pieces of f for a vehicleV , which has already collected three independent pieces ofG1 in f . The storage status is marked next to each AP.

Assume all the pieces stored on A;C;D;E are independentof each other. Now we consider two casual selections: (I) Bprefetches three pieces from G1, i.e., RBðf; 1Þ ¼ 3; (II) Bprefetches three pieces from G2, i.e., RBðf; 2Þ ¼ 3. Both ofthem are suitable for V , because they can provide it disjointpieces. However, when considering another two AP-contactsequences �1 ¼ fA;B;Cg and �2 ¼ fD;E;Bg, we notice thesecasual selections are less effective in terms of informationutility. Selection (I) would generate RBðf; 1Þ þRAðf; 1Þ þRCðf; 1Þ �GSðf; 1Þ ¼ 2 duplicate pieces ofG1 on �1. Thus thevehicles following �1 would find two pieces on downstreamAP C useless, since A and B already provide all the dataneeded to recoverG1. Similarly, selection (II) would generateRBðf; 2Þ þ RDðf; 2Þ þ REðf; 2Þ � GSðf; 2Þ ¼ 1 duplicatepiece on �2. The optimal selection for B is to prefetch 1piece from G1 and two pieces from G2. Such selectiongenerates no duplicate pieces on both �1 and �2 whilesatisfying the requirement on the prefetch volume.

To optimize the information utility of the prefetchcontents, the prefetching AP should factor in all the contactsequences passing it, the occurring probability of eachsequence and the storage status of each AP on each sequence.Obviously, such process is computation-consuming. Thus,we devise a lightweight heuristic algorithm based only onthe storage status of the neighboring APs in the contact map.For an AP apx, we define RankSum

RSðx; f; kÞ ¼X

api2NBðxÞRiðf; kÞ; ð5:1Þ

to reflect the overall storage status among apx’s neighbor-hood pertaining to storage of file f’s kth generation. Here,NBð�Þ denotes the set of neighbors in the contact map. Wearrange the generations, which the requesting vehicle hasnot completed, in increasing order of RS. Then theprefetching starts from the generation with the smallestRS and continues until the required prefetch volume hasbeen prefetched. Noting that the number of prefetchedpieces for each generation is no more than that generation’ssize. Intuitively, prefetching pieces from the generationwith smaller RankSum would result in a lower probabilitythat these prefetched pieces duplicate with other piecesstored in the neighborhood.

6 PERFORMANCE EVALUATION

In this section, we implement CCDSV in NS-2.34 [23] andevaluate the performance under two typical realisticvehicular scenarios. We compare CCDSV with otherexisting systems from various aspects and also evaluateCCDSV under several alternative system design options.

6.1 Methodology

We feed NS-2.34 with realistic vehicular traces, which isgenerated by multiagent microscopic traffic simulator(MMTS) [24] developed at ETH Zurich. This trafficsimulator is able to simulate public and private traffic overreal regional road maps of Switzerland with a high level ofrealism. The entire set of traces include around 260,000vehicles for a period of 24 hours in an area of 250 km by260 km. For manageable simulation time, we highlight two


Fig. 5. Prefetch content selection.

smaller areas in Zurich—Oberstrass and Zentrum. We takeOberstrass (4;000 m� 7;000 m) as a dense scenario with 520vehicles moving in it and 32 access points deployed alongthe roads and take Zentrum (4;685 m� 4;010 m) as a sparsescenario with 293 vehicles and 29 APs. These two scenariosare depicted in Figs. 6 and 7 where the curves indicate thetrajectories of vehicles and black squares are the deployedAPs. Transmission range of each AP is about 230 meters andtransmission bit rate is set at 11Mbps [3]. The bandwidth ofbackhaul link from AP to Internet (data-origin server) is3 Mbps. Throughout the simulation, the data transmissionbetween AP and vehicle is based on TCP. When the vehiclereceives enough encoded packets to recover the original file,it immediately schedules a file-acknowledgment destined tothe transmitting AP, which will stop the transmission afterreceiving that acknowledgment.

The contents on the data-origin server are requestedbased on Zipf-like distribution, which is usually used toreflect the Web contents popularity [25]. We arrange N(¼ 500 in our setting) files on the server in increasing orderof their popularities. The probability of the ith file beingrequested follows P ði; �Þ ¼ ð1=i�Þ=ð

PNk¼1 1=k�Þ, where � is a

parameter characterizing the skewness of the Zipf-likedistribution. In our setting, we choose � as 0.8 according tothe results in [25]. Each file’s size (in MBytes) is chosenrandomly from the set of f4; 5; 6; 7; 8g. We fix the generationsize and generation number of each file to 10 and 15,respectively. The piece size would scale accordingly withthe file size. The encoding coefficients are randomly chosenfrom the field, Galois Field(28). A vehicle requests a new fileonly when the previous one is completely satisfied. Similarto the study in [26], we describe the storage capacity of eachAP in terms of relative storage size which is defined as theratio of storage size to the total size of all files requested bythe vehicle users. The default relative storage size in oursimulation is set as 5 percent which is assumed to be thesame for all APs.

We are mainly interested in evaluating the following twometrics:

. Download rate. The average download throughput (inKbps) perceived by vehicular users driving acrossthe deployed APs.

. Backhaul traffic. The average data amount per second(in Kbps) flowing into each AP through backhaullink during the simulation. Backhaul traffic intro-duced by prefetching service is expected to be as low

as possible to avoid overloading the APs and thesaved bandwidth can be exploited to include moredata services.

In addition to the performance evaluations in thefollowing sections, we also conduct another set of simula-tions to evaluate the storage usage and impact of storagesizes under various algorithms in Appendices A and B,which can be found on the Computer Society DigitalLibrary at http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.142, respectively,.

6.2 Performance Comparison

We conduct performance comparison of the proposedCCDSV with other existing systems from two aspects:prefetching-AP selection and encoding method.

6.2.1 Prefetching AP Selection

We compare our representative-based AP selection methodwith the other three methods:

. basic—this performs no prefetching and serves as abaseline case,

. most-likely—used in [5], [7], and [8] with prefetch-ing at the APs along the most likely drivingtrajectory, among the lookahead-APs, and

. on-all—used in [14] with prefetching on all thelookahead-APs.

We implement the representative-based method in bothheuristic and optimal algorithms:

. rep-heu—refer to Algorithm 2 and

. rep-opt—refer to (4.6); this serves as the upperbound for performance of rep-heu.

Figs. 8 and 9 show the comparison results of downloadrate and backhaul traffic under different active vehicularusers in both Oberstrass and Zentrum scenarios. Activevehicular users are defined as the percentage of vehiclesthat have content requests. Higher number of activevehicular users indicates a higher level of workload overthe system.

In the figures for download rate, we can see that rep-heuhas a performance quite close to the optimal algorithm andthis proves the effectiveness of the proposed heuristicalgorithm. Both rep-heu and rep-opt have higher downloadrate than the other three methods, under all numbers ofactive vehicular users. As the active users increases, rep-heuand rep-opt also have much slower decreasing rate indownload throughput than that of most-likely and on-all.This is because our representative-based method success-fully prevents the frequent replacement of prefetched-objects from nullifying the benefits of prefetching, byselecting as representatives the APs with small objects


Fig. 6. Oberstrass area with 32 APs and 520 vehicles.

Fig. 7. Zentrum area with 29 APs and 293 vehicles.

replacement loss and light traffic loads. The download rateof on-all is the most sensitive to the amount of active users,and drops fast when the later increases.

Then we look at the figures for introduced backhaultraffic. As active vehicular users grow, the traffic injected intothe APs through backhaul link also increases in all themethods. The representative-based methods (rep-heu andrep-opt) have a graceful and slow increasing rate, with lessthan 30 percent load on backhaul link even under 100 percentactive users; while on-all has the fastest increasing rate andcosts near 50 percent backhaul bandwidth resource. Whenthe system is with small workloads (with active usersbetween 10%-30%), introduced backhaul traffic of represen-tative-based methods are similar to that of on-all, since theformer tend to select all the potential APs as representativeswhen system resource is abundant. When active usersincrease to above 30 percent in our case, the selectionbecomes more and more prudent, resulting in remarkablesaved backhaul traffic.

6.2.2 Encoding Methods

In Section 5, we describe the benefits of network coding inour content distribution system and also discuss its potentialadvantages over erasure coding, which is a type of sourcecoding. Now, we quantitatively show the benefits ofnetwork coding in both download rate and backhaul traffic.

We compare our network coding with the other twomethods.

. Chunk-based (used in [5]): this method simplydivides the file into chunks and keeps the originaldata form without any coding. The prefetching AP

will prefetch one or multiple chunks. In the simula-tion, we assume each file is divided into 10 chunkswhich equals the generation size in network coding.

. Erasure-coding (used in [6]): in the simulation, wedivide each file into 10 chunks (the same as chunk-based and network-coding) and set the redundancyfactor as 2, i.e., any 10 out of 20 encoded chunks canrecover the original file. The brief description oferasure coding as well as the comparison betweennetwork coding and erasure coding are elaborated inAppendix C, which is available in the onlinesupplemental material.

Figs. 10 and 11 show the comparison results. In Fig. 10,for both areas, network-coding has the highest download ratewhile chunk-based is the worst. Owing to the dramaticallyreduced duplicate data pieces on APs, coding (erasure ornetwork) would generate much higher download rate thanthat of noncoding method (chunk-based), especially underlarge percentage of active vehicular users. When active


Fig. 9. Zentrum area: download rate and backhaul traffic under different selection of prefetching APs.

Fig. 10. Download rate in Oberstrass and Zentrum Area, under differentcoding methods.

Fig. 8. Oberstrass area: download rate and backhaul traffic under different selection of prefetching APs.

users grow up to 70 percent, download rate in network-coding is 10.5 percent higher than that in erasure-coding and43 percent higher than that in chunk-based.

Another direct benefit of network-coding is reducing thebackhaul traffic introduced, through increasing the utility ofeach stored piece of data. Fig. 11 shows that network-codingcan reduce the backhaul traffic, on average, by 40 percentcompared to chunk-based and by 12 percent compared toerasure-coding.

6.3 Impact of Mobility Prediction Accuracy

To prove our claim that representative-based method inCCDSV is robust to the mobility prediction accuracy andcan maintain stable system performance, we evaluate thesystem performance under different accuracies. However,such accuracy is not controllable in our system. Thus wedesign a tunable predictor (T-predictor) that can determi-nately set the prediction accuracy by a parameter p which isdefined as the fraction of predictions that correctly identifythe next-contact AP. T-predictor makes prediction based onthe generated vehicle-AP contact trace, instead of thelearning process. Hence T-predictor exactly knows avehicle’s 1) potential next-contact APs, � ¼ fap1; ap2; . . . ;apng, arranged in decreasing order of transition probability;and 2) actual next-contact AP, apt 2 �.

For the sake of brevity, we plot the download rate andbackhaul traffic only in the area of Oberstrass with50 percent active vehicular users. As a comparison withrepresentative-based method, we also plot for on-all andmost-likely methods. The results are shown in Fig. 12. Wecan observe that, as prediction accuracy grows, thedownload rate of most-likely shows obvious increase,matching the expectation that the performance dependslargely on the predictor’s accuracy. on-all does not changewith the accuracy, because it selects all the potential APsregardless of the prediction accuracy. Both rep-heu andrep-opt maintain the download rate at a high level even atan accuracy of 10 percent. The download rate stablyincreases as prediction accuracy varies from 10 to100 percent, since the representative-based method alsoexplicitly factors the transition probabilities into theselection of representative APs.

When prediction accuracy increases up to 100 percent,the most-likely would always place the prefetched data atthe actual next-contact AP in advance. Even under suchideal case, representative-based method also gives a betterdownload rate than that of most-likely. This can beexplained as follows: the accurately predicted AP may

be a hot one responding busily to the large amount ofrequests of prefetching; thus the correctly placed data canreplace some prefetched objects, whose loss may cancelout some extent of performance gain. On the contrary,representative-based method, in the case of hot APs, canselect a lightly loaded AP nearby as representative andhelp hot AP prefetch as well as transmit data. The overallresults show that rep-based prefetching AP selection method

benefits from the improved prediction accuracy and is also

robust to its variation.In Fig. 12b, as prediction accuracy increases, the

introduced backhaul traffic in both rep-heu and rep-opt

decrease. This is because the set of representatives selectedwould shrink and concentrate on the APs with high-transition probability, as prediction accuracy increases.

6.4 Impact of Prefetch Contents Selection

Our rank-sum-based selection method tries to approach themaximal information utility of the prefetched contents. Oneof the main benefits is the reduced backhaul traffic, i.e.,reducing the backhaul bandwidth cost and load on thedata-origin servers. In this part, we compare the averagebackhaul traffic introduced by rank-sum with the other twoselection methods.

. Sequential. The prefetched pieces are selected fromincomplete generations sequentially.

. Random. At each time, it randomly selects anincomplete generation and prefetches pieces fromit; then the process continues until the amount ofprefetch volume is reached.


Fig. 11. Backhaul traffic in Oberstrass and Zentrum Area, under differentcoding methods.

Fig. 12. Download rate and backhaul traffic in Oberstrass, under 50percent active vehicular users and different prediction accuracy.

For comprehensive comparison, we also measure andinclude into the results the case with only prefetching butno network-coding (no-coding). The results are shown inFigs. 14 and 13.

As Fig. 14 shows, rank-sum can reduce the backhaultraffic, on average, by around 50 percent compared to no-coding, under all three percentages of active users in bothareas. Under small percentage of active users (e.g.,30 percent), the reduced traffic of rank-sum compared tosequential and random is limited, being around tens of Kbps.This is because, when the amount of stored contents in thenetwork of APs is relatively small, the probability of existingduplicate pieces among nearby APs is low regardless of theselection method being used. However, as the percentageincreases, the reduced traffic due to the informed selectionin rank-sum begins to enlarge (up to 22 percent under70 percent active users), compared to the other two casualselection methods. We can also observe that random per-forms better than sequential selection, due to the introducedrandomness which potentially helps reduce the probabilityof a vehicle finding duplicate pieces among nearby APs.

In Fig. 13, we notice that rank-sum shows dramaticimprovement of download rate compared to the other threemethods, especially under high percentage of active users.This is mainly due to both the frequency and the loss ofreplacing prefetched objects on APs being lowered down asa result of reduced traffic loads and data amount neededfor prefetch.

6.5 Protocol Overheads and Effectiveness

Protocol overheads. Control messages of CCDSV are usedin many aspects: 1) a vehicle notifies the previously contactAP of the transition contexts; 2) when selecting representa-tive APs, an AP communicates with its lookahead-APs; and3) neighboring APs in the contact map periodicallyexchange storage status among one another. In this part,

we test the protocol overheads, i.e., how much controlmessages on average are introduced into each AP persecond during the simulation. Table 2 shows the results.The last row gives the percentage of control messagesagainst the average backhaul traffic. We can see that evenunder 70 percent data request percentage, the amount ofcontrol messages only accounts for 2.18 percent of theaverage backhaul traffic.

Protocol effectiveness. An indication of the protocoleffectiveness is whether the system introduces too muchuseless prefetched data which would not be retrieved atall. These useless part would be a burden on systemresources, like backhaul link bandwidth and storage space.Here, we define Utilization Rate to measure what percen-tage of the prefetched data on an AP is finally retrievedeither from vehicles or the client APs. The higher theutilization rate, the more effective the protocol is. Table 3shows the results. We can see that the utilization rateunder all cases are maintained at a high level (>70%); andwhen the data request percentage increases, the prefetcheddata would be better utilized.

7 CONCLUSION

In this paper, we propose CCDSV—a cooperative contentdistribution system for vehicles through infrastructure APs.CCDSV is designed to achieve efficient cooperation amongthe network of APs so that vehicular users can effectivelyutilize the opportunistically encountered and short-livedAP connections. A structure called contact map is main-tained distributedly on top of the APs, learning andpredicting the potential vehicle-AP contacts. With therepresentative-based scheme, CCDSV carefully selects theones from the predicted set of lookahead-APs to performprefetching in order to avoid overloading AP’s backhaullink and ejecting the prefetched/cached data whose costmay overwhelm the prefetching benefits. CCDSV distri-butes the contents encoded by network coding and tomaximize the information utility under network coding,CCDSV selects each piece of prefetched content based onthe storage status (reflected by the metric rank-sum) ofneighboring APs in the contact map. We summarize thesystem control flow of CCDSV in Appendix D, which isavailable in the online supplemental material. The simula-tion results under various scenarios prove the effectivenessof CCDSV in many performance aspects.


Fig. 13. Backhaul traffic in Oberstrass and Zentrum Area, under differentcontent selection methods.

Fig. 14. Download rate in Oberstrass and Zentrum Area, under differentcontent selection methods.

TABLE 2Protocol Overheads in Oberstrass Area

TABLE 3Utilization Rate of Prefetched Data

REFERENCES

[1] The New York Times “Customers Angered as Iphones Overloadat&t,” http://www.nytimes.com/2009/09/03/technology/companies/ 03att.html, 2012.

[2] B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu,E. Shih, H. Balakrishnan, and S. Madden, “Cartel: A DistributedMobile Sensor Computing System,” Proc. Fourth Int’l Conf.Embedded Networked Sensor Systems (SenSys ’06), pp. 125-138, 2006.

[3] J. Eriksson, H. Balakrishnan, and S. Madden, “Cabernet: VehicularContent Delivery Using WiFi,” Proc. 14th ACM Int’l Conf. MobileComputing and Networking, pp. 199-210, 2008.

[4] M. Dischinger, A. Haeberlen, K. Gummadi, and S. Saroiu,“Characterizing Residential Broadband Networks,” Proc. ACMSIGCOMM Conf. Internet Measurement (IMC), pp. 24-26, 2007.

[5] P. Deshpande, A. Kashyap, C. Sung, and S. Das, “PredictiveMethods for Improved Vehicular WiFi Access,” Proc. Seventh Int’lConf. Mobile Systems, Applications, and Services (MobiSys), pp. 263-276, 2009.

[6] Y. Huang, Y. Gao, K. Nahrstedt, and W. He, “Optimizing FileRetrieval in Delay-Tolerant Content Distribution Community,”Proc. IEEE 29th Int’l Conf. Distributed Computing Systems (ICDCS),pp. 308-316, 2009.

[7] B. Chen and M. Chan, “MobTorrent: A Framework for MobileInternet Access from Vehicles,” Proc. IEEE INFOCOM, pp. 1404-1412, 2009.

[8] U. Shevade, Y.-C. Chen, L. Qiu, Y. Zhang, V. Chandar, M.K. Han,H.H. Song, and Y. Seung, “Enabling High-Bandwidth VehicularContent Distribution,” Proc. ACM Int’l Conf. Emerging NetworkingEXperiments and Technologies (CoNEXT), pp. 23:1-23:12, 2010.

[9] D. Zhang and C.K. Yeo, “A Cooperative Content DistributionSystem for Vehicles,” Proc. IEEE Global Telecomm. Conf.(Globecom ’11), 2011.

[10] A.J. Nicholson and B.D. Noble, “Breadcrumbs: Forecasting MobileConnectivity,” Proc. ACM MobiCom, pp. 46-57, 2008.

[11] L. Song, D. Kotz, R. Jain, and X. He, “Evaluating LocationPredictors with Extensive Wi-Fi Mobility Data,” Proc. IEEEINFOCOM, vol. 2, pp. 1414-1424, 2004.

[12] A.J. Nicholson, Y. Chawathe, M.Y. Chen, B.D. Noble, and D.Wetherall, “Improved Access Point Selection,” Proc. Fourth Int’lConf. Mobile Systems, Applications and Services (MobiSys ’06),pp. 233-245, 2006.

[13] J. Pang, B. Greenstein, M. Kaminsky, D. McCoy, and S. Seshan,“Wifi-Reports: Improving Wireless Network Selection withCollaboration,” Proc. Seventh Int’l Conf. Mobile Systems, Applica-tions, and Services (MobiSys ’09), pp. 123-136, 2009.

[14] A. Mishra, M. Shin, and W. Arbaush, “Context Caching UsingNeighbor Graphs for Fast Handoffs in a Wireless Network,” Proc.IEEE INFOCOM, vol. 1, 2004.

[15] A. Balasubramanian, B.N. Levine, and A. Venkataramani,“Enhancing Interactive Web Applications in Hybrid Networks,”Proc. ACM MobiCom, pp. 70-80, 2008.

[16] M. Fiore and J. Barcelo-Ordinas, “Cooperative Download inUrban Vehicular Networks,” Proc. IEEE Sixth Int’l Conf. MobileAdhoc and Sensor Systems (MASS ’09), pp. 20-29, 2009.

[17] P. Chou, Y. Wu, and K. Jain, “Practical Network Coding,” Proc.Ann. Allerton Conf. Comm. Control and Computing, vol. 41, pp. 40-49, 2003.

[18] Akamai White Papers “Leveraging the Edge: Delivering Un-matched Performance for Large File Downloads,” http://www.akamai.com/dl/whitepapers/leveraging_edge_wp.pdf, 2012.

[19] W. Tan, W. Lau, O. Yue, and T. Hui, “Analytical Models andPerformance Evaluation of Drive-thru Internet Systems,” IEEEJ. Selected Areas in Comm., vol. 29, no. 1, pp. 207-222, Jan. 2011.

[20] V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, and L. Cottrell,“Pathchirp: Efficient Available Bandwidth Estimation for Net-work Paths,” Proc. Passive and Active Measurement Workshop, vol. 4,2003.

[21] S. Lee, U. Lee, K. Lee, and M. Gerla, “Content Distribution inVANETS Using Network Coding: The Effect of Disk I/O andProcessing O/H,” Proc. Fifth Ann. IEEE Comm. Soc. Conf. Sensor,Mesh and Ad Hoc Comm. and Networks (SECON ’08), pp. 117-125,2008.

[22] M. Li, Z. Yang, and W. Lou, “Codeon: Cooperative PopularContent Distribution for Vehicular Networks Using Symbol LevelNetwork Coding,” IEEE J. Selected Areas in Comm., vol. 29, no. 1,pp. 223-235, Jan. 2011.

[23] T.V. Project, “The Network Simulator - ns-2.” http://www.isi.edu.nsnam/ns/index.html, 2012.

[24] V. Naumov, R. Baumann, and T. Gross, “An Evaluation of Inter-Vehicle Ad Hoc Networks Based on Realistic Vehicular Traces,”Proc. MobiHoc, vol. 6, pp. 108-119, 2006.

[25] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “WebCaching and Zipf-Like Distributions: Evidence and Implications,”Proc. IEEE INFOCOM, vol. 1, pp. 126-134, 1999.

[26] X. Tang and S. Chanson, “Coordinated En-Route Web Caching,”IEEE Trans. Computers, vol. 51, no. 6, pp. 595-607, June 2002.

[27] J. Byers, M. Luby, and M. Mitzenmacher, “A Digital FountainApproach to Asynchronous Reliable Multicast,” IEEE J. SelectedAreas in Comm., vol. 20, no. 8, pp. 1528-1540, Oct. 2002.

Da Zhang received the BEng degree in compu-ter software engineering from Tianjin University,China, in 2008. Currently, he is working towardthe PhD degree in the School of ComputerEngineering at the Nanyang Technological Uni-versity (NTU), Singapore. His research interestsinclude vehicular networks, mobile computingand ad hoc network security. He is a member ofthe IEEE.

Chai Kiat Yeo received the BEng (Hons) andMSc degrees both in electrical engineering, in1987 and 1991, respectively, from the NationalUniversity of Singapore and the PhD degreefrom the School of Electrical and ElectronicsEngineering, Nanyang Technological University(NTU), Singapore, in 2007. She was a principalengineer with Singapore Technologies Electro-nics and Engineering Limited prior to joiningNTU in 1993. She was the deputy director of

Centre for Multimedia and Network Technology (CeMNet) in NanyangTechnological University, Singapore before her current appointment asthe associate chair (Academic) with the School of Computer Engineer-ing, NTU. Her current research interests include ad hoc and mobilenetworks, delay tolerant networks, overlay networks, speech processingand enhancement.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Date post:	09-Dec-2016
Category:	Documents
Upload:	chai
View:	215 times
Download:	1 times

Enabling Efficient WiFi-Based Vehicular Content Distribution

Documents