Enabling Efficient WiFi-Based VehicularContent Distribution
Da Zhang, Member, IEEE, and Chai Kiat Yeo
Abstract—For better road safety and driving experience, content distribution for vehicle users through roadside Access Points (APs)
becomes an important and promising complement to 3G and other cellular networks. In this paper, we introduce Cooperative Content
Distribution System for Vehicles (CCDSV) which operates upon a network of infrastructure APs to collaboratively distribute contents to
moving vehicles. CCDSV solves several important issues in a practical system, like the robustness to mobility prediction errors, limited
resources of APs and the shared content distribution. Our system organizes the cooperative APs into a novel structure, namely, the
contact map which is based on the vehicular contact patterns observed by APs. To fully utilize the wireless bandwidth provided by APs,
we propose a representative-based prefetching mechanism, in which a set of representative APs are carefully selected and then share
their prefetched data with others. The selection process explicitly takes into account the AP’s storage capacity, storage status, inter-
APs bandwidth and traffic loads on the backhaul links. We apply network coding in CCDSV to augment the distribution of shared
contents. The selection of shared contents to be prefetched on an AP is based on the storage status of neighboring APs in the contact
map in order to increase the information utility of each prefetched data piece. Through extensive simulations, CCDSV proves its
effectiveness in vehicular content distribution under various scenarios.
Index Terms—Content distribution, vehicular networks, roadside access points, prefetching
Ç
1 INTRODUCTION
CONTENT distribution to vehicular users through wirelessnetwork access is emerging as a necessity to facilitate
better road safety and enhance driving experience. Thetypes of contents can include electronic newspapers,advertisements, road-situation reports, maps with trafficstatistics, music or movie clips, etc. The cellular network(e.g., 3G), mainly adopted due to its ubiquitous availability,is experiencing an explosive growth in subscribers and indemands for multimedia contents, thus risking being pushedto its capacity limit [1]. Meanwhile, WiFi-based AccessPoints (APs), as promising complement and augmentationto the cellular networks, have shown their feasibility incontent distribution for vehicles [2] [3]. These APs can beRoadSide Units (RSUs) deployed intentionally by networkservice providers and government departments or Hot-spotsthat are installed in roadside shops or buildings andconfigured for public access. These APs are characterizedby short-range coverage (hundreds of meters), relativelycheap and easy deployment and high data access rate (atheoretically 600 Mbps data rate in the latest IEEE 802.11n).
A typical architecture of WiFi-based vehicular contentdistribution system, as illustrated in Fig. 1, is made up of anetwork of interconnected APs, which are geographicallydeployed near the roads, running the customized protocolsfor cooperation and also equipped with local storage. The
APs can communicate with each other through backhaullinks to the Internet or via high-speed LAN. Data-originservers are the content providers, providing vehicular userswith both the shared (popular) and also private contents.
However, such network access scheme poses manychallenges on the system design for effective contentdistribution to vehicles: 1) a single vehicle-AP contactduration is quite limited (typically tens of seconds) due tothe fast vehicle speed and the AP’s short coverage range,thus constraining the data transfer opportunities; 2) theresponse latency of remote data-origin server on theInternet can waste the valuable contact duration, especiallyfor the heavily loaded server and congested or long-delayroute; 3) the wireless bandwidth between the AP andvehicles can be bottlenecked by the AP’s backhaul path tothe server on the Internet, considering that most of theWiFi-based access protocols are in the order of 10 Mbpswhile a median bandwidth of 5 Mbps of wired bandwidthto residents is reported in [4].
To solve the above challenges in vehicular contentdistribution, several recent works [5], [6], [7], [8], [9]adopted prefetching technique that has a long history andis widely applied to computer architecture, CPU design andweb accessing, etc. In these works, the requested data isprefetched onto the APs ahead along the driving trajectoryof the requesting vehicle, which can then download theprefetched data with a high throughput when connection isestablished, without resorting to the remote server or beingbottlenecked by the AP’s backhaul link. Such apparentlysimple technique, however, requires careful design when itis applied onto large-scale and real systems, as described inthe following section.
1.1 Related Works and Motivation
In this section, we capture some important issues arisingfrom the design of a prefetching system for vehicular content
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013 479
. The authors are with the Centre for Multimedia and Network Technology(CeMNet), School of Computer Engineering, Nanyang TechnologicalUniversity, N4-B2c-06, Nanyang Ave, Singapore 639798.E-mail: [email protected], [email protected].
Manuscript received 2 Feb. 2012; revised 17 Apr. 2012; accepted 21 Apr.2012; published online 4 May 2012.Recommended for acceptance by J. Cao.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2012-02-0078.Digital Object Identifier no. 10.1109/TPDS.2012.142.
1045-9219/13/$31.00 � 2013 IEEE Published by the IEEE Computer Society
distribution and investigate for each issue the related works
and their limitations, which motivate our proposal.
1.1.1 Mobility Prediction and Prefetching
The content distribution systems based on prefetching need
the mobility prediction component to predict the vehiculartrajectory and the APs that will be connected by the vehicle
in the recent future [5], [6], [7], [8]. Then the systemprefetches the requested contents on these APs. Thus, the
prediction accuracy directly affects the system performance,
because false prediction would make the vehicle miss theprefetched data.
However, the accuracy of mobility prediction1 is difficult
to maintain at a high level and varies largely from scenario
to scenario [10], [11], especially under the highly dynamicvehicular environment. Also, the intelligent AP selection
[12], [13] which is based on the ever-changing metrics of
APs like signal strength or available wireless bandwidth,further aggravates such prediction accuracy when a vehicle
is exposed to multiple surrounding APs. To the best of ourknowledge, almost all the prior works design separately the
prefetching component and mobility prediction component,
without explicitly considering whether the prefetchingcomponent can still work properly and keep the system
performance at a reasonable level under low-accuracy
mobility prediction.All the works in [5], [6], [7], and [8] predict and prefetch
data on either only one AP that is most likely connected
next by the vehicle or a series of APs that are most likely
connected successively in the future. Hence, the systemperformance totally relies on the accuracy of mobility
prediction. In [14], to reduce the association time during
handoff between two APs, the system prefetches contextinformation onto all the possible APs that would be next
connected, thus increasing the robustness of systemperformance against prediction error. However, a small
piece of context information needs much less resource than
a large bundle of data in our content distribution system,where such strategy would inefficiently utilize the AP’s
storage as well as backhaul bandwidth due to the excessive
redundant prefetching.
1.1.2 Storage Resource on APs
Each AP needs a local storage to contain the prefetcheddata. Although large-volume and low-cost data storagehardware is commonly available today, the storage resourceon an AP can still be limited due to following reasons:
1. multimedia content with large size;2. prolonged dwelling time of prefetched data on
APs due to the delay-tolerant pattern of accessfrom vehicles;
3. increased data volume needed to be prefetched dueto the enhancement in the wireless transmissionrate; and
4. huge user demands over large and busy areas.
However, most of the related works [5], [7], [8], [15]assume unlimited storage resources. The literature [6] is oneof the few that take storage limitation into consideration. Ituses a centralized content manager that periodicallyexecutes the content distribution algorithm to decide whichAPs to cache which parts of the requested files whilesatisfying the storage constraint. However, the AP itselfdoes not have a local storage management component andpasses all the management to the central controller. Thedrawback lies in scalability as the burden is imposed on thecentral controller which has to frequently execute to adaptto the changing user-demands.
1.1.3 Shared Content Distribution
The vehicular download process usually continues acrossmultiple APs due to the high vehicular speed and small APcoverage and thus it is common to divide the original fileespecially large multimedia content, into multiple piecesand store these pieces along the specific APs ahead [5], [16].With the objective of effective usage of system resources likestorage and backhaul bandwidth of APs, we expect that thedata pieces stored on APs for one vehicle can incidentallybe useful and contribute to the download progress of asmany passing vehicles as possible, thus reducing the cost ofadditional data fetch. Such expectation becomes evenstronger under the fact that most vehicular users usuallyshare the same interest on specific types of contents, likeroad-situation reports, weather reports, advertisements,electronic newspapers, and popular web-sites, etc.
However, simple division of the original content is lesseffective at this point since it usually presents manyduplicate pieces among APs as the shared contents arebeing downloaded. For example, vehicles V1 and V2 hold thesame interest on one file and download the first half of it(which is divided into two pieces) from AP A and B,respectively. If V1 contacts B next in its route, the first halfcached for V2 onBwould be useless for V1 because it alreadyholds that part andB has to launch a new data fetch from theremote server. On the contrary, encoded contents can domuch better than the simple division. Among the encodingmethods, network-coding [17] has proven its benefits oninformation sharing in vehicular content distribution [8], [9].A brief introduction of practical network coding is given inSection 5. Through network coding, each coded pieceencodes the information of all pieces in a file and remainsuseful as long as it is independent of the pieces already
480 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
1. The purpose of mobility prediction is to predict vehicle-AP contacts.Thus, we use the vehicle-AP contact prediction and mobility predictioninterchangeably hereinafter.
Fig. 1. Architecture of WiFi-based vehicular content distribution.
collected by the vehicle. However, adopting network codingin a naive way can still introduce duplicate data piecesamong the APs along a vehicle’s route. An illustrativeexample of this claim is shown in Section 5.1 and Fig. 5.
2 SYSTEM OVERVIEW AND CONTRIBUTIONS
In order to address these important issues in Section 1.1, wepropose CCDSV, a prefetching system built upon coopera-tive APs for speeding up content distribution to vehicularusers and efficiently utilizing the resource of storage andbackhaul bandwidth on APs. In other words, CCDSVattempts to maximize the vehicular download performanceobtained from the infrastructure of APs while reducing asmuch as possible both the data and control trafficintroduced into the set of APs.
We use Fig. 2 to show a high level view of how theCCDSV works from AP A’s perspective. Here, AP Areceives a request for file, say M, from an associatedvehicle and then transmits data of M from either A’s localstorage or the remote server hosting M. At the same time, Aforecasts whether the download of M can be completedbefore the vehicle drives out of the coverage range. If not, Ais responsible to select and notify those APs which mayprobably be connected ahead by the requesting vehicle, ofprefetching (parts of) the uncompleted portion. Fig. 2 showsall the probable to-be-contacted APs which are 3 hopsahead (these APs are abbreviated as lookahead-APs) alongthe vehicle’s path, in a tree-structure with transitionprobabilities being marked on the edges. In general, Aneeds to consider prefetching on the APs within k-hops(k � 1) ahead, usually more than just one hop, in order togive these APs ahead enough time to complete the prefetchbefore the requesting vehicle’s arrival.
For the selection of prefetching APs, we consider twocases in Fig. 2: 1) A issues notification of prefetching toB;F; . . . and J which is the predicted most probablevehicle-AP contact sequence; 2) A gives notification ofprefetching to all the lookahead-APs to get the maximumcache hit probability. These two cases correspond to thetwo extremes of the spectrum and face disadvantages. Thefirst case costs the least resources but has the worstrobustness against mobility prediction error, which makesthe prefetched data useless and thus wastes the invested
resources (storage, backhaul bandwidth, etc.). The secondcase is suitable if the storage and backhaul bandwidth aresufficiently abundant, otherwise such excessive redundantprefetching may risk overwhelming these resources: theejected prefetched data for the other sessions may degradethe overall performance. Moreover, the overwhelmingtraffic introduced into APs can bring down the throughputof the users currently being served due to the sharedbackhaul link.
We propose representative-based prefetching approachthat strikes a good balance between the two extreme casesabove. This approach selects a subset of lookahead-APs asrepresentatives for prefetching while the others remainas “client APs.” Only representatives do prefetching andeach representative is responsible for a group (may also beempty) of clients. In the case that the requesting vehicleends up contacting a client AP, the requested data is fetchedfrom the corresponding representative rather than from thedata-origin server, based on the fact that communicatingwith a geographically near AP generally enjoys a higherthroughput than with a remote server [18].
During the system design, we seek to answer thefollowing questions:
. How to construct the set of lookahead-APs, alongwith the transition probabilities, from each AP’sperspective? (Section 3)
. How to manage the finite storage, i.e., replacingwhich set of objects for sufficient space? (Sec-tions 4.1.2 and 4.2)
. How to select the representative APs to optimize theoverall system performance? (Section 4.1)
. How much and which part of the requested file is tobe prefetched onto the selected representative APs?(Sections 4.1.3 and 5)
Our contributions in designing CCDSV are as follows:
. We define and construct contact map—an overlaystructure that is on top of the network of APs andencodes the vehicle-AP contact patterns (Section 3),in order to accurately predict the future contact APsand feed comprehensive information to the pre-fetching component.
. We propose a novel representative-based prefetchingstrategy in Section 4.1 in order to increase thestability of the system performance in the presenceof varied mobility prediction accuracy and at thesame time, control the overhead incurred.
. We devise a series of algorithms in Sections 4.1.2and 4.2 to manage the storage where the prefetchedcontents and cached contents coexist. The costs ofejecting the objects of both content types areformally defined, helping the algorithms minimizethe risk of overall performance degradation due tocontent ejection.
. While recognizing the benefits of network codingin the shared content distribution, we propose, inSection 5, selecting the prefetch contents based onthe metric, RankSum of the neighboring APs in thecontact map, in order to augment the informationutility under practical network coding.
ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 481
Fig. 2. High-level overview of CCDSV operations from AP A’sperspective (dashed curve is the predicted most probably vehicle-APcontact sequence B! F! J). Transition probability is marked besideeach edge.
. We implement CCDSV in NS-2 simulator andevaluate its performance under various aspects byextensive simulations in Section 6. The resultsprove the effectiveness of CCDSV for vehicularcontent distribution.
3 CONTACT MAP
In this section, we introduce the concept of contact map,
which is an overlay structure on top of the network of APs
and encodes the observed patterns of vehicle-AP contacts.
Contact map is used in predicting a vehicle’s potential
contact AP(s) ahead on the route and the respective
transition probabilities to them. The predicted APs and
probabilities then form the building blocks in the algorithm
for representative-AP selection. Furthermore, the encoded
contact patterns in the contact map are also necessary in the
process of selecting prefetch contents (Section 5.1).During the trip, a vehicle sequentially associates with a
series of road-side APs, forming a contact sequence � ¼ðapi1 ; api2 ; ::::::Þ. Such contact sequences are observed from
AP-side and are used to extract the contact map. In essence,
contact map can be modeled as a directed graph ~G ¼ ðV ;EÞ.The vertices, V ¼ fapiji ¼ 1; 2; . . . ; Ng, are APs in the
infrastructure and directed edge, ~eijð2 EÞ, between two
neighboring vertices corresponds to a transition of vehicle’s
contact from api to apj. Each edge~eij is also associated with a
setCij of contexts, called transition contexts which records the
context information captured with the transition api ! apj.Contact map is flexible and can include as transition
contexts, various types of knowledge like the previous
contact AP before transition api ! apj, driving trajectory, or
driver’s profile (e.g., vehicle ID, daily driving habits, etc.).
The last two contexts require additional devices such as
GPS device or navigation system and also poses issue on
privacy. We leave the study on them for our future work.
Thus, for the sake of availability and simplicity, we adopt
the previous contact AP before transition as the context.
Hence, the set of transition contexts Cij associated with
edge ~eij is a set of APs such that: Cij ¼ fapkjapk 2 V and
9 � ¼ ð. . . ; apk; api; apj; . . .Þg. Fig. 3 gives an example sce-
nario and its extracted contact map.Transition context can help to better predict the contact
transitions by differentiating vehicles rather than treating
them uniformly. Generally, the more knowledge about the
driving information as reflected in the transition contexts, the
more accurate is the transition probability prediction. Actually,
the context here, i.e., the previous contact AP, provides ahint on the driving direction.
3.1 Contact Map Construction
The contact map with transition contexts can be expressedas set fð~eij; CijÞjapi; apj 2 V ;~eij 2 Eg. We maintain thecontact map distributedly, with each AP, say api, storingits outgoing neighbors, NBþðiÞ ¼ fapkj~eik 2 Eg, along withthe corresponding transition contexts, Cik, in whichapk 2 NBþðiÞ. The union of Cik for all apk 2 NBþðiÞ is theset of i’s incoming neighbors NB�ðiÞ ¼ fapjj~eji 2 Eg. Werefer to these information, stored at each AP, as the localview. Local views of all the APs jointly constitute the wholecontact map.
In practice, we use a table with three columns,½next ap; previous ap; probability� to represent the localview. We assume a vehicle, V , sequentially contacts threeAPs, aps, api and apt. Through V ’s feedback, api canobserve the contact sequence ð. . . ; aps; api; apt; . . .Þ and thenupdate the corresponding row of its local view table with a2-tuple ðapt; apsÞ: update probability, if there exists a rowwith ðnext ap; previous apÞ ¼ ðapt; apsÞ; otherwise, add anew row. The corresponding probability of this row iscalculated as
Probðapi ! aptjapsÞ ¼Nðapi ! aptjapsÞNðaps ! apiÞ
; ð3:1Þ
where Nðapi ! aptjapsÞ is the number of times the tupleðapt; apsÞ has been observed (initially set to 1 if this tuple isfirst added) and Nðaps ! apiÞ is the sum of observationnumbers of all the rows with previous ap being aps.
In reality, the infrastructure APs can change dynamically[13] because new APs can be deployed and some existingAPs may disappear due to system failure or signalobstruction. To quickly adapt to such dynamic factors, weimprove the basic (3.1) by a time-window technique. Itsprinciple is to segment the whole observation period into aseries of smaller time windows and to place the highestemphasis on the most recent observation while graduallydecreasing the emphasis on the preceding ones. Thetransition probability during time window j is expressed as
Probðapi ! aptjaps; jÞ ¼Niðapi ! aptjaps; jÞ
NiðapsjjÞ; ð3:2Þ
where notation Nið�Þ is the same as that in (3.1), except it isobserved within time window j. Note that the index of timewindow starts from the most recent window. We write theabove probability as Pj
i ðs; tÞ for brevity. Now, (3.1) can beimproved and expressed as a weighted average of Pj
i ðs; tÞover a certain number (i.e., m) of recent time windows
Probðapi ! aptjapsÞ¼ wkPk
i ðs; tÞ þ wk�1Pk�1i ðs; tÞ þ � � � þ wk�mþ1P
k�mþ1i ðs; tÞ
¼Xk
j¼k�mþ1
wjPji ðs; tÞ;
ð3:3Þ
where k is the index of the most recent time window and wjis the weight of the jth window. Different weight-selection
482 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
Fig. 3. Contact map. (a) An example of road map and AP deployment(b) extracted contact map with transition contexts. As an example, APB’s outgoing edges are labeled with the respective transition contexts.
methods (e.g., linearly or exponentially decreasing weights)would discard the history data at different rate. Here, wesimply give the ith time window the weight of wi ¼i=Pm
j¼1 j, where i 2 ½1;m�.
3.2 Predicting Next-Hop Transitions andProbabilities
The prediction function at an AP (say api) returns a set ofpotential APs, RðiÞ, to be contacted next and the respectivetransition probabilities PiðjÞ for each apj 2 RðiÞ. Forvehicles which can provide the identity of the previouscontact AP, say aps, the predicting AP api can directly makethe prediction by querying its local view table with thekeyword aps to obtain the results list.
Otherwise, the prediction function of api will return allthe APs in NBþðiÞ (here RðiÞ ¼ NBþðiÞ) with transitionprobability for each one as
PiðxÞ ¼X
apy2NB�ðiÞPiðxjyÞ � Probðapy ! apiÞ;
where Probðapy ! apiÞ ¼Nðapy ! apiÞP
apj2NB�ðiÞNðapj ! apiÞ;
apx 2 NBþðiÞ:ð3:4Þ
Equation (3.4) is actually a total probability taking intoaccount all the observed previous contact APs.
3.3 Constructing Lookahead-APs
In order to give sufficient time for the APs ahead tocomplete the prefetching, it is sometimes not enoughto consider the prefetching on the next-contact APs but toinclude the APs further ahead. Take Fig. 2 for examplewhere we assume ðA;B;EÞ is a contact sequence of therequesting vehicle that is currently associated with A. Ifthe expected transition time between B and E, i.e., theexpected time between when a vehicle makes a hand-off toB and when then to E, is shorter than the time needed tocomplete prefetching at E, then such prefetching at E,instead of being delayed until B is contacted, needs to startnow through A’s notification. Based on this rule, Arecursively expands the set of lookahead-APs and thealgorithm is illustrated in Algorithm 1.
Algorithm 1: Construction of lookahead-APs at AP u
OutPut LLu and EEu.
LLu ¼ ;;EEu ¼ ;;foreach i in RðuÞ do
LLu ¼ LLu [ fig;EEu ¼ EEu [ fðu; i; PcðiÞÞg;SetUpLookaheadAPsði 0Þ;
//Definition of Function SetUpLookaheadAPs
SetUpLookaheadAPs(current ap,time) {
foreach j in Rðcurrent apÞ do
If PV ðjÞBWðjÞ � ��T ðcurrent ap; jÞ þ time then
Lu ¼ Lu [ fjg;Eu ¼ Eu [ fcurrent ap; j; Pcurrent apðjÞg;//recursive calling
SetUpLookaheadAPs (j, ��T ðcurrent ap; jÞþtime);
end ifend foreach
}
The algorithm is from the perspective of AP u. The
outputs are two sets both from u’s perspective: LLu denotes
the set of lookahead-APs, and EEu is the set of 3-tuples
ði; j; probÞ denoting the transition of AP i to j with
probability prob. Actually, LLu and EEu together constitute a
subgraph, rooting at u, of the contact map with edge
weights. PV ðjÞ is the data volume needed to prefetch at AP
j. BWðjÞ is the expected available down-link bandwidth of
AP j when downloading from the data-origin server. Thus
PV ðjÞ=BWðjÞ is the expected time for AP j to complete
prefetching. ��T ði; jÞ is the expected time of transition (i.e.,
hand-off) from AP i to j.
4 REPRESENTATIVE-BASED PREFETCHING
APPROACH
Representative-based AP selection aims to optimize the
selection of a set of prefetching APs such that the system can
provide requesting vehicles with maximum gain in down-
load performance. Essentially, our representative-based ap-
proach organizes the lookahead-APs into an overlay consisting of
clusters with representative APs as cluster heads and client APs as
cluster members, as shown in Fig. 4. Cluster structure allows
APs without prefetching the requested data to immediately
locate the “nearest”data source (i.e., corresponding cluster
heads) with constant query delay. This property is crucial
for high-speed vehicle users whose limited contact duration
with APs can thus be saved without long delay for querying
and locating the requested data.
4.1 Algorithms for Representative AP Selection
The frequently used notations and symbols are summarized
in Table 1.
4.1.1 Vehicular Download Volume and Performance
Gain
To quantify the performance gain and loss, we define the
metric vehicular download volume (VDV ), i.e the data volume
a vehicle can download from the AP. Then we define Dði; jÞas the VDV at AP i which fetches data from entity j (either
the remote data-origin server or a representative AP) to
satisfy the download request and express it as
ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 483
Fig. 4. Organizing the lookahead-APs into clusters with representativeAPs as cluster heads and client APs as cluster members.
Dði; jÞ ¼ min�Bbji; B
wi
�Tconi �Bw
i �RTTbij � IiðjÞ;
where IiðjÞ ¼1; j 6¼ i;0; j ¼ i;
� ð4:1Þ
where Bwi is the available wireless bandwidth of i and Tconi
is the expected connection duration of a vehicle with i. Bbji
and RTTbij are the corresponding bandwidth and roundtrip time achieved between j and i through i’s backhaullink, respectively. When j ¼ i, AP i has the requested dataand can directly reply to the request with locally storeddata, thus RTTbii ¼ 0 and Bb
ii ¼ Bwi . The right item of the
minus sign indicates there is no data downloaded duringsending request to, and waiting for reply from, entity j.
Let a requesting vehicle’s current contact AP be APc andthe set of APc’s lookahead-APs is Lc ¼ fAPk1
; APk2; . . . ;
APkng. A selection vector, U ¼ ½u1; u2; . . . ; un�, correspondsto APc’s decision that, among Lc which are selected asrepresentatives and which as clients as well as whichrepresentative a client is attached to. Let ui ¼ i if APi is arepresentative, and ui ¼ jðj 6¼ iÞ if APi is a client and selectsAPj as its representative. We define the expected VDV thatthe group of APs in Lc can provide to the requestingvehicle when it passes through Lc, as a function of theselection vector U:
DGðUÞ ¼XAPi2Lcui¼i
PcðiÞ �Dði; iÞ þXAPi2Lcui¼i
XAPj2Lcuj¼i
PcðjÞ �Dðj; iÞ;
ð4:2Þ
where PcðiÞ is the predicted probability of transiting fromAPc to APi and may need multiplying multiple one-hoptransition probabilities. For example, in Fig. 2, PAðF Þ ¼PAðBÞ � PBðF Þ ¼ 0:42.DGðUÞ measures the gross gain of prefetching based on
the selection vector U and do not consider the performanceloss due to possible cache replacement. The left item ofthe plus sign represents the expected gain of VDV when therequesting vehicle contacts a representative AP and directlydownload the data from it. The right item of the plus signrepresents the expected VDV gain when the vehicle ends upcontacting a client AP and requesting the data from thecorresponding representative.
4.1.2 Storage Management for Prefetched Contents
Owing to the finite storage capacity, prefetching for onevehicle can cause the replacement of some contents alreadyprefetched for other vehicles and thus risk degrading theoverall performance in terms of VDV. Thus in the following,
we describe the storage management algorithm for the
prefetched contents and quantify the loss in content
replacement for each AP.When facing insufficient storage space, the storage
management algorithm here tries to eject a set of prefetched
objects with minimum loss in terms of VDV. The
representative-based approach somewhat complicates the
calculation of loss in VDV. The VDV loss of ejecting a
prefetched object from an AP would occur if either that AP
or any of its clients ends up being contacted by the vehicle
that requests the ejected object and it equals the reduction of
VDV when the finally contacted AP refetches that object
from the data-origin server. Let set Ok ¼ fO1; O2; . . . ; Omgdenote the prefetched objects in APk and a corresponding
replacement vector V ¼ ½v1; v2; . . . ; vm� denote which objects
are to be replaced (vi ¼ 1 if Oi is replaced while vi ¼ 0 if Oi
is retained). Assume the size of Oeð2 OkÞ to be ejected is
SðOeÞ. We suppose the index of the AP which notifies APkto prefetch Oe is hkðOeÞ. Then the VDV loss as a result of
ejecting Oe from APk can be expressed as a function of the
size of that ejected object
DLkðSðOeÞÞ ¼XAPj2fMOe
k[APkg
PhkðOeÞðjÞ � ðDðj; kÞ �Dðj; originÞÞ: ð4:3Þ
where MOe
k is the set of client APs taking APk as
representative with respect to object Oe. This equation
clearly reflects that the loss of ejecting an object from APkwould also be possibly imposed on the other APs which
take APk as representative for that ejected object. The actual
loss depends on which AP among the set fMOe
k [APkg is
finally contacted by the requesting vehicle and the prob-
ability of contacting each AP is expressed as PhkðOeÞðjÞ.Our replacement algorithm is to find a V leaving
minimum VDV loss and enough space for the contents to
be prefetched. An optimization problem can be modeled as
minimize: DLkðPV ðkÞÞ ¼XOi2Ok
vi �DLkðSðOiÞÞ
subject to:Xmi¼1
viSðOiÞ � PV ðkÞ; vi 2 f0; 1g:
where PV ðkÞ is the prefetch volume, i.e., size of contents
needed to be prefetched at APk. This optimization problem
is equivalent to the classical knapsack problem and has
been proved to be NP-hard. Hence a heuristic is developed
to approximate the optimal V. We define the normalized
DL (NDL) which is the loss of creating one unit of free
space by ejecting Oi from APk as
NDLi ¼DLkðSðOiÞÞ
SðOiÞ: ð4:4Þ
The prefetched objects in storage are arranged in increasing
order of NDL. Replacement starts from the object with the
smallest NDL and continues until PV ðkÞ units of space are
freed up. Then the VDV loss at APk, DLkðPV ðkÞÞ, is the sum
of DL of each ejected object.
484 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
TABLE 1Frequently Used Notations and Symbols
4.1.3 Estimating the Prefetch Volume
Here, we describe how to determine the value for prefetch
volume (PV), i.e., the size of the content to be prefetched,
which is mentioned in the previous section. The prefetch
volume is upper bounded by the vehicle-AP contact capacity
(CC) which is the maximum data volume a vehicle in
driving can download from an AP during their connection.
Contact capacity varies based on many factors, like the
number of vehicles being served, vehicle speed, channel
quality, etc. [19]. We estimate the contact capacity based on
the cumulative distribution function (CDF) of the histori-
cally observed ones. Then the p percentile value is chosen as
the estimate. In the simulation, we set p as 85 percent which
proves to be a proper value in two aspects: avoid insufficient
prefetching; and excluding some outliers with rare large
values to save both storage and backhaul bandwidth.Since the representative AP is also responsible for a set of
client APs, it should choose the prefetch volume to be the
maximum among the estimated contact capacities of those
client APs together with itself. Hence, for a certain selection
vector U, APi needs to prefetch
PV ðiÞ ¼ maxuj¼i
CCðjÞ: ð4:5Þ
4.1.4 Optimizing the Overall Prefetching Gain
Now we can formulate the objective of maximizing the
overall vehicular download volume into an optimization
problem
maximize DGðUÞ �Xui¼i
DLiðPV ðiÞÞ ð4:6Þ
subject to ui 2 ½1; n�8i : ui ¼ uui
ð4:7Þ
ui 6¼ i; if BTLi > threshold;
ui ¼ i; if APi has the requested content;
PV ðiÞ ¼ maxuj¼i
CCðjÞ;ð4:8Þ
where DGðUÞ expressed in (4.2) is not extended here forbrevity. Equation (4.6) expresses the net gain of VDV,excluding the possible loss due to cache replacement.Equation (4.7) is used for limiting the cluster radius (inFig. 4) to one-hop. Equation (4.8) indicates APi would not beselected for prefetching if its backhaul link is heavilyloaded. Backhaul Traffic Load (BTL) of APi is defined as theratio of total throughput of APi’s current flows to itsbackhaul link capacity. Total throughput of current flowscan be calculated by monitoring the packet arrival rate atthe application layer; while backhaul link capacity can bemeasured using available tools like [20].
To find the optimal selection of prefetching APs, the
algorithm needs to compute the net VDV gain for each
possible combination of variables in vector U. If variable uiis binary (e.g., APi is either selected as representative or
not), the search space is already in the order of Oð2nÞ. In
reality ui 2 ½1; n�, the search space would be much larger
than the binary case as n increases. Therefore, we propose a
heuristic algorithm, shown in Algorithm 2, to efficientlyaddress the optimal prefetching AP selection.
Algorithm 2. Representative-AP Selection by APcOutput: selection vector U ¼ ½ui�1�n;
prefetch volume vector Y ¼ ½yi�1�n�1. foreach ui 2 U do /* Initialize U */2. if APi has the requested content then
3. ui i //be fixed as representative;
4. else if BTLi � threshold then
5. ui �1 //indetermined but forbidden to be
representative;
6. else
7. ui 0 //indetermined;
8. foreach yi 2 Y do /* Initialize Y */9. yi ¼ PV ðiÞ;
10. Define Matrix G ¼ ½gij�n�n;
11. for i ¼ 1 to n do /* Initialize G /*
12. for j ¼ 1 to n do
13. if i ¼¼ j then
14. gii PcðiÞ � ½Dði; iÞ �DLiðPV ðiÞÞ�;15. else
16. t ¼ maxðPV ðiÞ; PV ðjÞÞ17. gij PcðjÞ � ½Dðj; iÞ �DLiðtÞ�;18. while ð9i 2 ½1; n� : ui � 0Þ do /* looping until
all in U are determined */
19. ðr; sÞ argmaxði;jÞðgii þ gijÞ20. if ur ¼¼ �1 or ður 6¼ 0 and ur 6¼ rÞ then
21. Continue
22. ur r, us r;
23. yr maxðyr; PV ðsÞÞ, ys 0;24. for j ¼ 1 to n do
25. if yr � PV ðjÞ and j 6¼ s then
26. grj PcðjÞ�Dðj; rÞ;27. else if yr < PV ðjÞ and j 6¼ s then
28. grj PcðjÞ � ½Dðj; rÞ þDLrðyrÞ�DLrðPV ðjÞÞ�;
29. Return U and Y
A matrix G ¼ ½gij� is defined and gij records the VDV atAPj which is a client and takes APi as representative. Onthe main diagonal of G, gii records the VDV at APi whichis a representative itself. Our heuristic algorithm isessentially greedy and attempts to select, at each iterationfrom lines 18 to 28, a pair of APs of representative-clientrelationship with maximum sum of VDV gain (gii þ gij inline 19). Note that it is possible that the two APs in theselected pair are the same, and in such case, that AP is arepresentative with no client selected. When pair ðr; sÞ isselected, we need to update (Lines 24 to 28) the elements inrth row of G due to the correlation among them. Forexample, if APs selects APr as representative, the cost(VDV loss) of another AP APj selecting APr as representa-tive would be reduced accordingly (lines 26 and 28)because this cost is already (partially) included into thepair ðr; sÞ. The algorithm would terminate when all ui in U
are determined. Then the selection vector U and prefetchvolume vector Y are returned as results. The heuristicalgorithm has a complexity of Oðn2Þ which is much moreefficient than the optimal algorithm. In Section 6, we will
ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 485
show that the heuristic algorithm can achieve near-optimalperformance.
4.2 More on Storage Management: Prefetching andCaching
Other than prefetching the contents in advance for therequesting vehicle, each AP also caches the contents thathave been accessed historically by vehicles, in order tospeed up future access. The reason for caching is based onthe fact that specific contents historically accessed throughan AP reflect their local popularity among the vehicleswhich passed that AP. In this section, we describe how to
manage the storage where the prefetched contents and cached
contents exist simultaneously. Prefetched contents are the onesstored for potential access and yet to be accessed so far. Theywill change to cached contents once they are accessed byvehicles which pass the hosting AP. Owing to the differentproperty of these two types of contents, it is difficult tomanage them as a whole. Thus in our system, we managethe cached and prefetched contents separately, i.e., newlyprefetched (cached) contents can only replace prefetched(cached) ones when there is not enough storage space.
Cached objects are further categorized into two types—active and idle. An AP, being notified of prefetching, doesnot need to do so if the intended object has already beencached and what it only needs is to set that object to“active.” The other cached objects not set as active areconsidered as “idle.” Cached objects being active wouldrevert to idle when the requesting vehicle drives away fromthe caching AP and its clients.
Assume the APk has a set of cached objects Ck ¼ fC1;
C2; . . . ; Cng. For each cached object, we define two rates:local access rate, rl and activation rate, ra. Local access ratereflects the relative popularity of an object among thelocally cached ones and is expressed as rli ¼ fli=ðP
j2½1;n� fljÞ, where fli denotes the number of local accessto object Ci during a recent time period T . Activation rate isdefined as rai ¼ fai=ð
Pj2½1;n� fajÞ, where fai means the
number of times object Ci is being activated (from idle toactive) by other APs during T . High activation rate of anobject indicates the caching AP is ideal to be representativefor that object and also implicitly reflects the object’spopularity in this neighborhood.
Ejecting a cached object to be accessed locally by anassociated vehicle can cause VDV loss; while ejecting anobject to be activated can lead to additional cost in thereprefetching process although it will not cause VDV loss.We measure the reprefetching cost as normalized prefetch
latency (NPL) which for an object Ci on APk is defined as
NPLi ¼SðCiÞ=Bb
sk
Twaitik
; ð4:9Þ
where Bbsk is the achievable bandwidth for APk fetching
data from the data-origin server. The normalizationconstant Twaitik is the expected waiting time from when Cion APk is activated until Ci ends up being accessed (by therequesting vehicle or a client AP). Normalized prefetchlatency can be perceived as measuring how quick the datato be prefetched can be prepared before the access.
For ejecting an idle cached object, we combine the VDV
loss and NPL into a single function to define the total cost
Cost�Cidlei
�¼ rli � ð1�Dðk; originÞÞ þ pc � rai �NPLi;
where origin is the data-origin server from which APkfetches Ci and pc ð0 < pc < 1Þ is a penalty coefficient to give
a lighter weight to the cost measured by NPL, because
VDV loss is much more directly detrimental to our objective
of maximizing the vehicle users satisfaction.For active cached objects, we should also include (4.4) to
reflect the additional loss from the client APs which take
APk as representative
CostðCactivei Þ ¼ NDLi þ rli � ð1�Dðk; originÞÞ
þ pc � rai �NPLi:
Similar to prefetched object replacement, we arrange the
cached objects in increasing order of their normalized cost,
CostðCiÞ=SðCiÞ and continue replacing the objects from the
one with the smallest normalized cost until enough units of
space are freed up.
5 APPLYING NETWORK CODING IN CCDSV
As discussed in Section 1.1.3, network coding helps increase
the information utility of the stored contents on APs, i.e., the
higher the information utility of a piece of data, the more
vehicles there are that find it useful and contribute to their
download process. However, adopting network coding in a
naive way can still introduce duplicate data pieces among
the APs along vehicle’s route thus reducing the information
utility and effective resource usage. Before showing an
illustrative example and our solutions, we first briefly
describe the implementation of network coding in CCDSV.Implementation. Chou et al. [17] propose a practical
network coding method which bridges the gap between
initial theoretical works and implementation in realistic
system. We follow the proposed concepts and methods in
[17] in our work. CCDSV does encoding at piece-level and
the data transmissions between the data-origin server and
AP as well as among APs are all in units of pieces. The data-
origin server divides the original file into N generations,
each of which is further divided into M pieces (i.e.,
generation size). A piece can consist of one or multiple
packet(s). By network coding, the server generates an
encoded piece by linearly combining the set of pieces in
one generation with random coefficients: p0
ij ¼PM
j¼1 cjpij,
where pij is the jth piece in ith generation. For decoding
purpose, encoding vector ½c1; c2; . . . ; cM � needs to be
transmitted together. The benefits of piece-level network
coding lie in that both computation complexity and
communication overheads can be flexibly kept at a reason-
able level for different file sizes, by tuning the piece size and
the generation size [21]. After collecting M independent
encoded pieces of a generation together with the corre-
sponding encoding vectors, the vehicle can recover the
original contents of that generation by solving a set of linear
equations. The entire original file can be recovered after
collecting all the generations.
486 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
5.1 Network Coding-Based Prefetch ContentSelection
In this section, we deal with the question regarding whichparts of the requested file, encoded under network coding,are prefetched onto the selected representative APs. Sincethe encoding is done at piece level for each generation of afile, the APs prefetch data in unit of pieces. Randomcombination of pieces in network coding makes no piece inthe same generation special or indispensable. The vehiclecan recover the original generation as long as enoughindependent pieces are collected. Thus, we need not toconsider which specific pieces of one generation areprefetched on the APs but only need to determine howmany pieces from each generation are prefetched, under theconstraint of total prefetch volume defined in (4.5).
Actually, for the vehicle which triggers the prefetching,the content selection is simple: prefetching the pieces thatare disjoint with the ones the requesting vehicle hasalready collected. Note that two sets of pieces are “disjoint”if they are either independent pieces in the same genera-tion or belonging to different generations. However, weexpect the prefetched data can also incidentally satisfy thedemands of other vehicles passing by later, in order toincrease the information utility of each prefetched contentand thus save on the system resources. Although networkcoding potentially helps in this point, we still need todesign a systematic approach to realize such potential asfully as possible.
We first describe how to represent each AP’s storagestatus, i.e., how much data is in storage for each generation.One accurate method is to use the matrix of encodingvectors of the stored pieces. However, examining andoperating the matrix is time-consuming, and exchanging thematrix with other APs is bandwidth-consuming [22]. Thuswe use a mathematical property of matrix, rank, to tradeoffsome accuracy for a quick and succinct reflection of thestorage status. Rank indicates the number of independentpieces currently stored for a specific generation. Now thestorage status is defined as: Riðf; kÞ which denotes the rankof the pieces of file f’s kth generation stored on APi. Notethat storage status with Riðf; kÞ ¼ GSðf; kÞ, where GSð�Þ isthe generation size of file f’s kth generation, completelyrepresents the information necessary for recovering theoriginal contents of f’s kth generation.
The example in Fig. 5 illustrates the impact of prefetchcontents selection. Assume a file f has two generations G1
and G2, each with generation size GSðf; 1Þ ¼ GSðf; 2Þ ¼ 6.Now, APB is asked to prefetch three pieces of f for a vehicleV , which has already collected three independent pieces ofG1 in f . The storage status is marked next to each AP.
Assume all the pieces stored on A;C;D;E are independentof each other. Now we consider two casual selections: (I) Bprefetches three pieces from G1, i.e., RBðf; 1Þ ¼ 3; (II) Bprefetches three pieces from G2, i.e., RBðf; 2Þ ¼ 3. Both ofthem are suitable for V , because they can provide it disjointpieces. However, when considering another two AP-contactsequences �1 ¼ fA;B;Cg and �2 ¼ fD;E;Bg, we notice thesecasual selections are less effective in terms of informationutility. Selection (I) would generate RBðf; 1Þ þRAðf; 1Þ þRCðf; 1Þ �GSðf; 1Þ ¼ 2 duplicate pieces ofG1 on �1. Thus thevehicles following �1 would find two pieces on downstreamAP C useless, since A and B already provide all the dataneeded to recoverG1. Similarly, selection (II) would generateRBðf; 2Þ þ RDðf; 2Þ þ REðf; 2Þ � GSðf; 2Þ ¼ 1 duplicatepiece on �2. The optimal selection for B is to prefetch 1piece from G1 and two pieces from G2. Such selectiongenerates no duplicate pieces on both �1 and �2 whilesatisfying the requirement on the prefetch volume.
To optimize the information utility of the prefetchcontents, the prefetching AP should factor in all the contactsequences passing it, the occurring probability of eachsequence and the storage status of each AP on each sequence.Obviously, such process is computation-consuming. Thus,we devise a lightweight heuristic algorithm based only onthe storage status of the neighboring APs in the contact map.For an AP apx, we define RankSum
RSðx; f; kÞ ¼X
api2NBðxÞRiðf; kÞ; ð5:1Þ
to reflect the overall storage status among apx’s neighbor-hood pertaining to storage of file f’s kth generation. Here,NBð�Þ denotes the set of neighbors in the contact map. Wearrange the generations, which the requesting vehicle hasnot completed, in increasing order of RS. Then theprefetching starts from the generation with the smallestRS and continues until the required prefetch volume hasbeen prefetched. Noting that the number of prefetchedpieces for each generation is no more than that generation’ssize. Intuitively, prefetching pieces from the generationwith smaller RankSum would result in a lower probabilitythat these prefetched pieces duplicate with other piecesstored in the neighborhood.
6 PERFORMANCE EVALUATION
In this section, we implement CCDSV in NS-2.34 [23] andevaluate the performance under two typical realisticvehicular scenarios. We compare CCDSV with otherexisting systems from various aspects and also evaluateCCDSV under several alternative system design options.
6.1 Methodology
We feed NS-2.34 with realistic vehicular traces, which isgenerated by multiagent microscopic traffic simulator(MMTS) [24] developed at ETH Zurich. This trafficsimulator is able to simulate public and private traffic overreal regional road maps of Switzerland with a high level ofrealism. The entire set of traces include around 260,000vehicles for a period of 24 hours in an area of 250 km by260 km. For manageable simulation time, we highlight two
ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 487
Fig. 5. Prefetch content selection.
smaller areas in Zurich—Oberstrass and Zentrum. We takeOberstrass (4;000 m� 7;000 m) as a dense scenario with 520vehicles moving in it and 32 access points deployed alongthe roads and take Zentrum (4;685 m� 4;010 m) as a sparsescenario with 293 vehicles and 29 APs. These two scenariosare depicted in Figs. 6 and 7 where the curves indicate thetrajectories of vehicles and black squares are the deployedAPs. Transmission range of each AP is about 230 meters andtransmission bit rate is set at 11Mbps [3]. The bandwidth ofbackhaul link from AP to Internet (data-origin server) is3 Mbps. Throughout the simulation, the data transmissionbetween AP and vehicle is based on TCP. When the vehiclereceives enough encoded packets to recover the original file,it immediately schedules a file-acknowledgment destined tothe transmitting AP, which will stop the transmission afterreceiving that acknowledgment.
The contents on the data-origin server are requestedbased on Zipf-like distribution, which is usually used toreflect the Web contents popularity [25]. We arrange N(¼ 500 in our setting) files on the server in increasing orderof their popularities. The probability of the ith file beingrequested follows P ði; �Þ ¼ ð1=i�Þ=ð
PNk¼1 1=k�Þ, where � is a
parameter characterizing the skewness of the Zipf-likedistribution. In our setting, we choose � as 0.8 according tothe results in [25]. Each file’s size (in MBytes) is chosenrandomly from the set of f4; 5; 6; 7; 8g. We fix the generationsize and generation number of each file to 10 and 15,respectively. The piece size would scale accordingly withthe file size. The encoding coefficients are randomly chosenfrom the field, Galois Field(28). A vehicle requests a new fileonly when the previous one is completely satisfied. Similarto the study in [26], we describe the storage capacity of eachAP in terms of relative storage size which is defined as theratio of storage size to the total size of all files requested bythe vehicle users. The default relative storage size in oursimulation is set as 5 percent which is assumed to be thesame for all APs.
We are mainly interested in evaluating the following twometrics:
. Download rate. The average download throughput (inKbps) perceived by vehicular users driving acrossthe deployed APs.
. Backhaul traffic. The average data amount per second(in Kbps) flowing into each AP through backhaullink during the simulation. Backhaul traffic intro-duced by prefetching service is expected to be as low
as possible to avoid overloading the APs and thesaved bandwidth can be exploited to include moredata services.
In addition to the performance evaluations in thefollowing sections, we also conduct another set of simula-tions to evaluate the storage usage and impact of storagesizes under various algorithms in Appendices A and B,which can be found on the Computer Society DigitalLibrary at http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.142, respectively,.
6.2 Performance Comparison
We conduct performance comparison of the proposedCCDSV with other existing systems from two aspects:prefetching-AP selection and encoding method.
6.2.1 Prefetching AP Selection
We compare our representative-based AP selection methodwith the other three methods:
. basic—this performs no prefetching and serves as abaseline case,
. most-likely—used in [5], [7], and [8] with prefetch-ing at the APs along the most likely drivingtrajectory, among the lookahead-APs, and
. on-all—used in [14] with prefetching on all thelookahead-APs.
We implement the representative-based method in bothheuristic and optimal algorithms:
. rep-heu—refer to Algorithm 2 and
. rep-opt—refer to (4.6); this serves as the upperbound for performance of rep-heu.
Figs. 8 and 9 show the comparison results of downloadrate and backhaul traffic under different active vehicularusers in both Oberstrass and Zentrum scenarios. Activevehicular users are defined as the percentage of vehiclesthat have content requests. Higher number of activevehicular users indicates a higher level of workload overthe system.
In the figures for download rate, we can see that rep-heuhas a performance quite close to the optimal algorithm andthis proves the effectiveness of the proposed heuristicalgorithm. Both rep-heu and rep-opt have higher downloadrate than the other three methods, under all numbers ofactive vehicular users. As the active users increases, rep-heuand rep-opt also have much slower decreasing rate indownload throughput than that of most-likely and on-all.This is because our representative-based method success-fully prevents the frequent replacement of prefetched-objects from nullifying the benefits of prefetching, byselecting as representatives the APs with small objects
488 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
Fig. 6. Oberstrass area with 32 APs and 520 vehicles.
Fig. 7. Zentrum area with 29 APs and 293 vehicles.
replacement loss and light traffic loads. The download rateof on-all is the most sensitive to the amount of active users,and drops fast when the later increases.
Then we look at the figures for introduced backhaultraffic. As active vehicular users grow, the traffic injected intothe APs through backhaul link also increases in all themethods. The representative-based methods (rep-heu andrep-opt) have a graceful and slow increasing rate, with lessthan 30 percent load on backhaul link even under 100 percentactive users; while on-all has the fastest increasing rate andcosts near 50 percent backhaul bandwidth resource. Whenthe system is with small workloads (with active usersbetween 10%-30%), introduced backhaul traffic of represen-tative-based methods are similar to that of on-all, since theformer tend to select all the potential APs as representativeswhen system resource is abundant. When active usersincrease to above 30 percent in our case, the selectionbecomes more and more prudent, resulting in remarkablesaved backhaul traffic.
6.2.2 Encoding Methods
In Section 5, we describe the benefits of network coding inour content distribution system and also discuss its potentialadvantages over erasure coding, which is a type of sourcecoding. Now, we quantitatively show the benefits ofnetwork coding in both download rate and backhaul traffic.
We compare our network coding with the other twomethods.
. Chunk-based (used in [5]): this method simplydivides the file into chunks and keeps the originaldata form without any coding. The prefetching AP
will prefetch one or multiple chunks. In the simula-tion, we assume each file is divided into 10 chunkswhich equals the generation size in network coding.
. Erasure-coding (used in [6]): in the simulation, wedivide each file into 10 chunks (the same as chunk-based and network-coding) and set the redundancyfactor as 2, i.e., any 10 out of 20 encoded chunks canrecover the original file. The brief description oferasure coding as well as the comparison betweennetwork coding and erasure coding are elaborated inAppendix C, which is available in the onlinesupplemental material.
Figs. 10 and 11 show the comparison results. In Fig. 10,for both areas, network-coding has the highest download ratewhile chunk-based is the worst. Owing to the dramaticallyreduced duplicate data pieces on APs, coding (erasure ornetwork) would generate much higher download rate thanthat of noncoding method (chunk-based), especially underlarge percentage of active vehicular users. When active
ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 489
Fig. 9. Zentrum area: download rate and backhaul traffic under different selection of prefetching APs.
Fig. 10. Download rate in Oberstrass and Zentrum Area, under differentcoding methods.
Fig. 8. Oberstrass area: download rate and backhaul traffic under different selection of prefetching APs.
users grow up to 70 percent, download rate in network-coding is 10.5 percent higher than that in erasure-coding and43 percent higher than that in chunk-based.
Another direct benefit of network-coding is reducing thebackhaul traffic introduced, through increasing the utility ofeach stored piece of data. Fig. 11 shows that network-codingcan reduce the backhaul traffic, on average, by 40 percentcompared to chunk-based and by 12 percent compared toerasure-coding.
6.3 Impact of Mobility Prediction Accuracy
To prove our claim that representative-based method inCCDSV is robust to the mobility prediction accuracy andcan maintain stable system performance, we evaluate thesystem performance under different accuracies. However,such accuracy is not controllable in our system. Thus wedesign a tunable predictor (T-predictor) that can determi-nately set the prediction accuracy by a parameter p which isdefined as the fraction of predictions that correctly identifythe next-contact AP. T-predictor makes prediction based onthe generated vehicle-AP contact trace, instead of thelearning process. Hence T-predictor exactly knows avehicle’s 1) potential next-contact APs, � ¼ fap1; ap2; . . . ;apng, arranged in decreasing order of transition probability;and 2) actual next-contact AP, apt 2 �.
For the sake of brevity, we plot the download rate andbackhaul traffic only in the area of Oberstrass with50 percent active vehicular users. As a comparison withrepresentative-based method, we also plot for on-all andmost-likely methods. The results are shown in Fig. 12. Wecan observe that, as prediction accuracy grows, thedownload rate of most-likely shows obvious increase,matching the expectation that the performance dependslargely on the predictor’s accuracy. on-all does not changewith the accuracy, because it selects all the potential APsregardless of the prediction accuracy. Both rep-heu andrep-opt maintain the download rate at a high level even atan accuracy of 10 percent. The download rate stablyincreases as prediction accuracy varies from 10 to100 percent, since the representative-based method alsoexplicitly factors the transition probabilities into theselection of representative APs.
When prediction accuracy increases up to 100 percent,the most-likely would always place the prefetched data atthe actual next-contact AP in advance. Even under suchideal case, representative-based method also gives a betterdownload rate than that of most-likely. This can beexplained as follows: the accurately predicted AP may
be a hot one responding busily to the large amount ofrequests of prefetching; thus the correctly placed data canreplace some prefetched objects, whose loss may cancelout some extent of performance gain. On the contrary,representative-based method, in the case of hot APs, canselect a lightly loaded AP nearby as representative andhelp hot AP prefetch as well as transmit data. The overallresults show that rep-based prefetching AP selection method
benefits from the improved prediction accuracy and is also
robust to its variation.In Fig. 12b, as prediction accuracy increases, the
introduced backhaul traffic in both rep-heu and rep-opt
decrease. This is because the set of representatives selectedwould shrink and concentrate on the APs with high-transition probability, as prediction accuracy increases.
6.4 Impact of Prefetch Contents Selection
Our rank-sum-based selection method tries to approach themaximal information utility of the prefetched contents. Oneof the main benefits is the reduced backhaul traffic, i.e.,reducing the backhaul bandwidth cost and load on thedata-origin servers. In this part, we compare the averagebackhaul traffic introduced by rank-sum with the other twoselection methods.
. Sequential. The prefetched pieces are selected fromincomplete generations sequentially.
. Random. At each time, it randomly selects anincomplete generation and prefetches pieces fromit; then the process continues until the amount ofprefetch volume is reached.
490 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013
Fig. 11. Backhaul traffic in Oberstrass and Zentrum Area, under differentcoding methods.
Fig. 12. Download rate and backhaul traffic in Oberstrass, under 50percent active vehicular users and different prediction accuracy.
For comprehensive comparison, we also measure andinclude into the results the case with only prefetching butno network-coding (no-coding). The results are shown inFigs. 14 and 13.
As Fig. 14 shows, rank-sum can reduce the backhaultraffic, on average, by around 50 percent compared to no-coding, under all three percentages of active users in bothareas. Under small percentage of active users (e.g.,30 percent), the reduced traffic of rank-sum compared tosequential and random is limited, being around tens of Kbps.This is because, when the amount of stored contents in thenetwork of APs is relatively small, the probability of existingduplicate pieces among nearby APs is low regardless of theselection method being used. However, as the percentageincreases, the reduced traffic due to the informed selectionin rank-sum begins to enlarge (up to 22 percent under70 percent active users), compared to the other two casualselection methods. We can also observe that random per-forms better than sequential selection, due to the introducedrandomness which potentially helps reduce the probabilityof a vehicle finding duplicate pieces among nearby APs.
In Fig. 13, we notice that rank-sum shows dramaticimprovement of download rate compared to the other threemethods, especially under high percentage of active users.This is mainly due to both the frequency and the loss ofreplacing prefetched objects on APs being lowered down asa result of reduced traffic loads and data amount neededfor prefetch.
6.5 Protocol Overheads and Effectiveness
Protocol overheads. Control messages of CCDSV are usedin many aspects: 1) a vehicle notifies the previously contactAP of the transition contexts; 2) when selecting representa-tive APs, an AP communicates with its lookahead-APs; and3) neighboring APs in the contact map periodicallyexchange storage status among one another. In this part,
we test the protocol overheads, i.e., how much controlmessages on average are introduced into each AP persecond during the simulation. Table 2 shows the results.The last row gives the percentage of control messagesagainst the average backhaul traffic. We can see that evenunder 70 percent data request percentage, the amount ofcontrol messages only accounts for 2.18 percent of theaverage backhaul traffic.
Protocol effectiveness. An indication of the protocoleffectiveness is whether the system introduces too muchuseless prefetched data which would not be retrieved atall. These useless part would be a burden on systemresources, like backhaul link bandwidth and storage space.Here, we define Utilization Rate to measure what percen-tage of the prefetched data on an AP is finally retrievedeither from vehicles or the client APs. The higher theutilization rate, the more effective the protocol is. Table 3shows the results. We can see that the utilization rateunder all cases are maintained at a high level (>70%); andwhen the data request percentage increases, the prefetcheddata would be better utilized.
7 CONCLUSION
In this paper, we propose CCDSV—a cooperative contentdistribution system for vehicles through infrastructure APs.CCDSV is designed to achieve efficient cooperation amongthe network of APs so that vehicular users can effectivelyutilize the opportunistically encountered and short-livedAP connections. A structure called contact map is main-tained distributedly on top of the APs, learning andpredicting the potential vehicle-AP contacts. With therepresentative-based scheme, CCDSV carefully selects theones from the predicted set of lookahead-APs to performprefetching in order to avoid overloading AP’s backhaullink and ejecting the prefetched/cached data whose costmay overwhelm the prefetching benefits. CCDSV distri-butes the contents encoded by network coding and tomaximize the information utility under network coding,CCDSV selects each piece of prefetched content based onthe storage status (reflected by the metric rank-sum) ofneighboring APs in the contact map. We summarize thesystem control flow of CCDSV in Appendix D, which isavailable in the online supplemental material. The simula-tion results under various scenarios prove the effectivenessof CCDSV in many performance aspects.
ZHANG AND KIAT YEO: ENABLING EFFICIENT WIFI-BASED VEHICULAR CONTENT DISTRIBUTION 491
Fig. 13. Backhaul traffic in Oberstrass and Zentrum Area, under differentcontent selection methods.
Fig. 14. Download rate in Oberstrass and Zentrum Area, under differentcontent selection methods.
TABLE 2Protocol Overheads in Oberstrass Area
TABLE 3Utilization Rate of Prefetched Data
REFERENCES
[1] The New York Times “Customers Angered as Iphones Overloadat&t,” http://www.nytimes.com/2009/09/03/technology/companies/ 03att.html, 2012.
[2] B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu,E. Shih, H. Balakrishnan, and S. Madden, “Cartel: A DistributedMobile Sensor Computing System,” Proc. Fourth Int’l Conf.Embedded Networked Sensor Systems (SenSys ’06), pp. 125-138, 2006.
[3] J. Eriksson, H. Balakrishnan, and S. Madden, “Cabernet: VehicularContent Delivery Using WiFi,” Proc. 14th ACM Int’l Conf. MobileComputing and Networking, pp. 199-210, 2008.
[4] M. Dischinger, A. Haeberlen, K. Gummadi, and S. Saroiu,“Characterizing Residential Broadband Networks,” Proc. ACMSIGCOMM Conf. Internet Measurement (IMC), pp. 24-26, 2007.
[5] P. Deshpande, A. Kashyap, C. Sung, and S. Das, “PredictiveMethods for Improved Vehicular WiFi Access,” Proc. Seventh Int’lConf. Mobile Systems, Applications, and Services (MobiSys), pp. 263-276, 2009.
[6] Y. Huang, Y. Gao, K. Nahrstedt, and W. He, “Optimizing FileRetrieval in Delay-Tolerant Content Distribution Community,”Proc. IEEE 29th Int’l Conf. Distributed Computing Systems (ICDCS),pp. 308-316, 2009.
[7] B. Chen and M. Chan, “MobTorrent: A Framework for MobileInternet Access from Vehicles,” Proc. IEEE INFOCOM, pp. 1404-1412, 2009.
[8] U. Shevade, Y.-C. Chen, L. Qiu, Y. Zhang, V. Chandar, M.K. Han,H.H. Song, and Y. Seung, “Enabling High-Bandwidth VehicularContent Distribution,” Proc. ACM Int’l Conf. Emerging NetworkingEXperiments and Technologies (CoNEXT), pp. 23:1-23:12, 2010.
[9] D. Zhang and C.K. Yeo, “A Cooperative Content DistributionSystem for Vehicles,” Proc. IEEE Global Telecomm. Conf.(Globecom ’11), 2011.
[10] A.J. Nicholson and B.D. Noble, “Breadcrumbs: Forecasting MobileConnectivity,” Proc. ACM MobiCom, pp. 46-57, 2008.
[11] L. Song, D. Kotz, R. Jain, and X. He, “Evaluating LocationPredictors with Extensive Wi-Fi Mobility Data,” Proc. IEEEINFOCOM, vol. 2, pp. 1414-1424, 2004.
[12] A.J. Nicholson, Y. Chawathe, M.Y. Chen, B.D. Noble, and D.Wetherall, “Improved Access Point Selection,” Proc. Fourth Int’lConf. Mobile Systems, Applications and Services (MobiSys ’06),pp. 233-245, 2006.
[13] J. Pang, B. Greenstein, M. Kaminsky, D. McCoy, and S. Seshan,“Wifi-Reports: Improving Wireless Network Selection withCollaboration,” Proc. Seventh Int’l Conf. Mobile Systems, Applica-tions, and Services (MobiSys ’09), pp. 123-136, 2009.
[14] A. Mishra, M. Shin, and W. Arbaush, “Context Caching UsingNeighbor Graphs for Fast Handoffs in a Wireless Network,” Proc.IEEE INFOCOM, vol. 1, 2004.
[15] A. Balasubramanian, B.N. Levine, and A. Venkataramani,“Enhancing Interactive Web Applications in Hybrid Networks,”Proc. ACM MobiCom, pp. 70-80, 2008.
[16] M. Fiore and J. Barcelo-Ordinas, “Cooperative Download inUrban Vehicular Networks,” Proc. IEEE Sixth Int’l Conf. MobileAdhoc and Sensor Systems (MASS ’09), pp. 20-29, 2009.
[17] P. Chou, Y. Wu, and K. Jain, “Practical Network Coding,” Proc.Ann. Allerton Conf. Comm. Control and Computing, vol. 41, pp. 40-49, 2003.
[18] Akamai White Papers “Leveraging the Edge: Delivering Un-matched Performance for Large File Downloads,” http://www.akamai.com/dl/whitepapers/leveraging_edge_wp.pdf, 2012.
[19] W. Tan, W. Lau, O. Yue, and T. Hui, “Analytical Models andPerformance Evaluation of Drive-thru Internet Systems,” IEEEJ. Selected Areas in Comm., vol. 29, no. 1, pp. 207-222, Jan. 2011.
[20] V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, and L. Cottrell,“Pathchirp: Efficient Available Bandwidth Estimation for Net-work Paths,” Proc. Passive and Active Measurement Workshop, vol. 4,2003.
[21] S. Lee, U. Lee, K. Lee, and M. Gerla, “Content Distribution inVANETS Using Network Coding: The Effect of Disk I/O andProcessing O/H,” Proc. Fifth Ann. IEEE Comm. Soc. Conf. Sensor,Mesh and Ad Hoc Comm. and Networks (SECON ’08), pp. 117-125,2008.
[22] M. Li, Z. Yang, and W. Lou, “Codeon: Cooperative PopularContent Distribution for Vehicular Networks Using Symbol LevelNetwork Coding,” IEEE J. Selected Areas in Comm., vol. 29, no. 1,pp. 223-235, Jan. 2011.
[23] T.V. Project, “The Network Simulator - ns-2.” http://www.isi.edu.nsnam/ns/index.html, 2012.
[24] V. Naumov, R. Baumann, and T. Gross, “An Evaluation of Inter-Vehicle Ad Hoc Networks Based on Realistic Vehicular Traces,”Proc. MobiHoc, vol. 6, pp. 108-119, 2006.
[25] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, “WebCaching and Zipf-Like Distributions: Evidence and Implications,”Proc. IEEE INFOCOM, vol. 1, pp. 126-134, 1999.
[26] X. Tang and S. Chanson, “Coordinated En-Route Web Caching,”IEEE Trans. Computers, vol. 51, no. 6, pp. 595-607, June 2002.
[27] J. Byers, M. Luby, and M. Mitzenmacher, “A Digital FountainApproach to Asynchronous Reliable Multicast,” IEEE J. SelectedAreas in Comm., vol. 20, no. 8, pp. 1528-1540, Oct. 2002.
Da Zhang received the BEng degree in compu-ter software engineering from Tianjin University,China, in 2008. Currently, he is working towardthe PhD degree in the School of ComputerEngineering at the Nanyang Technological Uni-versity (NTU), Singapore. His research interestsinclude vehicular networks, mobile computingand ad hoc network security. He is a member ofthe IEEE.
Chai Kiat Yeo received the BEng (Hons) andMSc degrees both in electrical engineering, in1987 and 1991, respectively, from the NationalUniversity of Singapore and the PhD degreefrom the School of Electrical and ElectronicsEngineering, Nanyang Technological University(NTU), Singapore, in 2007. She was a principalengineer with Singapore Technologies Electro-nics and Engineering Limited prior to joiningNTU in 1993. She was the deputy director of
Centre for Multimedia and Network Technology (CeMNet) in NanyangTechnological University, Singapore before her current appointment asthe associate chair (Academic) with the School of Computer Engineer-ing, NTU. Her current research interests include ad hoc and mobilenetworks, delay tolerant networks, overlay networks, speech processingand enhancement.
. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.
492 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 3, MARCH 2013