+ All Categories
Home > Documents > Enhancing Security and Privacy in Traffic …gruteser/papers/hoh_traffic...Various entities can...

Enhancing Security and Privacy in Traffic …gruteser/papers/hoh_traffic...Various entities can...

Date post: 03-Apr-2018
Category:
Upload: vonguyet
View: 216 times
Download: 1 times
Share this document with a friend
9
38 PERVASIVE computing Published by the IEEE CS and IEEE ComSoc 1536-1268/06/$20.00 © 2006 IEEE Enhancing Security and Privacy in Traffic-Monitoring Systems I ntelligent transportation systems increas- ingly depend on probe vehicles to monitor traffic: they can automatically report posi- tion, travel time, traffic incidents, and road surface problems to a telematics service provider. This kind of traffic-monitoring system could provide good coverage and timely infor- mation on many more roadways than is possi- ble with a fixed infrastructure such as cameras and loop detectors. This approach also promises signif- icant reductions in infrastruc- ture cost because the system can exploit the sensing, com- puting, and communications devices already installed in many modern vehicles. Although these applications can improve travelers’ safety, optimize traffic, and provide a new revenue source for car man- ufacturers, they also raise questions about pri- vacy. Because users’ vehicles provide data sam- ples that include their current positions and user identities to location-monitoring services, the dri- vers can be tracked. Unfortunately, anonymous data collection doesn’t solve this privacy prob- lem. First, it conflicts with security—in particu- lar, data integrity—which requires user identifi- cation. Second, even if location samples are anonymous, users can be reidentified through data mining techniques (so-called inference attacks). The architecture we propose meets these pri- vacy and data integrity requirements. It addresses privacy by separating the communication and authentication tasks (which rely on pseudonyms or identities) from data analysis and sanitization (which require access to detailed position infor- mation). Because these functions share only well- defined messages, only anonymous position information is available at the traffic-monitoring service. We conducted a case study to evaluate how vul- nerable this information is to inference attacks, and found that clustering techniques can auto- matically identify many vehicles’ likely home lo- cations in a typical suburban scenario. This is grounds for concern, because attackers could link home locations to household names using geo- coded address databases to identify drivers. How- ever, the results also show that data suppression techniques such as reducing sampling frequency effectively lower such reidentification risks. Traffic-monitoring with probe vehicles Our traffic-monitoring application estimates travel time for different routes using real-time traffic flow information. It derives that infor- mation from probe vehicle speed on different road segments. Probe vehicles monitor the road environment through in-vehicle sensors. 1 Xiaowen Dai and her colleagues have deter- mined that we can derive useful traffic flow information if 5 percent of vehicles act as probes. 2 This approach promises reduced in- frastructure installation and maintenance costs while extending sensing coverage to less-trav- eled roadways. This architecture separates data from identities by splitting communication from data analysis. Data suppression techniques can help prevent data mining algorithms from reconstructing private information from anonymous database samples. INTELLIGENT TRANSPORTATION SYSTEMS Baik Hoh, Marco Gruteser, and Hui Xiong Rutgers University Ansaf Alrabady General Motors
Transcript

38 PERVASIVEcomputing Published by the IEEE CS and IEEE ComSoc ■ 1536-1268/06/$20.00 © 2006 IEEE

Enhancing Security and Privacy in Traffic-Monitoring Systems

Intelligent transportation systems increas-ingly depend on probe vehicles to monitortraffic: they can automatically report posi-tion, travel time, traffic incidents, and roadsurface problems to a telematics service

provider. This kind of traffic-monitoring systemcould provide good coverage and timely infor-mation on many more roadways than is possi-ble with a fixed infrastructure such as cameras

and loop detectors. Thisapproach also promises signif-icant reductions in infrastruc-ture cost because the systemcan exploit the sensing, com-puting, and communicationsdevices already installed in many modern vehicles.Although these applications

can improve travelers’ safety, optimize traffic,and provide a new revenue source for car man-ufacturers, they also raise questions about pri-vacy. Because users’ vehicles provide data sam-ples that include their current positions and useridentities to location-monitoring services, the dri-vers can be tracked. Unfortunately, anonymousdata collection doesn’t solve this privacy prob-lem. First, it conflicts with security—in particu-lar, data integrity—which requires user identifi-cation. Second, even if location samples areanonymous, users can be reidentified throughdata mining techniques (so-called inferenceattacks).

The architecture we propose meets these pri-vacy and data integrity requirements. It addresses

privacy by separating the communication andauthentication tasks (which rely on pseudonymsor identities) from data analysis and sanitization(which require access to detailed position infor-mation). Because these functions share only well-defined messages, only anonymous positioninformation is available at the traffic-monitoringservice.

We conducted a case study to evaluate how vul-nerable this information is to inference attacks,and found that clustering techniques can auto-matically identify many vehicles’ likely home lo-cations in a typical suburban scenario. This isgrounds for concern, because attackers could linkhome locations to household names using geo-coded address databases to identify drivers. How-ever, the results also show that data suppressiontechniques such as reducing sampling frequencyeffectively lower such reidentification risks.

Traffic-monitoring with probe vehiclesOur traffic-monitoring application estimates

travel time for different routes using real-timetraffic flow information. It derives that infor-mation from probe vehicle speed on differentroad segments. Probe vehicles monitor the roadenvironment through in-vehicle sensors.1

Xiaowen Dai and her colleagues have deter-mined that we can derive useful traffic flowinformation if 5 percent of vehicles act asprobes.2 This approach promises reduced in-frastructure installation and maintenance costswhile extending sensing coverage to less-trav-eled roadways.

This architecture separates data from identities by splittingcommunication from data analysis. Data suppression techniques canhelp prevent data mining algorithms from reconstructing privateinformation from anonymous database samples.

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

Baik Hoh, Marco Gruteser, and Hui XiongRutgers University

Ansaf AlrabadyGeneral Motors

Figure 1 shows a typical traffic-mon-itoring architecture comprising threeentities: probe vehicles, a communica-tion service provider (CSP), and a tele-matics service provider (TSP)—perhapsa subsidiary of a vehicle manufacturer.Telematics is the use of GPS technologyintegrated with computers and mobilecommunication technology in vehicles.The TSP connects vehicles to a mainserver, perhaps through base stationsleased from CSPs. Probe vehicles carryGPS receivers and communication infra-structure such as cellular links to peri-odically report data records (latitude,longitude, time, and speed parameters)to the traffic information system. Fromthis information, the system can estimatecurrent mean vehicle speed and then feedit into navigation systems or use it tobuild a real-time congestion map (forexample, by calculating a congestionindex). It can also use vehicle speed toestimate traffic density and volume usingGreenshields’ equation.3 The system canthen broadcast estimated traffic infor-mation to subscribers through a Webinterface, where drivers can access itthrough their navigation systems or fromhome or office computers.

Security and privacychallenges

The primary security and privacy chal-lenges that traffic-monitoring applica-tions face are to ensure the integrity ofdata samples containing speed and po-sition information and to maintain pri-vacy for the drivers who supply the sam-ples. (For more information, see the“Related Work in Security and Privacy”sidebar.)

Data integrityThe integrity of the computed con-

gestion index relies on genuine speedand position data from the probe vehi-

cles. Malfunctioning probes and mali-cious parties who modify sensor read-ings can affect data integrity. Althoughmalicious attacks on traffic monitoringmight sound far-fetched, they appearquite plausible if you consider the gray-market devices people now buy to re-duce travel time (such as infrared trans-mitters to change traffic lights). Thesenew devices might manipulate the con-gestion index to divert traffic away froma road to reduce a particular driver’stravel time or toward a particular road-way to increase revenue at a particularstore. Other service providers might alsotry to dilute the information quality ofa competing traffic-monitoring service.

Various entities can compromise dataintegrity:

• Compromised vehicles. Drivers orthird parties could modify the hard-ware or software to report incorrectvehicle positions or speed readings.(Such modifications have occurred inEuropean trucks’ tachographs, whichare supposed to record vehicles’ driv-ing times and speed to let authoritiescheck adherence to mandatory driverrest periods.4

• Impostor devices. A device couldspoof other authorized devices. Thiscompromise is of particular concernin the form of a Sybil attack,5 inwhich a device illegitimately claimsmultiple entities. Traffic-monitoring

accuracy will degrade more if manyvehicles simultaneously report incor-rect information.

• Network intermediaries. The trans-mission of vehicle data over wirelessand wired communication links en-ables intermediate network entities tomodify reports.

PrivacyProactively addressing privacy con-

cerns in the architecture increases thepotential for users to adopt the traffic-monitoring service and reduces the riskof public data-handling mishaps. Loca-tion information collected by probe vehi-cles raises privacy concerns because it’soften precise enough to pinpoint thebuildings that drivers visited, at least insuburban areas where each building hasits own parking lot. Reconstructing anindividual’s route could provide a de-tailed movement profile that allows sen-sitive inferences. For example, recurringvisits to a medical clinic could indicateillness; visits to activist organizationscould hint at political opinions. Whileeveryone’s location traces deserve pro-tection, those of political leaders, celebri-ties, or business leaders would likely un-dergo particular scrutiny. For example,frequent meetings between chief execu-tives might indicate a pending merger oracquisition—highly desirable informa-tion for competitors and stock marketspeculators.

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 39

GPS traces

GPStraces

Communicationservice provider

Telematicsservice provider

Bob

Alice

GPS

Road conditions

Colored congestionmap (0-10)

Figure 1. traffic-monitoring architecturecomprising three entities: probe vehicles,a communication service provider, and atelematics service provider.

Various entities can compromise privacy:

• Eavesdroppers. Unauthorized thirdparties could monitor network trans-missions for vehicle position readingsand unique identifiers that would letthem track vehicles. In particular, thirdparties could monitor wireless trans-missions around particularly sensitivelocations to record which vehiclesrecurrently visit this area. Networkidentifiers, such as the internationalmobile subscriber identifiers (IMSI) inthe GSM (Global System for MobileCommunications) cell phone system,help identify recurring visits.

• Spyware. People with access to theon-board vehicle system could installsoftware that directly reports vehiclepositions to unauthorized networkservers.

• Insiders. Privacy breaches throughinsiders are particularly insidious atthe traffic-monitoring server (we’lldescribe this in detail later), which

receives and stores reports from largenumbers of different vehicles. Al-though access control mechanismsprovide some protection, several indi-viduals, such as system administrators,typically have root access to the sys-tem. (On 8 April 2006, InformationWeek posted a chronology of databreaches reported since the Choice-Point incident. ChoicePoint, whichmaintains and sells background fileson every American adult by selectingfrom public and private records,reported on 15 February 2005 thattheir 145,000 customers were at riskfor identity theft. Most incidents weredue to current, authorized employeesin the victim companies.)

There’s tension between integrity andprivacy requirements. A true privacycompromise requires not only deter-mining the places visited but also identi-fying the vehicle and driver. Thus, pri-vacy can be significantly enhanced if thevehicles anonymously report position

and speed information. The system canbetter maintain integrity, however, if thevehicles identify themselves and theiridentities can be authenticated. Strongauthentication combined with an auth-ority that issues and registers vehicle id-entities can prevent a Sybil attack, per-haps the main concern with regard todata integrity.

Architecture for anonymousdata collection

To resolve the tension between dataintegrity and privacy, the architectureassigns the authentication and filteringfunctions and the actual data analysis toseparate entities. One entity knows thevehicle’s identity but can’t access preciseposition and speed information; theother entity knows position and speedbut not identity. The architecture alsorelies on encryption to prevent eaves-dropping, tamper-proof hardware toreduce the risk of node compromise andspyware installation, and data sanitiza-tion to further strengthen data integrity.

40 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

T o allow authentication of messages from vehicles while main-

taining a degree of anonymity in vehicular networks, Maxim

Raya and Jean-Pierre Hubaux propose frequently changing the sign-

ing keys.1 To implement this, you could preload each vehicle with

numerous anonymous public�private key pairs. Using asymmetric

keys can eliminate the key agreement step in this solution. But storing

keys in vehicles might allow a Sybil attack unless trusted computing

hardware protects the keys. To implement this, Bryan Parno and

Adrian Perrig propose installing reanonymizers (hardware that issues a

fresh ID in response to valid, temporary certificates) in stoplights or

tollbooths at regular intervals to refresh vehicles’ anonymous keys.2

These solutions are primarily intended for vehicle-to-vehicle commu-

nications. Our proposed architecture for vehicle-to-infrastructure

communication uses fewer keys and enables easier key revocation

when vehicles’ keys have been compromised.

Researchers have also addressed the question of balancing secu-

rity and privacy in application areas such as electronic cash and

electronic voting. Pioneered by David Chaum and Eugène Van

Heyst,3 many proposals rely on group signatures that can verify a

message sender’s group membership while maintaining the

sender’s anonymity. Dan Boneh and his colleagues reduced the

size of group signatures, potentially enabling their use in challeng-

ing wireless network environments.4 Although group signatures

provide a possible alternative solution, they also require a trust-

worthy third party to enable key revocation (to determine the

original signer) and lead to large message overhead of about 200

bytes per signature. We’ve opted for less complex cryptographic

primitives.

REFERENCES

1. M. Raya and J.-P. Hubaux, “The Security of Vehicular Ad Hoc Networks,”Proc. 3rd ACM Workshop Security of Ad Hoc and Sensor Networks (SASN05), ACM Press, 2005, pp. 11�21.

2. B. Parno and A. Perrig, “Challenges in Securing Vehicular Networks,”Proc. 4th Workshop Hot Topics in Networks (HotNets-IV), ACM SIGCOMM,2005; www.acm.org/sigs/sigcomm/HotNets-IV/program.html.

3. D. Chaum and E. Van Heyst, “Group Signatures,” Advances in Cryptol-ogy: EUROCRYPT 91, LNCS 547, Springer, 1991, pp. 257�265.

4. D. Boneh, X. Boyen, and H. Shacham, “Short Group Signatures,” 24thAnn. Int’l Cryptology Conf. (CRYPTO 04), LNCS 3152, Springer, 2004, pp.41�55.

Related Work in Security and Privacy

Figure 2 illustrates the entities andcryptographic schemes involved intransmitting a data sample from avehicle. We distinguish the communi-cation server (CS) from the traffic ser-ver (TS). The CS, which a CSP pro-vides, maintains network connectionsand authenticates users but doesn’taccess location and speed data. TheTS receives anonymous data from theCS, decrypts and sanitizes it, andcomputes the real-time congestionmaps. In a real implementation, a cel-lular-phone service provider couldprovide the CS, and a TSP could pro-vide the traffic server. The two par-ties would likely enter a contractualrelationship as business partners thatwould prohibit the exchange of anyprivacy-sensitive information beyondthat specified in this architecture. Tofurther increase user confidence, anindependent agency could audit theinformation exchange between theseparties.

One pair of keys enables encryptionbetween the TS and vehicles. Every ve-hicle knows the TS’s public key KTS

and uses it to encrypt a location sam-ple. We refer to this encrypted mes-sage as a data segment. Since the TS

can decrypt the DS only with its pri-vate key, this layer of encryption pro-tects location privacy against eaves-droppers.

The CS shares a separate symmetrickey Kveh with each vehicle and knowsthe network identifiers (such as IMSIin GSM networks). Using this key, theCS can authenticate incoming datasamples and ensure that authorizedprobe vehicles are transmitting them.If they’re valid, the CS then removes allnetwork identifiers and the messageauthentication code from the vehicleand attaches its own MAC using athird key KCS established between theTS and the CS.

Key distribution and storageThe proposed architecture requires

storing Kveh in vehicles. If an intruder caneasily extract secret vehicle keys frommultiple cars, the intruder could insertlarge numbers of incorrect data samplesinto the traffic-monitoring system. Thusthe key should be stored in tamper-proofhardware. The TS’s public key KTS, onthe other hand, need not be stored intamper-proof hardware as long as users(or vehicles) can verify its integrity andauthenticity.

Vehicles’ keys initially can be embed-ded by the manufacturer and updatedduring regular government vehicle in-spections or regular maintenance. Thislets the manufacturer or inspectionagency replace keys if they’ve been com-promised. If more frequent key updatesare necessary, we can extend the archi-tecture to allow over-the-air provision-ing of new keys.

A sanitizer for traffic-monitoringsystems

The cryptographic authentication mech-anisms can address Sybil attacks (pro-vided that the keys are hard to generate)and message modifications by networkintermediaries, but they can’t prevent in-correct reports from compromised vehi-cles. So, the TS should sanitize receiveddata.

Techniques for sanity checking includeoutlier detection, consistency checking,and rule-based classification.6,7 We canleverage these techniques to build a san-itizer component for traffic-monitoringsystems. For example, the sanitizer couldtest data integrity by comparing an an-onymous vehicle’s claimed speed on aspecific road segment at a specific timewith

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 41

(Kveh, KTS)

E (M , KTS)

Data segment

(1)

Cryptographic function descriptions

M : Location sample(s)E (M,K ): Encrypt message M with symmetric key KHash{DS }: Produce hash from data segment DSHash{DS,K }: Keyed hash function

Key assignment

Kveh: Vehicle’s symmetric keyKCS: A symmetric key from the communication server Public and corresponding private keys for the traffic server

Hash{DS,Kveh}

Messagedauthentication

code

(Kveh, KCS)

(2)

Hash{DS,KCS}

Message authentication code

Communicationserver

Trafficserver

(KCS, K ' )

Datasegment

TS

KTS, K ' :TS

Figure 2. Traffic-monitoring architecture to ensure data integrity and anonymous data collection.

42 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

• statistics that other vehicles report inthe same situation,

• statistics collected one month earlierin the same situation, or

• adjacent location sample data re-ported by the same vehicle.

For example, if a malicious vehicle sendsa fake message reporting low speed (asevere traffic jam) but the sanitizer findsthat most probe vehicles on the same

road segment at a similar time reporthigh speed, the system can easily detectthis as an unreliable message.

We can extend the system to activelyblacklist vehicles that submit apparentlyincorrect data. Because only the CSmaintains identities, the TS must returnthe message with the incorrect data tothe CS. The CS in turn looks up this mes-sage’s originator (this requires bufferingmessages for a certain time window) anddrops all further messages from this vehi-cle until the CS can establish its integritythrough other means.

DiscussionAs we’ve already said, this architecture

can provide privacy guarantees againstbasic eavesdropping and insider attacksthrough encryption and the separationof identity and position information.

More sophisticated intrusions at theCSP and the TSP are also possible.

CS integrityThe proposed architecture assumes

that the CS is trustworthy with respectto data integrity. The architecture pro-vides no cryptographic protection

against the CS spoofing, replaying, ordropping messages. To relax this trust-worthiness assumption, the sanitizercould easily filter replayed messages atthe TS, because no two messages shouldcontain the identical GPS time stampand position. We could also add a basicdegree of protection against spoofedmessages through an additional sym-metric key, KINT, that all vehicles andthe TS share. Vehicles can use this key

to generate a MAC for each locationupdate message that the TS can verifywithout being able to identify the vehi-cle. However, vehicles and the TSwould need to update this key regularlybecause a key shared by many vehiclesis difficult to keep secret. Identifyingdropped messages proves most difficult.

For a more comprehensive solution,the TSP should continuously monitor thetraffic data’s quality by cross-checkingwith other data sources and monitoringconsumer complaints. Monitoring shouldlet the TSP identify if the CS has inserteda continuous bias in the data. It might alsomake an additional authentication key(KCS) unnecessary.

In this architecture, we’ve deliberatelyemphasized privacy protection, becauseprivacy leaks are often more difficult toidentify than integrity problems. Be-cause the CSP and TSP will enter amutually beneficial contractual rela-tionship, both parties will want to main-tain data integrity and monitor the pos-sibility of insider attacks. Individualdrivers, however, have fewer resourcesto verify that their private data hasn’tbeen compromised.

Location privacy at the CSP The proposed architecture provides

location privacy to drivers with respectto the CSP, because only the TSP knowsthe secret key to decrypt the GPS sam-ples. Although the CSP could probablyuse wireless network localization meth-ods to obtain the mobile node’s position,these methods would be significantly lessaccurate.

For example, cell phone localizationtechniques in the US were designed toFederal Communications Commissionspecifications. The E911 Phase II man-date states that a system should be ableto locate 67 percent of calls within 100meters and 95 percent of calls within300 meters. So, we can expect com-monly used technologies such as UplinkTime Difference of Arrival to providean order-of-magnitude less accuracythan in-vehicle GPS, which typicallyachieves better than 10-meter accuracy.Assisted GPS technology, which relieson GPS chips in cell phone handsets,might be more precise. A-GPS could beeasily disabled, however, for in-vehicledeployment.

Location privacy at the TSPBecause the TSP database mixes an-

onymous location samples from all vehi-cles, private information is hard to ex-tract. If multiple vehicles cross paths,discerning which sample belongs to whichvehicle is difficult. Nevertheless, breachescan still occur—intruders can accessdecrypted location samples from theTSP’s database.

Here are two risk scenarios arisingfrom data mining techniques in whichprivacy might be compromised even ifan anonymous data collection architec-ture is deployed:

• Home identification. An intrudermight identify a home’s location fromprobe vehicle drivers as a first steptoward identifying a particular driver.

Because the TSP database mixes anonymous

location samples with all vehicles, private

information is hard to extract. Nevertheless,

breaches can still occur.

• Target tracking. An intruder mightreconstruct paths from anonymoustraces and use them to link the driverto sensitive places that he or she visited.

Although these techniques are most use-ful in conjunction—a privacy compro-mise requires both identifying the driverand acquiring sensitive informationabout the individual—we concentratehere on home identification.

Home identification. Clustering can bean effective tool for home identification.8

Clustering analysis provides insight intodata by dividing objects into groups(clusters) such that the objects in a clus-ter are more similar to each other than tothe objects in other clusters.

Consider the case in which an autho-rized, legitimate, but malicious employeeat the TSP accesses GPS trace data col-lected as defined in our proposed archi-tecture. Since location samples are anon-ymous, at most, the adversary can obtaina collection of GPS location sampleswithout user identity. In addition,because of measurement inaccuraciesand the possibility of using differentparking spots, the exact endpoints of aGPS trace might differ by hundreds offeet, even though a vehicle visits the sameplace.

However, clustering techniques cansmooth out such noisy GPS traces andallow automatic identification ofrepeatedly visited places. Specifically,clustering algorithms can automaticallygroup a set of location samples thatlikely belong to the same destination:anonymized location samples withlow-to-zero speed might be candidatesfor endpoints, and the centroid of thiscluster of endpoints provides a goodestimate of the destination. We can im-prove the estimate’s accuracy by know-ing road topology as provided by digitalroad maps such as those from Goo-gleMap or MapQuest. In our cluster-

ing practice, we’ve developed a set ofheuristic rules to filter out irrelevantlocation samples. For instance, we candifferentiate stationary from movingGPS location samples by looking atGPS speed information. Also, we canuse time information to distinguishhome locations from other kinds ofdestinations. If the marked time is from4 p.m. to midnight and we detect nosubsequent moving GPS location sam-

ples before the next morning, the des-tination is more likely to be a homethan a workplace.

Indeed, place identification is a gen-eral technique to extract potentiallysensitive information about a driver’shabits and interests.9 This informationis also directly related to the ability to id-entify a driver from anonymous traces.Generally, the more information intru-ders know about a data subject (work-place, home location, gym visits, fa-vorite restaurant, and so on), the morelikely they can identify that driver. Webelieve that home identification pro-vides the highest risk, because there’susually a one-to-one mapping betweena typical suburban home and a house-hold, and home owners and occupantsare public knowledge through telephonewhite pages or real estate records.

Target tracking. Adversaries can usetarget tracking to reconstruct pathsfrom anonymous samples or seg-ments,10,11 especially once they’ve iden-tified a home location. Privacy risks gobeyond knowing a home location oncethey’ve linked potentially sensitive in-

formation or places to this home. Tar-get tracking lets an adversary follow thetraces reported by a vehicle to otherlocations, thereby linking informationabout other places to the driver’s iden-tity. However, these techniques don’twork well in urban areas because build-ings, bridges, and tunnels often blockGPS signals; they’re more effective insuburban areas, which contain lessdense GPS traces.

Sampling frequency and homeidentification: a case study

This case study analyzes the effective-ness of home identification techniqueson the TSP’s data sets. Our objective is toshed light on the most serious privacyquestion raised in the discussion of ourarchitecture. Is anonymous data collec-tion enough to protect user privacy? Ifnot, what sampling frequency providesenough data without unduly raising theprivacy risk? Intuitively, the privacy riskdecreases when the system operates withlower sampling frequency (the frequencywith which probe vehicles provide posi-tion updates). So, a judicious choice ofthe sampling frequency is critical in ourproposed architecture. Operating at re-duced sampling frequency is a basic datasuppression technique that you canderive from known concepts such as mixzones12 or cloaking techniques.13 In thiscase study, we consider how effectivelythis basic approach reduces the homeidentification risk. Of course, the chal-lenge in devising a suppression techniqueis to improve privacy while not unnec-essarily reducing service quality. Otherresearchers have looked at the effect on

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 43

Target tracking lets an adversary follow the

traces reported by a vehicle to other locations,

thereby linking information about other places

to the driver’s identity.

service quality,2 so our study concen-trates on privacy aspects.

Our study uses a data set containingGPS traces from vehicles driving in thelarger Detroit, Michigan, area. For pri-vacy reasons, we had no specific infor-mation about the vehicles or driversexcept that the drivers were volunteers.Each GPS sample comprises vehicle ID,time stamp, longitude, latitude, speed,and heading information. Each vehiclerecords a GPS sample every minutewhile its ignition is switched on, for aperiod of one week. This means thatthe traces contain both spatial and tem-poral gaps. No data is provided whilethe vehicle is parked with its ignitionswitched off. In addition, data was un-available when the GPS receiver wasacquiring or had lost a satellite fix (forexample, because of obstruction fromhigh-rise buildings).

Clustering-based homeidentification algorithm

For our home identification algo-rithm, we use a k-means clustering algo-rithm on anonymous location samplesto identify frequently visited places. Wethen refine the resulting clusters usingseveral heuristics. First, a set of anony-mous location samples near a homelikely have low to zero speed. Second,vehicles are often parked overnight athomes. Specifically, the key steps of the

algorithm are the following:

1. Drop location samples that are toohigh-speed (> 1 meter/second) fromthe set of all vehicles (the remainingsamples contain the candidate tripendpoints).

2. Select a target region of interest toimprove computational efficiency,and drop samples outside thisregion.

3. Apply the k-means pairwise cluster-ing algorithm to samples in the tar-get region and store the returnedcluster centroids.

4. Filter the candidate home locationsout of all centroids using heuristicA, arrival time, and heuristic B, zon-ing information.

Step 3 repeats to calculate the cen-troids of clusters until it finally groupsall location samples into the optimumnumber of clusters. The k-means pair-wise clustering algorithm in step 3doesn’t have a priori knowledge of theoptimum number of clusters at the ini-tial run. So, it uses all locations ob-tained after step 2 as initial clustersand keeps merging close ones into fewer clusters at each run. The merg-ing process stops when every centroidhas all its elements within a certain dis-tance (Dth) on the average. Dth shouldbe chosen according to different home

densities. If the home density is toodense, keep Dth small enough to dif-ferentiate the locations of other vehi-cles living near each other. In our sim-ulations, we use a value of 100 m forthis threshold, which we derived fromthe region’s actual home density.

Filtering with heuristic A eliminatesall centroids that don’t have any even-ing visits. We define an evening visit asa location sample arriving between 4p.m. and midnight. Filtering with heur-istic B eliminates all centroids outsideresidential areas. In our experiments,we’ve eliminated centroids outside res-idential areas by manually inspectingsatellite imagery (using Google Earth).You could automate this process by ob-taining geographic-information-systemdata sets with city zoning information.

Because real home addresses wereunavailable (we omitted driver identi-ties in the data set for privacy reasons),we manually inspected the unmodifiedweek-long traces overlayed on satelliteimages to identify plausible home loca-tions as an experiment baseline. Tomake the evaluation feasible, we ana-lyzed a subset of the region covered bythe 239 traces in the data set (each tracecorresponds to one driver). The subsetcontained the two residential regions(together a 25 � 25 km area) markedwith rectangles in figure 3. We found 65plausible homes through manual inspec-tion. We then compared the automatedalgorithms to the results of the manualinspection. For the algorithm evalua-tion, we considered a home correctlyidentified if the algorithm and manualinspection gave the same answer. Theresults indicate whether the inspectiontask could be automated for mass sur-veillance purposes.

Because no real ground truth was avail-able, the experiment doesn’t definitely

44 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

Figure 3. Plausible home locations in two target regions (the red rectangles),identified through manual inspection.The study considered 65 homes in a 25 � 25 km area.

answer whether we identified the drivers’real home locations. The 65 reference lo-cations we chose manually, however, eachcontained a single home that stood out asa likely home location—the drivers visitedthis location much more frequently atnight than others. So, we believe manualinspection provided a reasonable ap-proximation of real home positions.

To examine the effectiveness of reduc-ing sampling frequency, we measured thehome identification rate (how manyhomes out of 65 we correctly detected)and the false positive (how many of theestimated home locations were incorrect)at sample intervals. False positives canbe caused by many vehicles waiting attraffic lights or the cluster centroid shift-ing to a neighbor’s house because of inac-curate location reports. In addition to thestandard 1-minute sample interval(which produces one location trace perminute), we considered 2-, 4-, and 10-minute intervals.

ResultsFigure 4 shows that at the standard

rate, the home identification algorithmcorrectly located about 85 percent ofthe homes, albeit also returning a largenumber of false positives. Reducing thesampling frequency decreases the homeidentification rate to 40 percent for the4-minute interval, with similar false-positive rates. Although there’s no clearlinear trend—for example, home iden-tification with 10-minute intervals per-formed better than with 4-minute inter-vals because it happened to generatefewer candidate centroids after clus-tering—the results indicate that datasuppression algorithms can reduce thehome identification risk and therebyincrease privacy. Also, although the homeidentification intrusion technique weevaluated suffered from many false pos-itives, it can be at least effective for auto-mated prefiltering, followed by manual

inspection to remove false positives.

The degree of privacy protec-tion this architecture providesdepends on judiciously choos-ing the frequency with which

probes send in their position updates.Sampling frequencies higher than onesample per minute, as frequently con-sidered for traffic-monitoring applica-tions,1,2,14 allow data mining techniquesto reidentify many of the probe vehicles.To provide a high degree of privacy pro-tection, traffic-monitoring systems shouldoperate at sample frequencies of at leastseveral minutes or employ more sophis-ticated data suppression mechanisms that can optimize both privacy and dataquality.

ACKNOWLEDGMENTSThis work was supported in part by the US Na-tional Science Foundation under grant CN5-0524475.

REFERENCES

1. T. Ishizaka, A. Fukuda, and S. Narupiti,“Evaluation of Probe Vehicle System by

Using Micro Simulation Model and CostAnalysis,” J. Eastern Asia Soc. Transporta-tion Studies, vol. 6, 2005, pp. 2502�2514.

2. X. Dai, M. Ferman, and R. Roesser, “ASimulation Evaluation of a Real-Time Traf-fic Information System Using Probe Vehi-cles,” Proc. 2003 IEEE Intelligent Trans-portation Systems, vol. 1, IEEE Press, 2003,pp. 475�480.

3. B. Greenshields, “A Study of Traffic Capac-ity,” Highway Research Board Proc., vol.14, 1935, pp. 448�477.

4. R. Anderson, Security Engineering: A Guideto Building Dependable Distributed Sys-tems, John Wiley & Sons, 2001, pp. 222-225.

5. J.R. Douceur, “The Sybil Attack,” Proc. 1stInt’l Workshop Peer-to-Peer Systems (IPTPS

02), 2002; www.cs.rice.edu/Conferences/IPTPS02/101.pdf.

6. N. Adam, V. Janeja, and V. Atluri, “Neigh-borhood Based Detection of Anomalies inHigh Dimensional Spatio-Temporal SensorDatasets,” Proc. 2004 ACM Symp. AppliedComputing, ACM Press, 2004, pp. 576�583.

7. D. Beneventano et al., “Consistency Check-ing in Complex Object Database Schematawith Integrity Constraints,” Proc. IEEETrans. Knowledge and Data Eng., vol. 10,no. 4, 1998, pp. 576�598.

OCTOBER–DECEMBER 2006 PERVASIVEcomputing 45

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

Home identification rate (percent)

Fals

e po

sitiv

e (p

erce

nt)

1 minute2 minutes

4 minutes

10 minutes

Figure 4. Results of the home identifica-tion algorithm using four different sampling intervals.

8. A.K. Jain and R.C. Dubes, Algorithms forClustering Data, Prentice Hall, 1988.

9. D. Ashbrook and T. Starner, “Using GPS toLearn Significant Locations and PredictMovement across Multiple Users,” Per-sonal Ubiquitous Computing, vol. 7, no. 5,2003, pp. 275�286.

10. D. Reid, “An Algorithm for Tracking Mul-tiple Targets,” IEEE Trans. Automatic Con-trol, vol. 24, no. 6, 1979, pp. 843�854.

11. M. Gruteser and B. Hoh, “On the An-onymity of Periodic Location Samples,”Proc. 2nd Int’l Conf. Security in PervasiveComputing, LNCS 3674, Springer, 2005,pp. 179�192.

12. A. Beresford and F. Stajano, “Mix Zones:User Privacy in Location-Aware Services,”IEEE Workshop Pervasive Computing and Comm. Security (PerSec 04), 2004;www.cl.cam.ac.uk/~fms27/papers/2004-BeresfordSta-mix.pdf.

13. M. Gruteser and X. Liu, “Protecting Pri-vacy in Continuous Location-Tracking Ap-plications,” IEEE Security and Privacy, vol.2, no. 2, 2004, pp. 28�34.

14. R. Cayford and T. Johnson, “OperationalParameters Affecting Use of AnonymousCell Phone Tracking for Generating TrafficInformation,” Proc. Inst. of TransportationStudies 82nd TRB Ann. Meeting, vol. 1, no.3, 2003.

46 PERVASIVEcomputing www.computer.org/pervasive

I N T E L L I G E N T T R A N S P O R T A T I O N S Y S T E M S

the AUTHORSBaik Hoh is a PhD student in the Electrical and Computer Engineering Departmentand WINLAB (Wireless Information Network Laboratory) at Rutgers University. His re-search includes privacy-enhancing technologies and mobile worm and virus quaran-tine. He received his MS in electrical and computer engineering from the Korea Ad-vanced Institute of Science and Technology. He’s a student member of the IEEEComputer Society and the ACM. Contact him at WINLAB, Rutgers Univ., 671 Rte. 1 S.,North Brunswick, NJ 08902; [email protected].

Marco Gruteser is an assistant professor at WINLAB, Rutgers University. His researchinterests include location-aware networking and building reliable, secure, and pri-vacy-aware communication systems for vehicular networks. He received his PhD incomputer science from the University of Colorado at Boulder. He is a member of theIEEE Computer Society and the ACM. Contact him at WINLAB, Rutgers Univ., 671 Rte.1 S., North Brunswick, NJ 08902; [email protected].

Hui Xiong is an assistant professor of computer information systems in the Manage-ment Science and Information Systems Department at Rutgers University. His re-search interests include data mining, spatial databases, statistical computing, andgeographic information systems. He received his PhD in computer science from theUniversity of Minnesota. He is coeditor of Clustering and Information Retrieval and co-editor in chief of the Encyclopedia of Geographical Information Science. He is a memberof the IEEE Computer Society and the ACM. Contact him at Rutgers Univ., AckersonHall, 200K, 180 University Ave., Newark, NJ 07102; [email protected].

Ansaf Alrabady is a senior research engineer at General Motors. His main researchis in communication security and embedded system development for automotiveapplications. He received his PhD in computer engineering from Wayne State Uni-versity. In 2001, he received the Automotive Hall of Fame Young Leadership and Ex-cellence award for his contributions to the automotive industry. Contact him at Gen-eral Motors, 30500 Mound Rd., Warren, MI 48090-9055; [email protected].

Any products your peers shouldknow about? Write a review forIEEE Pervasive Computing, andtell us why you were impressed.Our New Products departmentfeatures reviews of the latest

components, devices, tools, andother ubiquitous computing

gadgets on the market.

Send your reviews andrecommendations to

[email protected]

Tried anynew gadgets

lately?


Recommended