+ All Categories
Home > Documents > 1 USING CELLPHONE O-D DATA FOR REGIONAL …docs.trb.org/prp/16-6488.pdf4 5 Sumit Bindra 6 RSG ......

1 USING CELLPHONE O-D DATA FOR REGIONAL …docs.trb.org/prp/16-6488.pdf4 5 Sumit Bindra 6 RSG ......

Date post: 26-Jun-2018
Category:
Upload: lamdang
View: 215 times
Download: 0 times
Share this document with a friend
14
USING CELLPHONE O-D DATA FOR REGIONAL TRAVEL MODEL VALIDATION 1 2 3 4 Sumit Bindra 5 RSG 6 55 Railroad Row, White River Junction, VT 05001 7 Tel: 802-295-4999 Fax: 802-295-1006; Email: [email protected] 8 9 10 Word count: 3,774 words text + 8 tables/figures x 250 words (each) = 5,774 words 11 12 13 14 15 16 17 August 1, 2015 18
Transcript

USING CELLPHONE O-D DATA FOR REGIONAL TRAVEL MODEL VALIDATION 1

2

3

4

Sumit Bindra 5

RSG 6

55 Railroad Row, White River Junction, VT 05001 7

Tel: 802-295-4999 Fax: 802-295-1006; Email: [email protected] 8

9

10

Word count: 3,774 words text + 8 tables/figures x 250 words (each) = 5,774 words 11

12

13

14

15

16

17

August 1, 201518

Bindra 2

1

ABSTRACT 2

This paper presents a summary of the way in which cellphone location data can be used to validate 3

regional travel models. The paper presents a detailed view of how the cellphone data is collected 4

and summarized and what its sthrengths and weakenss are. A summary of the comparison of 5

cellphone location data with Syracuse travel model is presented followed by a list of 6

recommendations derived from this comparison. 7

8

9

10

11

Keywords: Cellphone Location Data, Travel Model, AirSage 12

13

Bindra 3

INTRODUCTION 1

2

The Syracuse Metropolitan Transportation Council (SMTC) maintains the regional travel demand 3

model for areas including the City of Syracuse in Onondaga County and other neighboring towns. 4

The model area includes 1,185 internal and external zones, extends 45 miles north-south and 35 5

miles east west. The regional travel demand model is a traditional trip-based, four-step model that 6

runs on the TransCAD platform and was built by RSG. The existing model has been calibrated and 7

validated against average daily traffic (ADT) counts, vehicle miles traveled (VMT), trip length 8

distributions, and screen line counts. With the goal of improving the calibration and validation 9

process, SMTC decide to do a region wide Origin‐Destination (OD) study using cellphone 10

location data. The data collection effort was completed by AirSage, a firm which collects and 11

analyzes real-time mobile signals to provide anonymous data of the location and movement of 12

mobile devices. This data set provides insight into where people are located and how they move 13

about over time. AirSage’s WiSE (Wireless Signal Extraction) technology extracts data from 14

wireless carrier networks, as generated by devices in the normal course of operation (e.g., making 15

phone calls, texting, surfing the Web). Mobile devices frequently communicate with the network, 16

both during use and when the mobile is in idle mode. AirSage technology anonymizes the data 17

stream ensuring user privacy, and performs multiple stages of analysis to monitor the location 18

movement of mobile devices, and thus the population of mobile users. 19

20

RSG was assigned the task of ensuring that the zone-to-zone flows from both the model results and 21

cellphone movements (that were assumed indicative of regional population) matched well with 22

each other. At the end of that study, RSG advised that it is appropriate to use AirSage data for 23

model calibration and validation at spatial levels that are an aggregation of the model Travel 24

Analysis Zone (TAZ) level. It was observed that as the resolution of data comparison increases 25

(becomes more disaggregate), the comparison becomes worse. RSG also noted that AirSage is 26

better equipped to measure certain trip purposes more accurately than others (HBW vs. NHB) and 27

that is due to their clustering algorithms that are inherent to the data collection and aggregation 28

process at AirSage. RSG also presented a list of precautions that should be taken when using 29

AirSage data for select link analysis and developing external-to-external or external-to-internal 30

matrices for use in travel models. 31

32

For this paper, a review of AirSage’s data collection, expansion and summarization processes and 33

methodology is being presented using the information that is publically available. Key areas of the 34

process have been reviewed including: 35

36

1. Device location processing 37

2. Activity pattern analysis / location generation 38

3. Population synthesis and Trip analysis 39

4. Data aggregation and packaging 40

An overview of each of these steps is presented in the next section along with a list of 41

“things-to-remember” while performing the comparative analysis between AirSage and regional 42

travel model outputs. This is followed with a detailed comparative analysis of the AirSage data and 43

the travel model outputs across various dimension including large, medium and small aggregation 44

levels, by trips purposes, by time of day and trip length frequencies. 45

46

Bindra 4

AIRSAGE METHODOLOGY OVERVIEW 1

2

This section presents an overview of various steps and assumptions in the AirSage data collection, 3

expansion and summarization process. It is important to note that this review is based on the 4

literature AirSage provides along with the data to the agencies purchasing the data. The summary 5

here is presented based on the papers and presentations various agency staff members have done 6

on the topic. This section is divided into five parts that are discussed below in detail. 7

8

Device Location Processing 9

10

AirSage documentation says that “Time-stamped locations (latitude/longitude) are generated for 11

each mobile device (e.g. a cellphone), utilizing the network signaling data generated each time a 12

mobile device interacts with the mobile network. Interaction with the network comes in many 13

forms including sending and receiving text messages or receiving updates or streaming data 14

to/from mobile devices. “Processed Sightings” are created using this information in addition to 15

factoring in the quality of the device and removing any static that might occur within the network 16

that has the potential to obscure the data.” 17

18

Staff at National Capital Region Transportation Planning Board (TPB) and Metropolitan 19

Washington Council of Governments (COG) noted in their presentation (1) that trip movements 20

are identified by time and distance criteria, namely: 21

22

1. Trips O-Ds must be at least 1.2-1.5 km (0.75-0.93 miles) in distance; 23

2. If a device stops at a location for 5+ minutes, a destination is assumed. 24

These “assumptions” are appropriate and logical when used in the context of converting cellular 25

locations to trips made by a person. They, however, do introduce errors when comparing them to 26

outputs of a regional travel model. For example, in a dense city center or downtown like 27

environment, the distance threshold can potentially miss several short trips where both the origin 28

(e.g. a person’s apartment) and destination (e.g. his office nearby) are within a mile of each other. 29

While these trips will be part of the travel model, AirSage can potentially ignore these as being in 30

the same cluster thus potentially underreporting trips. Similarly, the 5+ minute stop being 31

designated as a destination can break a HBW trip in the model into a HBO and NHB or WBO trip 32

in AirSage summary is a person stops for coffee on the way to work. These assumptions are 33

something that can make the travel model output look worse when compared to the AirSage data 34

and should be treated as one of the areas of weakness with regards to this comparison. 35

36

Activity Pattern Analysis and Point 37

38

In its report about activity pattern analysis and point generation, AirSage mentions that “All of the 39

“Device Locations” (Home, Work, etc.) for a device are determined over the course of four to six 40

weeks. The data are run through a series of pattern recognition and statistical clustering 41

algorithms to determine repeated and irregular trip patterns and primary activity locations for a 42

device. These patterns and locations are used to classify trip purpose.” 43

44

AirSage also mentions that a home location is defined as a place where a subscriber (of the cellular 45

device) spends most of its time between 9:00 pm and 6:00 am and a work location is determined by 46

Bindra 5

looking at where subscribers spend the majority of their days between 9:00 am and 5:00 pm. All 1

remaining locations, with a 5+ minute stop inside a mile wide radius, are defined as “others” and 2

the trip legs are formulated around these to arrive at a daily trip pattern. These assumptions are 3

very reasonable for most areas except when looking at zone with a high population of evening or 4

night shift workers or college students, who do not satisfy these assumptions. These “unusual” 5

groups generally never form a large part of a travel model and thus should not affect the 6

comparison too much. Thus, this location tagging and cluster analysis is a strong feature of this 7

dataset. 8

9

Population Synthesis 10

11

In its report about population synthesis, AirSage mentions that “Using the observed sample 12

devices, the movements for a full population is synthesized based on the penetration rates and 13

device quality. Penetration rate is the ratio of number of resident devices observed by AirSage in a 14

given census tract to the 2010 census population. Device quality refers to the number of daily 15

sightings observed for each device. This factor feeds a model which adjusts for the probability of 16

missing trips due to limited visibility of some devices.” 17

18

The population systhesis feature could possibly be one of main strengths of the cellular data. This 19

could also explain how in most cases the comparison of outputs from a well-calibrated travel 20

model match very closely with AirSage data at aggregate levels. However, it is also important to 21

note one factor that is not mentioned in the AirSage documentation is the percentage of people 22

owning smartphones. AirSage can only collect a person’s location if the cellular device interacts 23

with the network. Smartphones interact with the network a lot more frequently (for calls, texts, 24

internet access, locations services, etc) than traditional phones. Thus, an area with a very low 25

percentage of smartphones can potentially skew the data in an unwanted direction. AirSage 26

mentions that device quality is used in factoring for probability of missing trips but details of that 27

“modelling” are unavailable to the general public or the agencies and thus cannot be commented 28

on. The agencies should request from AirSage a copy of the unadjusted data so that the agencies 29

can apply the expansion factors based on their local knowledge of the area or selectively remove 30

areas with very low penetration rates. 31

32

Trip Analysis 33

34

In its report, AirSage mentions that “Each trip is analyzed and classified into various interesting 35

categories such as resident class of subscriber, trip purpose, time of day and day of week.” 36

37

In essence, based on the home, work and other location of the cellular device derived from the 4-6 38

weeks of preliminary observation and clustering analysis, a trip purpose can be assigned to each 39

trip. Since the AirSage data cannot identify any other specific location type except home and work 40

based on the clustering analysis, results from travel models with trip types such as home based 41

school or shopping (HBS) have to be aggregated with home based other (HBO) trips for 42

comparative purposes. Thus, in general for most travel models the three main trips purposes that 43

can be compared with the AirSage data are home based work (HBW), HBO, and non-home based 44

(NHB) trips. The time of day and day of week component is definitely a plus for the AirSage data 45

and can be used in validating the time of day component of the travel models. 46

Bindra 6

One suggested use of the AirSage data is its use as a way of generating external-to-internal (and 1

vice versa) and external-to-external trip matrices for use in travel models. These matrices are 2

traditionally derived from license plate or Bluetooth surveys and can appear to be a perfect place to 3

use the AirSage data. There are important issues with the use of AirSage data for developing these 4

matrices that are discussed here. 5

6

A study area has to be defined prior to commencing the AirSage data analysis so that the devices in 7

the area can be designated as those belonging to a resident (those living in the study area) or a 8

visitor i.e. someone whose cellular device are seen for the first time in an “external zone”. AirSage 9

suggests that an external zone be defined as a 30 to 45 minute travel time buffer created around 10

study area to form the external zones. At the outer edges of a travel model, these external zones can 11

stretch 30-40 miles in each direction thus potentially adding many external-to-external trips to the 12

data set that never pass through the study area. At places where these external zones include mid to 13

large size cities with trips to and from each other, this error can be amplified substantially. In 14

addition to that, when applying factors to expand the sample of trips from cellular devices, the 15

population of the study area alone is used. Thus, in that case these EE trips will form a larger than 16

usual chunk of the total trips in the dataset that potentially should either not be a part of it or should 17

have been grown using different population growth factors. 18

19

A potential solution to this issue is scaling these EE trips based on the population in the areas 20

where they originate or to only include ones that pass through the internal zones. However, none of 21

this has been specifically identified or discussed in the AirSage documentation and thus it remains 22

an area of concern. It is also important for the agencies to be aware of any special events (that 23

attract many visitors) in the study area during the preliminary investigation or final data collection 24

time frames unless specifically desired. 25

26

Data Aggregation and Processing 27

28

The device type (resident or visitor), time of the year, time periods during a day, day of the week, 29

and trips purposes are all segment that AirSage data can be divided into. Every additional data 30

segmentation increases the cost of procuring the data. Thus, agencies must pay a lot of attention to 31

the required level of segment they need for comparison their model outputs to the AirSage data or 32

using this data as an input in their travel models. 33

34

Other segmentations missing from the AirSage data that are equally important for developing or 35

validating a travel model are travel mode, auto occupancy, vehicle classification, etc. Thus, while 36

the AirSage dataset can be useful in transportation planning and model development, it cannot be 37

treated as wholesome replacement of any particular traditional data collection activity. 38

With these strengths and weaknesses in mind, the rest of the report will focus on comparing the 39

AirSage data to the SMTC travel model outputs. 40

41

AIRSAGE DATA COMPARISON TO TRAVEL MODEL OUTPUTS 42

43

This section will focus on various dimensions across which AirSage data was compared to the 44

SMTC four-step travel model outputs. The AirSage data for SMTC was collected in October 2013 45

for an average weekday. The data was further segmented by three time periods – AM peak, PM 46

Bindra 7

peak and off peak – and three trip types – HBW, HBO and NHB. Comparison across various 1

dimensions is presented in the sections below. 2

3

Comparison at Aggregate Levels by Trip Purposes 4

5

TABLE 1FIGURE 1 presents the comparison of town to town flows in Syracuse with the AirSage 6

trips on the x axis and the Model trips on the x axis. When comparing total town-to-town trips the 7

correlation is very strong between the two data sources (0.95) but the comparison is slightly less 8

robust when trips are separated by purpose. When similar charts are preared for TAZ to TAZ flows, 9

the comparison also looks worse thus it is concluded that AirSage data is only well suited for 10

comparison with travel models at aggregate levels. 11

12

13

FIGURE 1 Town to Town Flows in Syracuse – Total and by Trip Purpose 14

15

TABLE 1 presents the breakdown of the AirSage and Travel Model trips by purpose. Total 16

AirSage trips are 9% lower than the model and it has 53% more home based trips than in the model 17

while the NHB trips were 48% lower which was in contrast to some of initially assumptions made 18

about this comparison. 19

20

21

Bindra 8

TABLE 1 AirSage Trips vs model Trips 1

2

3

4

5

6

7

8

A logical way to justify either was to compare the results from these to established industry 9

standards and other areas where AirSage has provided cellphone O-D data. FIGURE 2 presents 10

this comparison with the National Cooperative Highway Research Program (NCHRP), National 11

Capital Region Transportation Planning Board (TPB) (1) model and Mobile Area Transportation 12

Study (MATS). Clearly AirSage appears capable of identifying Home-Based Other (HBO) trip 13

purposes but less capable of differentiating between HBW and NHB trip purposes. 14

15

16

FIGURE 2 Comparing Trip Purpose Percent Shares 17

Purpose AirSage

Trips

AirSage

Percent

Model

Trips

Model

Percent

Percent

Diff HBW 484 25% 316 15% 53%

HBO 1,061 56% 1,084 52% -2%

NHB 360 19% 694 33% -48%

TOTAL 1,905 100% 2,094 100% -9%

Bindra 9

An interesting comparison with the HBW trip destinations can be made by plotting the total HBW 1

trips with destinations in a zone against the total employment in that zone. Ideally, one would 2

expect a linear correlation between the two since employment is the only attraction for a HBW 3

trips to a zone. 4

5

In FIGURE 3, the left side shows the same for the travel model with a correlation of 0.84 between 6

the two. The same plot using the Airsage data shows that work trip destinations are not correlated 7

with zone level employment locations and there is no general trend of increasing HBW 8

destinations with an increase in employment. This is clearly a violation of well established and 9

expected outcomes and does not lend any support to trip purpose information in AirSage data. 10

11

12

13

FIGURE 3 HBW Destinations vs Employment 14

15

Comparison for Special Zones of Interest 16

17

This study also looked at the comparison between the two dataset for one special area – Syracuse 18

University (SU). SMTC provided additional information about the university stating that at the 19

time of the AirSage data collection effort, SU had approximately 20,000 students and 5,000 staff 20

members. Table below presents the comparison of AirSage and Model trips by purpose for this 21

zone. It is clearly apparent that potenaitlly due to its trip defining thresholds and clustering 22

algorithms, AirSage is underreporting trips in this zone by a substantial amount. 23

24

TABLE 2 AirSage Trips vs Model Trips for Syracuse University 25

26

27

28

29

30

31

32

33

34

Purpose AirSage

Trips

Model

Trips HBW 9,300 15,300

HBO 9,400 25,000

NHB 3,900 25,500

TOTAL 22,600 65,800

Bindra 10

Comparison for Time of Day Partitions 1

2

Time of day patterns is considered to be one of the strongest attributes of the cellphone location 3

data in travel models. The comparison between the model and AirSage time of day partitions 4

showed a lot of similarities between the two. When making this comparison, the agencies should 5

pay attention to the definition of each of the time periods in the model and AirSage. The results are 6

shown in FIGURE 4. 7

8

9

10

FIGURE 4 Time of Day Distributions 11

12

Comparison of Trip Length Frequency Distribution 13

14

Trip length frequency distribution (TLFD) comparison between the model and AirSage datasets 15

for all trips and by trip purpose was also performed. While the comparison of TLFD for total trips 16

performed well when compared to the model (FIGURE 5), comparions by trip purpose did not 17

look promising. As a test a combined TLFD chart of HBW and NHB trips was prepared and s 18

shown below further strenghtning the idea that AirSage had issues separating the two trip purposes 19

in this area. 20

Bindra 11

1

FIGURE 5 TLFD Comparison 2

3

Important Note on Performing Select Link Analysis 4

5

SMTC also purchased transient or select link data in order to test the effectiveness of using airSage 6

data for such analysis. The comparison of the select link analysis results from the model to the 7

AirSage data were not promising due to various important resons described here. 8

9

What makes the AirSage data not well suited for select link analysis is the the way cellular device 10

locations are collected by AirSage. AirSage collects cellphone locations only when the device 11

interacts with the cellular network i.e start/end of a call, send text message, data transfer, or 12

internet access. If the device is simple on but not interacting with the network in abovementioned 13

ways, the location of the device cannot be determined by AirSage. For example, in FIGURE 6, 14

where each dot represents and network “ping”, Device 1 will be registered by AirSage but Device 15

2 will not be even though they are both travelling on the same link with cellphones. Thus, for short 16

segments, it is possible that the cellular device never interacts with the network while travelling 17

through those segments and thus not registering its location as being on the select link. 18

19

Another source of errors in select link analysis of short segments is the fact that AirSage will also 20

register cellular devices that interact with the network while travelling on nearby links. There is no 21

good way of filtering these unwanted devices/trips unless only device that register their location 22

multiple times on the select link are used in the final dataset. Thus, it is not advisable to use the 23

AirSage data for select link analysis with short segments. 24

25

If agencies do decide to do so for short segments, it is suggested that they study the location of 26

cellular network towers and their coverage and pick a link long enough to pass through coverage 27

areas of at least two cell towers. This is because AirSage can also register a cellular devices 28

location if it changes cell towers and this will ensure that most vehicles travelling on the select link 29

are registered. 30

31

Bindra 12

1

FIGURE 6 AirSage Registered Devices 2

3

CONCLUSION 4

5

Compairing the Syracuse Metropolitan Transportation Council (SMTC) four-step travel demand 6

model with the AirSage data yeieled several important results and conclusions. These are preented 7

below: 8

9

1. Aggregation is good; disaggregation is bad: Lokking at zone to zone flows vs. town to 10

town flows, it was evident that comparison of AirSage and travel model outputs is best if 11

done at aggregate levels. There are several sources of error in both at a zone level that can 12

make the comparison appear worse. Since it is also more expensive for the agencies to 13

collect the data at a disaggregate level, agencies should think about the lowest level of 14

resolution their stakeholders would be happy with and develop zones accordingly. 15

2. Get creative with external zone boundaries: If external zone boundaries are too small, the 16

external trips might not register with AirSage coz the cellphone never interacts with the 17

network while travelling through that zone. On the other hand if the external zone 18

boundaries are too big, and especially if they include mid to large towns or cities, they 19

might register trips that are completely unrelated to the study area. It is thus important for 20

the agencies to be creative and aware of the external zone boundaries. 21

3. Think about what trip purposes you really need and why: In theory it appears that AirSage 22

should predict HBW and HBO trips well due to their clustering algorithm and preparations 23

done in advace of data collection. There are cases where a substantial population shift 24

workers or students would through this clustering algorithm off. The cellphone devices do 25

not come with any personal or identifiable information and thus it would be extremely 26

difficult for AirSage to account for these trips appropriately. Thus agencies should be 27

Bindra 13

aware of these groups in their study areas and access if they truly need the trip purpose 1

information and are willing to live with these known issues. 2

4. Select link analysis should only be done on long links and with care: As discussed earlier, 3

AirSage collects cellphone locations only when the device interacts with the cellular 4

network i.e start/end of a call, send text message, data transfer, or internet access. Thus if 5

this dataset is to be used for select link analysis, the link should be substantially long and 6

hopefully isolated. 7

8

9

10

11

12

Bindra 14

REFERENCES 1

1. Initial analysis of AirSage O-D cellular data for the TPB modeled area. Ronald 2

Milone. July 2014 3

www.mwcog.org/uploads/committee-documents/ZV1YW1Zc20140718142637.pd4

f. Accessed January 5, 2015. 5

6


Recommended