+ All Categories
Home > Documents > Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System...

Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System...

Date post: 10-Dec-2016
Category:
Upload: janine
View: 217 times
Download: 3 times
Share this document with a friend
8
30 2006 ODM are reviewed and compared with the ODM performed in 1997, which was based on traditional household surveys. SÃO PAULO Nearly 11 million people live in São Paulo, the largest city in South America (1). However, the city has more than 60% more daily trips on its public transit system (11.5 million in 2002) than does New York City (currently 7 million), which has a comparable urban population (2, 3). In São Paulo, the compact metro system carries 1.5 million passengers per day, and the public bus system, the backbone of the city’s transportation infrastructure, carries nearly 10 million passengers per day (2). Because of increasing road-based congestion in the metropolitan region, São Paulo’s city government made the bus system an invest- ment priority starting in 2001. Since then, the government has addressed informal transit providers—that is, owner-operators of minibuses without fixed routes, schedules, or fares—and integrated nearly all of them into the formal, public operating system. In addi- tion, they renovated the fleet and made infrastructure investments in several terminals and bus rapid transit corridors. Finally, they adopted new technologies such as AFC and automatic vehicle location (AVL) systems, which are contributing to improved operations and customer service. These new technologies were originally implemented to help with accounting and oversight functions, respectively, but the data available also present an opportunity for improved planning. ODMs The ODM is an input to the traditional transportation planning process and gives the number of trips between geographic zones in a region. In São Paulo, the geographic scale of “zone” was adopted to develop the matrix. Currently, there are 390 zones in the metropolitan area, 271 of which lie within the city limits. In the urban core, a zone is about the size of a neighborhood; that is, between eight to 15 city blocks across in each direction. Peripheral zones are substantially larger than zones in the urban core, as fewer inhabitants occupy the outlying areas. Values within the ODM represent the number of passengers traveling from one zone to another. The ODM is important because it quantifies demand geographically and temporally for transport services, providing information for service planning and operations analysis. For example, in strategic Constructing an Automated Bus Origin–Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil Janine M. Farzin This paper outlines the process used to create an origin–destination matrix (ODM) in São Paulo, Brazil, with data available from automated data collection (ADC) systems. Prior work to develop ODMs using ADC systems is reviewed; however, the São Paulo case differs substantially from these. The approach used in this paper addresses a more complex bus network than has been approached before, uses a platform for inte- grating more data than were available in previous applications, and applies raw global positioning system–coordinate data to determine the location of buses for assigning an origin zone. Previously documented destination-inference techniques are used to assign destinations to each trip. Overall, the ODM development process uses three data sources: bus stops, automatic vehicle location data, and automated fare collection data. The results of the electronically generated ODM are analyzed and compared with results from prior household surveys. This analysis sug- gests that the larger sample size available from electronic farecard records enables more comprehensive, detailed ODMs than those generated from traditional household survey data. Large sets of electronically generated travel data, such as those from electronic farecards or Global Positioning System (GPS) transmis- sions, can serve as inputs to the transportation planning process. For example, one widely used transportation-planning tool is an origin– destination matrix (ODM), which quantifies transport demand between geographic regions in a city during a particular period of time. Since the ODM quantifies passenger use in the system, automated fare collection (AFC) data can be valuable for creating improved ODMs with more detailed passenger trip information than a traditional survey provides. However, using these data requires development of new technical tools, as shown in this paper. This paper first provides background information on the transport system in the city of São Paulo, Brazil, and describes recent appli- cations of automated ODMs in large cities. Next, it discusses the steps used to create an ODM for the public bus transit system in São Paulo, outlines the development process, and provides detailed information about specific steps. Finally, some results from a new, automated New York City Transit, 2 Broadway, D25.111, New York, NY 10004. jfarzin@ alum.mit.edu. Transportation Research Record: Journal of the Transportation Research Board, No. 2072, Transportation Research Board of the National Academies, Washington, D.C., 2008, pp. 30–37. DOI: 10.3141/2072-04
Transcript
Page 1: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

30

2006 ODM are reviewed and compared with the ODM performedin 1997, which was based on traditional household surveys.

SÃO PAULO

Nearly 11 million people live in São Paulo, the largest city in SouthAmerica (1). However, the city has more than 60% more daily tripson its public transit system (11.5 million in 2002) than does NewYork City (currently 7 million), which has a comparable urbanpopulation (2, 3). In São Paulo, the compact metro system carries1.5 million passengers per day, and the public bus system, thebackbone of the city’s transportation infrastructure, carries nearly10 million passengers per day (2).

Because of increasing road-based congestion in the metropolitanregion, São Paulo’s city government made the bus system an invest-ment priority starting in 2001. Since then, the government hasaddressed informal transit providers—that is, owner-operators ofminibuses without fixed routes, schedules, or fares—and integratednearly all of them into the formal, public operating system. In addi-tion, they renovated the fleet and made infrastructure investments inseveral terminals and bus rapid transit corridors. Finally, they adoptednew technologies such as AFC and automatic vehicle location (AVL)systems, which are contributing to improved operations and customerservice. These new technologies were originally implemented tohelp with accounting and oversight functions, respectively, but thedata available also present an opportunity for improved planning.

ODMs

The ODM is an input to the traditional transportation planning processand gives the number of trips between geographic zones in a region.In São Paulo, the geographic scale of “zone” was adopted to developthe matrix. Currently, there are 390 zones in the metropolitan area,271 of which lie within the city limits. In the urban core, a zone isabout the size of a neighborhood; that is, between eight to 15 cityblocks across in each direction. Peripheral zones are substantiallylarger than zones in the urban core, as fewer inhabitants occupy theoutlying areas. Values within the ODM represent the number ofpassengers traveling from one zone to another.

The ODM is important because it quantifies demand geographicallyand temporally for transport services, providing information forservice planning and operations analysis. For example, in strategic

Constructing an Automated BusOrigin–Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

Janine M. Farzin

This paper outlines the process used to create an origin–destinationmatrix (ODM) in São Paulo, Brazil, with data available from automateddata collection (ADC) systems. Prior work to develop ODMs using ADCsystems is reviewed; however, the São Paulo case differs substantiallyfrom these. The approach used in this paper addresses a more complexbus network than has been approached before, uses a platform for inte-grating more data than were available in previous applications, andapplies raw global positioning system–coordinate data to determine thelocation of buses for assigning an origin zone. Previously documenteddestination-inference techniques are used to assign destinations to eachtrip. Overall, the ODM development process uses three data sources:bus stops, automatic vehicle location data, and automated fare collectiondata. The results of the electronically generated ODM are analyzed andcompared with results from prior household surveys. This analysis sug-gests that the larger sample size available from electronic farecard recordsenables more comprehensive, detailed ODMs than those generated fromtraditional household survey data.

Large sets of electronically generated travel data, such as those fromelectronic farecards or Global Positioning System (GPS) transmis-sions, can serve as inputs to the transportation planning process. Forexample, one widely used transportation-planning tool is an origin–destination matrix (ODM), which quantifies transport demand betweengeographic regions in a city during a particular period of time. Sincethe ODM quantifies passenger use in the system, automated farecollection (AFC) data can be valuable for creating improved ODMswith more detailed passenger trip information than a traditionalsurvey provides. However, using these data requires developmentof new technical tools, as shown in this paper.

This paper first provides background information on the transportsystem in the city of São Paulo, Brazil, and describes recent appli-cations of automated ODMs in large cities. Next, it discusses the stepsused to create an ODM for the public bus transit system in São Paulo,outlines the development process, and provides detailed informationabout specific steps. Finally, some results from a new, automated

New York City Transit, 2 Broadway, D25.111, New York, NY 10004. [email protected].

Transportation Research Record: Journal of the Transportation Research Board,No. 2072, Transportation Research Board of the National Academies, Washington,D.C., 2008, pp. 30–37.DOI: 10.3141/2072-04

Page 2: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

planning applications, the number of passengers traveling in a futurestudy year can be estimated in the ODM. The ODM can then be usedin a transportation planning model to determine if the infrastructurein future years serves the estimated demand. In addition, if dataare available to create more frequent ODMs for a shorter-term plan-ning horizon, information about user patterns could influence fleetcomposition, service frequency, or routes.

Traditionally, the data used to create an ODM have been collectedby sampling households and documenting each trip made by eachmember of the household on the prior day. These household surveysare then statistically expanded to capture all trips in the entire urbanarea, allowing for analysis by zone or across the entire region, or both.The survey results typically show travel demand patterns acrossall modes and provide urban demographic information (related tohousehold composition and income). Because household surveys areexpensive to perform, they are not carried out frequently and usuallycollect only the minimum amount of data necessary to fill all ofthe cells of an ODM. In São Paulo, a regionwide origin–destination(O-D) survey has been performed every 10 years since 1967, jointlysponsored by the state-run Metrô, the city-run São Paulo TransporteS/A bus agency, and the Municipal Transport Secretary. The consis-tency and scope of these surveys have provided a wealth of infor-mation for the region, especially about general trends since 1967.Nonetheless, in a developing city, the regional landscape can changesignificantly over a 10-year period, and between surveys, plannersare left without up-to-date information reflecting travel patterns.

CREATING A NEW ODM

While transit agencies are implementing new automated data collec-tion (ADC) systems to reduce costs or standardize recordkeeping, thenew data can be applied to create planning tools such as electronicallygenerated ODMs. This method has several advantages when comparedwith the traditional survey-based method of generating ODMs (4).Existing electronic data have a low marginal cost because they aretypically procured for other applications in the agency, although theyare usually only available in a raw format. In addition, in the case ofAFC data, the transit ridership sample is larger and more frequentlyavailable than that from a survey—especially compared with largemultimodal surveys in which the composition of transit ridership maybe low. An ODM built from ADC-systems data can complement theurban demographic information provided from surveys. In addition,data from each of the collection methods can be used to validate ortest the other.

With such benefits, the interest in creating ODMs from ADCsystems has grown in the past several decades. The first instance ofusing farecard-based data to create O-D patterns was at Bay AreaRapid Transit (BART) in the 1980s. Buneman used rail-only, closed-system AFC data for performance measurement (“closed system”means that a swipe is needed to both enter and exit the system, so O-Ds are directly available) (5). Next, both the Chicago TransitAuthority (CTA) in Illinois and New York City Transit (NYCT)developed applications for creating ODMs using models that wouldaccommodate an open-system structure (6–8). In an open system, inwhich passengers do not swipe their card upon exiting the system,both CTA and NYCT inferred passenger destinations from algo-rithms based on the origins of subsequent transit trips taken duringthe same day. CTA used some bus transfer information to improvethe rail-only matrix, but it did not include an estimation of bus O-Dpatterns (9).

Farzin 31

Both the CTA and NYCT ODMs were initially rail-based; how-ever, each organization has continued to develop its capabilities tocreate ODMs for bus trips as well. The CTA is now using reliableAVL data (based on a combination of GPS, odometer, and gyroscopereadings) to assign origin locations to farecard swipes (from the AFCsystem) for the ODM. Cui published an application of this method,creating an ODM for a single bus route in Chicago (4). A system-level application is currently in development. Without AVL data,NYCT is currently using schedule data for inferring bus locationupon passenger entry, according to operations planning staff familiarwith the project. In addition, NYCT is working to overcome aggre-gated data collection methods, which provide AFC records at only6-min intervals.

SÃO PAULO’S ODM APPLICATION

As noted, the documented methods to create rail-based ODM arebeing expanded to include bus trips in several cities. However, ineach case, the available resources and transport network createunique challenges. Similar to bus ODM development in New Yorkand Chicago done over the past year, a working application has beendeveloped simultaneously in São Paulo and is presented here. Asmentioned above, the bus network is the backbone of transit move-ments in São Paulo, and the volume of daily passenger movements andGPS transmissions necessitate stronger data management tools thanthose used in other projects. The AFC data are stored in an Oracle 9idatabase, and AVL data are stored in a Microsoft SQL Server; how-ever, necessary AVL data are transferred to Oracle for developingthe automated planning application, which is written using PL/SQL,Oracle’s development language. In addition, the reliance on AVL datain the format of raw GPS coordinates to determine bus locationduring boarding has not been addressed before. More details abouteach development step are provided below.

Three primary data sources are combined to develop the busODM in São Paulo: onboard AVL transmissions, geographic busstop location data, and AFC electronic farecard entry data. The AVLand AFC data are becoming more valuable as their use increases.AVL equipment is installed on one-tenth of the fleet (in a concentratedarea of the city); however, implementation over the entire fleet isexpected by mid-2008. In addition, since the introduction of theelectronic farecard in 2004, the share of passengers using the card hassteadily risen. On the day sampled for ODM development describedin this paper, more than 75% of passengers used an electronic fare-card for boarding. (The card is a contactless smartcard made of sturdyplastic and could be used for years without replacement.) The majorsteps behind the construction of the ODM and combining thesedata sets are presented in Figure 1 and described in detail below.First, each of the three major data sets are cleaned and minimized toretain only the necessary fields included in the figure.

The values of the final field, “zone,” on the AFC farecard use tableare assigned to farecard entries by joining the three data sets, asshown in Figure 1. First, the latitude and longitude of bus stops andGPS transmissions are joined to determine the closest bus stop toeach GPS transmission. Then, based on the time a passenger entereda particular bus, the closest bus stop is associated with each passenger.Through these database joins, the zone where each customer swipedhis or her farecard is determined. Third, using inference techniquesdeveloped in Chicago, the destinations of each known origin aredetermined (7 ). Based on limited AVL installation, the final step toperform a system-level expansion is not yet complete, but noted in

Page 3: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

the figure as a necessary step for citywide ODM completion. Eachof the steps noted here is discussed in more detail below.

ODM Development Step 1. Joining AVL Data to Specific Bus Stops

Several subprocesses support each of the primary steps noted above.For the first step, to join AVL data with the bus stops, each data setis prepared independently, with considerable attention spent on theGPS transmissions.

Cleaning Data Sets

First, to create the bus stop table, bus stops are joined geographicallyto zones in the city using a spatial join in traditional GIS software. Oncecombined, this table is imported into Oracle, where the application isdeveloped.

Next, to prepare the AVL table, a trip column is added so thatduplicate GPS transmissions can be removed. Repeat transmissionsor bouncing signals cause duplicate transmissions, and dependingon which columns of data are affected, these entries are eliminatedfrom the AVL table. About 1,400 buses have functioning GPS units;1 day of AVL data currently contains 3.3 million transmission records.After removing duplicate entries from the table, 2.2 million recordsremain. The three types of duplicate records found in the AVL dataare shown in Table 1.

32 Transportation Research Record 2072

Note that in excluding the duplicate transmissions for a bus in thesame place over a period of time (case 3 from Table 1), some dis-cretion is necessary. If the transmissions are sequential, then only onerecord should be retained; however, if a bus coincidentally transmitsfrom the same location on a different trip, then both records are valid.To perform this differentiation, the new “trip” column is added tothe AVL data set. However, because the table has millions of records,the update is performed algorithmically, as follows:

1. Get the GPS transmission table with all available fields.2. Put the transmissions in ascending order according to

– Vehicle number,– Line number, and– Transmission time.

3. For each record, check the values as follows and assign a currenttrip number.

– If the vehicle or line changes, then set the current tripnumber to one.

– Or if the direction changes, then set the current trip numberto the prior trip number plus one.

– Or assign the current trip number equal to the most recenttrip number.

– Update the trip number column value with the current tripnumber.

– Go to next record; repeat.4. Save table and exit.

Joining Data

Finally, the prepared bus stop table and cleaned AVL table arejoined using latitude and longitude coordinates. While this join isgeographic in nature, the data sets have too many records for accept-able performance with available GIS software. Instead, an algorithmis developed in Oracle that simulates the geographic nature of the joinby matching an AVL record to a bus stop that is within 100 to 115 m.

During this process, if one AVL record or GPS transmission joinswith two or more stops (that are very close), assume that is acceptablebecause the adjacent stops will likely be in the same zone (the levelof detail used throughout the analysis). If two or more GPS trans-missions from the same vehicle and trip attempt to join with one busstop, prevent this from happening by first selecting the closest trans-mission; if there is more than one, select the earliest one. Finally, ifno GPS transmission joins with a bus stop, that is acceptable; assumethe bus did not make that stop on a particular trip. Figure 2 geo-graphically illustrates how the PL/SQL script joins AVL records(GPS transmissions) with bus stops.

(AVL)GPS TransmissionsBus Trip number Time Lat/Lon

Bus StopsStop Lat/Lon Zone

(AFC)Farecard use

Card ID Bus Time Zone

Destination InferenceAlgorithm

System-level ridershipexpansion

2

1

3

FIGURE 1 Data and steps used to construct São Paulo’s ODM.

TABLE 1 Duplicate Entries Originally Found in AVL GPS Transmissions Table

Columns with Duplicate Data Corresponding Physical Situation Number of Duplicates Solution

Vehicle, line, direction, time, latitude and longitude

Vehicle, line, direction, time(different latitude and longitudepositions)

Vehicle, line, direction, latitudeand longitude (different—sequential—transmission time)

Repeat transmission

Signal may be bouncing off ofbuildings, sending multipletransmissions from multiple latitude and longitude positions

Bus sitting in one location overtime but continuing to sendtransmissions

74,000

125,000

1.01 million

Exclude duplicates so that one original record remainsin data set.

Ignore. Only the event closest to a bus stop will beretained, and it significantly increases complexityto discern which transmission is correct.

Only keep the earliest time entry. Delete other recordsfrom same vehicle and location.

Page 4: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

To create the geographic buffer in Oracle, use the followingformula:

where

t.lat = latitude of the transmission,s.lat = latitude of the bus stop,t.lon = longitude of the transmission, ands.lon = longitude of the bus stop.

t.lat s.lat t.lon s.lon− + −( ) ≤ 0 001.

Farzin 33

Here, 0.001 represents 0.001° of combined latitude or longitude.In São Paulo, at approximately 23.5° latitude, 1° of longitude equals101.8 km (63.8 mi) and 1° of latitude equals 111.4 km (69.2 mi).Therefore, 0.001° of longitude corresponds with 101.8 m (334 ft),and 0.001° of latitude corresponds with 111.4 m (365 ft). When thedifference between latitudes and longitudes (at 23.5° latitude) is cal-culated, if the difference is less than a combined 0.001° of latitudeor longitude, then the distance on the ground is always less than orequal to 111.4 m (365 ft). Large bus platforms of about 100 m (328 ft)in length exist in São Paulo, so this assumption represents a reasonabledistance for capturing buses at a bus stop.

ODM Development Step 2. Join GPS Transmission Zone Table with Specific Farecard Entries

Now that each AVL record is affiliated with a particular bus stop,the stop also has a zone assignment associated with it. By associ-ating a passenger entry (or an AFC record) with an AVL record(according to bus number and time of day), the zone that passengersare in when they swipe their card on the bus is available. This sec-ond step (from Figure 1) of joining AFC with AVL data is a criticalcomponent in developing the ODM and is discussed below.

Cleaning Data Sets

Similar to what is done in Step 1, both tables are first cleaned andminimized before joining the data sets, reducing the number ofcolumns and records that the computer has to process and increasingefficiency. Because there are about 10 million farecard entries perday on the bus system, and a complete installation of GPS on the busfleet will produce millions of GPS transmissions per day, eliminatingexcess data can have a significant impact on processing time.

First, from the AVL–bus stop combined table, several excesscolumns can be eliminated. The only columns to keep are

• Vehicle,• Time and date of transmission,• Trip number, and• Zone.

At this level of aggregation, eliminating the specific bus stopinformation and retaining only the zone association leaves severalexcess records in the table. At this point, only records for which abus on a particular trip enters a new zone are kept. If a bus is drivingin a zone and passes several stops in that zone, all but the first recordcan be eliminated. This record reduction is illustrated in Figure 3.

GPS Transmission joined to Bus Stop

Other GPS Transmissions

Bus Stop

Buffer area around Bus Stop

Roads

Zone boundary

FIGURE 2 Geographic illustration of joining GPStransmissions with bus stops.

AVL.Vehicle AVL.Date/Time AVL.Trip BusStop.Zone 11006 9/5/2006 14:02:00 3 711006 9/5/2006 14:08:00 4 7 11006 9/5/2006 14:10:00 4 711006 9/5/2006 14:14:00 4 41 11006 9/5/2006 14:20:00 4 42 11006 9/5/2006 14:23:00 4 4211006 9/5/2006 14:24:00 4 4211007 9/5/2006 07:23:13 1 121

… … … …

Newzone

New vehicle

New trip

Only retain tablerecords for each:

FIGURE 3 Step 2 data preparation: reducing records in the AVL–bus stop table.

Page 5: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

In addition, the AFC data are reduced. Once all excess columnsare eliminated, only the following columns remain:

• Card ID,• Bus ID,• Transfer number, and• Time and date.

Initially, the transfer number is used to find farecard swipes thatrepresent trip starts (where the transfer number equals zero) and toeliminate excess records. Eventually, a more comprehensive ODMcould utilize transfer information to see trip paths; however, at thistime, only trip origins are used. About one-third of all system entriesare transfers, and after eliminating these records, 6.3 million AFCrecords representing unique trip entries remain.

The transfer number column is also no longer necessary. Onlythe card ID, bus ID, and time columns are kept in the AFC table.Also, because this application is being developed before all AVLinstallations were completed, only AFC data on buses with AVLequipment are needed. After eliminating unnecessary farecard entries,658,000 AFC records remain in the table.

Joining Data

At this point, the AVL and AFC tables are ready to be joined. Theapproach for this join was developed in Chicago and is implementedhere (9). For each farecard entry, the vehicle is identified for joiningthe AFC to the AVL data set; then, the most recent GPS transmissionbefore the farecard swipe is matched. This AVL record will tell whatzone the bus is traveling in when the farecard is swiped, and therespective zone will now be affiliated with the farecard entry. No zoneis joined to a particular farecard swipe if more than 1 h has passedsince the most recent AVL record for that vehicle. Because some buseswith GPS were presumably not functioning during revenue service,505,000 AFC records join with AVL data (and receive an originzone assignment).

ODM Development Step 3. Destination Inference

As shown in Figure 1, after the AFC records have a zone associatedwith them, the trip destination is inferred. This step follows assump-tions used in previous applications (6, 7 ), which are summarizedhere. Assume that

• The destination of a trip is in the same zone as the origin of thefollowing trip and

• The last trip of the day always ends in the same zone as theorigin of the first trip of the day (the home zone), regardless of howmany trips were taken during the day.

An application was developed to automate the destination-inferenceprocess.

At this point, for the first time, an ODM can be created to sum-marize trips that have an O-D zone assignment. This matrix includesonly O-Ds from cards with more than one trip taken during thecourse of the day. However, the sample can be expanded to includeall daily entries—including passengers that entered the system onlyone time during the day.

The matrix expansion to account for trips with unknown desti-nations uses a simple formula. For each origin zone, the total tripsto each destination zone are summarized, and trips with unknown

34 Transportation Research Record 2072

destinations are allocated to each of the destination zones in the sameproportion as the original data. After this expansion is complete, allfarecard users with origin zone assignments are included and now havean inferred destination. The ODM has 505,000 records at completion.When complete data are available, the automated application isequipped to handle several million records simultaneously.

Note that when AVL equipment is installed over the entire fleet,and the records in the ODM are not geographically clustered (towardregions with AVL equipment), the current ODM can be expandedsystemwide as well—accounting for all passengers using the systemduring the day. Passenger totals per line or bus trip can be used tovalidate a systemwide expansion. For the current ODM, local-levelresults are available; these are reviewed below.

Other ODM Development Notes

While final results are available, some outstanding issues from matrixdevelopment remain; their resolution could improve the existingmodel. Three issues are reviewed briefly here so that results can bebetter understood.

First, the geographic files of some bus lines have duplicate busstops on some lines. For example, the same stop may be included atthe start and finish of some routes or during the duration of the routeto allow a bus to circle through a neighborhood and return to a mainstreet and continue. These bus stop duplications (representing 0.4%of more than 84,000 bus stops) were deleted from bus line files toallow a seamless join to the AVL table in Step 1. At the aggregatezonal level, results are not affected by this omission of duplicatestops; however, if the scope of the ODM is narrowed to provide O-Ddata at a more refined level of detail, then this omission of some busstops would hinder results and need to be resolved.

Second, in São Paulo (unlike in the United States), passengers are notrequired to swipe their farecard and pass through an on-bus turnstileimmediately upon entering the bus. Some seats exist in the front, non-paid portion of the bus, and passengers may sit or stand momentarily(or for several minutes) before passing through the turnstile and enter-ing the paid onboard area. In this analysis, there are no assumptionsmade that the time when passengers swipe their farecards is analogousto the time that they entered the bus; this may not be true. However, anassumption is made that when aggregating O-D results to a zonal level,passengers will pass through the turnstile while the bus is still withinthe same zone as they originally boarded. However, no testing orstatistical validation has been performed to verify this assumption.

The third concern related to the ODM is that because the entire busfleet does not yet have GPS (and GPS installation has been limitedto one clustered geographic region), a systemwide sample of O-Dvolumes and patterns is not yet available. Therefore, a systemwideexpansion to estimate all bus trips in the system has not yet been per-formed. The eventual expansion will be similar to the expansionsperformed for O-D household survey data, and well-documentedmethods can be used (4).

RESULTS

The ODM results are based on AVL and AFC data available fromMay 9, 2006. In total, the ODM reflects about 5% of all transit tripsin São Paulo, more than 6% of bus entries (including transfers), and8% of total bus trips (excluding transfers). Nonetheless, because mostof these O-Ds are concentrated in one area of the city, a reasonabledata set of the use patterns within that region is available.

Page 6: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

For comparison with the 2006 ADC results, the 1997 regionwideO-D Household Survey results provided the most recent ODM dataavailable. Both matrices are aggregated to the same zonal level,and the zone structure has not been significantly modified betweenthe two analysis years. First, results are reviewed; next, additionalconsiderations among the ODM sample years are discussed.

Analysis of 2006 Results and Comparison with 1997 ODM

First, total trips from the northwest area of São Paulo to destinationsthroughout the city are compared in Figure 4. In these maps, all ofthe trips that start in any zone inside of the northwest corner of thecity are summarized, and their respective destinations across the cityillustrated. A darker shade is used to indicate zones where more tripsterminate (the same color scale was used in both maps). In both 1997and 2006, most of the trips that started in the northwest area alsoended there. However, in the 1997 map, more trips to the downtownare visible (the higher density of trips is seen by comparing the darkercentral zones in 1997 to the lighter zones in 2006). This is likely causedby there being incomplete trip information from 2006; in that data set,trips on buses without AVL are not included, and transfer segmentsare not analyzed (as described in the next section). Finally, the totaltrips included in each analysis year are within 15%. Fewer trips arerecorded in 2006—when higher volumes likely existed because ofpopulation growth and improved services—but this difference canbe attributed to cash-paying passengers entering the system in 2006who are not included in the 2006 ODM. (As mentioned, these pas-sengers accounted for less than a quarter of ridership on May 9.)Overall, the differences between analysis years are expected because

Farzin 35

of the incomplete nature of the 2006 matrix; however, strong trippatterns in the northwest area are similar across analysis years. Despitehaving incomplete data available at this time, the approach fordeveloping the 2006 matrix is reasonable based on trip totals for thenorthwest region and expected variation in trip patterns, as seen inthe visual comparison of results offered in Figure 4.

Next, a more detailed level of analysis is presented for the northwestarea, shown in Figure 5. Here, central zone 208 (marked in Figure 5with a star) is chosen in the region where AVL data are available,and all the trips from that zone to other surrounding zones in thesame area are given. The maps show the destinations across the area;zones with darker shading represent more trip destinations, and thesame scale was used in both maps. As in Figure 4, the total volumebetween the two analysis years is close (within 5%), with fewer tripsavailable in the incomplete 2006 ODM. Also, the maps show cor-responding patterns: most of the trips from zone 208 terminate in thesame zone or in adjacent zones. Finally, though, the most significantresult is that the ADC-based matrix shows small O-D flows to someperiphery zones in the region. One major critique of household surveysis that when few passengers are sampled and their trips extrapolatedto a systemwide level, many smaller flow patterns will not appear inthe data. Here, the 2006 data use a much larger sample (from theAFC system); therefore, O-D patterns are more detailed and likelya better reflection of flows within the system.

Finally, Table 2 shows the number of passengers included in theODM that transfer during their trip to the rail network in São Paulo.While transfer patterns were not analyzed in detail during this analy-sis, a quick query using the farecard ID and origin segment of tripsincluded in the 2006 ODM can provide planners with detailed trans-fer information in the future. Supplemental bus and multimodaltransfer information can be analyzed in much greater detail for both

13,200 to 45,800 4,600 to 13,199700 to 4599200 to 6990 to 199

Metro Station

Area Boundary

Zone Boundary

(a) (b)

Destinations for Trips with Origin inNW Area

FIGURE 4 Comparison of (a) 1997 and (b) 2006 northwest-area origin trips to the rest of the city.

Page 7: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

operations and strategic planning purposes and can be providedfrom automated ODM development.

Considerations for Comparing 1997 and 2006 ODMs

While comparing results from the 1997 and 2006 ODMs, differencesbetween the analysis years should be mentioned. First, the routestructure of the bus network was likely adjusted between 1997 and2006. New operating consortiums were formed around 2002, andwhile they serve areas where previous routes operated, some changescan be anticipated, especially given the increasing growth in theregion between these two analysis years. The differences betweenthe respective networks were not analyzed in this study.

Also, the 2006 results capture only O-D patterns of passengersusing electronic farecards. Although the majority of passengers todayuse farecards, cash payments are still accepted on buses. Cash-paymenttrips will not be available in the ODM until a systemwide expansion

36 Transportation Research Record 2072

has been completed. Because farecards allow up to three free transferswithin 2 h of the first system swipe, passengers using farecards maybe more inclined toward transferring, and a system expansion may bebiased toward those patterns. The percentage of passengers includedin the 2006 ODM that transferred was nearly 50%, compared witha 34% system average.

Finally, only passengers using a bus with AVL equipment will becaptured in the 2006 ODM. Within the northwest area of São Paulo,transfers and return trips will likely occur on a bus with AVL. How-ever, for passengers that transfer to buses from different areas of thecity without AVL, their return trip will likely not have an AVL recordat its origin. This means that a destination is not immediately inferred,and when a destination is estimated based on other trips with simi-lar origins (in step 3), the total trip distance traveled will likely beunderestimated—that is, destinations will be assigned in proportionto destinations for trips with repeated access to buses with GPSinstalled. Despite these setbacks in the initial results, comparing the2006 ODM with prior data still allows consideration of the feasibilityof the new development process.

5,150 to 5,1601,120 to 5,149220 to 1,11970 to 2191 to 69

0

191

247

244

224

281

424

135

447

23172

82

1.122

5.160

347

715

1.490

1.072

1.377

3.33214

32

2120

2

147 0

0

0

Destinations for Trips with Origin inCentral NW Zone

Area Boundary

Zone Boundary(a) (b)

FIGURE 5 Comparison of (a) 1997 and (b) 2006 trips from zone 208 to other zones in thenorthwest area.

TABLE 2 Passengers in Sample Bus ODM That Transfer to São Paulo’s Rail NetworkDuring the Day

Rail Line Terminal Names Rail Network Bus-to-Rail Transfers

1 Tucuruvi–Jabaquara Metrôa 3,546

2 Vila Madalena–Alto do Ipiranga Metrô 2,264

3 Barra Funda–Corinthians/Itaquera Metrô 10,014

5 Capão Redondo–Largo Treze Metrô 0

A Jundiaí–Luz CPTMb 2,525

B Amador Bueno–Júlio Prestes CPTM 3,121

C Osasco–Jurubatuba CPTM 747

D Luz–Rio Grande da Serra CPTM 0

E Luz–Estudantes CPTM 233

F Brás–Calmon Viana CPTM 34

aCompanhia do Metropolitano de São Paulo (Metropolitan Company of São Paulo): the urban subwaynetwork.bCompanhia Paulista de Trens Metropolitanos (Metropolitan Train Company of the State of SãoPaulo): the suburban commuter rail network.

Page 8: Constructing an Automated Bus Origin-Destination Matrix Using Farecard and Global Positioning System Data in São Paulo, Brazil

CONCLUSION

As hardware and data storage prices fall, many transport agenciesare storing large quantities of ADC systems data. However, theyare not necessarily taking advantage of simultaneous improve-ments in processing speed to discover new planning and measure-ment applications. Although this paper outlines the developmentof one such application to develop a traditional planning input—the ODM—it is only the beginning of a transformation in thetransportation field to transform all of this new data into usefulinformation.

In summary, this paper shows the development steps used tocreate a bus ODM in São Paulo by combining bus stop, AVL, andAFC data sets. The methods used draw on previous work done inChicago; however, some differences in system structure and avail-able data led to variations in the development details and process,including the reliance on raw GPS data and the increased volume ofavailable and anticipated data. Initial results show that the ODMpatterns are reasonable and that using data from the AFC systemprovides a larger sample set than traditional survey data, resultingin more specific results at the most detailed analysis level.

Many improvements can still be made to the system, such asadding transfer-level information to the ODM and verifying avail-able bus stop data. In addition, after AVL installation is completedacross the bus fleet, the matrix will need a systemwide expansionto account for passengers not using farecards. Some questionsabout sample bias will likely arise at that time. Nonetheless, thisfirst iteration of the ODM provides a foundation for future workand outlines introductory steps for other agencies interested inbuilding similar tools.

ACKNOWLEDGMENTS

The author thanks the U.S. Department of State, the Institute ofInternational Education, and the Fulbright Foundation for providingfunding for this research. The author also thanks the São PauloTransporte S/A for its cooperation in completing a local case study and

Farzin 37

specifically thanks the staff of the Information Technology departmentand the president’s office, as well as Ana Odila Paiva Souza.

REFERENCES

1. Instituto Brasileiro Geograficas e Estatisticas (IBGE). População Residente,Segundo as Unidades da Federação e Municípios. www.ibge.gov.br/seculoxx/estatisticas_populacionais.shtm; see Table “populacao2000aeb_s2_009_a_009i” in População Download Parte 2. Accessed Nov. 16, 2006.

2. São Paulo Metrô. Aferição de Pesquisa Origen e Destino na RegiãoMetropolitana, Características das Viagens. www.metro.sp.gov.br/empresa/pesquisas/afericao_da_pesquisa/afericao_da_pesquisa_01.shtml. AccessedMarch 20, 2006.

3. The MTA Network, New York City Metropolitan Transportation Authority.Public Transportation for the New York Region. www.mta.info/mta/network.htm. Accessed March 20, 2006.

4. Cui, A. Bus-Passenger Origin–Destination Matrix Estimation UsingAutomated Data Collection Systems. MS thesis. Massachusetts Instituteof Technology, Cambridge, 2006.

5. Buneman, K. Automated and Passenger-Based Transit Performance Measures. In Transportation Research Record 992, TRB, NationalResearch Council, Washington, D.C., 1984, pp. 23–28.

6. Rahbee, A., and D. Czerwinski. Using Entry-Only Automatic FareCollection Data to Estimate Rail Transit Passenger Flows at the CTA.Proc., 2002 Transport Chicago Conference, Chicago, Ill., 2002.

7. Zhao, J. The Planning and Analysis Implications of Automated DataCollection Systems: Rail Transit OD Matrix Inference and Path ChoiceModeling Examples. MS thesis. Massachusetts Institute of Technology,Cambridge, 2004.

8. Barry, J. J., R. Newhouser, A. Rahbee, and S. Sayeda. Origin and Des-tination Estimation in New York City Using Automated Fare SystemData. In Transportation Research Record: Journal of the TransportationResearch Board, No. 1817, Transportation Research Board of the NationalAcademies, Washington, D.C., 2002, pp. 183–187.

9. Zhao, J., A. Rahbee, and N. H. M. Wilson. Estimating a Passenger TripOrigin-Destination Matrix Using Automatic Data Collection Systems.Computer-Aided Civil and Infrastructure Engineering, Vol. 22, No. 5,July 2007, pp. 376–387.

This paper reflects the views of the author, who is responsible for the facts andaccuracy of the research presented.

The Public Transportation Marketing and Fare Policy Committee sponsored publication of this paper.


Recommended