On�line�Spatio�Temporal�Analysis�of�Network�Data and Road DevelopmentsData�and�Road�DevelopmentsCASA�Conference��April�13�2010
Tao Cheng (tao cheng@ucl ac uk) + TeamTao�Cheng��([email protected])���+�TeamDepartment�of�Civil,�Environmental�&�Geomatic Engineering,�UCL�
OutlineIntroduction• Introduction– Background and aim
• Methodology – integrated ST Data Mining– Statistical approach– Machining learning– Visualization– Simulation
• Programme and Progressg g
B k dBackground• Large cities are increasinglyg g y
crowded - population & mobilityTraffic congestion affects• Traffic congestion affects both the economy and daily life.
• It is difficult and expensive to increase the capacity of the road networkthe road network.
City of London• Traffic levels in the Congestion• Traffic levels in the Congestion
Charging Zone are falling but congestion levels are risingcongestion levels are rising.
• cost of congestion £3 billi- £3 billion per year
• Mayor’s traffic priorities – reduce congestion and smooth traffic flows
• Removal of western extension of CC (27/11/2008)
• Olympic Games 2012Olympic Games 2012 – travel time to London Olympic sites
Why is this?Why is this?•Reduction in network capacity?•Reallocations of capacity to other uses?R d d ili f th t k?
Aim To understand the traffic congestion in central London
•Reduced resilience of the network?
Aim - To understand the traffic congestion in central London
• To quantitatively measure road network performanceperformance
• To understand causes of traffic congestionassociation between traffic and– association between traffic and interventions
• traffic flow speed/journey time• traffic flow, speed/journey time• incidents, road works, signal changes and bus
lane changes
• Case study – London
Ch ll (1) N t k C l itChallenge (1) Network Complexity
1) Dynamics2) Spatial dependence3) Spatio-temporal
interactions4) Heterogeneity
Challenge (2) - Data issues
• massive – 20GB monthly• multi-sourced related to 5 different networks • different scales (density & frequency)• variable data qualityq y• contain conflicts, errors, mistakes and gaps
DATA COVERAGE
L d R d N t kLondon Road Networks CordonsCentral, Inner, OuterScreenlines
Thames,NorthernNorthern,five radialsfourperipheralsperipherals
Traffic Flow SurveysTraffic Flow Surveys
• NMC (National manual count annual data)ATC (A i C ) 20 MB• ATC (Automatic Count) – 20 MB
• different time periods, intervals and accuracy
Traffic speed (and hence journey time) data
• MCOS (Moving Car Observer Surveys) – Centre, Inner, and Outer – least accurate of the datasetsleast accurate of the datasets
• ITIS (GPS vehicle tracking system) - 2GB – major A roads and bus routes in town with 2000 probes
di– medium accuracy• ANPR (Automatic Number Plate Reading) – 6 GB
– main roads in the central and west extensions of CCZs – 5-minute intervals, 5 vehicle groups,– high accuracy– available since March 2008available since March 2008
• At least 5 networks – boundaries do not fully align
LTIS i id t d t d t 20MBLTIS incident and event data - 20MB
• works, hazards, accidents, signal faults, special events, , , , g , p ,breakdowns, security, and other causes
• DfT have all these data as map or as text files
- Minimal, Moderate, Serious or Severe subjective ?subjective ?unrecorded?not geocoded?
not broadcast on the traffic Link website, creating problems in analysis and reporting.
There are uncertainties and gaps in the intervention data
Methodology - ISTDM
What’s new - (1) data-driven top-downTransition in data availability
D i
What s new - (1) data-driven, top-down
Data abundance:• Data scarcity:– High cost– Low volume
• Data abundance:– High volume– Multiple kinds and Low volume
– Intensive validation
psources
– Extensive application
• Top-down:
Transition in modelling approach
• Bottom-up: • Top down:– Phenomenological– Describe system gross
of all behavioural
Bottom up:– Mechanistic– Explicit representation
of behaviour (origin of all behavioural responses
– Direct to objectives
of behaviour (origin,destn, model, time …)
– System properties by aggregationaggregation
What’s new: (2) integrated space and time
Existing ST analysis methods
What s new: (2) integrated space and time
• time series analysis + spatial correlation
i l i i h i di i
Existing ST analysis methods
• spatial statistics + the time dimension
• time series analysis + artificial neural networkstime series analysis artificial neural networks
ST dependence � space + time
Integrated modelling of ST is needed –
• seamless & simultaneous
• ST-association/autocorrelation• ST-association/autocorrelation
What’s new: (3) hybrid/quantitative approach
bi i l i ith hi
What s new: (3) hybrid/quantitative approach
• combine regression analysis with machine learning
i th iti it d l t- improve the sensitivity and explanatory power• study the heterogeneity and scale of road
fperformance - optimal scale for monitoring
Quantitative assessment of network• Quantitative assessment of network performance
Sensible decision making & policy evaluation- Sensible decision making & policy evaluation
Principle of ST Modelling
Space-time data = global (deterministic) space-time trends +
Principle of ST Modelling
)()()(
local (stochastic) space-time variations
)()()( tettZ iii ��� Z(t)=u(t)+e(t)
• - the observation of the data series at spatial location i and at)(z ti
Zi=ui+ei
the observation of the data series at spatial location i and at time t;
• - space-time patterns that explain large-scale deterministic space-time trends and can be expressed as a nonlinear function in
)(i
)(ti�space time trends and can be expressed as a nonlinear function in space and time.
• - the residual term, a zero mean space-time correlated error that explains small scale stochastic space time variations
)(teithat explains small-scale stochastic space-time variations.
Cheng, Wang, Li (forthcoming, Geographical Analysis)
Model 1 STARIMA Spatio Temporal AutoModel 1 - STARIMA - Spatio-Temporal Auto-Regressive Integrated Moving Average
� ���� � ��
�����p
k
q
l
n
h
hlh
m
h
hkhi
lk
tltWktzWtz1 1 0
)(
0
)( )()()()( ���
(Pfeifer P E and Deutsch S J, 1980)
M d l 1 STARIMA
Our approach – Integrated modelling of ST
Model 1 – STARIMA
� ���p q n
hm
hlk
tltWktWt )()( )()()()( �
• define weights based upon spatial distance and
� ���� � ��
�����k l h
hlh
h
hkhi tltWktzWtz
1 1 0
)(
0
)( )()()()( ���
• define weights based upon spatial distance and spatial adjacency
id i t• consider anisotropy• able to model spatially continued phenomena
Tao Cheng, Jiaqiu Wang, Xia Li, 2010 A hybrid approach for space-time series of environmental data, Geographical Analysis (forth coming)
Model 2 ANN Artificial Ne ral Net orksModel 2 - ANN - Artificial Neural Networks
SFNN – spatial interpolation DRNN ti i l i
(Mandic D P and Chambers JA, 2001)
SFNN – spatial interpolation DRNN – time series analysis
a static neuron neuronb dynamic
�n
b1)(ˆl( )i)(ˆ��
��1j
jiji bziwz b1)(tzlwz(t)iw)t(z ����
Cheng, Wang (2008, TGIS) Cheng, Wang (2009, CEUS)
• ANN for space-time trend analysis
)),(()(ˆ 0����� � tifftn
ki )),(()( 01
��� ��
ffk
ki
Tao Cheng Jiaqiu Wang 2009 Accommodating Spatial Associations inTao Cheng, Jiaqiu Wang, 2009, Accommodating Spatial Associations inDRNN for Space-Time Analysis, Computers, Environment and Urban System,33, 409-418.
Model 2 STANNModel 2 - STANN
� �����n
i)0(
j)1(
jii b1)(tzlw1)(tziw(t)zSpace-Time Neuron ��
��1j
ijjii b1)(tzlw1)(tziw(t)zSpace-Time Neuron
• One step implementation of ANN+ STARIMAA d S i i i A• Accommodate ST associations in ANN
• Deal with nonlinearity & heterogeneity in BP learning
Jiaqiu Wang, Tao Cheng, STANN – Modeling Space-Time Series by Artificial Neural Networks, International Journal of Geographical Information Science, under review
3 S SModel 3 – SVM - Support Vector Machines
SVC & SVR (Vapnik et al, 1996)
Model 3 STSVR
J.Q. Wang, T. Cheng*, J. Haworth
Model 3 - STSVR
• Nonlinear Spatio-Temporal Regression by SVM
• Develop ST kernel function• Overcome over-fitting in STANN• Deal with errors• Model nonlinearity & heterogeneity
Jiaqiu Wang, Tao Cheng, James Haworth, Space-Time Kernels, submitted to Spatial Data Handling (SDH) 2010, Hong Kong, May 26-28.
Other methods
• Geographically Weighted Regression (GWR) – -> STWR?
• Permutation Scan Statistics (PSS) – –> STPSS? (or STC)> STPSS? (or STC)
• Exploratory Visualization (DM) + ST+OLAPSTOEV?– -> STOEV?
• Simulation (Multi-scales)– -> STMSS?
Progress
GUI: A Web-base Platform for Dynamic Visualization, Simulation, Analysis (OLAP)
Tool Boxes: Integrated Spatio-Temporal Data Mining (Matlab+ ?)
PatternJH/BA
Clustering PatternI i
Tool Boxes: Integrated Spatio-Temporal Data Mining (Matlab+..?)
Clustering2.1 Transition
2.2
InterventionAnalysis
2.3
PerformancePrediction
2.4
ModelUpdating
2.5
STARIMADRNN SVMGWR
Database/Platform(Oracle + ArcGIS)(ANPR, GPS, ITLS, …. based on ITN)
STANDARD Platform Structure
STARIMA for Journey Time Prediction in London
Study area
London Arterial Road Map
Pattern analysis of journey timePattern analysis of journey time4
in)
(a)2
3
ney
time
(mi
1
Jour
nm
in)
(b)
rney
tim
e (m
The distribution plot of 33 Mondays journey times link 605 during 07:00 to 19:00 (2009
Jour
The distribution plot of 33 Mondays journey times link 605 during 07:00 to 19:00 (2009Jan. – Aug.)
Space-time analysisSpace-time analysis
Space-time Autocorrelation Function of Sample Series at 7:00-10:00 Space-time Autocorrelation Function after Seasonal Difference
0.2
0.25
15
x 106 (a) R605 Periodogram
maximum at freq=0.020833period=480.3
0.4p
0.06
0.08
Space-time Autocorrelation Function after nonseasonal difference
0.1
0.15
orre
latio
ns
0
5
100.2
rrela
tions
0.02
0.04
0.06
corre
latio
ns
0
0.05
e-tim
e A
utoc
o
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
frequency
1(b) R605. Cumulative Periodogram0
0.1
e-tim
e A
utoc
o
-0.04
-0.02
0
Spa
ce-ti
me
Aut
o
0 1
-0.05
0
Spa
ce
0.5
0 2
-0.1Spa
c
-0.08
-0.06
S
0 20 40 60 80 100 120 140 160 180-0.15
-0.1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
frequency20 40 60 80 100 120 140 160 180
-0.2
Lag
20 40 60 80 100 120 140 160 180-0.1
Lag
Lag
Results 260Results
220
230
240
250
260 Actual5�min�prediction10�min�prediction15�min�prediction20�min�prediction
e (se
c)170
180
190
200
210
Jour
neyt
ime
150
160
5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50
7 8 9 10 7 8 9 10
2009�Aug�17 2009�Aug�24
60
80
100 5�min�prediction10�min�prediction15�min�prediction20�min�prediction
0
20
40
�80
�60
�40
�20
5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50 5 20 35 50
7 8 9 10 7 8 9 10
2009�Aug�17 2009�Aug�24Wang, Cheng, Heydecker (2010, IMA)
AccuracyPrediction Accuracy at different prediction intervals
Forecasting Horizon5 min 10 min 15 min 20 min
Number of validate prediction 96 93 86 70 Mean relative error 0.07% 0.25% 0.44% 0.81% Standard deviation of relative error 0.16% 0.38% 0.77% 1.27%
Comparison of results from extended STARIMA model and standard STARIMA model (Kamarianakis and Prastacos, 2005) at 5 min interval
Number of validate prediction
Mean relative error Standard deviation of relative error
Extended STARIMA 96 0.07% 0.16%% %Standard STARIMA 95 0.11% 0.41%
Visualization of traffic congestion inVisualization of traffic congestion inspace-time
Figure 1. Delay at 9 am on 12th April 2009
Cheng, Emmonds, Tanaksaranond, Sonoiki (2010, GISRUK)
Figure 2. Delay at 9:15am on 12th April 2009
LCAP 15 January 2010 8:00-10:00 am
Isosurface
High delay value (red color)
Sideview
Topview
Detection of Emerging Spatio-Temporal Outliers on Network
Cheng, Anbaroglu (2010, SDH)
James HaworthJames HaworthDepartment of Civil, Environmental & Geomatic Engineering, UCL
Multi-scale analysis of road network performance
• Using spatio-temporal data mining techniques to look for• Using spatio temporal data mining techniques to look for patterns in congestion at varying spatial and temporal scales
• What patterns can be observed in inbound and outbound congestion...– Daily? Weekly? Seasonally?...y y y
• Identification of recurrent and non-recurrent congestion in LondonLondon
/ /EP/G023212/1
Understanding Road Congestion as an Emergent Property of Traffic Networks
MacroscopicFlow and economic models
MicroscopicIndividual behaviours simplistic and
based on ‘known’ road capacity route choice macroscopic-driven
Picture credits: DOT California, PTV, Paramics
l d l fFormal model of Emergence
Link levelWhat causes congestion to emerge at link level?
SPRE
What is the effect of road layout?
Junction level
EAD OF C
Junction levelAre junctions the key source of congestion?What choices are available to drivers?
CONG
ESWhat choices are available to drivers?
Network level
STION
How does congestion spread to the whole network?
Manley, Cheng (2010, IMCIC)
SSTANDARD Website –http://standard.cege.ucl.ac.uk
C l i N t k C l itCan we predict/migrate emergence (congestion) of Road Network ?
Conclusion - Network Complexityp g g ( g )
UnderstandUnderstand
DetectModel
Simulation
AcknowledgementsAcknowledgements
National High-tech R&D Program (863 Program)
NSF China