© 2010 IBM Corporation
IBM Research - Ireland - 2014IBM Research - Ireland - 2014
© 2014 IBM Corporation
xStream Data Fusion for Transport
Smarter Cities Technology CentreIBM Research - Ireland
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation2
Multi-Sensor Data Fusion for Travel Time Estimation
Description Intelligent Operation for Transport backend for the real time fusion of
traffic-related sensor data.
Significance Transport operators are looking to revamp their control center with focus
on• Improve situational awareness of the transport network and increased
data granularity as to its performance• Maximize accuracy, minimize computational latency.• Maximize efficiency of investment and minimize dependence on on-
street equipment Processes and disseminates information to road users and traffic
managers. Provides information on which improved decision making can be taken to positively affect the above KPIs.
Impact Achieve broader coverage, higher accuracy using combination of legacy
sensor types
Applications Deployed and evaluated over a 4 months period in London in the context
of a competitive bid organized by Transport For London.
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
• Why Data Fusion?
– Separately, information sources have drawbacks:
• Automated Numerical Plate Recognition (ANPR): after the fact, not real time (1h delay), availability issues, high operation costs
• Induction Loops (SCOOT): do not translate directly into desired KPI (e.g. travel times), require traffic flow models and continuous (re)calibration
• Opportunistic (e.g. smartphone etc.): lot of open questions - perenniality of the technology, social, ICO regulations, …
• All: spatiotemporally sparse, uncertain
Motivation
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
• Why Data Fusion?
– Together, information sources complement each other:
• E.g.: Traffic models translate SCOOT occupancy into travel-times, and Bluetooth/ANPR validate and calibrate the traffic model.
• Achieve wider coverage and finer information granularity
• Increase trust in information, eliminate error bias introduced by individual sensor types
• Increase robustness against sensor failures
Motivation
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
Example - Fusing Travel Times with Volumetric Information
Travel Time (Bluetooth, ANPR) Volume/Density Fundamental Curves(Induction loops, CCTV, …)
Travel time (s)Density
Vo
lum
e (
v/h
)
P(t
rave
l tim
e)
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
Example - Fusing Travel Times with Volumetric Information
Data fusion offers a scalable and systematic way to capture and exploit the relation between data sources of different types.
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
Denoising - E.g. Bluetooth using frequency of detections from same devices
• Source of noise: bluetooth mac address clones detected at mutiple locations, variable detection ranges, casual drivers who stop frequently for non-traffic related reasons
• Commuters are a more reliable source of information. They can be identified overtime from their regular travel patterns.
Hours Elapsed Between consecutive detections
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
xStream Data Fusion Solution
IntelligentOperation
TransportationGIS Transform &
Adapters
DataassimilationDataassimilation
Traffic FlowModelsTraffic FlowModels Data FusionData Fusion
InterpolationPredictionInterpolationPrediction
DenoisingDenoising
InfrastructureAnd data models
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
Generalized Additive Models
Flexible, versatile class of statistical models:
Different types of input variables: categorical (weekday), continuous (temperature), ...
Non-linear effects of input variables (covariates)
Applicable to various domains (Energy, Transport, Water, ...) Human-understandable, robust:
No “black box” → easy to validate
Representation of expert domain knowledge
Straight-forward analysis of uncertainty and outlier events Efficient learning algorithms (batch and streams)
Yt dependent output variableXi
t independent input variables (covariates)fi transfert function
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
Big Data Platform
DBMS
Stream computing approach
Map Reduce
KPIs
Real-time
Offline
modelmodel
IBM Research – Ireland - 2014IBM Research – Ireland - 2014
© 2014 IBM Corporation
TfL - Transport for London - Results
• Prediction Results: The fitted GAM explains on average 74.5% variation of the data
Scenario 5-minahead
30-minahead
True(second)
102.3 102.3
Predicted(second)
108.6 112.7
RMSE(1day)
11.8 16.6