1
AGENT-BASED MODELING, SIMULATION, AND CONTROL—
SOME APPLICATIONS IN TRANSPORTATION
Montasir Abbas, Virginia Tech
(with contributions from past and present VT-SCORES students, including:
Zain Adam, Sahar Ghanipoor-Machiani, Linsen Chong, and Milos Mladenovic)
Workshop III: Traffic Control
New Directions in Mathematical Approaches for Traffic Flow Management
IPAM
October 27, 2015
Presentation Outline• Agent based modeling… what? why? And how?
• What is the learning framework? What are the
techniques?
• Examples of learning:
• Controller agents
• Driver behavior agents
• Vehicle agents
• What if we don’t incorporate learning?
• Conclusions
2
Background
3
Learning
• Can we predict a condition or a
behavior/response from a wealth of data?
• Can we model and interpret a phenomenon in
a state-action framework?
• The same input data can lead to different
performance measures, and we are the
reason!
4
Motivation
B
A
C
E
F G
H I
D
Varying Traffic Behavior Maneuvers Naturalistic Data
5
15
25
35
45
5 15 25 35 45 55
t= 4 sec
t= 0 sec (reference
time)
t= 2 sec
X (m)
Y (
m)
Trained Agents
VISSIM Simulation
Detailed Behavioral Data Trajectories
Advanced VISSIM API-
Agent Interface
5
A Learning Framework
6
State 2
State 1
State 3
State 4State 6
Other states
State 5
State
Action
State S Diagram Policy P
Learning Techniques
• Machine Learning
• Q-Learning
• Reinforcement
Learning
• Etc.
7
State 2
State 1
State 3
State 4State 6
Other states
State 5
State
Action
State S Diagram Policy P
Q-Learning
8
Acting on environment,
receiving rewards,
selecting actions to
reach a goal
Application: Dilemma Zone Problem
• Application of learning to controller and
to humans
• Controllers making decisions
• Humans learning from mistakes
9
To stop or not to stop? That is the question!
10
To stop
11
To go
12
Controller Agent—Learning the Policy
13
14
Environment’s State Variables:
Total number of vehicles in DZ
Agent’s Actions:
- End the Green
- Extend the Green
Reward:
Vehicles caught in DZ
Q-learning algorithm parameters:
Learning rate: 0.01
Discount rate: 0.5
Off-line and Online Learning• Find P* with simulation
• Update Q-table with real
data
15
Markovian Traffic State Estimation
0
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30 35 40
Time to max-out (sec)
Se
Semantic Memory
StateAction
(stop/go)
Episodic Memory
Dataset
Memory Decay
Working Memory
Updated
Q tableQ table
Trained Q
table
E-Greedy
Brain Analogy
Procedural Memory
Pro
pensi
ty
Distractions Emotions
Human Learning Model
Dealing with High State Dimensionality
(Naturalistic driving behavior study)*
• Training input: traffic states and actions
• Training output: acceleration and steering
• Input variables discretized using fuzzy sets
• Continuous actions are generated from discrete
actions
• Uses all the safety critical events available in training
17
*Safety and Mobility Agent-based
Reinforcement-learning Traffic Simulation
Add-on Module (SMART SAM)
NFACRL Framework𝑆𝑖=the 𝑖𝑡ℎ input variable (state variable)
𝐾 =number of input variables
𝑁𝑀𝑖=number of fuzzy sets or membership
functions for the 𝑆𝑖
𝑀𝑖𝑎(𝑖)
=𝑎(𝑖)𝑡ℎ fuzzy set or membership function
for the 𝑖𝑡ℎ input variable
𝑅𝑗=the 𝑗𝑡ℎ fuzzy rule
𝑁=number of fuzzy rules
18
𝜆𝑗=weight between 𝑗𝑡ℎ fuzzy rule and critic
𝑤𝑞𝑗=weight between 𝑗𝑡ℎ fuzzy rule and action 𝑞
𝑉 =critic value
𝐴𝑞 =output of 𝑞𝑡ℎ action
Where 𝑖 = 1,…𝐾, 𝑎 𝑖 = 1, . . 𝑁𝑀𝑖 , 𝑗 = 1,… , 𝑁 and
𝑞 = 1, . . , 𝑃
Applications and Cross Validation
• Test the heterogeneity of the drivers
• Training: Used the data from Agent A in training with
its behavioral rules as output
• Validation: Used the output rule of Agent A and
applied it to driver B
• Heterogeneity of Agent A, B is represented by degree
of accuracy in validation
19
Agent A: Event 1
20
0 50 100 150 200 250 300 350-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 50 100 150 200 250 300 350-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Agent A: Event 2
21
0 50 100 150 200 250 300 350 400-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 50 100 150 200 250 300 350 400-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Driver Agent B
22
0 100 200 300 400 500 600-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 100 200 300 400 500 600-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Driver Agent A: Own Behavior
23
0 50 100 150 200 250 300 350-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 50 100 150 200 250 300 350-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Driver B: Own Behavior
24
0 100 200 300 400 500 600-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 100 200 300 400 500 600-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Driver A: Using Behavior from B
25
0 50 100 150 200 250 300 350-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 50 100 150 200 250 300 350-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Driver B: Using Behavior from A
• Heterogeneity is clear
26
0 100 200 300 400 500 600-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 100 200 300 400 500 600-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Mega-Agent Behavior
• Mega-Agent behaves as Driver B
27
0 100 200 300 400 500 600-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
Time (0.1s)
Accele
ration (
g)
Longitidinal Action Estimation
Naturalistic
Agent
0 100 200 300 400 500 600-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
Time (0.1s)
Yaw
Angle
(ra
diu
s)
Lateral Action Estimation
Naturalistic
Agent
Comparison of Mega-Agent to
Cross Validation Result• Degree of accuracy: R square
28
Event Agent A Agent B Mega
long lat long lat long lat
Event A 0.98 0.967 0.81 0.83 0.98 0.95
Event B 0.82 0.6 0.97 0.92 0.97 0.9
But…Why NOT Statistical Modeling?
• Would lead to wrong conclusions!
29
Future CV/AV Applications
• Multi-modal applications: modeling,
simulation, and optimization
• Accounting for different priorities,
including emergency vehicles
• Utilization of the computing capabilities
of CV/AV
• Linking arterial control to freeway
management scenarios
• Characterizing and changing network
performance
30
31
User-Controlled AI-Controlled High-priority
Token-based PL selection system
AI PL selection system based on performance
Pre-set PL based on vehicle type
Microscopic simulation framework for system evaluation
Vehicle Agents
ABM system
Reservation Matrix
Revocation-enabled FIFO
Trajectory Adjustment
Fuel and Emission Optimization
Road and Vehicle Characteristics
User and System Requirements
System Configuration and ABMS Rules
Perf
orm
ance
Mea
sure
s
Multi-agent System Framework
Multi-agent System Framework
32
a1
a2t1
t2
RD
Required Delay for a
vehicle after arriving
at the intersection
until higher priority
vehicles clear all
conflict tiles
Time
Speed
TimeRather than driving
with constant speed,
come to a complete
stop for a duration of
RD before resuming
speed, a vehicle
follows a modified
trajectory to delay its
arrival by RD
Dis
tanc
e
Here I
Am
State
PI
RD
Tim
e
t1, a
1, t2, a
2
33
Negotiating an Intersection
34
Experiment Setup
• Simulating high and low priority levels in some
approaches
• Tabulated delay values and vehicle trajectories
for different approaches
35
0
810
1620
2.7
5.6
8.5
11
.41
4.3
17
.22
0.1
23
.02
5.9
28
.83
1.7
34.
63
7.5
40
.44
3.3
46
.24
9.1
52
.05
4.9
57
.86
0.7
63
.66
6.5
69
.47
2.3
75
.27
8.1
81
.08
3.9
86.
88
9.7
92
.69
5.5
98
.41
01
.3
Dis
tan
ce
Time
Time-Space Diagram for Phase 2
2.0 - 4.4
4.0 - 10.2
26.0 - 10.2
32.0 - 4.6
41.0 - 4.8
50.0 - 4.6
56.0 - 10.2
57.0 - 4.4
70.0 - 10.2
78.0 - 4.6
85.0 - 4.6
114.0 - 4.6 0
810
1620
2.3
5.2
8.1
11
.01
3.9
16
.81
9.7
22
.62
5.5
28
.43
1.3
34
.23
7.1
40
.04
2.9
45
.84
8.7
51
.65
4.5
57
.46
0.3
63
.26
6.1
69
.07
1.9
74
.87
7.7
80
.68
3.5
86
.48
9.3
92
.29
5.1
98
.01
00
.9
Dis
tan
ce
Time
Time-Space Diagram for Phase 4
1.0 - 4.4
8.0 - 4.6
9.0 - 4.4
10.0 - 4.8
11.0 - 4.4
12.0 - 4.8
13.0 - 4.1
14.0 - 4.6
15.0 - 4.1
18.0 - 4.6
20.0 - 4.4
21.0 - 4.8
23.0 - 4.6
28.0 - 4.4
29.0 - 4.4
Experiment Results
• Agents adapt by forming dense
platoons to pass through large gaps
more efficiently
• Interesting emergent behavior can
be observed from simple interaction
rules
• Low priority agents are sensitive to
traffic demand level
• Frequent EV calls re-synch the EV
approach
36
Phase % EV, Ph2
Scenario 1 2 3 4 5 6 7 8
PL
1 2 1 3 1 2 1 3 4
10 200 200 200 200 200 200 200 200 0
11 400 400 400 400 400 400 400 400 0
12 400 600 400 600 400 600 400 600 0
13 400 800 400 800 400 800 400 800 0
14 200 200 200 200 200 200 200 200 10
15 400 400 400 400 400 400 400 400 20
16 400 600 400 600 400 600 400 600 30
0.0
50.0
100.0
150.0
200.0
250.0
10 11 12 13 14 15 16
1
2
3
4
5
6
7
8
Concluding Remarks
• Intelligent agents can capture individual
learning, and agent-based modeling can
capture the emerging system behavior
• Think state-action framework…it can
explain a lot of things
• Win the chess game, not just the next
move
37