AGENT-BASED MODELING, SIMULATION, AND CONTROL...

Post on 23-Sep-2020

0 views 0 download

transcript

1

AGENT-BASED MODELING, SIMULATION, AND CONTROL—

SOME APPLICATIONS IN TRANSPORTATION

Montasir Abbas, Virginia Tech

(with contributions from past and present VT-SCORES students, including:

Zain Adam, Sahar Ghanipoor-Machiani, Linsen Chong, and Milos Mladenovic)

Workshop III: Traffic Control

New Directions in Mathematical Approaches for Traffic Flow Management

IPAM

October 27, 2015

Presentation Outline• Agent based modeling… what? why? And how?

• What is the learning framework? What are the

techniques?

• Examples of learning:

• Controller agents

• Driver behavior agents

• Vehicle agents

• What if we don’t incorporate learning?

• Conclusions

2

Background

3

Learning

• Can we predict a condition or a

behavior/response from a wealth of data?

• Can we model and interpret a phenomenon in

a state-action framework?

• The same input data can lead to different

performance measures, and we are the

reason!

4

Motivation

B

A

C

E

F G

H I

D

Varying Traffic Behavior Maneuvers Naturalistic Data

5

15

25

35

45

5 15 25 35 45 55

t= 4 sec

t= 0 sec (reference

time)

t= 2 sec

X (m)

Y (

m)

Trained Agents

VISSIM Simulation

Detailed Behavioral Data Trajectories

Advanced VISSIM API-

Agent Interface

5

A Learning Framework

6

State 2

State 1

State 3

State 4State 6

Other states

State 5

State

Action

State S Diagram Policy P

Learning Techniques

• Machine Learning

• Q-Learning

• Reinforcement

Learning

• Etc.

7

State 2

State 1

State 3

State 4State 6

Other states

State 5

State

Action

State S Diagram Policy P

Q-Learning

8

Acting on environment,

receiving rewards,

selecting actions to

reach a goal

Application: Dilemma Zone Problem

• Application of learning to controller and

to humans

• Controllers making decisions

• Humans learning from mistakes

9

To stop or not to stop? That is the question!

10

To stop

11

To go

12

Controller Agent—Learning the Policy

13

14

Environment’s State Variables:

Total number of vehicles in DZ

Agent’s Actions:

- End the Green

- Extend the Green

Reward:

Vehicles caught in DZ

Q-learning algorithm parameters:

Learning rate: 0.01

Discount rate: 0.5

Off-line and Online Learning• Find P* with simulation

• Update Q-table with real

data

15

Markovian Traffic State Estimation

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35 40

Time to max-out (sec)

Se

Semantic Memory

StateAction

(stop/go)

Episodic Memory

Dataset

Memory Decay

Working Memory

Updated

Q tableQ table

Trained Q

table

E-Greedy

Brain Analogy

Procedural Memory

Pro

pensi

ty

Distractions Emotions

Human Learning Model

Dealing with High State Dimensionality

(Naturalistic driving behavior study)*

• Training input: traffic states and actions

• Training output: acceleration and steering

• Input variables discretized using fuzzy sets

• Continuous actions are generated from discrete

actions

• Uses all the safety critical events available in training

17

*Safety and Mobility Agent-based

Reinforcement-learning Traffic Simulation

Add-on Module (SMART SAM)

NFACRL Framework𝑆𝑖=the 𝑖𝑡ℎ input variable (state variable)

𝐾 =number of input variables

𝑁𝑀𝑖=number of fuzzy sets or membership

functions for the 𝑆𝑖

𝑀𝑖𝑎(𝑖)

=𝑎(𝑖)𝑡ℎ fuzzy set or membership function

for the 𝑖𝑡ℎ input variable

𝑅𝑗=the 𝑗𝑡ℎ fuzzy rule

𝑁=number of fuzzy rules

18

𝜆𝑗=weight between 𝑗𝑡ℎ fuzzy rule and critic

𝑤𝑞𝑗=weight between 𝑗𝑡ℎ fuzzy rule and action 𝑞

𝑉 =critic value

𝐴𝑞 =output of 𝑞𝑡ℎ action

Where 𝑖 = 1,…𝐾, 𝑎 𝑖 = 1, . . 𝑁𝑀𝑖 , 𝑗 = 1,… , 𝑁 and

𝑞 = 1, . . , 𝑃

Applications and Cross Validation

• Test the heterogeneity of the drivers

• Training: Used the data from Agent A in training with

its behavioral rules as output

• Validation: Used the output rule of Agent A and

applied it to driver B

• Heterogeneity of Agent A, B is represented by degree

of accuracy in validation

19

Agent A: Event 1

20

0 50 100 150 200 250 300 350-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 50 100 150 200 250 300 350-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Agent A: Event 2

21

0 50 100 150 200 250 300 350 400-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 50 100 150 200 250 300 350 400-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Driver Agent B

22

0 100 200 300 400 500 600-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 100 200 300 400 500 600-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Driver Agent A: Own Behavior

23

0 50 100 150 200 250 300 350-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 50 100 150 200 250 300 350-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Driver B: Own Behavior

24

0 100 200 300 400 500 600-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 100 200 300 400 500 600-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Driver A: Using Behavior from B

25

0 50 100 150 200 250 300 350-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 50 100 150 200 250 300 350-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Driver B: Using Behavior from A

• Heterogeneity is clear

26

0 100 200 300 400 500 600-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 100 200 300 400 500 600-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Mega-Agent Behavior

• Mega-Agent behaves as Driver B

27

0 100 200 300 400 500 600-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

Time (0.1s)

Accele

ration (

g)

Longitidinal Action Estimation

Naturalistic

Agent

0 100 200 300 400 500 600-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

Time (0.1s)

Yaw

Angle

(ra

diu

s)

Lateral Action Estimation

Naturalistic

Agent

Comparison of Mega-Agent to

Cross Validation Result• Degree of accuracy: R square

28

Event Agent A Agent B Mega

long lat long lat long lat

Event A 0.98 0.967 0.81 0.83 0.98 0.95

Event B 0.82 0.6 0.97 0.92 0.97 0.9

But…Why NOT Statistical Modeling?

• Would lead to wrong conclusions!

29

Future CV/AV Applications

• Multi-modal applications: modeling,

simulation, and optimization

• Accounting for different priorities,

including emergency vehicles

• Utilization of the computing capabilities

of CV/AV

• Linking arterial control to freeway

management scenarios

• Characterizing and changing network

performance

30

31

User-Controlled AI-Controlled High-priority

Token-based PL selection system

AI PL selection system based on performance

Pre-set PL based on vehicle type

Microscopic simulation framework for system evaluation

Vehicle Agents

ABM system

Reservation Matrix

Revocation-enabled FIFO

Trajectory Adjustment

Fuel and Emission Optimization

Road and Vehicle Characteristics

User and System Requirements

System Configuration and ABMS Rules

Perf

orm

ance

Mea

sure

s

Multi-agent System Framework

Multi-agent System Framework

32

a1

a2t1

t2

RD

Required Delay for a

vehicle after arriving

at the intersection

until higher priority

vehicles clear all

conflict tiles

Time

Speed

TimeRather than driving

with constant speed,

come to a complete

stop for a duration of

RD before resuming

speed, a vehicle

follows a modified

trajectory to delay its

arrival by RD

Dis

tanc

e

Here I

Am

State

PI

RD

Tim

e

t1, a

1, t2, a

2

33

Negotiating an Intersection

34

Experiment Setup

• Simulating high and low priority levels in some

approaches

• Tabulated delay values and vehicle trajectories

for different approaches

35

0

810

1620

2.7

5.6

8.5

11

.41

4.3

17

.22

0.1

23

.02

5.9

28

.83

1.7

34.

63

7.5

40

.44

3.3

46

.24

9.1

52

.05

4.9

57

.86

0.7

63

.66

6.5

69

.47

2.3

75

.27

8.1

81

.08

3.9

86.

88

9.7

92

.69

5.5

98

.41

01

.3

Dis

tan

ce

Time

Time-Space Diagram for Phase 2

2.0 - 4.4

4.0 - 10.2

26.0 - 10.2

32.0 - 4.6

41.0 - 4.8

50.0 - 4.6

56.0 - 10.2

57.0 - 4.4

70.0 - 10.2

78.0 - 4.6

85.0 - 4.6

114.0 - 4.6 0

810

1620

2.3

5.2

8.1

11

.01

3.9

16

.81

9.7

22

.62

5.5

28

.43

1.3

34

.23

7.1

40

.04

2.9

45

.84

8.7

51

.65

4.5

57

.46

0.3

63

.26

6.1

69

.07

1.9

74

.87

7.7

80

.68

3.5

86

.48

9.3

92

.29

5.1

98

.01

00

.9

Dis

tan

ce

Time

Time-Space Diagram for Phase 4

1.0 - 4.4

8.0 - 4.6

9.0 - 4.4

10.0 - 4.8

11.0 - 4.4

12.0 - 4.8

13.0 - 4.1

14.0 - 4.6

15.0 - 4.1

18.0 - 4.6

20.0 - 4.4

21.0 - 4.8

23.0 - 4.6

28.0 - 4.4

29.0 - 4.4

Experiment Results

• Agents adapt by forming dense

platoons to pass through large gaps

more efficiently

• Interesting emergent behavior can

be observed from simple interaction

rules

• Low priority agents are sensitive to

traffic demand level

• Frequent EV calls re-synch the EV

approach

36

Phase % EV, Ph2

Scenario 1 2 3 4 5 6 7 8

PL

1 2 1 3 1 2 1 3 4

10 200 200 200 200 200 200 200 200 0

11 400 400 400 400 400 400 400 400 0

12 400 600 400 600 400 600 400 600 0

13 400 800 400 800 400 800 400 800 0

14 200 200 200 200 200 200 200 200 10

15 400 400 400 400 400 400 400 400 20

16 400 600 400 600 400 600 400 600 30

0.0

50.0

100.0

150.0

200.0

250.0

10 11 12 13 14 15 16

1

2

3

4

5

6

7

8

Concluding Remarks

• Intelligent agents can capture individual

learning, and agent-based modeling can

capture the emerging system behavior

• Think state-action framework…it can

explain a lot of things

• Win the chess game, not just the next

move

37