Research ArticleIntelligent Ramp Control for Incident ResponseUsing Dyna-119876 Architecture
Chao Lu12 Yanan Zhao1 and Jianwei Gong1
1School of Mechanical Engineering Beijing Institute of Technology Beijing 100081 China2Institute for Transport Studies University of Leeds Leeds LS2 9JT UK
Correspondence should be addressed to Chao Lu tscllugmailcom
Received 18 June 2015 Revised 22 September 2015 Accepted 28 September 2015
Academic Editor Dongsuk Kum
Copyright copy 2015 Chao Lu et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Reinforcement learning (RL) has shown great potential for motorway ramp control especially under the congestion caused byincidents However existing applications limited to single-agent tasks and based on119876-learning have inherent drawbacks for dealingwith coordinated ramp control problems For solving these problems a Dyna-119876 based multiagent reinforcement learning (MARL)system named Dyna-MARL has been developed in this paper Dyna-119876 is an extension of 119876-learning which combines model-free and model-based methods to obtain benefits from both sides The performance of Dyna-MARL is tested in a simulatedmotorway segment in the UK with the real traffic data collected from AM peak hours The test results compared with IsolatedRL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operationwith respect to increasing total throughput reducing total travel time and CO
2emission Moreover with a suitable coordination
strategy Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from differenton-ramps
1 Introduction
Traffic congestion occurs when the traffic demand for a roadnetwork approaches or exceeds its available road capacityEven slight losses of the balance between demand andcapacity on motorways can lead to long travel delays highenergy consumptions and severe environmental problemsTherefore how to alleviate traffic congestion and maintainthe demand-capacity balance has become one of the mainconcerns of the transport community To this end a numberof traffic control devices such as variable speed limit (VSL)variable message sign (VMS) and ramp control systems aredeveloped under the umbrella of intelligent transportationsystems (ITS) Among these advanced systems ramp control(also known as ramp metering) has been widely used andproved to be an effective control method for different kindsof congestion on motorways [1]
Generally traffic congestion can be classified into twocategories recurrent congestion and nonrecurrent conges-tion Recurrent congestion is caused by the daily traffic
operation with temporarily increased traffic demand in peakhours [2] Considering the daily peak traffic on motorwaysrecurrent congestion is the main concern of many existingramp control systems For instance fixed-time systems (alsoknown as pretimed systems) use historical data collectedfrom daily peak hours to generate control strategies offlineand trigger these strategies at fixed times (eg morning orevening peak hours) of each day [1] Local traffic-responsivesystems such as demand-capacity method ALINEA [3] andits variations [4] can respond to the real-time traffic andkeep the outflow or road density of the motorway mainlineclose to some target value (eg road capacity or criticaldensity) Usually these target values should be defined inadvance according to the so-called fundamental diagramwhich is derived from the daily traffic data To deal withnetwork-wide problems traffic-responsive systems have beenextended to coordinated ramp control systems such as Flow[5] System Wide Adaptive Ramp Metering (SWARM) [6]and Zone algorithms [7] Similar to local traffic-responsivesystems these coordinated systems also attempt to make
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 896943 16 pageshttpdxdoiorg1011552015896943
2 Mathematical Problems in Engineering
the outflow ofmotorwaymainline approach a predeterminedtarget value which is usually the road capacity Anothergroup of systems focuses on formulating different controlscenarios as optimisation problems and using optimal controltechniques (eg model predictive control) to solve themThe purpose of these systems is to maximise or minimisean objective function not to achieve some predefined targetvalue Examples of these systems can be found in [8ndash12]where macroscopic traffic flow models were combined withcontrol systems to formulate optimal control problems
Although the aforementioned systems have shown theireffectiveness in different scenarios recurrent congestion isstill themain focus of these systems and a component that candeal with nonrecurrent congestion is not included in thesesystems Unlike recurrent congestion caused by the increasedtraffic demand in peak hours nonrecurrent congestion ismainly induced by incidents and thus it is usually referredto as incident-induced congestion [2 13] Traffic incidentsare nonrecurrent events such as road accidents vehiclebreakdown and unexpected obstacles that may block one ormore lanes of the motorway mainline The temporary laneblockage will interrupt the normal operation of traffic flowand lead to a rapid reduction of road capacity [14] In thiscase fixed-time and simple traffic-responsive systems whichare dependent on the information collected from daily trafficoperation or a predefined target value are not applicableTherefore more sophisticated systems that can respond toincidents are required During the last decades a series ofsuch kinds of ramp control systems have been designedmost of which are based on optimisation techniques Forexample an optimal control structure using a simple macro-scopic traffic flow model was proposed in [15] to deal withincident-induced congestion A more complex system withconsideration of dynamic incident duration was developedin [16] which can be solved by the linear programmingtechnique In the research presented in [17 18] both lane-changing and queuing behaviour during the incident wereincorporated into a modelling structure and solved by astochastic optimal control system Although these systemsare based on different technologies they all need a modelto predict traffic conditions and use these predictions toaccomplish the control process
Model-based methods usually have poor adaptabilitywhen the mismatch between simulation models and the realcontrolled environment emerges [19ndash21] To overcome thislimitation another optimisation-based method reinforce-ment learning (RL) was introduced to the ramp control areaThis method is based on theMarkov decision process (MDP)and dynamic programming (DP) which can approximatelysolve the optimisation problem through continuous learningwithout any models The first ramp control system using RLto solve incident-induced problemswas developed in [19 22]The basic RL algorithm named 119876-learning was adopted bythis system to alleviate traffic congestion caused by incidentsAfter this work several119876-learning systems considering bothlocal (eg [23 24]) and coordinated (eg [25 26]) controlproblems were proposed However 119876-learning can onlylearn from real interactions with the traffic operation andcannot make full use of historical data (or models) Because
of this limitation 119876-learning usually has a low learningspeed and needs a great number of trials to obtain the bestcontrol strategy in some complex scenarios such as incident-induced congestion [27] This problem is even worse inthe coordinated ramp control problems with exponentiallyincreased state and action spaces which will lead to the so-called ldquocurse of dimensionalityrdquo [28] One solution to speedup the learning process and deal with incidents efficiently hasbeen proposed in our previous work [27 29] This systemused the Dyna-119876 architecture to combine model-free 119876-learning with a model-based method and can be used toaccomplish single-agent tasks
In this paper the previous single-agent system is extendedto a multiagent case that can deal with a network-wideproblem with multiple ramp controllers We refer this systemto Dyna-MARL which adopts a multiagent RL (MARL)strategy based on Dyna-119876 architectureThe rest of this paperis organised as follows Section 2 briefly introduces the basicknowledge of RL including single-agent andmultiagent casesThe architecture of Dyna-MARL is described in Section 3After that Sections 4 and 5 give the detailed description ofthe models elements and related algorithm of Dyna-MARLThe simulation experiments and relevant results are discussedin Section 6 Section 7 finally gives some conclusions andintroduces the future work
2 Reinforcement Learning
RL is a subclass of machine learning In the followingsubsections two kinds of RL problems namely single-agentand multiagent RL will be briefly introduced
21 Single-Agent RL The problem of single-agent RL isusually defined as an MDP that can be represented by atuple (119878 119875 119877 119862) [30] 119878 is the state space used to describe theexternal environment 119862 is the control action set containingexecutable actions of the agent 119875 is the state transitionprobability For state pair (119904 1199041015840 isin 119878) 119875119888(119904 1199041015840) represents theprobability of reaching state 1199041015840 after executing action 119888 at state119904 119877 119878 times 119862 rarr R is the reward function 119877(119904 119888) denotes theimmediate reward after taking action 119888 at state 119904 Based onthese definitions119876 value is defined for each state-action pair(119904 119888) and shown below
119876120587(119904 119888)
= 119864
infin
sum119899=0
120574119899119877 (119904119896+119899+1
119888119896+119899+1
) | 119904119896= 119904 119888
119896= 119888
(1)
where 119896 is the time index and 119899 is the number of time steps119904119896isin 119878 and 119888119896 isin 119862 are the environment state and executed
control action at time step 119896 respectively 120574 isin [0 1] isthe discount factor which indicates the importance of thefollowing predicted rewards For 120574119899 119899 is the power 120587 is thepolicy corresponding to a sequence of actions The optimalpolicy can be obtained by maximising the 119876 value
The most widely used algorithm in literature for esti-mating the maximum 119876 value is 119876-learning [31] By using
Mathematical Problems in Engineering 3
the updating equation as given below 119876-learning can max-imise 119876 value for each state-action pair
119876119896+1(119904119896 119888119896) = 119876
119896(119904119896 119888119896) + 120572 [119877
119896(119904119896 119888119896)
+ 120574max119888119896+1
119876119896(119904119896+1 119888119896+1) minus 119876119896(119904119896 119888119896)]
(2)
where 119876119896+1(119904119896 119888119896) and 119876119896(119904119896 119888119896) are the 119876 value for state-action pair (119904119896 119888119896) at the 119896+1th step and 119896th step respectivelyand 119876119896(119904119896+1 119888119896+1) is the 119876 value for the state-action pair(119904119896+1 119888119896+1) at the 119896th step 120572 isin [0 1] is the learning rate 120574
and 120572 can be regulated according to different problems
22 Multiagent Scenarios In multiagent scenarios an MDPfor single-agent case can be extended to a stochastic game(SG) or Markov game in which a group of agents try toobtain some equilibrium solutions through coordination orcompetition [28]
In the absence of competition all agents involved in agame have a common goal to maximise the global 119876 valuewhich forms a coordinated MARL problem In this case thepolicy optimisation is determined by actions executed by allagents
For solving a coordinated MARL problem the updateequation (2) for 119876-learning can be easily extended to repre-sent the global 119876 value update [28]
119876119896+1(119904119896 119888119896
1 119888
119896
119899) = 119876
119896(119904119896 119888119896
1 119888
119896
119899)
+ 120572 [119877119896(119904119896 119888119896
1 119888
119896
119899)
+ 120574 max119888119896+1
1119888119896+1119899
119876119896(119904119896+1 119888119896+1
1 119888
119896+1
119899)
minus 119876119896(119904119896 119888119896
1 119888
119896
119899)]
(3)
The only difference with (2) is that119876 and 119877 in (3) relate to119899 actions 119888
1 119888
119899executed by 119899 agents rather than to a single
action 119888
23 Solutions for Coordinated MARL It can be seen from (3)that as the number of agents grows combinations of actionsand the resultant computational complexity are increasedexponentially which may make the problem unsolvablewithin a required time limit [28]Therefore a commonly usedmethod is to decompose the global 119876 value to several local119876 values each of which can be maximised by a few relevantagents rather than all agents [32] Based on this distributedmethod several strategies have been proposed In [28] thesestrategies fall into three categories including coordination-based coordination-free and indirect coordination strate-gies
Coordination-based strategies need local 119876 values to beupdated according to actions executed by all relevant agents
(named joint actions) at each time step [28] The decisionmaking process of each agent is based on the informationreceived from all other related agents with sufficient com-munication This will complicate the problem On the otherhand coordination-free (or independent) strategies such asdistributed 119876-leaning algorithm make each agent updatethe corresponding local 119876 values based on its own actions[33]Therefore each agent makes its decisions independentlywithout increasing computational complexity However thiscomputational efficiency is at the expense of nonguaranteedconvergence [32] Indirect coordination strategies try to finda balance between the above two methods By applyingindirect strategies each agent can maintain models for itscooperative partners and update local 119876 values withoutknowing all the information of other agents at each step [28]Based on high-quality models this method can reduce theproblem complexity and guarantee convergence with limitedcoordination
3 Dyna-119876 Based IndirectCoordination Strategy
Because of the benefits introduced in the above section theindirect coordination strategy has been applied in [34] forsolving urban traffic control problems In their work eachagent maintains a model for estimating the action selectionprobability of its neighbours and uses this information tooptimise control strategies In this paper we extend thismethod to motorway systems by applying Dyna-119876 architec-ture
Under the Dyna-119876 architecture a modified macroscopicflow model named asymmetric cell transmission model(ACTM) and119876-learning algorithm are combined together todeal with coordinated MARL problems In this section theapplication of Dyna-119876 will be introduced
31 Dyna-119876 Architecture Dyna-119876 architecture is an exten-sion of standard 119876-learning that integrates planning actingand learning together [30] Unlike 119876-learning which learnsfrom the real experience without a model Dyna-119876 learnsa model and uses this model to guide the agent [35] Aftercapturing the real experience two loops run to learn optimalpolicies that can obtain the maximum 119876 value in Dyna-119876architecture (see Figure 1)
In loop I direct RL is the standard119876-leaning process thatcan be used to interact with the real external environmentLoop II contains two main tasks (1) model learning isused to improve the model accuracy through obtaining newknowledge from real experience (2) planning is the sameprocess of direct RL except that it is using the experiencegenerated by a model Acting is the action execution process
Applying a model the agent can predict reactions of itsexternal environment and other agents before executing aspecific action which provides an opportunity for agent toupdate119876 value before receiving the real feedback Simultane-ously direct RL is running to update the119876 value through thereal interactionTherefore optimal policy is learned throughboth real experience and predictions By using this strategy
4 Mathematical Problems in Engineering
Valuepolicy
ExperienceModel
Acting
Model learning
Planning
Direct RL
Figure 1 Dyna-119876 architecture
Valuepolicy
Model
and estimated traffic arrival and
Experience
Direct RL Planning
Acting
Model learning
Age
nt ar
chite
ctur
eA
gent
coor
dina
tion
i i + 1 i + 2
j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8
Agent i + 1Agent i Agent i + 2
middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot
dkjoff dkj+3offdkj+6off
dkj+2ondkj+5on dkj+8on
akj+2onakj+5on akj+8on
dkjmain dkj+8maindkj+2main
akj+7mainakj+1main
akj+8main
Q-valueQk+1i+1 (s
ki+1 c
ki c
ki+1)
dkj+3main dkj+3off akj+5main
(ii) Info from i cki qkion d
kjmain d
kjoff
(i) Real traffic arrival and departure rates
cki qkion
dkjmain dkjoff
cki+1 qki+1on
dkj+3main dkj+3off
departure rates of section i and i + 1
(i) ACTM
(ii) Action probability p(cki | ski+1)
Figure 2 System architecture
Dyna-119876 can learn faster than 119876-learning in many situations[30]
Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]
32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section
A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell
The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example
Mathematical Problems in Engineering 5
Incident extent
Critical section
(a)
Flow
Density0
dmaxjmain
dInmaxjmain wj
wInj
120588InjC 120588jC 120588In
jJ120588jJ
j
Inj
(b)
Figure 3 Fundamental diagram during the incident
experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess
To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894
Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)
If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by
119876119896+1
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) = 119876
119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)
+ 120572[
[
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) + 120574max
119888119896+1
119894
sum
119888119896+1
119894minus1
119901 (119888119896+1
119894minus1| 119904119896+1
119894)
sdot 119876119896
119894(119904119896+1
119894 119888119896+1
119894minus1 119888119896+1
119894) minus 119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)]
]
119901 (119888119896+1
119894minus1| 119904119896+1
119894) =
count (119904119896+1119894 119888119896+1
119894minus1)
sum119888119894minus1isin119862119894minus1
count (119904119896+1119894 119888119894minus1)
(4)
where 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is the immediate reward obtained by
agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896
119894are actions
executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) and
119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) are the 119876 values for agent 119894 at step 119896 + 1 and
step 119896 respectively 119862119894minus1
is the action set of agent 119894 minus 1count(119904119896+1
119894 119888119896+1
119894minus1) returns the number of visits for state-action
pair (119904119896+1119894 119888119896+1
119894minus1) Thus 119901(119888119896+1
119894minus1| 119904119896+1
119894) is the probability for
agent 119894 minus 1 selecting action 119888119896+1119894minus1
at state 119904119896+1119894
Models and therelated symbols shown in Figure 2 will be specified in theflowing section
4 Modified Asymmetric CellTransmission Model
A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions
41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582
1 1205822 1205823isin [0 1])
to reflect this new dynamics These three parameters aredefined as 120582
1= VIn119895V119895 1205822= 119908
In119895119908119895 and 120582
3= 119889
Inmax119895main 119889
max119895main
V119895and 119908
119895are the free flow speed and congestion wave speed
of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn
119895
119908In119895 and 119889Inmax
119895main are these three variables during the incident120588119895119862
and 120588In119895119862
are the critical densities for normal and incidentsituations 120588
119895119869and 120588In
119895119869are the jam densities for normal and
incident situations
6 Mathematical Problems in Engineering
42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from
the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations
Departure rates of the mainline and on-ramp
119889119896
119895main = min1205821sdotV119895
119897119895
sdot (119902119896
119895main + 120579119895 sdot 119889119896
119895on sdot Δ119905 minus 119889119896
119895off sdot Δ119905) 1205822 sdot119908119895minus1
119897119895
sdot (119902max119895minus1main minus 119902
119896
119895minus1main minus 120579119895 sdot 119889119896
119895minus1on sdot Δ119905) 1205823
sdot 119889119896max119895main
119889119896
119895on =
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905119888119896
119894
Δ119905
if 119895 is metered on-ramp cell
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905
if 119895 is unmetered on-ramp cell
(5)
Conservation of the mainline and on-ramp
119902119896+1
119895main = 119902119896
119895main + Δ119905
sdot (119886119896
119895main + 119889119896
119895on minus 119889119896
119895main minus 119889119896
119895off)
119902119896+1
119895on = 119902119896
119895on + Δ119905 sdot (119886119896
119895on minus 119889119896
119895on)
(6)
where 119886119896119895main and 119889
119896
119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896
119895on and 119889119896119895on are the on-
ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is
the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896
119895off = 0) 119902119896
119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max
119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896
119895on and 119902max119895on denote the current (at step
119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896
119894is the metering rate for the on-ramp cell
of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow
allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending
parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study
For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896
119894main =
sum119869
119895=1119902119896
119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896
119894on = 119902119896
119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max
119894main =
sum119869
119895=1119902max119895main and 119902
max119894on = 119902
max119895on
43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as
119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by
119886119896119896+1
119894main = 119886119896119896+1
119895main =sum119873minus1
119899=0119886119896minus119899
119895main
119873 if 119895 is the boundary cell
119886119896119896+1
119894on = 119886119896119896+1
119895on =sum119873minus1
119899=0119886119896minus119899
119895on
119873 if 119895 is the on-ramp cell
119889119896119896+1
119895off =sum119873minus1
119899=0119889119896minus119899
119895off
119873 if 119895 is the off-ramp cell
(7)
where 119886119896119896+1119895main and 119886119896119896+1
119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1
119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894
5 Definition of RL Elements
Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
the outflow ofmotorwaymainline approach a predeterminedtarget value which is usually the road capacity Anothergroup of systems focuses on formulating different controlscenarios as optimisation problems and using optimal controltechniques (eg model predictive control) to solve themThe purpose of these systems is to maximise or minimisean objective function not to achieve some predefined targetvalue Examples of these systems can be found in [8ndash12]where macroscopic traffic flow models were combined withcontrol systems to formulate optimal control problems
Although the aforementioned systems have shown theireffectiveness in different scenarios recurrent congestion isstill themain focus of these systems and a component that candeal with nonrecurrent congestion is not included in thesesystems Unlike recurrent congestion caused by the increasedtraffic demand in peak hours nonrecurrent congestion ismainly induced by incidents and thus it is usually referredto as incident-induced congestion [2 13] Traffic incidentsare nonrecurrent events such as road accidents vehiclebreakdown and unexpected obstacles that may block one ormore lanes of the motorway mainline The temporary laneblockage will interrupt the normal operation of traffic flowand lead to a rapid reduction of road capacity [14] In thiscase fixed-time and simple traffic-responsive systems whichare dependent on the information collected from daily trafficoperation or a predefined target value are not applicableTherefore more sophisticated systems that can respond toincidents are required During the last decades a series ofsuch kinds of ramp control systems have been designedmost of which are based on optimisation techniques Forexample an optimal control structure using a simple macro-scopic traffic flow model was proposed in [15] to deal withincident-induced congestion A more complex system withconsideration of dynamic incident duration was developedin [16] which can be solved by the linear programmingtechnique In the research presented in [17 18] both lane-changing and queuing behaviour during the incident wereincorporated into a modelling structure and solved by astochastic optimal control system Although these systemsare based on different technologies they all need a modelto predict traffic conditions and use these predictions toaccomplish the control process
Model-based methods usually have poor adaptabilitywhen the mismatch between simulation models and the realcontrolled environment emerges [19ndash21] To overcome thislimitation another optimisation-based method reinforce-ment learning (RL) was introduced to the ramp control areaThis method is based on theMarkov decision process (MDP)and dynamic programming (DP) which can approximatelysolve the optimisation problem through continuous learningwithout any models The first ramp control system using RLto solve incident-induced problemswas developed in [19 22]The basic RL algorithm named 119876-learning was adopted bythis system to alleviate traffic congestion caused by incidentsAfter this work several119876-learning systems considering bothlocal (eg [23 24]) and coordinated (eg [25 26]) controlproblems were proposed However 119876-learning can onlylearn from real interactions with the traffic operation andcannot make full use of historical data (or models) Because
of this limitation 119876-learning usually has a low learningspeed and needs a great number of trials to obtain the bestcontrol strategy in some complex scenarios such as incident-induced congestion [27] This problem is even worse inthe coordinated ramp control problems with exponentiallyincreased state and action spaces which will lead to the so-called ldquocurse of dimensionalityrdquo [28] One solution to speedup the learning process and deal with incidents efficiently hasbeen proposed in our previous work [27 29] This systemused the Dyna-119876 architecture to combine model-free 119876-learning with a model-based method and can be used toaccomplish single-agent tasks
In this paper the previous single-agent system is extendedto a multiagent case that can deal with a network-wideproblem with multiple ramp controllers We refer this systemto Dyna-MARL which adopts a multiagent RL (MARL)strategy based on Dyna-119876 architectureThe rest of this paperis organised as follows Section 2 briefly introduces the basicknowledge of RL including single-agent andmultiagent casesThe architecture of Dyna-MARL is described in Section 3After that Sections 4 and 5 give the detailed description ofthe models elements and related algorithm of Dyna-MARLThe simulation experiments and relevant results are discussedin Section 6 Section 7 finally gives some conclusions andintroduces the future work
2 Reinforcement Learning
RL is a subclass of machine learning In the followingsubsections two kinds of RL problems namely single-agentand multiagent RL will be briefly introduced
21 Single-Agent RL The problem of single-agent RL isusually defined as an MDP that can be represented by atuple (119878 119875 119877 119862) [30] 119878 is the state space used to describe theexternal environment 119862 is the control action set containingexecutable actions of the agent 119875 is the state transitionprobability For state pair (119904 1199041015840 isin 119878) 119875119888(119904 1199041015840) represents theprobability of reaching state 1199041015840 after executing action 119888 at state119904 119877 119878 times 119862 rarr R is the reward function 119877(119904 119888) denotes theimmediate reward after taking action 119888 at state 119904 Based onthese definitions119876 value is defined for each state-action pair(119904 119888) and shown below
119876120587(119904 119888)
= 119864
infin
sum119899=0
120574119899119877 (119904119896+119899+1
119888119896+119899+1
) | 119904119896= 119904 119888
119896= 119888
(1)
where 119896 is the time index and 119899 is the number of time steps119904119896isin 119878 and 119888119896 isin 119862 are the environment state and executed
control action at time step 119896 respectively 120574 isin [0 1] isthe discount factor which indicates the importance of thefollowing predicted rewards For 120574119899 119899 is the power 120587 is thepolicy corresponding to a sequence of actions The optimalpolicy can be obtained by maximising the 119876 value
The most widely used algorithm in literature for esti-mating the maximum 119876 value is 119876-learning [31] By using
Mathematical Problems in Engineering 3
the updating equation as given below 119876-learning can max-imise 119876 value for each state-action pair
119876119896+1(119904119896 119888119896) = 119876
119896(119904119896 119888119896) + 120572 [119877
119896(119904119896 119888119896)
+ 120574max119888119896+1
119876119896(119904119896+1 119888119896+1) minus 119876119896(119904119896 119888119896)]
(2)
where 119876119896+1(119904119896 119888119896) and 119876119896(119904119896 119888119896) are the 119876 value for state-action pair (119904119896 119888119896) at the 119896+1th step and 119896th step respectivelyand 119876119896(119904119896+1 119888119896+1) is the 119876 value for the state-action pair(119904119896+1 119888119896+1) at the 119896th step 120572 isin [0 1] is the learning rate 120574
and 120572 can be regulated according to different problems
22 Multiagent Scenarios In multiagent scenarios an MDPfor single-agent case can be extended to a stochastic game(SG) or Markov game in which a group of agents try toobtain some equilibrium solutions through coordination orcompetition [28]
In the absence of competition all agents involved in agame have a common goal to maximise the global 119876 valuewhich forms a coordinated MARL problem In this case thepolicy optimisation is determined by actions executed by allagents
For solving a coordinated MARL problem the updateequation (2) for 119876-learning can be easily extended to repre-sent the global 119876 value update [28]
119876119896+1(119904119896 119888119896
1 119888
119896
119899) = 119876
119896(119904119896 119888119896
1 119888
119896
119899)
+ 120572 [119877119896(119904119896 119888119896
1 119888
119896
119899)
+ 120574 max119888119896+1
1119888119896+1119899
119876119896(119904119896+1 119888119896+1
1 119888
119896+1
119899)
minus 119876119896(119904119896 119888119896
1 119888
119896
119899)]
(3)
The only difference with (2) is that119876 and 119877 in (3) relate to119899 actions 119888
1 119888
119899executed by 119899 agents rather than to a single
action 119888
23 Solutions for Coordinated MARL It can be seen from (3)that as the number of agents grows combinations of actionsand the resultant computational complexity are increasedexponentially which may make the problem unsolvablewithin a required time limit [28]Therefore a commonly usedmethod is to decompose the global 119876 value to several local119876 values each of which can be maximised by a few relevantagents rather than all agents [32] Based on this distributedmethod several strategies have been proposed In [28] thesestrategies fall into three categories including coordination-based coordination-free and indirect coordination strate-gies
Coordination-based strategies need local 119876 values to beupdated according to actions executed by all relevant agents
(named joint actions) at each time step [28] The decisionmaking process of each agent is based on the informationreceived from all other related agents with sufficient com-munication This will complicate the problem On the otherhand coordination-free (or independent) strategies such asdistributed 119876-leaning algorithm make each agent updatethe corresponding local 119876 values based on its own actions[33]Therefore each agent makes its decisions independentlywithout increasing computational complexity However thiscomputational efficiency is at the expense of nonguaranteedconvergence [32] Indirect coordination strategies try to finda balance between the above two methods By applyingindirect strategies each agent can maintain models for itscooperative partners and update local 119876 values withoutknowing all the information of other agents at each step [28]Based on high-quality models this method can reduce theproblem complexity and guarantee convergence with limitedcoordination
3 Dyna-119876 Based IndirectCoordination Strategy
Because of the benefits introduced in the above section theindirect coordination strategy has been applied in [34] forsolving urban traffic control problems In their work eachagent maintains a model for estimating the action selectionprobability of its neighbours and uses this information tooptimise control strategies In this paper we extend thismethod to motorway systems by applying Dyna-119876 architec-ture
Under the Dyna-119876 architecture a modified macroscopicflow model named asymmetric cell transmission model(ACTM) and119876-learning algorithm are combined together todeal with coordinated MARL problems In this section theapplication of Dyna-119876 will be introduced
31 Dyna-119876 Architecture Dyna-119876 architecture is an exten-sion of standard 119876-learning that integrates planning actingand learning together [30] Unlike 119876-learning which learnsfrom the real experience without a model Dyna-119876 learnsa model and uses this model to guide the agent [35] Aftercapturing the real experience two loops run to learn optimalpolicies that can obtain the maximum 119876 value in Dyna-119876architecture (see Figure 1)
In loop I direct RL is the standard119876-leaning process thatcan be used to interact with the real external environmentLoop II contains two main tasks (1) model learning isused to improve the model accuracy through obtaining newknowledge from real experience (2) planning is the sameprocess of direct RL except that it is using the experiencegenerated by a model Acting is the action execution process
Applying a model the agent can predict reactions of itsexternal environment and other agents before executing aspecific action which provides an opportunity for agent toupdate119876 value before receiving the real feedback Simultane-ously direct RL is running to update the119876 value through thereal interactionTherefore optimal policy is learned throughboth real experience and predictions By using this strategy
4 Mathematical Problems in Engineering
Valuepolicy
ExperienceModel
Acting
Model learning
Planning
Direct RL
Figure 1 Dyna-119876 architecture
Valuepolicy
Model
and estimated traffic arrival and
Experience
Direct RL Planning
Acting
Model learning
Age
nt ar
chite
ctur
eA
gent
coor
dina
tion
i i + 1 i + 2
j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8
Agent i + 1Agent i Agent i + 2
middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot
dkjoff dkj+3offdkj+6off
dkj+2ondkj+5on dkj+8on
akj+2onakj+5on akj+8on
dkjmain dkj+8maindkj+2main
akj+7mainakj+1main
akj+8main
Q-valueQk+1i+1 (s
ki+1 c
ki c
ki+1)
dkj+3main dkj+3off akj+5main
(ii) Info from i cki qkion d
kjmain d
kjoff
(i) Real traffic arrival and departure rates
cki qkion
dkjmain dkjoff
cki+1 qki+1on
dkj+3main dkj+3off
departure rates of section i and i + 1
(i) ACTM
(ii) Action probability p(cki | ski+1)
Figure 2 System architecture
Dyna-119876 can learn faster than 119876-learning in many situations[30]
Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]
32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section
A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell
The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example
Mathematical Problems in Engineering 5
Incident extent
Critical section
(a)
Flow
Density0
dmaxjmain
dInmaxjmain wj
wInj
120588InjC 120588jC 120588In
jJ120588jJ
j
Inj
(b)
Figure 3 Fundamental diagram during the incident
experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess
To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894
Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)
If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by
119876119896+1
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) = 119876
119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)
+ 120572[
[
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) + 120574max
119888119896+1
119894
sum
119888119896+1
119894minus1
119901 (119888119896+1
119894minus1| 119904119896+1
119894)
sdot 119876119896
119894(119904119896+1
119894 119888119896+1
119894minus1 119888119896+1
119894) minus 119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)]
]
119901 (119888119896+1
119894minus1| 119904119896+1
119894) =
count (119904119896+1119894 119888119896+1
119894minus1)
sum119888119894minus1isin119862119894minus1
count (119904119896+1119894 119888119894minus1)
(4)
where 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is the immediate reward obtained by
agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896
119894are actions
executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) and
119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) are the 119876 values for agent 119894 at step 119896 + 1 and
step 119896 respectively 119862119894minus1
is the action set of agent 119894 minus 1count(119904119896+1
119894 119888119896+1
119894minus1) returns the number of visits for state-action
pair (119904119896+1119894 119888119896+1
119894minus1) Thus 119901(119888119896+1
119894minus1| 119904119896+1
119894) is the probability for
agent 119894 minus 1 selecting action 119888119896+1119894minus1
at state 119904119896+1119894
Models and therelated symbols shown in Figure 2 will be specified in theflowing section
4 Modified Asymmetric CellTransmission Model
A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions
41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582
1 1205822 1205823isin [0 1])
to reflect this new dynamics These three parameters aredefined as 120582
1= VIn119895V119895 1205822= 119908
In119895119908119895 and 120582
3= 119889
Inmax119895main 119889
max119895main
V119895and 119908
119895are the free flow speed and congestion wave speed
of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn
119895
119908In119895 and 119889Inmax
119895main are these three variables during the incident120588119895119862
and 120588In119895119862
are the critical densities for normal and incidentsituations 120588
119895119869and 120588In
119895119869are the jam densities for normal and
incident situations
6 Mathematical Problems in Engineering
42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from
the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations
Departure rates of the mainline and on-ramp
119889119896
119895main = min1205821sdotV119895
119897119895
sdot (119902119896
119895main + 120579119895 sdot 119889119896
119895on sdot Δ119905 minus 119889119896
119895off sdot Δ119905) 1205822 sdot119908119895minus1
119897119895
sdot (119902max119895minus1main minus 119902
119896
119895minus1main minus 120579119895 sdot 119889119896
119895minus1on sdot Δ119905) 1205823
sdot 119889119896max119895main
119889119896
119895on =
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905119888119896
119894
Δ119905
if 119895 is metered on-ramp cell
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905
if 119895 is unmetered on-ramp cell
(5)
Conservation of the mainline and on-ramp
119902119896+1
119895main = 119902119896
119895main + Δ119905
sdot (119886119896
119895main + 119889119896
119895on minus 119889119896
119895main minus 119889119896
119895off)
119902119896+1
119895on = 119902119896
119895on + Δ119905 sdot (119886119896
119895on minus 119889119896
119895on)
(6)
where 119886119896119895main and 119889
119896
119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896
119895on and 119889119896119895on are the on-
ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is
the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896
119895off = 0) 119902119896
119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max
119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896
119895on and 119902max119895on denote the current (at step
119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896
119894is the metering rate for the on-ramp cell
of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow
allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending
parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study
For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896
119894main =
sum119869
119895=1119902119896
119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896
119894on = 119902119896
119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max
119894main =
sum119869
119895=1119902max119895main and 119902
max119894on = 119902
max119895on
43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as
119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by
119886119896119896+1
119894main = 119886119896119896+1
119895main =sum119873minus1
119899=0119886119896minus119899
119895main
119873 if 119895 is the boundary cell
119886119896119896+1
119894on = 119886119896119896+1
119895on =sum119873minus1
119899=0119886119896minus119899
119895on
119873 if 119895 is the on-ramp cell
119889119896119896+1
119895off =sum119873minus1
119899=0119889119896minus119899
119895off
119873 if 119895 is the off-ramp cell
(7)
where 119886119896119896+1119895main and 119886119896119896+1
119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1
119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894
5 Definition of RL Elements
Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
the updating equation as given below 119876-learning can max-imise 119876 value for each state-action pair
119876119896+1(119904119896 119888119896) = 119876
119896(119904119896 119888119896) + 120572 [119877
119896(119904119896 119888119896)
+ 120574max119888119896+1
119876119896(119904119896+1 119888119896+1) minus 119876119896(119904119896 119888119896)]
(2)
where 119876119896+1(119904119896 119888119896) and 119876119896(119904119896 119888119896) are the 119876 value for state-action pair (119904119896 119888119896) at the 119896+1th step and 119896th step respectivelyand 119876119896(119904119896+1 119888119896+1) is the 119876 value for the state-action pair(119904119896+1 119888119896+1) at the 119896th step 120572 isin [0 1] is the learning rate 120574
and 120572 can be regulated according to different problems
22 Multiagent Scenarios In multiagent scenarios an MDPfor single-agent case can be extended to a stochastic game(SG) or Markov game in which a group of agents try toobtain some equilibrium solutions through coordination orcompetition [28]
In the absence of competition all agents involved in agame have a common goal to maximise the global 119876 valuewhich forms a coordinated MARL problem In this case thepolicy optimisation is determined by actions executed by allagents
For solving a coordinated MARL problem the updateequation (2) for 119876-learning can be easily extended to repre-sent the global 119876 value update [28]
119876119896+1(119904119896 119888119896
1 119888
119896
119899) = 119876
119896(119904119896 119888119896
1 119888
119896
119899)
+ 120572 [119877119896(119904119896 119888119896
1 119888
119896
119899)
+ 120574 max119888119896+1
1119888119896+1119899
119876119896(119904119896+1 119888119896+1
1 119888
119896+1
119899)
minus 119876119896(119904119896 119888119896
1 119888
119896
119899)]
(3)
The only difference with (2) is that119876 and 119877 in (3) relate to119899 actions 119888
1 119888
119899executed by 119899 agents rather than to a single
action 119888
23 Solutions for Coordinated MARL It can be seen from (3)that as the number of agents grows combinations of actionsand the resultant computational complexity are increasedexponentially which may make the problem unsolvablewithin a required time limit [28]Therefore a commonly usedmethod is to decompose the global 119876 value to several local119876 values each of which can be maximised by a few relevantagents rather than all agents [32] Based on this distributedmethod several strategies have been proposed In [28] thesestrategies fall into three categories including coordination-based coordination-free and indirect coordination strate-gies
Coordination-based strategies need local 119876 values to beupdated according to actions executed by all relevant agents
(named joint actions) at each time step [28] The decisionmaking process of each agent is based on the informationreceived from all other related agents with sufficient com-munication This will complicate the problem On the otherhand coordination-free (or independent) strategies such asdistributed 119876-leaning algorithm make each agent updatethe corresponding local 119876 values based on its own actions[33]Therefore each agent makes its decisions independentlywithout increasing computational complexity However thiscomputational efficiency is at the expense of nonguaranteedconvergence [32] Indirect coordination strategies try to finda balance between the above two methods By applyingindirect strategies each agent can maintain models for itscooperative partners and update local 119876 values withoutknowing all the information of other agents at each step [28]Based on high-quality models this method can reduce theproblem complexity and guarantee convergence with limitedcoordination
3 Dyna-119876 Based IndirectCoordination Strategy
Because of the benefits introduced in the above section theindirect coordination strategy has been applied in [34] forsolving urban traffic control problems In their work eachagent maintains a model for estimating the action selectionprobability of its neighbours and uses this information tooptimise control strategies In this paper we extend thismethod to motorway systems by applying Dyna-119876 architec-ture
Under the Dyna-119876 architecture a modified macroscopicflow model named asymmetric cell transmission model(ACTM) and119876-learning algorithm are combined together todeal with coordinated MARL problems In this section theapplication of Dyna-119876 will be introduced
31 Dyna-119876 Architecture Dyna-119876 architecture is an exten-sion of standard 119876-learning that integrates planning actingand learning together [30] Unlike 119876-learning which learnsfrom the real experience without a model Dyna-119876 learnsa model and uses this model to guide the agent [35] Aftercapturing the real experience two loops run to learn optimalpolicies that can obtain the maximum 119876 value in Dyna-119876architecture (see Figure 1)
In loop I direct RL is the standard119876-leaning process thatcan be used to interact with the real external environmentLoop II contains two main tasks (1) model learning isused to improve the model accuracy through obtaining newknowledge from real experience (2) planning is the sameprocess of direct RL except that it is using the experiencegenerated by a model Acting is the action execution process
Applying a model the agent can predict reactions of itsexternal environment and other agents before executing aspecific action which provides an opportunity for agent toupdate119876 value before receiving the real feedback Simultane-ously direct RL is running to update the119876 value through thereal interactionTherefore optimal policy is learned throughboth real experience and predictions By using this strategy
4 Mathematical Problems in Engineering
Valuepolicy
ExperienceModel
Acting
Model learning
Planning
Direct RL
Figure 1 Dyna-119876 architecture
Valuepolicy
Model
and estimated traffic arrival and
Experience
Direct RL Planning
Acting
Model learning
Age
nt ar
chite
ctur
eA
gent
coor
dina
tion
i i + 1 i + 2
j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8
Agent i + 1Agent i Agent i + 2
middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot
dkjoff dkj+3offdkj+6off
dkj+2ondkj+5on dkj+8on
akj+2onakj+5on akj+8on
dkjmain dkj+8maindkj+2main
akj+7mainakj+1main
akj+8main
Q-valueQk+1i+1 (s
ki+1 c
ki c
ki+1)
dkj+3main dkj+3off akj+5main
(ii) Info from i cki qkion d
kjmain d
kjoff
(i) Real traffic arrival and departure rates
cki qkion
dkjmain dkjoff
cki+1 qki+1on
dkj+3main dkj+3off
departure rates of section i and i + 1
(i) ACTM
(ii) Action probability p(cki | ski+1)
Figure 2 System architecture
Dyna-119876 can learn faster than 119876-learning in many situations[30]
Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]
32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section
A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell
The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example
Mathematical Problems in Engineering 5
Incident extent
Critical section
(a)
Flow
Density0
dmaxjmain
dInmaxjmain wj
wInj
120588InjC 120588jC 120588In
jJ120588jJ
j
Inj
(b)
Figure 3 Fundamental diagram during the incident
experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess
To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894
Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)
If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by
119876119896+1
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) = 119876
119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)
+ 120572[
[
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) + 120574max
119888119896+1
119894
sum
119888119896+1
119894minus1
119901 (119888119896+1
119894minus1| 119904119896+1
119894)
sdot 119876119896
119894(119904119896+1
119894 119888119896+1
119894minus1 119888119896+1
119894) minus 119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)]
]
119901 (119888119896+1
119894minus1| 119904119896+1
119894) =
count (119904119896+1119894 119888119896+1
119894minus1)
sum119888119894minus1isin119862119894minus1
count (119904119896+1119894 119888119894minus1)
(4)
where 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is the immediate reward obtained by
agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896
119894are actions
executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) and
119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) are the 119876 values for agent 119894 at step 119896 + 1 and
step 119896 respectively 119862119894minus1
is the action set of agent 119894 minus 1count(119904119896+1
119894 119888119896+1
119894minus1) returns the number of visits for state-action
pair (119904119896+1119894 119888119896+1
119894minus1) Thus 119901(119888119896+1
119894minus1| 119904119896+1
119894) is the probability for
agent 119894 minus 1 selecting action 119888119896+1119894minus1
at state 119904119896+1119894
Models and therelated symbols shown in Figure 2 will be specified in theflowing section
4 Modified Asymmetric CellTransmission Model
A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions
41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582
1 1205822 1205823isin [0 1])
to reflect this new dynamics These three parameters aredefined as 120582
1= VIn119895V119895 1205822= 119908
In119895119908119895 and 120582
3= 119889
Inmax119895main 119889
max119895main
V119895and 119908
119895are the free flow speed and congestion wave speed
of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn
119895
119908In119895 and 119889Inmax
119895main are these three variables during the incident120588119895119862
and 120588In119895119862
are the critical densities for normal and incidentsituations 120588
119895119869and 120588In
119895119869are the jam densities for normal and
incident situations
6 Mathematical Problems in Engineering
42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from
the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations
Departure rates of the mainline and on-ramp
119889119896
119895main = min1205821sdotV119895
119897119895
sdot (119902119896
119895main + 120579119895 sdot 119889119896
119895on sdot Δ119905 minus 119889119896
119895off sdot Δ119905) 1205822 sdot119908119895minus1
119897119895
sdot (119902max119895minus1main minus 119902
119896
119895minus1main minus 120579119895 sdot 119889119896
119895minus1on sdot Δ119905) 1205823
sdot 119889119896max119895main
119889119896
119895on =
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905119888119896
119894
Δ119905
if 119895 is metered on-ramp cell
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905
if 119895 is unmetered on-ramp cell
(5)
Conservation of the mainline and on-ramp
119902119896+1
119895main = 119902119896
119895main + Δ119905
sdot (119886119896
119895main + 119889119896
119895on minus 119889119896
119895main minus 119889119896
119895off)
119902119896+1
119895on = 119902119896
119895on + Δ119905 sdot (119886119896
119895on minus 119889119896
119895on)
(6)
where 119886119896119895main and 119889
119896
119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896
119895on and 119889119896119895on are the on-
ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is
the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896
119895off = 0) 119902119896
119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max
119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896
119895on and 119902max119895on denote the current (at step
119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896
119894is the metering rate for the on-ramp cell
of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow
allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending
parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study
For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896
119894main =
sum119869
119895=1119902119896
119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896
119894on = 119902119896
119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max
119894main =
sum119869
119895=1119902max119895main and 119902
max119894on = 119902
max119895on
43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as
119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by
119886119896119896+1
119894main = 119886119896119896+1
119895main =sum119873minus1
119899=0119886119896minus119899
119895main
119873 if 119895 is the boundary cell
119886119896119896+1
119894on = 119886119896119896+1
119895on =sum119873minus1
119899=0119886119896minus119899
119895on
119873 if 119895 is the on-ramp cell
119889119896119896+1
119895off =sum119873minus1
119899=0119889119896minus119899
119895off
119873 if 119895 is the off-ramp cell
(7)
where 119886119896119896+1119895main and 119886119896119896+1
119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1
119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894
5 Definition of RL Elements
Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
Valuepolicy
ExperienceModel
Acting
Model learning
Planning
Direct RL
Figure 1 Dyna-119876 architecture
Valuepolicy
Model
and estimated traffic arrival and
Experience
Direct RL Planning
Acting
Model learning
Age
nt ar
chite
ctur
eA
gent
coor
dina
tion
i i + 1 i + 2
j j + 1 j + 2 j + 3 j + 4 j + 5 j + 6 j + 7 j + 8
Agent i + 1Agent i Agent i + 2
middot middot middot middot middot middotmiddot middot middot middot middot middot middot middot middot
dkjoff dkj+3offdkj+6off
dkj+2ondkj+5on dkj+8on
akj+2onakj+5on akj+8on
dkjmain dkj+8maindkj+2main
akj+7mainakj+1main
akj+8main
Q-valueQk+1i+1 (s
ki+1 c
ki c
ki+1)
dkj+3main dkj+3off akj+5main
(ii) Info from i cki qkion d
kjmain d
kjoff
(i) Real traffic arrival and departure rates
cki qkion
dkjmain dkjoff
cki+1 qki+1on
dkj+3main dkj+3off
departure rates of section i and i + 1
(i) ACTM
(ii) Action probability p(cki | ski+1)
Figure 2 System architecture
Dyna-119876 can learn faster than 119876-learning in many situations[30]
Although a model is maintained in the Dyna-119876 archi-tecture the whole system is different from the model-basedcontrol method such as model predictive control (MPC)The model in Dyna-119876 architecture is a complementarycomponent which is used to speed up the learning processand simplify the coordination of agents The optimal controlactions are learnt from both real and simulated experienceWithout models the Dyna-119876 architecture is equivalent tothe 119876-learning technique and can still work as a model-freesystem MPC on the other hand is dependent on the modelwhich means it cannot work without models ThereforeDyna-119876 can be considered as a combination of model-freeand model-based method [27]
32 System Architecture Each agent in the motorway controlsystem is designed on the basis of Dyna-119876 architecture whichcontrols one prespecified motorway section
A simplified motorway segment is shown in Figure 2 foranalysisThis segment contains threemotorway sections (119894 119894+1 119894 + 2)with detectors located at boundaries Each motorwaysection is divided into a number of cells (119895 119895 + 1 119895 + 8)according to its layout and geometric features Generallythree kinds of cells exist in the motorway such as on-rampcells that are linked with on-ramps (119895 + 2 119895 + 5 119895 + 8) off-ramp cells linked with off-ramps (119895 119895 + 1 119895 + 6) and normalcells (119895 + 1 119895 + 4 119895 + 7) In this paper we define that eachmotorway section can have at most one on-ramp cell
The typical Dyna-119876 architecture presented in Figure 2 isdetailed for each agent here Take agent 119894 + 1 for example
Mathematical Problems in Engineering 5
Incident extent
Critical section
(a)
Flow
Density0
dmaxjmain
dInmaxjmain wj
wInj
120588InjC 120588jC 120588In
jJ120588jJ
j
Inj
(b)
Figure 3 Fundamental diagram during the incident
experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess
To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894
Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)
If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by
119876119896+1
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) = 119876
119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)
+ 120572[
[
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) + 120574max
119888119896+1
119894
sum
119888119896+1
119894minus1
119901 (119888119896+1
119894minus1| 119904119896+1
119894)
sdot 119876119896
119894(119904119896+1
119894 119888119896+1
119894minus1 119888119896+1
119894) minus 119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)]
]
119901 (119888119896+1
119894minus1| 119904119896+1
119894) =
count (119904119896+1119894 119888119896+1
119894minus1)
sum119888119894minus1isin119862119894minus1
count (119904119896+1119894 119888119894minus1)
(4)
where 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is the immediate reward obtained by
agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896
119894are actions
executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) and
119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) are the 119876 values for agent 119894 at step 119896 + 1 and
step 119896 respectively 119862119894minus1
is the action set of agent 119894 minus 1count(119904119896+1
119894 119888119896+1
119894minus1) returns the number of visits for state-action
pair (119904119896+1119894 119888119896+1
119894minus1) Thus 119901(119888119896+1
119894minus1| 119904119896+1
119894) is the probability for
agent 119894 minus 1 selecting action 119888119896+1119894minus1
at state 119904119896+1119894
Models and therelated symbols shown in Figure 2 will be specified in theflowing section
4 Modified Asymmetric CellTransmission Model
A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions
41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582
1 1205822 1205823isin [0 1])
to reflect this new dynamics These three parameters aredefined as 120582
1= VIn119895V119895 1205822= 119908
In119895119908119895 and 120582
3= 119889
Inmax119895main 119889
max119895main
V119895and 119908
119895are the free flow speed and congestion wave speed
of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn
119895
119908In119895 and 119889Inmax
119895main are these three variables during the incident120588119895119862
and 120588In119895119862
are the critical densities for normal and incidentsituations 120588
119895119869and 120588In
119895119869are the jam densities for normal and
incident situations
6 Mathematical Problems in Engineering
42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from
the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations
Departure rates of the mainline and on-ramp
119889119896
119895main = min1205821sdotV119895
119897119895
sdot (119902119896
119895main + 120579119895 sdot 119889119896
119895on sdot Δ119905 minus 119889119896
119895off sdot Δ119905) 1205822 sdot119908119895minus1
119897119895
sdot (119902max119895minus1main minus 119902
119896
119895minus1main minus 120579119895 sdot 119889119896
119895minus1on sdot Δ119905) 1205823
sdot 119889119896max119895main
119889119896
119895on =
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905119888119896
119894
Δ119905
if 119895 is metered on-ramp cell
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905
if 119895 is unmetered on-ramp cell
(5)
Conservation of the mainline and on-ramp
119902119896+1
119895main = 119902119896
119895main + Δ119905
sdot (119886119896
119895main + 119889119896
119895on minus 119889119896
119895main minus 119889119896
119895off)
119902119896+1
119895on = 119902119896
119895on + Δ119905 sdot (119886119896
119895on minus 119889119896
119895on)
(6)
where 119886119896119895main and 119889
119896
119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896
119895on and 119889119896119895on are the on-
ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is
the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896
119895off = 0) 119902119896
119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max
119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896
119895on and 119902max119895on denote the current (at step
119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896
119894is the metering rate for the on-ramp cell
of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow
allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending
parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study
For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896
119894main =
sum119869
119895=1119902119896
119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896
119894on = 119902119896
119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max
119894main =
sum119869
119895=1119902max119895main and 119902
max119894on = 119902
max119895on
43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as
119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by
119886119896119896+1
119894main = 119886119896119896+1
119895main =sum119873minus1
119899=0119886119896minus119899
119895main
119873 if 119895 is the boundary cell
119886119896119896+1
119894on = 119886119896119896+1
119895on =sum119873minus1
119899=0119886119896minus119899
119895on
119873 if 119895 is the on-ramp cell
119889119896119896+1
119895off =sum119873minus1
119899=0119889119896minus119899
119895off
119873 if 119895 is the off-ramp cell
(7)
where 119886119896119896+1119895main and 119886119896119896+1
119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1
119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894
5 Definition of RL Elements
Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
Incident extent
Critical section
(a)
Flow
Density0
dmaxjmain
dInmaxjmain wj
wInj
120588InjC 120588jC 120588In
jJ120588jJ
j
Inj
(b)
Figure 3 Fundamental diagram during the incident
experience consists of traffic arrival and departure ratesobserved from the detectors of motorway section 119894 + 1as well as the information received from agent 119894 which isapplied to improve models In the model component twomodels are maintained An asymmetric cell transmissionmodel (ACTM) with estimated traffic arrival and departurerates is used to simulate the traffic flow dynamics in relevantmotorway sections A probability model of action selectionof agent 119894 at the current state is updated for further planningprocess
To reduce the complexity of MARL like many realapplications some conventions are used to restrict the actionselection of an agent [28] Specifically in our design eachagent only communicates with its spatial neighbours Forinstance agent 119894 + 1 receives the control action and trafficinformation from agent 119894 and sends its own informationto agent 119894 + 2 For the case shown in Figure 2 we assumemotorway section 119894 is the critical section where an incidentoccurs In this situation agent 119894 plays a more important rolethan other agents for dealing with incidents Agent 119894 canbe considered as the chief controller that makes decisionsaccording to its own knowledge about the traffic and incidentsituations Other agents should regulate their control policiesbased on the reaction of agent 119894
Therefore two 119876 values are defined for two kinds ofagents Ifmotorway section 119894 is the critical section the119876 valueof agent 119894 is only related to its own state and action spacewhich can be updated by the same equation denoted by (2)
If motorway section 119894 is the normal section withoutincidents the 119876 value of agent 119894 can be calculated by
119876119896+1
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) = 119876
119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)
+ 120572[
[
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) + 120574max
119888119896+1
119894
sum
119888119896+1
119894minus1
119901 (119888119896+1
119894minus1| 119904119896+1
119894)
sdot 119876119896
119894(119904119896+1
119894 119888119896+1
119894minus1 119888119896+1
119894) minus 119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894)]
]
119901 (119888119896+1
119894minus1| 119904119896+1
119894) =
count (119904119896+1119894 119888119896+1
119894minus1)
sum119888119894minus1isin119862119894minus1
count (119904119896+1119894 119888119894minus1)
(4)
where 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is the immediate reward obtained by
agent 119894 at time step 119896 when actions 119888119896119894minus1 119888119896
119894are actions
executed by agent 119894 minus 1 and 119894 Similarly 119876119896+1119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) and
119876119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) are the 119876 values for agent 119894 at step 119896 + 1 and
step 119896 respectively 119862119894minus1
is the action set of agent 119894 minus 1count(119904119896+1
119894 119888119896+1
119894minus1) returns the number of visits for state-action
pair (119904119896+1119894 119888119896+1
119894minus1) Thus 119901(119888119896+1
119894minus1| 119904119896+1
119894) is the probability for
agent 119894 minus 1 selecting action 119888119896+1119894minus1
at state 119904119896+1119894
Models and therelated symbols shown in Figure 2 will be specified in theflowing section
4 Modified Asymmetric CellTransmission Model
A first-order macroscopic traffic flow model named asym-metric cell transmission model (ACTM) is applied as one ofthe models in the Dyna-119876 architectureThis model is derivedfrom the widely used cell transmission model (CTM) [36]and has been used for ramp control problems [11 37] In thispaper we modify ACTM to incorporate the traffic dynamicsunder incident conditions
41 Traffic Dynamics during the Incident As shown inFigure 3(a) when an incident happens in the critical sectionone or more lanes of the motorway will be blocked accordingto the incident extent Because of the lane blockage incidentmay reduce the normal road capacity and spatial storagespace which will produce a new relationship between trafficflow and road density that is fundamental diagrampresentedin Figure 3(b) As suggested by [38] additional parameterscan be used to regulate fundamental diagram for incident sit-uations We introduce three parameters (120582
1 1205822 1205823isin [0 1])
to reflect this new dynamics These three parameters aredefined as 120582
1= VIn119895V119895 1205822= 119908
In119895119908119895 and 120582
3= 119889
Inmax119895main 119889
max119895main
V119895and 119908
119895are the free flow speed and congestion wave speed
of cell 119895 119889max119895main is the maximum departure flow of cell 119895 VIn
119895
119908In119895 and 119889Inmax
119895main are these three variables during the incident120588119895119862
and 120588In119895119862
are the critical densities for normal and incidentsituations 120588
119895119869and 120588In
119895119869are the jam densities for normal and
incident situations
6 Mathematical Problems in Engineering
42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from
the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations
Departure rates of the mainline and on-ramp
119889119896
119895main = min1205821sdotV119895
119897119895
sdot (119902119896
119895main + 120579119895 sdot 119889119896
119895on sdot Δ119905 minus 119889119896
119895off sdot Δ119905) 1205822 sdot119908119895minus1
119897119895
sdot (119902max119895minus1main minus 119902
119896
119895minus1main minus 120579119895 sdot 119889119896
119895minus1on sdot Δ119905) 1205823
sdot 119889119896max119895main
119889119896
119895on =
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905119888119896
119894
Δ119905
if 119895 is metered on-ramp cell
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905
if 119895 is unmetered on-ramp cell
(5)
Conservation of the mainline and on-ramp
119902119896+1
119895main = 119902119896
119895main + Δ119905
sdot (119886119896
119895main + 119889119896
119895on minus 119889119896
119895main minus 119889119896
119895off)
119902119896+1
119895on = 119902119896
119895on + Δ119905 sdot (119886119896
119895on minus 119889119896
119895on)
(6)
where 119886119896119895main and 119889
119896
119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896
119895on and 119889119896119895on are the on-
ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is
the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896
119895off = 0) 119902119896
119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max
119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896
119895on and 119902max119895on denote the current (at step
119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896
119894is the metering rate for the on-ramp cell
of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow
allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending
parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study
For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896
119894main =
sum119869
119895=1119902119896
119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896
119894on = 119902119896
119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max
119894main =
sum119869
119895=1119902max119895main and 119902
max119894on = 119902
max119895on
43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as
119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by
119886119896119896+1
119894main = 119886119896119896+1
119895main =sum119873minus1
119899=0119886119896minus119899
119895main
119873 if 119895 is the boundary cell
119886119896119896+1
119894on = 119886119896119896+1
119895on =sum119873minus1
119899=0119886119896minus119899
119895on
119873 if 119895 is the on-ramp cell
119889119896119896+1
119895off =sum119873minus1
119899=0119889119896minus119899
119895off
119873 if 119895 is the off-ramp cell
(7)
where 119886119896119896+1119895main and 119886119896119896+1
119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1
119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894
5 Definition of RL Elements
Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
42 Modified ACTM Given three incident-related parame-ters the traffic dynamics in each cell can be derived from
the fundamental diagram illustrated in Figure 3(b) and rep-resented by the following equations
Departure rates of the mainline and on-ramp
119889119896
119895main = min1205821sdotV119895
119897119895
sdot (119902119896
119895main + 120579119895 sdot 119889119896
119895on sdot Δ119905 minus 119889119896
119895off sdot Δ119905) 1205822 sdot119908119895minus1
119897119895
sdot (119902max119895minus1main minus 119902
119896
119895minus1main minus 120579119895 sdot 119889119896
119895minus1on sdot Δ119905) 1205823
sdot 119889119896max119895main
119889119896
119895on =
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905119888119896
119894
Δ119905
if 119895 is metered on-ramp cell
min
(119902119896
119895main + 119886119896
119895on sdot Δ119905)
Δ119905 120578119895sdot(119902
max119895main minus 119902
119896
119895main)
Δ119905
if 119895 is unmetered on-ramp cell
(5)
Conservation of the mainline and on-ramp
119902119896+1
119895main = 119902119896
119895main + Δ119905
sdot (119886119896
119895main + 119889119896
119895on minus 119889119896
119895main minus 119889119896
119895off)
119902119896+1
119895on = 119902119896
119895on + Δ119905 sdot (119886119896
119895on minus 119889119896
119895on)
(6)
where 119886119896119895main and 119889
119896
119895main are the mainline arrival and depar-ture rates for the cell 119895 at step 119896 119886119896
119895on and 119889119896119895on are the on-
ramp arrival and departure rates in cell 119895 at step 119896 119889119896119895off is
the off-ramp departure rate for cell 119895 at step 119896 (if cell 119895 isnot an off-ramp cell 119889119896
119895off = 0) 119902119896
119895main represent the numberof vehicles on the mainline of cell 119895 at step 119896 119902max
119895main is themaximumnumber of this value limited by themainline spaceof cell 119895 Similarly 119902119896
119895on and 119902max119895on denote the current (at step
119896) and maximum number of vehicles in the on-ramp of cell119895 respectively Δ119905 (min) is the time duration between eachtwo time steps 119888119896
119894is the metering rate for the on-ramp cell
of the 119894th motorway section at step 119896 120578119895isin [0 1] is the flow
allocation parameter of cell 119895 120579119895isin [0 1] is the flow blending
parameter of traffic flow from the on-ramp to the mainlineof cell 119895 The unit of all the arrival and departure rates ismodified to vehmin in this study
For motorway section 119894 with 119869 cells the number ofvehicles in the mainline can be calculated by 119902119896
119894main =
sum119869
119895=1119902119896
119895main while the number of vehicles in the on-rampof motorway section 119894 is presented by 119902119896
119894on = 119902119896
119895on In thisway the maximum number of vehicles in the mainline andon-ramp of motorway section 119894 is presented by 119902max
119894main =
sum119869
119895=1119902max119895main and 119902
max119894on = 119902
max119895on
43 Estimation of Arrival and Departure Rates Arrival ratesof the boundary cells in each motorway section (such as
119895 + 2 119895 + 5 and 119895 + 8) and all the on-ramps as well as thedeparture rates of off-ramps are inputs of the ACTM for eachplanning step between two real control steps Considering theshort time of planning process (10 steps) we assume theserates can remain stable during the planning and are estimateddirectly from the recent flow data collected from detectorsThe method described by Wang [16] is used here to do theestimation which simply averages themost recently observeddata to get the predicted flow rates In our model we usethe flow data collected from the last 119873 time steps (119873 = 5)Therefore these three rates can be calculated by
119886119896119896+1
119894main = 119886119896119896+1
119895main =sum119873minus1
119899=0119886119896minus119899
119895main
119873 if 119895 is the boundary cell
119886119896119896+1
119894on = 119886119896119896+1
119895on =sum119873minus1
119899=0119886119896minus119899
119895on
119873 if 119895 is the on-ramp cell
119889119896119896+1
119895off =sum119873minus1
119899=0119889119896minus119899
119895off
119873 if 119895 is the off-ramp cell
(7)
where 119886119896119896+1119895main and 119886119896119896+1
119895on are the estimated arrival rates ofmainline and on-ramp of cell 119895 for the planning step betweenreal step 119896 and 119896+1 119889119896119896+1
119895off is the estimated off-ramp departurerate of cell 119895 If cell 119895 is the boundary cell of motorway section119894 the arrival or departure rate of this cell is also the arrival ordeparture rate of motorway section 119894
5 Definition of RL Elements
Except for the architecture and models defined in Section 3three basic elements environment state control action andreward function should be specified to form a RL problem
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
This section details these three elements and the relevantalgorithm
51 Environment State Environment states of a motorwaysection are composed of mainline states and on-ramp statesThe samemethodmentioned in [27 29] is used here to obtainthe state space Generally for the mainline of motorwaysection 119894 the number of vehicles ranges from 0 to themaximum number 119902max
119894main which is uniformly divided into119899119894intervals Each interval represents a state of the mainline
Therefore each mainline section can be represented by astate set 119878
119894main with 119899119894states Similarly on-ramp traffic is
represented by a state set 119878119894on with 119898
119894states according to
the maximum number of vehicles 119902max119894on 119899119894 and 119898119894 should
be adjusted for different motorway sections according to thesection length In this way if motorway section 119894 is the crit-ical section the external traffic environment is representedby
119878119894= 119878119894main times 119878119894on 119904
119896
119894isin 119878119894
(8)
which contains 119899119894sdot 119898119894states At each time step a state 119904119896
119894will
be selected from 119878119894as the environment state If motorway
section 119894 is a normal section state sets of its neighbouragent should be incorporatedThus traffic state is representedby
119878119894= 119878119894main times 119878119894on times 119878119894minus1main times 119878119894minus1on 119904
119896
119894isin 119878119894
(9)
which contains 119899119894sdot 119898119894sdot 119899119894minus1sdot 119898119894minus1
states
52 Control Action In a ramp control problem the aim of thecontrol action is to regulate the number of vehicles enteringmainline in each control step Similar to [29] we adoptflow control as the control action which can be presentedby an action set 119862 = 4 6 8 10 12 14 16 18 20 with 9flow rates between the minimum (4 vehmin) and maximum(20 vehmin) values
Exploitation and exploration are two basic behaviours ofthe RL agent Exploitation means the agent takes the controlaction that can get the most rewards from the previousexperience Exploration instead means the agent tries newactions with less rewards In order to balance these twobehaviours we use the 120576-greedy policy to select controlactions [30] Specifically this policy takes a random actionwith probability 120576 and chooses the greedy action (with themaximum119876 value)with probability 1minus120576 for each control step
The action selection probability can be formally expressedas
119901 (119888119896
119894| 119904119896
119894)
=
1 minus 120576 if 119888119896119894= arg max119888119896
119894
(119876119896minus1(119904119896
119894 119888119896
119894))
120576 otherwise
(10)
53 Reward Function Reward function is used to calculatethe immediate reward after executing a specific action at eachtime step which guides the agent to achieve its objectiveConsidering a common objective of traffic control system(ie minimising total travel time) we define our reward toguide the agent to minimise total time spent (TTS) throughlearning process
TTS is defined as the total time spent by vehicles in thenetwork during a period of time For our case TTS can beobtained from the following equation
TTS = Δ119905 sdot119870
sum
119896=0
(119902119896
119894main + 119902119896
119894on) (11)
In the above equation Δ119905 is a fixed value thereforeminimising TTS is equivalent to minimising the number ofvehicles on the networksum119870
119896=0(119902119896
119894main +119902119896
119894on) To minimise thisvalue the reward function defined here is composed of twonegative rewards used to indicate penalties for vehicles on themainline and on-ramp The formal reward function at step 119896is defined according to two situations
(1) Motorway Section 119894 Is the Critical Section Consider
119877119896
119894(119904119896
119894 119888119896
119894)
=
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus1 otherwise
(12)
where 119877119896119894(119904119896
119894 119888119896
119894) is the immediate reward for agent 119894 in state
119904119896
119894when executing action 119888119896
119894at control step 119896 119902max
119894main and 119902max119894on
are used to normalise the number of vehicles onmainline andon-ramp which guarantees that 119877119896
119894(119904119896
119894 119888119896
119894) isin [minus1 0]
(2) Motorway Section 119894 Is Not the Critical Section Here a newnegative reward is introduced to maintain the system equitythat is to make sure that the on-ramp queues and relatedtravel times at different on-ramps should be close to eachother
119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) =
minus119902119896
119894main + 119902119896
119894on
119902max119894main + 119902
max119894on
minus
10038161003816100381610038161003816119902119896
119894on minus 119902119896
119894minus1on10038161003816100381610038161003816
max (119902max119894on 119902
max119894minus1on)
if 119902119896119894main lt 119902
max119894main 119902
119896
119894on lt 119902max119894on
minus2 otherwise
(13)
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
For each agent 119894 and episode do
119871 larr997888 CEIL( IncidentDurationΔ119905
)
IF 119894 is the critical sectionInitialise 1198770
119894(119904119894 119888119894) 1198760119894(119904119894 119888119894)
ELSEInitialise 1198770
119894(119904119894 119888119894minus1 119888119894) 1198760119894(119904119894 119888119894minus1 119888119894) 1198750119894(119904119894 119888119894minus1)
For each control step 119896 isin 119870 do (Loop I)(i) get detected data from each cell 119895 119886119896
119895main 119889119896
119895main 119889119896
119895off 119886119896
119895on 119889119896
119895on(ii) get state 119904119896
119894through (8) and (9)
(iii) get action 119888119896119894by 120576-greedy policy (10)
(iv) get 119902119896119895main 119902
119896
119895on through (6) and do 119902119896119894main larr sum
119869
119895=1119902119896
119895main 119902119896
119894on larr 119902119896
119895onIF 119894 is the critical sectionupdate 119877119896
119894(119904119896
119894 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894) through (2) and (12)
ELSE update 119877119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119876119896119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) 119901(119888119896119894minus1| 119904119896
119894) through (4) and (13)
IF 119904119896119894= 119904
initial and 119896 + 1 ge 119871 end the algorithmELSE get 119886119896119896+1
119894main 119886119896119896+1
119894on 119889119896119896+1119895off by (7) and do 119897 larr 119896 119904119897
119894larr 119904119896
119894 119902119897119895main larr 119902
119896
119895main 119902119897
119895on larr 119902119896
119895on and start loop IIFor each planning step 119897 isin 119871 do (Loop II)(i) generate flow rates for each cell 119895 119889119897
119895main119889119897
119895on through (5)(ii) get the state 119904119897
119894
(iii) get 119902119897119895main 119902
119897
119895on and do 119902119897119894main larr sum
119869
119895=1119902119897
119895main 119902119897
119894on larr 119902119897
119895on(iv) get action 119888119897
119894by 120576-greedy policy
IF 119894 is the critical sectionupdate 119877119897
119894(119904119897
119894 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894)
ELSE update 119877119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119876119897119894(119904119897
119894 119888119897
119894minus1 119888119897
119894) 119901(119888119896119894minus1| 119904119896
119894)
IF (119897 = 119896 + 9) or (119904119897119894= 119904
initial and 119897 + 1 ge 119871) go back to loop IELSE repeat loop IIEndFor
EndForEndFor
Algorithm 1 Algorithm for Dyna-MARL
Compared to (12) a new term |119902119896
119894on minus 119902119896
119894minus1on|max(119902max
119894on 119902max119894minus1on) is added into (13) which is a penalty
for on-ramp queue difference in motorway section 119894 and119894 minus 1 As two adjacent agents cooperated in this situation119877119896
119894(119904119896
119894 119888119896
119894minus1 119888119896
119894) is related to two control actions 119888119896
119894minus1and
119888119896
119894 max(sdot sdot) returns the maximum value of two given
parameters which is used for normalisation
54 Description of the Algorithm Based on the Dyna-119876architecture andRL elements defined in previous subsectionsan algorithm Dyna-MARL is developed and described inthis subsection Two main loops corresponding to direct RLand planning shown in Figure 1 are detailed in Dyna-MARLBetween two real control steps in loop I 10 planning stepswill be run in loop II The pseudocode of Dyna-MARL canbe seen from Algorithm 1
An episode in Dyna-MARL represents a control cyclewhich starts from incident occurrence and terminates whenthe traffic state returns to initial state 119904initial that is the trafficstate before the incident occurrence Incident duration isassumed to be known in advance
6 Case Study and Results
One of the metered motorway segments (southbound direc-tion) of M6 in the UK is chosen for the case study Thissegment is between junction 21A (J21A) and junction 25(J25) with an approximate length of 124 km (see Figure 4)Making the noncontrolled (NC) situation as the base linewe designed a series of experiments to compare the pro-posed Dyna-MARL algorithm with Isolated RL (119876-learningwithout coordination) Experiments and relevant results aredescribed as follows
61 Partitions of the Test Segment The test motorway seg-ment with a three-lane mainline three metered on-rampsand five off-ramps is simulated by AIMSUN [39] which isa microscopic traffic simulation package According to thedetectors location and road layout the whole segment isdivided into three sections Each section contains a meteredon-ramp Motorway section 3 is divided into 4 cells andmotorway sections 2 and 3 are both divided into 3 cellsThe partitions of each section can be seen from Figure 5According to the section length the maximum number of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 9
O
D
J25
J24
J23
J21A
J22
D1 (to M62)D2 (to M62)
D3 (to A579)
D4 (to A580)
D5 (to A58)
O3 (from A49)
O2 (from A580)
O1 (from A579)
Figure 4 Test motorway segment of M6
J21A J22 J23 J25J24
D O
Section 1 Section 2 Section 3
1 1 11 18 151515 13 1 1
13 16 07 17210743
Section boundaryCell boundary
c b a
Road section length (unit km)Flow direction
D1 D2 D3 D4 D5O1 O2 O3
Figure 5 Partitions of test segment
vehicles in each mainline section and on-ramps is as follows119902max1main = 1860 119902
max2main = 2880 119902
max3main = 2880 119902
max1on = 108
119902max2on = 90 and 119902
max3on = 120
62 Real Data Source Real detector data collected from 17loop detectors located in the motorway segment (includingboth mainline and on-off-ramps) are used for case studywhich can be extracted from Traffic Information System(HATRIS) [40] These traffic count data are averaged fromApril 2012 to March 2013 with 15-minute intervals Onlyworking day data (from Monday to Friday) are used due tothe dramatic reduction of traffic load in weekends Some ofthe detector data collected frommainline and three on-rampsare presented in Figure 6 from which we can see that twopeak periods including AM peak period (around 070000ndash090000) and PM peak period (around 160000ndash180000)exist during the daily traffic operation
In the test site ramp metering only works at peak hoursMeanwhile it is valuable to test the performance of theproposed algorithm in the high demand situation If it canwork under the high traffic load it should be also useful forcommon situations Therefore AM peak period with heavy
Section 3 Section 2 Section 1
On-ramp 3 On-ramp 2 On-ramp 1
0
200
400
600
800
1000
1200
15-m
inut
e tra
ffic c
ount
060000 120000 180000 000000000000Time of day
Figure 6 Real averaged traffic data
traffic load is considered for case study Specifically we use theaveraged traffic data during AM peak period collected fromTRADS to estimate OD (origins and destinations) matrixfor the simulation A model proposed in [41] is adopted byAIMSUN to do the estimationwhere the number of iterations
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
10 Mathematical Problems in Engineering
Table 1 ODmatrix estimated
Originsdestinations D D5 D4 D3 D2 D1 TotalsO 2089 375 686 728 1169 771 5818O3 875 65 212 193 117 46 1507O2
886 0 0 61 315 216 1477O1
824 0 0 0 292 226 1343Totals 4675 440 898 981 1893 1258 10146
Table 2 Parameters for ACTM
Parameter 119889maxmain V 119908 120579 120578 120582
112058221205823
Value 6300 vehh 107 kmh 116 kmh 05 016 055 09 06
is set as 1000 to get convergence Table 1 shows the ODmatrix estimated from real traffic data
63 Incident Scenarios Considering the difficulty of captur-ing real incident data we simulate some incident scenariosin AIMSUN To make each ramp meter work during theincident the incident is located near the most downstreammotorway section that is motorway section 1 Thereforethree incident scenarios A B andC are designed correspond-ing to three different incident locations in a b and c (asillustrated in Figure 5) respectively
The simulation experiment lasts for one and a half hoursfrom 070000 to 083000 during AM peak period After 30-minute normal operation (for warm-up) the incident is trig-gered at 073000 and lasts for 30 minutes In the preliminaryexperiments designed in this paper the incident with onelane blocked is considered Parameters introduced here canalso be regulated for multiple lane-blockage situations Theincident extent is 50 meters which is assumed to be constantduring the incident
Learning-related parameters are set as typical values [30]that is 120572 is 02 120574 is 08 and 120576 is 01 Other parameters relatedto ACTM are calibrated and summarised in Table 2 All thecells have the same 120579 and 120578
64 Results The comparison of Dyna-MARL Isolated RLand NC is conducted from three aspects density evolutionsome general indicators and the system equity The experi-mental results are described as follows
(1) Density Evolution We can see from Figure 7 that fourdense areas exist during the traffic operation Three of themnear on-ramp entrances (motorway length around 05 km5 km and 10 km) are caused by heavy traffic loads from on-ramps The dense area close to the segment end forms due tothe incident
In scenario A incident location is close to on-ramp 1(O1) Without control this incident leads to sever congestion
which blocks on-ramp 1 and propagates to motorway section2 (around 9 km in Figure 7(a)) Under this scenario IsolatedRL cannot alleviate incident-induced congestion effectively
(see Figure 7(b)) In the beginning of congestion formulationwithout coordination only the nearest ramp controller reactsto the congestion Because of the space limit of on-rampone ramp controller is insufficient to dissolve this congestionthat still propagates to motorway section 2 Dyna-MARL onthe other hand coordinates all three ramp controllers andmakes full use of the storage space of three on-ramps todeal with incident-induced congestion In this way mainlinecongestion can be restricted in a smaller area and will notpropagate to motorway section 2 (see Figure 7(c))
For scenarios B and C incidents are near the motorwayend and far from on-ramp 1 Without blocking on-ramp1 incidents do not lead to sever congestion Under suchcircumstances both Isolated RL and Dyna-MARL work wellon easing congestion in the mainline As shown in Figures7(e)ndash7(i) compared with the NC situation both Isolated RLand Dyna-MARL can restrict the congestion in a small rangenear the on-ramp entrances
(2) General Indicators In this comparison some general indi-cators including total travel time (should be reduced) totalthroughput (should be improved) and total CO
2emission
(should be reduced) are used to show how the proposedsystem can benefit road users These indicators are widelyused in the transport community to test the performance ofnewly developed traffic control systems
As shown in Figure 8(a) comparedwith theNC situationboth Isolated RL and Dyna-MARL can reduce the totaltravel time of road users in all three scenarios SpecificallyIsolated RL decreases total travel time by up to 62 whileDyna-MARL achieves a maximum reduction of 122 (seeFigure 8(d))The comparison of total throughput is presentedin Figure 8(b) Dyna-MARL can improve the total through-put by up to 23 (see Figure 8(d)) which outperforms Iso-lated RL in all three scenarios In scenario B Isolated RL evenfails to improve the total throughput For the comparison oftotal CO
2emission (shown in Figure 8(c)) both Isolated RL
and Dyna-MARL achieve their best performance in scenarioBwith a reduction of 47and46 respectively In scenariosA and C Dyna-MARL has a much better performance thanIsolated RL
Through the above comparison we can see that Dyna-MARL outperforms Isolated RL for almost all the scenariosand indicators
(3) System Equity Although the general indicators presentedin comparison (2) have shown their effectiveness on testingthe performance of different systems they cannot measurethe issue of system equity which is also an important aspectof the system performance In this paper we only considerthe spatial equity issue that is defined as a measurementof equity of user delays on different on-ramps [42] In thisstudy we assume the road users from all three on-ramps havethe same importance If all users from different on-rampscan experience the similar travel time the control system isdefined as an equitable system This term is used to measurethe system equity that is a large queue difference leads toa highly inequitable system In [43] the variance of traveltime on different on-ramps is used as an indicator to measure
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 11
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
0
50
100
150
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(a)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(b)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(c)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(d)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(e)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(f)
Figure 7 Continued
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
12 Mathematical Problems in Engineering
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(g)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(h)
0
50
100
150
020
4060
80100
03
69
1215
Motorway length (km) Absolute tim
e (min)
Mot
orw
ay d
ensit
y (v
ehk
mla
ne)
20 40 60 80 100
Density (vehkmlane)
(i)
Figure 7 Density profiles for (a) NC in scenario A (b) Isolated RL in scenario A (c) Dyna-MARL in scenario A (d) NC in scenario B (e)Isolated RL in scenario B (f) Dyna-MARL in scenario B (g) NC in scenario C (h) Isolated RL in scenario C and (i) Dyna-MARL in scenarioC
system equity Similar to [43] for the sake of comparison thestandard deviation is considered in our caseThis indicator isdefined as
SD (119896) = radicsum119899
119894=1[119905119896
minus 119905119896
119894]2
119899
(14)
where SD(119896) is the standard deviation of travel time ofdifferent on-ramps at time step 119896 119905119896
119894is the estimated total
travel time of on-ramp 119894 at step 119896 119905119896 is the averaged total traveltime of 119899 on-ramps at step 119896
Results about the comparison of system equity can beseen from Figure 9 For the NC situation good equity can
be maintained due to no restrictions of entering vehiclesin scenarios B and C (as shown in Figures 9(b) and 9(c))However when one of the on-ramp entrances is blocked bythe congestion in scenario A a long queue forms and leadsto imbalance and resultant inequity for users on differenton-ramps (see Figure 9(a)) For controlled cases IsolatedRL performs poorly in all scenarios This is because theramp controller near congestion takes much more restrictedmeasures than other controllers on the controlled trafficBecause of the coordination strategy Dyna-MARL out-performs Isolated RL on maintaining system equity in allscenarios especially during the incident (from 073000 to080000)
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 13
900
1200
1500
1800To
tal t
rave
l tim
e (h)
B CA
NCIsolated RLDyna-MARL
Scenario
(a)
7000
7500
8000
8500
9000
9500
10000
Tota
l thr
ough
put (
veh)
B CA
NCIsolated RLDyna-MARL
Scenario
(b)
NCIsolated RLDyna-MARL
14
15
16
17
18
19
2
Tota
l CO
2em
issio
n (k
g)
B CAScenario
times104
(c)
Totalthroughput
Totaltravel time emission
minus3
0
4
8
12
16Re
duct
ion
from
NC
()
B C A B C A B CA
Total CO2
NCIsolated RLDyna-MARL
Scenario
(d)
Figure 8 Comparison of general measures for different scenarios
7 Conclusions and Future Work
A Dyna-119876 based multiagent reinforcement learning methodreferred to as Dyna-MARL for motorway ramp control hasbeen developed in this paper Dyna-MARL is comparedwith Isolated RL (119876-learning without coordination) andnoncontrolled situation under the simulation environmentReal traffic data collected from a metered motorway segmentin the UK are used to form the simulation
Through a series of simulation-based experiments wecan conclude the following (1) Isolated RL can improvethe motorway performance in terms of increasing totalthroughput reducing total travel time and CO
2emission but
this improvement is at the expense of poor system equity ondifferent on-ramps (2) with a suitable coordination strategy
much higher system equity can be achieved by Dyna-MARL (3) in addition to the system equity Dyna-MARLoutperforms Isolated RL in almost all scenarios regardingall indicators which means Dyna-MARL can deal with thenetwork-wide problems effectively
Although the simulation tests have shown some positiveresults regarding the performance of Dyna-MARL a simpli-fied incident scenario with fixed duration is considered inthe current work In the practical situation incident durationis highly unstable and affected by a number of factorssuch as weather conditions road conditions and arrivingtime of the incident management team Therefore incidentduration should be considered as an uncertainty which willbe investigated in our future work
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
14 Mathematical Problems in Engineering
NCIsolated RLDyna-MARL
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000Time of day
(a)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(b)
0
2
4
6
8
10
Stan
dard
dev
iatio
n (h
)
073000 080000 083000070000
NCIsolated RLDyna-MARL
Time of day
(c)
Figure 9 Standard deviation for different scenarios (a) scenario A (b) scenario B and (c) scenario C
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 15
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This paper is supported by China Scholarship Council andUniversity of Leeds (CSC-University of Leeds scholarship)and partially supported by the National Natural ScienceFoundation of China (Grant nos 91420203 and 61271376)The authors would like to thank the institutions that supportthis study
References
[1] M Papageorgiou and A Kotsialos ldquoFreeway ramp meteringan overviewrdquo IEEE Transactions on Intelligent TransportationSystems vol 3 no 4 pp 271ndash281 2002
[2] A Skabardonis P Varaiya andK F Petty ldquoMeasuring recurrentand nonrecurrent traffic congestionrdquo Transportation ResearchRecord vol 1856 pp 118ndash124 2003
[3] M Papageorgiou H Hadj-Salem and J-M BlossevilleldquoALINEA a local feedback control law for on-ramp meteringrdquoJournal of the Transportation Research Board vol 1320 pp58ndash64 1991
[4] E Smaragdis and M Papageorgiou ldquoSeries of new local rampmetering strategiesrdquo Transportation Research Record vol 1856pp 74ndash86 2003
[5] L N Jacobson K C Henry and O Mehyar ldquoReal-time meter-ing algorithm for centralized controlrdquo Transportation ResearchRecord vol 1232 pp 17ndash26 1989
[6] G Paesani J Kerr P Perovich and F Khosravi ldquoSystemwide adaptive ramp metering (SWARM)rdquo in Proceedings of the7th ITS America Annual Meeting and Exposition Merging theTransportation and Communications Revolutions WashingtonDC USA June 1997
[7] R Lau Ramp Metering by ZonemdashThe Minnesota AlgorithmMinnesota Department of Transportation 1997
[8] H M Zhang and W W Recker ldquoOn optimal freeway rampcontrol policies for congested traffic corridorsrdquo TransportationResearch Part BMethodological vol 33 no 6 pp 417ndash436 1999
[9] A Kotsialos M Papageorgiou and F Middelham ldquoOptimalcoordinated ramp metering with advanced motorway optimalcontrolrdquo Transportation Research Record no 1748 pp 55ndash652001
[10] A Hegyi B De Schutter and H Hellendoorn ldquoModel predic-tive control for optimal coordination of ramp metering andvariable speed limitsrdquo Transportation Research C EmergingTechnologies vol 13 no 3 pp 185ndash209 2005
[11] G Gomes and R Horowitz ldquoOptimal freeway ramp meteringusing the asymmetric cell transmission modelrdquo TransportationResearch Part C Emerging Technologies vol 14 no 4 pp 244ndash262 2006
[12] A H F Chow and Y Li ldquoRobust optimization of dynamicmotorway traffic via ramp meteringrdquo IEEE Transactions onIntelligent Transportation Systems vol 15 no 3 pp 1374ndash13802014
[13] RW Hall ldquoNon-recurrent congestion how big is the problemAre traveler information systems the solutionrdquo TransportationResearch Part C vol 1 no 1 pp 89ndash103 1993
[14] P Prevedouros B Halkias K Papandreou and P KopeliasldquoFreeway incidents in the United States United Kingdom andAttica Tollway Greece characteristics available capacity andmodelsrdquo Transportation Research Record vol 2047 pp 57ndash652008
[15] T L Greenlee and H J Payne ldquoFreeway ramp meteringstrategies for responding to incidentsrdquo in Proceedings of theIEEE Conference on Decision and Control including the 16thSymposium on Adaptive Processes and a Special Symposium onFuzzy Set Theory and Applications pp 987ndash992 New OrleansLA USA December 1977
[16] M H Wang Optimal ramp metering policies for nonrecurringcongestion with uncertain incident duration [PhD thesis] Pur-due University West Lafayette Ind USA 1994
[17] J-B Sheu ldquoStochastic modeling of the dynamics of incident-induced lane traffic states for incident-responsive local rampcontrolrdquo Physica A Statistical Mechanics and its Applicationsvol 386 no 1 pp 365ndash380 2007
[18] J-B Sheu and M-S Chang ldquoStochastic optimal-controlapproach to automatic incident-responsive coordinated rampcontrolrdquo IEEE Transactions on Intelligent Transportation Sys-tems vol 8 no 2 pp 359ndash367 2007
[19] C Jacob and B Abdulhai ldquoMachine learning for multi-jurisdictional optimal traffic corridor controlrdquo TransportationResearch Part A Policy and Practice vol 44 no 2 pp 53ndash642010
[20] M Davarynejad A Hegyi J Vrancken and J van den BergldquoMotorway ramp-metering control with queuing considerationusing Q-learningrdquo in Proceedings of the 14th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo11) pp1652ndash1658 IEEE Washington DC USA October 2011
[21] K Rezaee B Abdulhai and H Abdelgawad ldquoApplication ofreinforcement learning with continuous state space to rampmetering in real-world conditionsrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems (ITSC rsquo12) pp 1590ndash1595 IEEE Anchorage AlaskaUSA September 2012
[22] C Jacob and B Abdulhai ldquoAutomated adaptive traffic corridorcontrol using reinforcement learning approach and case stud-iesrdquo Transportation Research Record vol 1959 pp 1ndash8 2006
[23] K Rezaee B Abdulhai and H Abdelgawad ldquoSelf-Learningadaptive rampmetering analysis of design parameters on a testcase in Toronto Canadardquo Transportation Research Record vol2396 pp 10ndash18 2013
[24] X-J Wang X-M Xi and G-F Gao ldquoReinforcement learn-ing ramp metering without complete informationrdquo Journal ofControl Science and Engineering vol 2012 Article ID 208456 8pages 2012
[25] A Fares and W Gomaa ldquoMulti-agent reinforcement learningcontrol for ramp meteringrdquo in Progress in Systems Engineeringvol 330 of Advances in Intelligent Systems and Computing pp167ndash173 Springer Basel Switzerland 2015
[26] K Veljanovska K M Bombol and T Maher ldquoReinforcementlearning technique in multiple motorway access control strat-egy designrdquo PROMET-Traffic amp Transportation vol 22 no 2pp 117ndash123 2010
[27] C LuHChen and SGrant-Muller ldquoAn indirect reinforcementlearning approach for ramp control under incident-inducedcongestionrdquo in Proceedings of the 16th International IEEEConference on Intelligent Transportation Systems (ITSC rsquo13) pp979ndash984 IEEE The Hague The Netherlands October 2013
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
16 Mathematical Problems in Engineering
[28] L Busoniu R Babuska and B De Schutter ldquoA comprehensivesurvey of multiagent reinforcement learningrdquo IEEE Transac-tions on Systems Man and Cybernetics Part C Applications andReviews vol 38 no 2 pp 156ndash172 2008
[29] C Lu H Chen and S Grant-Muller ldquoIndirect ReinforcementLearning for Incident-responsive ramp controlrdquo ProcediamdashSocial and Behavioral Sciences vol 111 pp 1112ndash1122 2014
[30] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press 1998
[31] C C HWatkins and P Dayan ldquoQ-learningrdquoMachine Learningvol 8 no 3-4 pp 279ndash292 1992
[32] J R Kok and N Vlassis ldquoCollaborative multiagent reinforce-ment learning by payoff propagationrdquo Journal of MachineLearning Research vol 7 pp 1789ndash1828 2006
[33] C Guestrin M G Lagoudakis and R Parr ldquoCoordinatedreinforcement learningrdquo in Proceedings of the 19th InternationalConference on Machine Learning pp 227ndash234 Sydney Aus-tralia July 2002
[34] S El-Tantawy B Abdulhai and H Abdelgawad ldquoMultiagentreinforcement learning for integrated network of adaptivetraffic signal controllers (marlin-atsc) methodology and large-scale application on downtown torontordquo IEEE Transactions onIntelligent Transportation Systems vol 14 no 3 pp 1140ndash11502013
[35] L P KaelblingM L Littman andAWMoore ldquoReinforcementlearning a surveyrdquo Journal of Artificial Intelligence Research vol4 pp 237ndash285 1996
[36] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research Part B Methodological vol 28no 4 pp 269ndash287 1994
[37] J Haddad M Ramezani and N Geroliminis ldquoCooperativetraffic control of a mixed network with two urban regions anda freewayrdquo Transportation Research Part B Methodological vol54 pp 17ndash36 2013
[38] H Mongeot and J-B Lesort ldquoAnalytical expressions ofincident-induced flow dynamics perturbations using macro-scopic theory and extension of Lighthill-Whitham theoryrdquoTransportation Research Record vol 1710 pp 58ndash68 2000
[39] Transport Simulation Systems Aimsun Userrsquos Manual 61 TTSBarcelona Spain 2010
[40] Highways England ldquoHatris Homepagerdquo 2013 httpswwwhatriscouk
[41] E Cascetta ldquoEstimation of trip matrices from traffic counts andsurvey data a generalized least squares estimatorrdquo Transporta-tion Research B vol 18 no 4-5 pp 289ndash299 1984
[42] L Zhang and D Levinson ldquoBalancing efficiency and equity oframpmetersrdquo Journal of Transportation Engineering vol 131 no6 pp 477ndash481 2005
[43] A Kotsialos andM Papageorgiou ldquoEfficiency and equity prop-erties of freeway network-wide ramp metering with AMOCrdquoTransportation Research Part C Emerging Technologies vol 12no 6 pp 401ndash420 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of