Post on 21-Apr-2020
transcript
CellularNetworkTrafficSchedulingusingDeepReinforcementLearning
SandeepChinchali,et.al.MarcoPavone,SachinKattiStanfordUniversity
AAAI2018
Canwelearn tooptimallymanagecellularnetworks?
2
Internet
DelaySensitive
Real-timeMobileTraffic
DelayTolerant(DT)Traffic
IoT:Map/SWupdatesPre-fetchedcontent
WhyisIoT/DTtrafficschedulinghard?
csandeep@stanford.edu 3
IoT
Utilization
AcceptableLimit IoT Contendinggoals• MaxIoT/DTdata• Losstomobiletraffic
• Networklimits
OptimalControl
WhyisIoT/DTtrafficschedulinghard?
csandeep@stanford.edu 4
09:00 11:00 13:00 15:00 17:00 19:00 21:00
Local time
0
10
20
30
40
50
Con
gest
ion
C
Melbourne Central Business District, Rolling Average = 1 min
Shopping center
O�ce building
Southern cross station
Melbourne central station
Diversecity-widecellpatterns
Ourcontributions
1. Identifyinefficienciesinrealcellularnetworks4weeks,10diversecellsinDowntownMelbourne,Australia
2. DataDriven,DeepLearningNetworkModelOurlivenetworkexperimentsmatchMDPdynamics
3. AdaptiveRLschedulerFlexiblyrespondstooperatorrewardfunctions
csandeep@stanford.edu 5
IoTScheduler
NetworkState
IoTrate
WhyDeepLearning?
1. Learntime-variantnetworkdynamics
2. Adapttohigh-levelnetworkoperationgoals
3. Generalizetodiversecells
4. Abundanceofnetworkdata
csandeep@stanford.edu 6
09:00 11:00 13:00 15:00 17:00 19:00 21:00
Local time
0
10
20
30
40
50
Con
gest
ion
C
Melbourne Central Business District, Rolling Average = 1 min
Shopping center
O�ce building
Southern cross station
Melbourne central station
RelatedWork
1. DynamicResourceAllocation• Electricitygrid(Reddy2011),calladmission(Marbach 1998),trafficcontrol(Chu2016)
2. Data-drivenOptimalControl+Forecasting• DeepRL(Mnih 2013,Silver2014,Lillicrap 2015)• LSTMnetworks(Hochreiter 1997,Laptev2017,Shi2015)
3. MachineLearningforComputerNetworks• ClusterResourceManagement(Mao2016)• MobileVideoStreaming(Mao2017,Yin2015)
csandeep@stanford.edu 7
Data-drivenproblemformulation1. NetworkStateSpace2. IoTSchedulerActions3. Time-variantdynamics4. Networkoperatorpolicies
8
NumUsers
IoTScheduler
Networkstate+forecasts
Congestion
Cellefficiency IoTrate
PrimeronCellNetworks
csandeep@stanford.edu 9
Goal: Maxsafe IoT𝐭𝐫𝐚𝐟𝐟𝐢𝐜𝑽𝒕 overday
(LinkQuality)
CurrentNetworkState
FullStatewithTemporalFeatures
RLsetup(1):StateSpace
csandeep@stanford.edu 10
Agent EnvironmentAction
Networkstate
Reward
StochasticForecast(LSTM)
Horizon:DayofT mins
IoTTrafficRate:
IoTVolumeperminute:
Utilizationgain:
RLsetup(2):ActionSpace
csandeep@stanford.edu 11
Agent EnvironmentAction
Networkstate
Reward
RLsetup(3):TransitionDynamics
csandeep@stanford.edu 12
20:10 20:15 20:20
Local time
1.0
1.1
1.2
1.3
1.4
1.5
1.6
Con
gest
ion
C
Controlled tra�c
Backgrounddynamics
Agent EnvironmentAction
Networkstate
Reward
RLsetup(4):OperatorRewards
Overallweightedreward
1. IoTtrafficvolume
2. Losstoregularusers
3. Trafficbelownetworklimit
13
Agent EnvironmentAction
Networkstate
Reward
Goal: FindOptimalOperatorPolicy
What-ifmodel
Evaluation
14csandeep@stanford.edu
EvaluationCriteria
1. Robustperformanceondiversecell-daypairs2. Abilitytoexploitbetterforecasts3. Interpretability
15
NumUsers
IoTScheduler
Networkstate+forecasts
Congestion
Cellefficiency IoTrate
1.RLgeneralizestoseveralcell-daypairs
TUain Test0
20
40
60
80
100
8ti
lizati
on
gain
VIoT/V0 (
%)
α
1
2
Respondtooperatorpriorities
Significantgains:• FCCSpectrumAuction(Reardon2016):$4.5Bfor10MHzofspectrum• 14.7%mediangainforα = 2• Significant costsavings[simulated]
csandeep@stanford.edu 16
2.RLeffectivelyleveragesforecasts
17RicherLSTMforecasts
RL
Benchmark
3a.RLexploitstransientdipsinutilization
ControlledCongestion Utilizationgain
18
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
0
2
4
6
8
10
12
14
16
Con
gest
ion
C
Original
Heuristic control
DDPG control TransientDip
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
0
20
40
60
80
100
Utiliz
atio
nga
inV
IoT/V
0(%
)
Heuristic control
DDPG control
3b.RLsmoothsnetworkthroughput
ControlledCongestion ResultingThroughput
csandeep@stanford.edu 19
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
0
2
4
6
8
10
12
14
16
Con
gest
ion
C
Original
Heuristic control
DDPG control
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Thr
ough
put
B(M
Bps
)
Original
Heuristic control
DDPG control
Throughput limit
Conclusion
Modernnetworksareevolving• Delaytoleranttraffic(IoTupdates,pre-fetchedcontent)
Data-drivenoptimalcontrol• LSTMforecasts+RLcontroller• 14.7%simulatedgain->significantsavings
Futurework:• Operationalnetworktests• Decouplepredictionandcontrol
Questions:csandeep@stanford.edu
csandeep@stanford.edu
Extraslides
21csandeep@stanford.edu
2.RLeffectivelyleveragesforecasts
Betterforecastsenhanceperformance DiscretizedMDPforofflineoptimal
0 50 100 150 200 250|S|
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
Rew
ard
R
|A|=5|A|=20|A|=40|A|=60
csandeep@stanford.edu 22RicherLSTMforecasts ApproachCts MDP