Cellular Network Traffic Scheduling using Deep Reinforcement … · 2019-03-11 · Cellular Network...

transcript

CellularNetworkTrafficSchedulingusingDeepReinforcementLearning

SandeepChinchali,et.al.MarcoPavone,SachinKattiStanfordUniversity

AAAI2018

Canwelearn tooptimallymanagecellularnetworks?

Internet

DelaySensitive

Real-timeMobileTraffic

DelayTolerant(DT)Traffic

IoT:Map/SWupdatesPre-fetchedcontent

WhyisIoT/DTtrafficschedulinghard?

csandeep@stanford.edu 3

Utilization

AcceptableLimit IoT Contendinggoals• MaxIoT/DTdata• Losstomobiletraffic

• Networklimits

OptimalControl

WhyisIoT/DTtrafficschedulinghard?

09:00 11:00 13:00 15:00 17:00 19:00 21:00

Local time

Melbourne Central Business District, Rolling Average = 1 min

Shopping center

O�ce building

Southern cross station

Melbourne central station

Diversecity-widecellpatterns

Ourcontributions

1. Identifyinefficienciesinrealcellularnetworks4weeks,10diversecellsinDowntownMelbourne,Australia

2. DataDriven,DeepLearningNetworkModelOurlivenetworkexperimentsmatchMDPdynamics

3. AdaptiveRLschedulerFlexiblyrespondstooperatorrewardfunctions

IoTScheduler

NetworkState

IoTrate

WhyDeepLearning?

1. Learntime-variantnetworkdynamics

2. Adapttohigh-levelnetworkoperationgoals

3. Generalizetodiversecells

4. Abundanceofnetworkdata

09:00 11:00 13:00 15:00 17:00 19:00 21:00

Local time

Melbourne Central Business District, Rolling Average = 1 min

Shopping center

O�ce building

Southern cross station

Melbourne central station

RelatedWork

1. DynamicResourceAllocation• Electricitygrid(Reddy2011),calladmission(Marbach 1998),trafficcontrol(Chu2016)

2. Data-drivenOptimalControl+Forecasting• DeepRL(Mnih 2013,Silver2014,Lillicrap 2015)• LSTMnetworks(Hochreiter 1997,Laptev2017,Shi2015)

3. MachineLearningforComputerNetworks• ClusterResourceManagement(Mao2016)• MobileVideoStreaming(Mao2017,Yin2015)

Data-drivenproblemformulation1. NetworkStateSpace2. IoTSchedulerActions3. Time-variantdynamics4. Networkoperatorpolicies

NumUsers

IoTScheduler

Networkstate+forecasts

Congestion

Cellefficiency IoTrate

PrimeronCellNetworks

Goal: Maxsafe IoT𝐭𝐫𝐚𝐟𝐟𝐢𝐜𝑽𝒕 overday

(LinkQuality)

CurrentNetworkState

FullStatewithTemporalFeatures

RLsetup(1):StateSpace

Agent EnvironmentAction

Networkstate

Reward

StochasticForecast(LSTM)

Horizon:DayofT mins

IoTTrafficRate:

IoTVolumeperminute:

Utilizationgain:

RLsetup(2):ActionSpace

Networkstate

Reward

RLsetup(3):TransitionDynamics

20:10 20:15 20:20

Local time

Controlled tra�c

Backgrounddynamics

Networkstate

Reward

RLsetup(4):OperatorRewards

Overallweightedreward

1. IoTtrafficvolume

2. Losstoregularusers

3. Trafficbelownetworklimit

Networkstate

Reward

Goal: FindOptimalOperatorPolicy

What-ifmodel

Evaluation

14csandeep@stanford.edu

EvaluationCriteria

1. Robustperformanceondiversecell-daypairs2. Abilitytoexploitbetterforecasts3. Interpretability

NumUsers

IoTScheduler

Networkstate+forecasts

Congestion

Cellefficiency IoTrate

1.RLgeneralizestoseveralcell-daypairs

TUain Test0

lizati

VIoT/V0 (

Respondtooperatorpriorities

Significantgains:• FCCSpectrumAuction(Reardon2016):$4.5Bfor10MHzofspectrum• 14.7%mediangainforα = 2• Significant costsavings[simulated]

2.RLeffectivelyleveragesforecasts

17RicherLSTMforecasts

Benchmark

3a.RLexploitstransientdipsinutilization

ControlledCongestion Utilizationgain

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

Original

Heuristic control

DDPG control TransientDip

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

Utiliz

Heuristic control

DDPG control

3b.RLsmoothsnetworkthroughput

ControlledCongestion ResultingThroughput

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

Original

Heuristic control

DDPG control

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

Original

Heuristic control

DDPG control

Throughput limit

Conclusion

Modernnetworksareevolving• Delaytoleranttraffic(IoTupdates,pre-fetchedcontent)

Data-drivenoptimalcontrol• LSTMforecasts+RLcontroller• 14.7%simulatedgain->significantsavings

Futurework:• Operationalnetworktests• Decouplepredictionandcontrol

Questions:csandeep@stanford.edu

csandeep@stanford.edu

Extraslides

21csandeep@stanford.edu

2.RLeffectivelyleveragesforecasts

Betterforecastsenhanceperformance DiscretizedMDPforofflineoptimal

0 50 100 150 200 250|S|

|A|=5|A|=20|A|=40|A|=60

csandeep@stanford.edu 22RicherLSTMforecasts ApproachCts MDP

Cellular Network Traffic Scheduling using Deep Reinforcement … · 2019-03-11 · Cellular Network...

Documents