DIRECT: Distributed Cross-Domain Resource Orchestration in Cellular Edge
Computing
Presenter: Tao HanAssistant Professor
Electrical and Computing Engineering DepartmentThe University of North Carolina at Charlotte
1
Qiang Liu and Tao Han
Outline
❖ Introduction
❖Motivation & Challenges
❖System Model & Algorithm
❖System Design & Implementation
❖Performance Evaluation
* https://www.bloorresearch.com/technology/5g-iot-and-edge-computing/
* http://www.nxtview.com/?p=1230 3
Currently in U.S. there are 8 networked devices per person, expected to 13.6 per person by 2022*
Rapid Increasing of Connected Devices
* https://www.ericsson.com/en/mobility-report/reports/november-2018/mobile-
data-traffic-growth-outlook 4
6x Mobile Traffic Increase in North America Until 2024
Substantial growth of Mobile Traffic
* https://www.independent.ie/business/technology/news/the-need-for-speed-is-
ireland-ready-for-5g-the-next-big-thing-in-cellular-technology-36629260.html 5
5G aims to connect “everything”, e.g., Phone, Vehicles, IoT, Machine, Health*
The 5G Era
* https://networks.nokia.com/5g/get-ready6
Accommodating heterogeneous services is challenging since they have extremely diverse
performance requirements, e.g., low-latency, high-reliability, low-energy*
Heterogeneous Services
* Guan, W., Wen, X., Wang, L., Lu, Z. and Shen, Y., 2018. A service-oriented
deployment policy of end-to-end network slicing based on complex network
theory. IEEE Access, 6, pp.19691-19701. 7
Network Slicing enables operator creates logical networks (slices) over physical infrastructure,
slices are tailored to support various services, improves revenue and reduces operation costs
Network Slicing
Outline
❖ Introduction
❖Motivation & Challenges
❖System Model & Algorithm
❖System Design & Implementation
❖Performance Evaluation
9
The main objective of network slicing is to efficiently utilize the physical infrastructures
to serve the slices which are managed individually by slice tenants (functional isolation).
Motivation
FlexRAN: A flexible and programmable platform for software-defined radio access networks
Orion: RAN slicing for a flexible and cost-effective multi-service mobile network architecture
How Should I Slice My Network?: A Multi-Service Empirical Evaluation of Resource Sharing Efficiency
Overbooking network slices through yield-driven end-to-end orchestration
Efficient Radio Access Network (RAN) Sharing Platform
Efficient Network Slicing in Radio Access Network (MAC layer)
Introduce “Slicing Efficiency” to Evaluate the Resource Sharing Efficiency
End-to-End Slicing Platform, Improve Revenue by Slices Overbooking
The resource demands of slices are assumed to be known, that is not always TURE!
10
Challenges
Unknown performance model of slices due to functional isolation
❖ Tenants can customize own operation strategies on
their slices (user scheduling, prioritization, etc.)Resource Orchestration
❖ Slice performance is related to multiple domain
resources (radio, transport, computing, etc.)
❖ Dynamic slice traffic in the network (temporal,
spatial, etc. )
Fast response on network traffic dynamics
Determine multiple domain resource correlations
Adapt to different network scenarios
11
Our Target: Cross-Domain Resource Orchestration for Network Slicing
Our Solutions
❖ Utilize Gaussian Process (GP) to learn the resource
demands of individual slice
❖ Derive “predictive gradient” from GP model and use
gradient descent method to orchestrate resources to slices
❖ Adopt ADMM method to coordinate the resource
usage of slices across the network to meet SLAs
Resource Orchestration
GP is data-efficient and fast
“gradient” descent is effective
Derivation-based coordination in network
Outline
❖ Introduction
❖Motivation & Challenges
❖System Model & Algorithm
❖System Design & Implementation
❖Performance Evaluation
13
The set of resources orchestrated to the ith slice on jth edge node*
System Model
*Edge node: a logic network unit, edge node, which is composed of
a cellular base station and a certain amount of computing resources
The utility (unknown) of the ith slice on jth edge node
Assumption: Each slice reports its utility to orchestrator periodically that indicates the performance
of last resource orchestration
Radio Access Network Edge Servers
The objective function is:
14
Problem Statement
Maximize the system utility under the total payment of slices:
The constraints of resources in edge nodes
The constraints of slice total payments
The problem falls into the realm of black-box optimization, e.g., pattern search, Bayesian optimization, etc.
➔ Time consuming, Lack of scalability
15
Algorithm Design
Let’s think about this situation: what will we do if we know the utility function of slices?
Step 1: decompose the problem w.r.t. edge nodes with ADMM method ( constraints 1 is the
only coupling constraints )
Step 2: use gradient-based descent methods to optimize resource orchestration of slices at each
edge node individually ( utility function is known )
σ𝑗∈𝐽 𝑓𝑗(𝑥𝑗) −𝑔(𝑥, 𝑧)σ𝑗∈𝐽 𝑓𝑗(𝑥𝑗)
𝑓1 𝑥1 − 𝑔(𝑥, 𝑧) 𝑓1 𝑥1 − 𝑔(𝑥, 𝑧) 𝑓1 𝑥1 − 𝑔(𝑥, 𝑧)
equivalent
gradientStep size
∇𝑔(𝑥, 𝑧)
16
Algorithm Design
Then it becomes: how can we obtain the exact gradients of utility function?
Answer: WE CANNOT! But we can estimate. With Gaussian Process (fast, data-efficient).
Gaussian Process (GP): constructs a probabilistic model, regresses the target unknown function.
GP produces a distribution of predicted output for any individual input.
Input (X1,X2,…) Output (Y)
<3,2> <13>
<1,3> <10>
<1,1> <2>
<3,0> <9>
… …
17
Algorithm Design
The accuracy of the prediction of GP increases with more input-output
data pairs in the database.
1-dim function example: Gaussian Process Regression after 3 steps
18
Algorithm Design
The accuracy of the prediction of GP increases with more input-output
data pairs in the database.
1-dim function example: Gaussian Process Regression after 7 steps
19
Algorithm Design
The accuracy of the prediction of GP increases with more input-output
data pairs in the database.
1-dim function example: Gaussian Process Regression after 9 steps
20
Algorithm Design
Predictive Gradient:resource 1
resource 2
resource K
The accuracy of the prediction of GP increases with more input-output
data pairs in the database.
predicted curve
real curve
observed point
predict point
An example of predictive gradient w.r.t. single resource:
predictive gradient
predicted curve
real curve
observed point
predict point
predictive gradient
Point 1 Point 2
21
Algorithm Overview
Controller side: updating the dual variables and optimize the auxiliary variable Z (convex problem)
Edge node side: optimize the resource allocation X with Gaussian Process based proximal
gradient descent method (predictive gradient).
Outline
❖ Introduction
❖Motivation & Challenges
❖System Model & Algorithm
❖System Design & Implementation
❖Performance Evaluation
24
DIRECT: System Overview
❖ DIRECT controller: coordinate the resource usage of slices across edge nodes (control-side algorithm)
❖ DIRECT agents in edge nodes: orchestrate resources to slices with predictive gradients (edge-side algorithm)
Slice Orchestrator
effectively and dynamically orchestrate virtual network resources to serve slices across the whole network
❖ controller coordinate with agents by exchanging certain variables, e.g., X, Z, U, etc.
25
DIRECT: System Overview
Resource Hypervisor
efficiently and dynamically virtualize physical infrastructures and map virtual resource allocation of slices to physical
Radio Resource Hypervisor
Computing Resource Hypervisor
26
DIRECT: System Overview
Network Slices
use virtual resources to serve its users individually
27
Resource Hypervisor: Radio
Methodology: virtualize the radio resource by managing the MAC layer user scheduling and resource
allocation (physical resource blocks (PRBs) in LTE network).
Input: Slices and their users, resource orchestration of slices, channel conditions
Output: all users to PRBs mapping for both uplink and downlink
Difficulties:
1) tradeoff between isolation and efficiency;
2) unknown association between slice and users in MAC layer (only RNTI available );
28
Resource Hypervisor: Radio
Minimum number of PRBs
Solution:
1) define virtual radio resource (RR) as wireless bandwidth, e.g., 540kHz;
2) convert the RR of slice users into number of PRBs
3) find the association between slice and users by capture IMSI info in S1AP message
4) formulate a users-to-PRBs mapping problem;
5) solve the problem with heuristic algorithm
Idea of heuristic algorithm: select the user with the best channel condition for each PRB. Comply to
uplink frequency-contiguous PRB allocation.
29
Resource Hypervisor: Computing
Difficulties: No open-source CUDA kernel management platform; no operations allowed once kernels are
dispatched to GPU side except interruption
Methodology: virtualize the GPU computing resource by managing the dispatch of kernel functions
(Token-based mechanism)
name Required threads parameters
Kernel function in
CUDA programing:
Kernel 1 (10k threads)
Kernel 2(1k threads)
Kernel 3(5k threads)
Kernel N
Callin
gkern
els
asy
nch
ron
ou
sly
Kernel 1(10k threads)
Kernel 2(1k threads)
Kernel 3(5k threads)
Executing kernels serially in GPU
CPU side GPU side
*Multiple process services (MPS) enables multiple applications/processes to share
the GPU resources. Here, we consider no Concurrent Kernel Execution/Streams.
30
Resource Hypervisor: Computing
The kernel of user is dispatched only if its token is meet. Token is update as:
Methodology: virtualize the GPU computing resource by managing the dispatch of kernel functions
(Token-based mechanism)
name Required threads parameters
Kernel function in
CUDA programing:
Kernel 1 (10k threads)
Kernel 2(1k threads)
Kernel 3(5k threads)
Kernel N
Callin
gkern
els
asy
nch
ron
ou
sly
Kernel 1(10k threads)
Kernel 2(1k threads)
Kernel 3(5k threads)
Executing kernels serially in GPU
CPU side GPU side
*Multiple process services (MPS) enables multiple applications/processes to share
the GPU resources. Here, we consider no Concurrent Kernel Execution/Streams.
Token generated
Token consumed by running
kernels (tracked by cudaEvent)
Kernel 1(10k threads)
Kernel 2(1k threads)
Kernel 3(5k threads)
Command Queue
Token
Track execution status of
kernels using cudaEventsync
31
➢ DIRECT controller exchanges the control variables with all
edge nodes;
The DIRECT Protocol
➢ Each DIRECT agent in edge node orchestrates the virtual
resources to network slices;
➢ Each network slice orchestrates the its virtual resources to
users based its own policy;
➢ The virtual resource allocations of all the admitted users are
informed to hypervisor;
➢ The hypervisor maps the virtual resources of users to physical
resources;
➢ Slices report their utilities to DIRECT agent which report
resource usage to DIRECT controller;
➢ The loop continues until convergence.
DIRECT protocol enables effectively cross-domain resource orchestration in network slicing
32
System Implementation
Hardware Details:
❖ OpenAirInterface LTE Platform: 2x USRP B210 SDR boards, 2x eNodeB computers, 1x Core network
❖ CUDA GPU computing platform: 2x NVIDIA GTX 1080Ti, CUDA 8.0
❖ Mobile users: 4x Huawei dongle E2273, 4x Linux computers
Outline
❖ Introduction
❖Motivation & Challenges
❖System Model & Algorithm
❖System Design & Implementation
❖Performance Evaluation
34
Evaluation Setting
Utility: Utility of slice (unknown) is defined as the slice latency (summation of user latency).
Applications: based on YOLO object detection framework to emulate various resource demands of slices
❖ Mobile Augmented Reality (MAR):
❖ Video Analytics and Streaming (VAS):
“request”
1280x720
416x416
medium model
1280x720
“dog, bike”
608x608
large model
Algorithms Comparison:
Static: evenly share to slices all the resources across all edge nodes;
Pswarm: a global optimization solver, replace the GP-based Alg. 1 in each edge node;
TOMLAB-glcSolve: a global optimization solver, replace the GP-based Alg. 1 in each edge node;
application MAR MAR VAS
35
Experimental Results
❖ DIRECT converges in several iterations.
❖ DIRECT reduces about 21% system latency as compared to Static.
❖ DIRECT agents learn to orchestrate resources to slices.
[*] Pswarm and TOMLAB need too much iterations, impractical in experiments
21%
61%
36
Experimental Results
❖ DIRECT coordinates the resource utilization among edge nodes to meet the total payment.
❖ DIRECT agents learn the resource demands of slices and orchestrate resources accordingly.
Learn the slice traffic on edges
Learn the resource demand of application
application MAR MAR VAS
37
Simulation Results
❖ DIRECT has a smooth convergence performance in simulation, corresponds to experiments.
❖ DIRECT agents learn the resources orchestration in less than 20 interactions with slice.
38
Simulation Results
❖ DIRECT has a great scalability performance as compared to the others.
❖ Optimization solvers could obtain similar performance in terms of system utility, but
impractical in real system since too many interactions needed.
Better scalability
39
Conclusion
❖ We presented DIRECT, cross-domain resource virtualization and orchestration system, which
orchestrates resources to slices under unknown performance model of slices.
❖ DIRECT integrates optimization method (ADMM) and learning assisted algorithm (GP-
based).
❖ We designed cross-domain resource virtualization, i.e., Radio and Computing Resource
Hypervisor to virtualize physical infrastructures.
❖ We implemented DIRECT prototype which is composed of OpenAirInterface LTE system and
CUDA GPU computing platform.
❖ We evaluate the performance of DIRECT with both experiments and simulations. Results
validate that DIRECT significantly outperforms the other baseline algorithms.