Post on 14-Apr-2017
transcript
Policy-driven, Platform-aware Nova SchedulerAdrian Hoban, Principal Engineer, Intel
Ramki Krishnan, Distinguished Engineer, CTO NFV, Dell
Tim Hinrichs, CTO, Styra
2Dell - Restricted - Confidential
Team• Core Team (besides presenters)
– Arun Yerra, Dell
– Dilip Krishnaswamy, IBM Research
– Joseph Gasparakis, Intel
– Ruby Krishnaswamy, Orange
• Contributor Acknowledgement
– Anoop Ghanwani, Dell
– Diego Lopez, Telefonica (Operator)
– Francisco Javier, Telefonica (Operator)
– Frank Zdarsky, Red Hat
– Jim Hao Chen, Northwestern University
– Norival Figueira, Brocade
– Peter Willis, BT (Operator)
– Sridhar Ramaswamy, Brocade
– Steve Gordon, Red Hat
– Sylvain Bauza, Red Hat
– Uri Elzur, Intel
3Dell - Restricted - Confidential
OpenStack Nova Scheduler Challenges
• Platform Features Beyond Compute
– SDS use case: High perf storage and compute isolation
– Wait for next OpenStack Release?
• Ease of Use
– Gen use case: Determine highly loaded or unusable hosts
– Build use case specific analysis tools?
• Initial Placement vs other Functions
– NFV use case: Dynamic monitoring and violation detection
– Design one-off monitoring framework?
Admin
User
4Dell - Restricted - Confidential
Application Performance-aware Workload Placement (1)
Delivering “Low-latency, reliable delivery” workloads e.g. Broadcast Video, Distance Learning, Augmented Reality in the Telco Cloud
• NFV Orchestrator - End-to-end - Intra-dc, Inter-dc WAN etc.
• Exemplary VNFs - Stateful firewall, Wireless Video Proxy, Crypto
• Compute: Fine grained resource partitioning for VM
– Dedicated core(s) AND NUMA awareness AND L3 cache part [1] ANDSR-IOV *** ELSE **
– Dedicated core(s) AND NUMA awareness AND L3 cache partitioning AND DPDK vSwitch *** ELSE ***
– Dedicated physical server
• Network: Overlay/Underlay QoS
– High QoS AND Minimum buffer depth in switches
• Storage: High Performance Logging
– NVMe SSD based storage *** ELSE *** SSD based storage
Ref. [1] - Intel RDT - http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
3G 4G 5G
Premium Quality Video
Poor Quality Video Infrastructure Issues
5Dell - Restricted - Confidential
Application Performance-aware Workload Placement (2)
Delivering “Classic enterprise" workloads, e.g. Email, CRM in the Telco Cloud
• Exemplary data plane VNFs - Stateful firewall, IDS/IPS, WAN Opt and IPSEC crypto
• Compute: Deterministic performance by avoiding memory contention
– NUMA awareness AND SR-IOV *** ELSE ***
– NUMA awareness
• Network: No HA requirement
• Storage: SSD for High performance logging
Delivering “Residential broadband" workloads, e.g. cost-effective Internet in the Telco Cloud
• Exemplary data plane VNFs - NAT
• Compute/Network: Max capacity limit
• Storage: HDD for Low cost
6Dell - Restricted - Confidential
Policy-driven Scheduler Approach (1)Minimize Vendor Lock-in and DependencyMaximize feature velocity
• Extensibility– Admin/User can add new compute (Nova),
networking (Neutron), storage (Cinder) constraints on the fly
• Understandability– Admin/User uses human readable scheduling
policies and build analysis tools on a need basis
• Monitoring– Admin/User benefits from a single representation
for handling variation in resource utilization and initial placement
Minimize additional code
No custom analysis tools
No delay in monitoring feature
availability
7Dell - Restricted - Confidential
Policy-driven Scheduler Approach (2)Best of Breed
• Imperative Interface Choices
– Extensions to current JSON filter - JSON Weight
• Declarative Interface Choices
– JSON Filter extensions to current Nova Flavors
– Datalog embedded in YAML for flexible constraint specification and database manipulation
Enable user to customize specific
applications
Address User understandability, Admin extensibility
8Dell - Restricted - Confidential
Imperative Example
Policy-driven Scheduler
User Request
NUMA and SR-IOV
else
NUMA and more cores
Host 1 Host 2
Host 3 Host 4
Host1: SRIOV
Host2: NUMA, SRIOV
Host3: NUMA, more cores
Host4: L3 partitioning
Output
2
1
Host 2
Host 3
User Describes Desired
Hardware
9Dell - Restricted - Confidential
Declarative Example
Policy-driven Scheduler
User Request
affinity: [“vm123”, “vm456”]
memory: 10GB
type: “low-latency, reliable-delivery”
Host 2
Output
Host 1 Host 2
Host 3 Host 4
Policy
Store
Policy
This type requires
local ephemeral
SSD-backed storage
Host2 data
memory: 20GB
storage: ssd
User Describes Workload
10Dell - Restricted - Confidential
OpenStack Nova Scheduler
Host 1
Host 2
Host 3
Host N
Host 1
Host 3
Host 8
Host 9
Filters Weighting
• 30+ types of filters.
• Find the subset of suitable hosts.
• Order suitable hosts.
Host 8
Host 1
Host 9
Host 3
:
:
11Dell - Restricted - Confidential
Nova Scheduler Filter• Administrator configures the filter list (30+ options)• scheduler_default_filters=RamFilter,Compute
Filter,AvailabilityZoneFilter,ComputeCapabi
litiesFilter,ImagePropertiesFilter,ServerGr
oupAntiAffinityFilter,ServerGroupAffinityFi
lter'
• Admin configures various filter input data sets such as the
flavor definition with extra_specsHost 1:
Host 3:
Host 8:
Host 9:
Each host complies with an imperative request based on user and admin input.
E.g. 4GB for VM, huge pages, AES-NI, same availability zone, PCIe accelerators, can meet
image property requirements, etc., etc.
12Dell - Restricted - Confidential
Nova Scheduler Weight
• Configured by the administrator.• RAM
– Spread across hosts evenly based on RAM utilisation.
• Metrics– Weigh hosts based on a combination of the
weight associated with the specified host_state metrics.
• IO Ops– Weight hosts based on I/O operations.
• Affinity– Weights hosts based on the number of
instances from a given server group.– Affinity and Anti-Affinity options available.
Host 8: 10GB Free
Host 1: 7GB Free
Host 9: 3GB Free
Host 3: 1GB Free
RAM Centric
weighting policy
13Dell - Restricted - Confidential
• Administrator input to the filter scheduler is largely static and Nova centric– E.g. flavour and extra_spec definitions, Host aggregate definitions, etc.
• Not possible to deploy to a given service level with different infrastructure resource allocations (in the same request) under policy governance.
• Not possible to modify the weighting configuration/policy for different parts of the environment such as per availability zone or host aggregates.
Problem Statement(s) – Nova Placement
14Dell - Restricted - Confidential
Empower User: JsonFilter + JsonWeight
Filter Scheduler
Host Data
(Nova’s HostState)
User Request
NUMA and SR-IOV weighted 2
NUMA and more cores weighted 1
JsonFilter JsonWeight
15Dell - Restricted - Confidential
Empower Admin 1: New Filter
Filter Scheduler
Policy StoreHost Data
(Config, File)(Nova’s HostState)
User Request
workload: “low-latency, reliable-delivery”
tenant-id: “pepsi”
AdminJsonFilter AdminJsonWeight
Pro: Extensible by admin to external data sources like Cinder and Neutron.
Con: New filter on already long list.
16Dell - Restricted - Confidential
Empower Admin 2: Modify Existing Filters
Field Description
vCPUs Number of virtual CPUs
Memory_MB VM memory in megabytes
Disk Virtual root disk size in GB
…
Extra_specs Key-value pairs
Policy AND/OR/NOT of tests
Flavor fields
Field Description
ID Number of virtual CPUs
Name VM memory in megabytes
AvailabilityZone Virtual root disk size in GB
Hosts List of hosts in group
Metadata Key-value pairs
Policy AND/OR/NOT of tests
Host Aggregate FieldsPro: Extensible by admin.
Already part of workflow.
Con: Adds complexity toestablished filters
17Dell - Restricted - Confidential
Status
• Concept stage with early drafts of several specs– Imperative: json-weight– Declarative:
– New scheduler: policy-based-scheduler– New filter+weight: admin-json-filter– Modify existing flavor: flavor-policy– New Host aggregate field: host-aggregate-policy
• 3 sessions at this summit– Wednesday, 9-10:30 (Nova scheduler working session)– Wednesday, 11-11:40 (Congress Integrations session)– Wednesday, 11:45-12:30 (NFV Orchestration BoF)
18Dell - Restricted - Confidential
Key Takeaways
• Contributors: 10+ companies
• Goal: Policy-driven scheduling, Service-assured resource-allocation
• Approach:
– Imperative: User describes desired hardware in policy language OR
– JSON Weight
– Declarative: User describes application; admin maps application to hardware
– Admin JSON Filter, Admin JSON Weight
– Enhance Flavor and Host Aggregates
• Weekly meeting: 8am Pacific = 1300 UTC
– Please join us!
19Dell - Restricted - Confidential
20Dell - Restricted - Confidential
Intel Legal Notices and Disclaimers
• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
• No computer system can be absolutely secure.
• Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
• Intel, the Intel logo and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.
• © 2016 Intel Corporation.
21Dell - Restricted - Confidential
Policy Language: JsonFilter and JsonWeight
For "low-latency" workloads:
• At least 8GB of free ram
• At least 8 free vCPUs
• NUMA awareness
[‘or’, [‘and', ['=', '$user.type', 'low-latency'],
[‘>’, ’$host.free_ram_mb’, 8*1024],
[‘>’, ’$host.vcpus_total’ - '$host.vcpus_used', 8],
[‘not’, [‘=', '$host.numa_topology', 'None']]]]
22Dell - Restricted - Confidential
Policy Language: YAML based policy
parameters:
availability_zone:
type: String
label: availability zone number
description: Name of the availability zone server
should be hosted on.
affinity :
type : String
label : Affinity
description: Affinity Group Id
ram :
type : integer
label : RAM
description: Minimum RAM size required by server
instance in GB.
hard_constraints:
ram_constraint:
operation_type : min
value : { get_param : ram }
affinity_constraint:
operation_type : equals
value : { get_param : affinity }
availability_zone_constraint:
operation_type : equals
value : { get_param : availability_zone }
soft_constraints:
ram_factor:
operation_type : multiplication
value : { get_param : ram-weight}
23Dell - Restricted - Confidential
Policy Language: DataLog
main(host) :-
nova:host(host),
not eliminated_host(host),
max_host_score(host, max)
eliminated_host(host) :-
nova:host(host),
request:same_hosts(vm),
not nova:deployed(vm, host)
eliminated_host(host) :-
nova:host(host),
request:different_hosts(vm),
nova:deployed(vm, host)
max_host_score(host, max(score)) :-
weight(host, score)
weight(host, ram_weight) :-
request:ram(requested_ram),
nova:host_ram(host, actual_ram),
ram_weight = actual - requested_ram / 256