+ All Categories
Home > Documents > AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul...

AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul...

Date post: 05-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
1 AIOps Practice in Network Operation and Maintenance He Yubao [email protected]
Transcript
Page 1: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

1

AIOps Practice in Network Operation and Maintenance

He [email protected]

Page 2: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

2

Challenges:Telecom Network becoming more Complex

AI

OSSEMS

Orche-strator BSS ...

Cloud OS

Server Switch

Operation Center

Storage

CloudAIR

RAN (DU/CU)

BBU

X-Haul

X-Haul

Swtich

Swtich

Mobile Connection

Enterprise Connection

Home Connection

Agile Controler

MxU

ONT

PremiumHomeBroadband

Core Network

CPE

OLT

SW

WDM

BRAS/PE

WDM WDM

DCI&CloudOptix Backbone

WDM

P Router P RouterAI

Network CloudEngine

RAN vBNG Core(CU) …

5G

CP/UPAPPs

...

Cloud OS

Server Switch Storage

AIVideo

IoT…

Network CloudEngine

IMSMME Core

5G

(CP)APPs

...

Cloud OS

Server Switch Storage

AIRAN 5G Core APPs(CU) (UP) ...

Cloud OSServer Switch Storage

AI SD-WAN Agile Private Line

Cloud-basedBNG

CloudAIR

CloudCampus

CloudFAN(Joint Innovation)

CloudEdge/CloudCore

NFVI CloudFabric

Aggregation&Metro

CampusSwtich

Orchestrator &MANO

CloudRAN(Joint Innovation)

Metro as aFabric (JointInnovation) CloudBackbone

AP

AP

Edge Network Cloud

Vertical layers and Heterogeneous componentsDifficult for integration and

troubleshooting

Vertical layers and Heterogeneous componentsDifficult for integration and

troubleshooting

Millions of nodes and Myriads of connections

Difficult for deployment and fault locating

Millions of nodes and Myriads of connections

Difficult for deployment and fault locating

Billions of ServicesNeed more agile and handle

massive O&M data

Billions of ServicesNeed more agile and handle

massive O&M data

Page 3: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

3

AIOps: Use AI to assist Human operation and maintenance

Operationby scripts and tools

Before 2008

Operationby scripts and tools

Before 2008

Operationby ITIL and WebsFrom 2008 - 2012

Operationby ITIL and WebsFrom 2008 - 2012

Operationby DevOps

From 2012 - 2015

Operationby DevOps

From 2012 - 2015

Page 4: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

4

AIOps — Gartner's View

Utilize big data, machine learning and other analytical technologies to directly and indirectly enhance the relevant technical capabilities of IT services through preventive forecasting, personalization and dynamic analysis to achieve higher quality, reasonable cost and efficient support for the products or services being maintained.

Garnter 2016 Garnter 2018

AIOps platforms enhance IT operations through greater insights by combining big data, machine learning and visualization. I&O leaders should initiate AIOpsdeployment to refine performance analysis today and augment to IT service management and automation over the next two to five years

Page 5: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

5

Focus on Different Ability Requirement for O&M Scenarios

Survey Monitoring Trouble-shooting

Service Recovery

Upgrading & Patching

Change & Optimization

Fast and Automatic

Real-time Fast to root cause Self-recovery Less

Service interruption

Safe & Error free

Product Lifecycle

……DeploymentPlanning

No so much expertise required

Need not on the locale

So many layers and Heterogeneous components. Cost even more than 10 days.

Hard to find root cause between layers and components(Physical and VM). More than 3 hours.

Human error in configuration cause accidents. More than 25%.

Page 6: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

6

AIOps Target in Huawei

Autonomous network

Device Device Device Device

NMS/EMS NMS/EMS

Service Delivery and Assurance

AI

NMS/EMS NMS/EMS

Service Delivery and Assurance

AI(Model,

Training)

Controller Controller

AI Reasoning

Device Device Device Device

AI Reasoning AI Reasoning AI Reasoning AI Reasoning

Data Data Data Data

Develop AI Model

Autonomous network

Intelligent Service Operation

Data Execute

Target/Intent Based(Scenario, Model, Training)

Autonomous Network

Rule Based(Template, Customization, Configuration)

Top-down Management

Page 7: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

7

Definition of Network Autonomous Driving Network

Based on scenarios, the system gradually replaces “hand (operation), eye (monitoring), brain (decision), heart

(intention)”, and finally realizes “Autonomous Driving Network”.

Level definition

LO:Manual Management

L1:Assisted Automation

L2:Partial Autonomous Network

L3:Conditional Autonomous Network

L4:Highly Autonomous Network

L5:Full Autonomous

Planning and Design

Definition: Manual Management NetworkAbility: no automation, no auxiliary system

intention(heart)

System characteristics

Definition: tool-assisted support for discrete pointsCapabilities: tool assistance, rules and expert experience curing

Definition: Task-oriented automationAbility: Based on strategy automation, free hands

Definition: Based on the scene self-closed ability, the responsibility is peopleAbility: gram perceives environmental changes, frees hands and eyes

Definition: Single scene autonomy, responsibility for the systemAbility: Predictive autonomy, free hands, eyes and brain

Definition: full scene closed loop autonomyAbility: intention driven, full human liberation

Maintenance and

optimizationdecision

(brain)

Business Distribution

monitor(eye)

Deployment operation(hand)

Page 8: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

8

AI-enabled Autonomous Driving Network

Control & management convergence

Intelligent ServiceCross-domain and global experience

Intelligent SitesEmbedded AI capabilities

…...Intelligence function

Analyticsfunction

Automationfunction

MR data

RU location SLA data

KPI data Hardware status Networkload

Network AI

Site AI

Intent translation

Automation function

Intelligence function

Analytics function

Cloud AI Data & inference Model training

Edge IntelligenceSmart sensors and data collection

with real-time awareness

Cloud IntelligenceApplications in the cloud

Local Intelligence

Low latency control loops (TTI < 0.5ms)

Massive live streaming data (200GB/day/site)

Full mobile network data and status, all-scenario automation

Page 9: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

9

Challenges of AIOps

Object hard to understand Experience hard to use AI methods hard to choose ……

Complex relationships between

components

Differentiated realization

methods

Complex solutions combinations

Knowledge in the “Mind” of

technicians

Lack of inheritance

Silo knowledge management

Un-structured

No one algorithm for all

Scenario specific

No plenty of live data(Log, KPI,

Alarms, etc) for training

KnowledgeModel Algorithm

Page 10: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

10

Model: PnP for Radio Station Deployment based on eModel

FullPara Audit Planning/audit

Live Network

eModel

NE Configuration

NMS

Huawei NE Model Operator Model

Simplified parameters enabled by eModel, which consists of Engine and Policy.

1)Lots of parameter complex relations predefined,

2)Parameter policy specify configure method of features in different scenarios.

3)Policy is script-based, which can be used to quickly transfer expert

knowledge.

eModel: Mapping between Customer language to Device language

Page 11: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

11

Knowledge&Policy:Knowledge Generation and Management for Automation and Self-service Support

“The greatest waste in the process of operation and maintenance is the waste of knowledge.”

Product Knowledge O&M Knowledge

O&M Knowledge Graph and Knowledge Base

AutomationSelf-Service

Device/App A

EMS/MANO/Controller A

Device/App BDevice/App C

EMS/MANO/Controller B

Command

Command Data

Data

Page 12: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

12

Knowledge&Policy:Real-time Assistant

Digitalized O&M Knowledge Cloud

O&M Knowledge Graph

Policy Model

AR Intelligent Glass Intelligent Assistant Robert

• Network level

assurance

• Network Planning

• O&M Policy

Management

• ……Field Technician Remote Expertise

• Hardware operation

• Cable operation

• Status Check

• Troubleshooting

• ……

Page 13: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

13

Single log anomaly detection

Appears as an exception.

Violate normal rules

Log exception summary

Violate sequential rules

Log exception scenario

One dimensionanalysis such as

power failure

Multi-dimensionanalysis, such as the

number of up/down is inconsistent.

Sequentialanalysis of logs have

time series characters, such as OSPF logs.

Multiple log anomaly detection

Log mode Anomaly detection

Log summary

Logs

Algorithm in Anomalies finding: Pattern Finding through Log Analysis

Page 14: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

14

Algorithm in Fault Prediction: Time Series Algorithms Using in KPI Analysis

Find abnormal in advance.

Normal Status

Prediction Area

Alarm appear

Historical KPI data

Real-time KPI data

Rules filter Normal/Abnormal

Experts Annotation

Deterioration model

Training the model

Result

Uncertain samplesFeedback and update

Raw Data

Patterns Result

Page 15: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

15

The “strobe and oscillation detection” mechanism automatically identifies whether it needs to be tuned.

Algorithm in Precise alarm:Reduce Alarm and Root Alarm Recommendation through Alarm Analysis

Time series dynamic adaptive technology Root cause diagnosis based on frequent item mining and random forest

Main flow of the dynamic algorithm

Alarms

Object Type Separation

Alarm Sequence Segmentation

Effect verification

Rule Application

Strobe Oscillation Detection

END

Time correlation matrix

Object correlation suppression matrix

Time domain merge inflection point fitting

Object type separation

Alarm sequence segmentation

1 2 3 … 30

ALM1 435 546 578 … 643

ALM2 528 634 697 … 724

ALM3 261 325 365 … 471

ALM4 142 267 375 … 794

Time and object associationmatrix

Time domain merging,inflection point fitting

Key point:

Tune needed

No need to tune

Monthly aging recalculation, self-renewal

Key Algorithm: Alarm Correlation Mining Based on Frequent Items Mining

Key algorithm:Root Case Mining Based on Random Forest

A CA B C C B D

RelevantIrrelevant

Time window

*Common color -> common parameters

ALM1 ALM2 RELATIVITYA B 0.7

B C 0.98

B D 0.82

Inferring the managed object causality and operating system causality based on the known alarm causality, and further obtain root cause alarm based on random forest algorithm.

TREE #1 TREE #2 TREE #3 TREE #4

CLASS C CLASS D CLASS B CLASS C

FINAL CLASS

MAJORITY VOTING

X dataset

N1 features N2 features N3 features N4 features

Alarm TypeRunning System

HardwareSystem Signaling

System

AD C

B

RRU Board Cell

Page 16: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

1616

Summary

• AI can help much in O&M assistant, but not means all in current stage.

• Model, Knowledge/Policy and Algorithm are key factors.

• AIOps need to be scenario oriented, no one way for all.

• AIOps is a long term work, and need to have different target and emphasis in stages.

• AIOps and Zero-Touch do not means no Human interfere. It will be a different role for Human.

Page 17: AIOps Practice in Network Operation and Maintenance · 2019-09-27 · RAN (D U/CU) BBU X-Haul X-Haul Swtich Swtich MobileConnection EnterpriseConnection HomeConnection ... deployment

17

愿景和使命

把数字世界带入每个人、每个家庭、每个组织,构建万物互联的智能世界

Vision and MissionBring digital to every person, home and organization

for a fully connected, intelligent world


Recommended