Rethinking Network Management :
Models, Data-Mining and Self-Learning
Stefan WallinThe Thesis
2
What is Network Management ?
AlarmsService StatusTrouble-shoot Configure Service
Configure Device
Control workflowwith trouble-tickets
3
What is Network Management ?
Problems?Alarm MonitoringService Management - Monitor - Configure
4
Main Thesis
Use domain-specific languages to specify alarm and service models Explicit knowledge Text-based representation
Use data-mining and self-learning to capture “hard-to-model” things Tacit knowledge
5
Research Structure
Sta
tus C
alcu
latio
n
Configura
tion
Changes
Service Type
Service TypeComponent
Device Type
Service Models
Constraints
Alarm Type
Alarm Type
Causa
lityAlarm Models
Constraints
Data-Mining
Self-Learning
6
Problems and Contributions
Defined a Domain-Specific Language BASS for specifying alarm models Model Quality Automatic Correlation
Data-Mining and Self-Learning to assign alarm severity levels
Domain-Specific Languages for Service Management Defined SALmon for monitoring Test of IETF YANG for Service
Configuration
Alarm Type
Alarm Type
Cau
sality
Alarm Models
Constraints
Sta
tus C
alcu
latio
n
Config
ura
tion C
hang
es
Service Type
Service TypeComponentDevice Type
Service Models
ConstraintsData-M
ining
Self-Learning
7
Attacking the Problems
me
ChallengesSolutionsValidations SolutionsService Providers
Equipment Vendors
Computer Science specialists from• LTU• Data Ductus• Tail-f• YALTS
JournalsConferences
8
Publication Overview
Conferences/Workshops IFIP ManWeek IEEE IM IEEE NOMS Usenix LISA IEEE AINA TeNAS IEEE SOSE
Journals IEEE IT Professional Springer
Journal of Network andSystems Management
John Wiley & Sons International Journal ofNetwork Management
Inderscience International Journal ofBusiness Intelligence andData-Mining
Springer TelecommunicationsSystems
9
Contents
The Alarm Problem
Alarm Solutions BASS Alarm prioritization
The Service Management Problem
Service Management Solutions Monitoring with SALmon Configuring with IETF YANG
Problems? – Input from Service Providers
Conclusions and Future Work
Acknowledgements
Problems?
Input from Service Providers
11
Coming Changes
20 Operators
12
Research Efforts
20 Operators
The Alarm Problem
14
Alarm Chain
Managed System Management System
ResourceStates
Alarms AlarmNotifications
EstimatedAlarms
EstimatedResourceStates
?
Alarm TypeResourceSeverityRaise / ClearText
15
The Alarm ProblemMost network elements […] does not have the notion of an alarm state. Devices emit event notifications whenever an implementor thought this is a good idea
[around] 40% percent of the alarms are considered to be redundant as many alarms appear at the same time for one ’fault’. Many alarms are also repeated [...]. One alarm had for example appeared 65000 times in today’s browser. Correlation is hardly used even if it supported by the systems, [current correlation level is] 1-2 % maybe.
16
The Alarm Problem
Too many > 1 / Sec Which ones are relevant? Several alarms for the same fault
Wrong severity levels
Interpreting meaning and impact
?
17
Interpreting an Alarm
*A0628/546 /08-07-01/10 H 38/ N=0407/TYP=ICT/CAT=SI /EVENT=DAL/NCEN=AMS1 /AM=SMTA7/AGEO=S1-TR03-B06-A085-R000 /TEXAL=IND RECEPTION/COMPL.INF: /AF=URMA7/ICTQ7 AGCA=S1-TR03-B06-A085-R117/DAT=08-07-01/HRS=10-38-14 /AMET=07-020-01 /AFLR=175-011/PLS/CRC=NACT /NSAE=186/NSGE=186/NIND=14/INDI=956/NSDT=0
18
Confusing Alarm Severity
Original Severityfrom Device
Priority set byOperator
19
Hard-to ManageSeverity Distribution
Hollifield, B., Habibi, E.: The Alarm Management Handbook
20
Alarm Type Distribution
26
90%
…3500
21
Alarm Monitoring
Domain-Specific Models
Modeling Alarms – Enable Automation and Increase Quality
22
Research Structure
Sta
tus C
alcu
latio
n
Configura
tion
Changes
Service Type
Service TypeComponent
Device Type
Service Models
Constraints
Alarm Type
Alarm TypeCausa
lity
Alarm Models
Constraints
Data-Mining
Self-Learning
23
Alarms Today
We have: Alarm interface standards Envelope, the parameters Alarm documentation
Informal documents for humans
What we do not have: Formal alarm definitions that can be used for
automation The contents of the envelope “Alarm Model”
?
24
Alarm Model
BASSAlarm TypesPredicatesConstraints- Information- Semantic
25
BASS
26
Alarm DBfrom Real Operator
Bass Prototype and Validation
Alarm Docfrom Real Vendor Alarms
Uncorrelated
.alarm
Correlation RulesBASS
Feedback
DocumentationGraphs
Information ConstraintsSemantic Constraints
Correlated
27
Semantic Constraints
173 warnings in approved and released alarm interface
28
Information Constraints to Automate Correlation
Automatic identification of root-cause candidates
29
Alarm Monitoring
Data-Mining and Self-Learning
Assigning Correct Severity Levels by Learning from Experts
30
Research Structure
Sta
tus C
alcu
latio
n
Configura
tion
Changes
Service Type
Service TypeComponent
Device Type
Service Models
Constraints
Alarm Type
Alarm TypeCausa
lity
Alarm Models
Constraints
Data-Mining
Self-Learning
31
Learning Alarm Priorities
Assign
PrioAnal
yse
AlarmSystem
Neural NetworkAlarm Prio
Trouble TicketSystem
Training
SuggestPriority
DatabasesFrom RealServiceProvider
Priority
32
Result
• Neural network
correct in 53 %
• Original severity correct in 11 %
Distribution of Errors
Originalseverity
Neuralnetwork
Magnitude of ErrorToo high Too low
Perc
en
tage o
f A
larm
s
33
The Service Management
Problems
34
Service Management
”Services are not currently managed well in any suite of applications and require a tremendous amount of work to maintain”
”Service models are becoming more and more important”
”Focus on service management - bringing this up to 40% from [the] current level of 5-10%”
”Managing services must be the focus of the future development, while pushing network management into a supporting role”
35
Complex Structures
SoftwareImplementation
Interpretations and Tedious Mappings
“Service Models” Configuration
Monitoring
Solutions
Service Modeling and Service Status Calculation
37
Research Structure
Sta
tus C
alcu
latio
n
Configura
tion
Changes
Service Type
Service TypeComponent
Device Type
Service Models
Constraints
Alarm Type
Alarm TypeCausa
lity
Alarm Models
Constraints
Data-Mining
Self-Learning
38
My Two Tracks for Service Management
IETF YA
NG
SA
Lmon
Sta
tus C
alcu
latio
n
Configura
tion
Changes
Service Type
Service TypeComponent
Device Type
1 Model the Services2 Express the transformations
39
Simplifed Structures
Remove room for interpretations and automate mappings
Models
Configuration
Monitoring
Models
40
SALmon Example
BroadbandForum TR-126Triple PlayQoE Requirements
41
SALmon Test
• The TR-126model could beexecuted
• Compact complete model
• Easy to change in one place
SLA and Servicemonitor UI
42
My Two Tracks for Service Management
IETF YA
NG
SA
Lmon
Sta
tus C
alcu
latio
n
Configura
tion
Changes
Service Type
Service TypeComponent
Device Type
1 Model the Services2 Express the transformations
Released2010
43
Service Configuration and Activation
IETF Defined YANG as data-modeling language for managing devices “Replacing SNMP MIBs”
Thesis: YANG can be used to model services, not only devices Service Configuration as a YANG – YANG transform
Work: Service Modeling projects at service providers Service Activation product, Tail-f NCS
44
SALmon and YANG
SALmon IETF YANG Comment
Model Structure
Object Oriented Tree Tree structures more suited for rendering
Purpose Operational Data Configuration Data and Operational Data
Time-Series
Calculations
Functional - - YANG to YANG mapping in Java for imperative configuration- XPATH possible to express aggregation
Constraints
- XPATH
Conclusions and Future Work
46
Conclusions
For Research Closer cooperation with equipment and service providers Network management is in need of computer science
For Network Equipment Providers Provide models (in a form) that can be used for
automation Interface quality
For Service Providers Model the offered services Knowledge management
Overcome current practice of incomplete illustrations and free-form documents
47
Future Work
SALmon features represented in YANG Language extensions or as models Time-series Functional calculations
XPATH
Database representation
Imperative activation as part of the model ?
More knowledge management by usingdata-mining and self-learning
Alarm Type
Alarm Type
Cau
salit
y
Alarm Models
Constraints
Sta
tus C
alcu
latio
n
Configura
tion C
hang
es Service
Type
Service TypeComponentDevice Type
Service Models
Constraints
Data-Mining
Self-Learning
Data-Mining
Self-Learning
48
Errata
Paper C : Says trivial approach is correct in 17 % of the
cases Should be 11 %
Section 2 : Wrong “T”, should be:
49
Thank You !
Mikael Börjesson
Jörgen ÖfjellJohan EhnmarkAndreas JonssonUlrik ForsgrenMagnus KarlssonLeif Landén
Christer ÅhlundJohan NordlanderViktor LeijonRobert BrännströmKarl AnderssonDaniel GranlundDan Johansson
Klacke WikströmHåkan MillrothMartin BjörklundSeb StrolloJohan BevemyrJoakim GrebenöChris Williams
Equipment Vendors and Service ProvidersTest Data
Nicklas Bystedt
Sidath HandurukandeEU FundedMagneto Project
50
?