+ All Categories
Home > Documents > Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in...

Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in...

Date post: 29-May-2020
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
24
Large Scale Anomaly Detection in Data Center Logs and Metrics Rafael Martínez
Transcript
Page 1: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

Large Scale Anomaly Detection in

Data Center Logs and Metrics

Rafael Martínez

Page 2: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Gradiant, ICT technology centre in Spain

Since 2008, focused on technological development and knowledge transfer to industry

+100

5,2M€ 54% 46%

professionals

revenue in 2017contracted companies competitive public funding

12european projects

Page 3: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Focus on…

Page 4: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Focus on… Connectivity · Intelligence · Security

Page 5: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

• Large amounts of data

• Data flowing 24×7×365

• Logs/events & metrics

• Goal: Keep systems OK

Problem statement

Page 6: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

ML to the rescue!

Anomaly Detection

Page 7: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

2. Anomaly Detection in Logs

Anomaly Detection

Page 8: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

2. Anomaly Detection in Logs

Anomaly Detection

Requirements

• Run on endless data streams

• Provide results in real-time

Page 9: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

• Discord Discovery

2. Anomaly Detection in Logs

• LogScore

Anomaly Detection

Requirements

• Run on endless data streams

• Provide results in real-time

Page 10: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Metrics

Discord Discovery

Page 11: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Metrics

Discord Discovery

Improvements on the original batch algorithm

• Heuristics to reduce O(n2) towards O(n)• Early abandon

• Randomization

• Denoise filtering

Page 12: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Metrics

Discord Discovery

Improvements on the original batch algorithm

• Heuristics to reduce O(n2) towards O(n)

• Fully streaming operation

• Supports input chunks of any size

Page 13: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 14: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 15: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

No satisfactory results from single runs

• Too little representative clusters

• Too many outliers

Page 16: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

spec = freq_words / (freq_words + wildcards)

high spec => more informative patterns

low spec => less outliers (bigger clusters)

spec = 0.6

Page 17: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 18: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Anomaly Detection in Logs

LogScore

Page 19: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

1. Anomaly Detection in Metrics

• Discord Discovery

2. Anomaly Detection in Logs

• LogScore

Anomaly Detection

Both algorithms

• Learn normality from past data

• Unsupervised operation

Page 20: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Functional architecture

Page 21: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Deployment

Page 22: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Tests and results

• 4 months of data (52 GB, 121 machines, 306M log events)

• Not only detect strange events

• But also provide a summary suitable for human supervision

• Good perceived performance

Page 23: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

SACBD@ECSA2018

Large Scale Anomaly Detection in Data Center Logs and Metrics

Conclusions

• High-value solution with hard requirements

• Fully scalable in large Big Data environments

• One company already pushing these technologies into the market

Page 24: Large Scale Anomaly Detection in Data Center Logs and Metrics · Large Scale Anomaly Detection in Data Center Logs and Metrics Anomaly Detection in Metrics Discord Discovery Improvements

Thank you!

(+34) 986 120 430 | [email protected] | www.gradiant.org

Rafael P. Martínez-Álvarez, PhD

[email protected]


Recommended