+ All Categories
Home > Documents > Detecting Large-Scale System Problems by Mining Console Logs

Detecting Large-Scale System Problems by Mining Console Logs

Date post: 04-Feb-2016
Category:
Upload: skylar
View: 43 times
Download: 0 times
Share this document with a friend
Description:
Detecting Large-Scale System Problems by Mining Console Logs. Author : Wei Xu* , Ling Huang†, Armando Fox* David Patterson* ,Michael Jordan* Conference: ICML 2010, ACM SOSP2009 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: [email protected]. Outline. - PowerPoint PPT Presentation
Popular Tags:
27
Scale System Problems by Mining Console Logs Author : Wei Xu* , Ling Huang†, Armando Fox* David Patterson* ,Michael Jordan* Conference: ICML 2010, ACM SOSP2009 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: [email protected]
Transcript
Page 1: Detecting Large-Scale System Problems by Mining Console Logs

Detecting Large-Scale System Problems by Mining Console Logs

Author : Wei Xu* , Ling Huang†, Armando Fox* David Patterson* ,Michael Jordan*Conference: ICML 2010, ACM SOSP2009Advisor: Yuh-Jye LeeReporter: Yi-Hsiang YangEmail: [email protected]

Page 2: Detecting Large-Scale System Problems by Mining Console Logs

Outline

•Introduction•Methodology•Evaluation and Visualization•Conclusion

2

Page 3: Detecting Large-Scale System Problems by Mining Console Logs

Introduction

• Information of console logs?Console logs rarely help in large-scale

datacenter servicesOperational problems are dependent on the

deployment and runtime environmentTypical console log is much more structured

• Anomaly detectionUnusual log messages often indicate the

source of the problem

3

Page 4: Detecting Large-Scale System Problems by Mining Console Logs

Workflow • Log Parsing

Convert a log message from unstructured text to a data structure

• Feature creationConstructing the state ratio vector and the

message count vector features• Anomaly detection

Principal Component Analysis(PCA)-based anomaly detection method

• VisualizationDecision tree

4

Page 5: Detecting Large-Scale System Problems by Mining Console Logs

Workflow

5

Page 6: Detecting Large-Scale System Problems by Mining Console Logs

Log Parsing with Source Code•Difficulty: Templatize automatically

C languagefprintf(LOG, "starting: xact %d is %s")

JavaCLog.info("starting: " + txn)

• Not easy to distinguish variables 、 states

6

Page 7: Detecting Large-Scale System Problems by Mining Console Logs

Parsing Approach-Source Code•Generate the source code’s abstract

syntax tree (AST) •Use AST to identify all method calls on

objects of the classes (or their subclasses)•Deduce the types of variables in message

templates

7

Page 8: Detecting Large-Scale System Problems by Mining Console Logs

Parsing Approach-Source Code

8

Page 9: Detecting Large-Scale System Problems by Mining Console Logs

Parsing Approach-Log•Apache Lucene reverse index•Implement as a Hadoop map-reduce

job Replicating the index to every node and

partitioning The map stage performs the reverse-index

search The reduce stage processing depends on the

features to be constructed

9

Page 10: Detecting Large-Scale System Problems by Mining Console Logs

Parsing Approach

10

Page 11: Detecting Large-Scale System Problems by Mining Console Logs

Feature Creation

•The state ratio vector Each state ratio vector : a group of state variables in

a time window

•The message count vector Each vector dimension : different message type Value of the dimension : messages appear in the

message group

11

Page 12: Detecting Large-Scale System Problems by Mining Console Logs

12

Page 13: Detecting Large-Scale System Problems by Mining Console Logs

13

Feature Creation-The message count vector

Page 14: Detecting Large-Scale System Problems by Mining Console Logs

14

Anomaly Detection-Principal Component Analysis (PCA)

Page 15: Detecting Large-Scale System Problems by Mining Console Logs

•Applied Term Frequency / Inverse Document Frequency (TF-IDF)

•Replace each entry yi,j with a weighted entry wi,j ≡ yi,j log(n/dfj), where dfj is total number of message groups that contain the j-th message type

15

Anomaly Detection-Principal Component Analysis (PCA)

Page 16: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization

•From Elastic Compute Cloud (EC2)•203 nodes of HDFS and 1 nodes of Darkstar

16

Page 17: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization

• Parse fails when cannot find a message template that matches the message and extract message variables.

17

Page 18: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization

•50 nodes, takes less than 3 minutes , less than 10 minutes with 10 node

18

Page 19: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization-Darkstar•DarkMud

Provided by the Darkstar teamEmulated 60 user clients in the DarkMud

virtual world performing random operationsRan the experiment for 4800 seconds Injected a performance disturbance by

capping the CPU during time 1400 to 1800 sec

19

Page 20: Detecting Large-Scale System Problems by Mining Console Logs

Disturbance by capping the CPU

20

Page 21: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization-Darkstar•Ratio between number of ABORTING to

COMMITTING increases from about 1:2000 to about 1:2

•Darkstar does not adjust transaction timeout accordingly

21

Page 22: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization-Darkstar

•Augmented each feature vector using the timestamp of the last message in that group

22

Page 23: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization -Hadoop

23

Page 24: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization -Hadoop

24

Page 25: Detecting Large-Scale System Problems by Mining Console Logs

Evalution and Visualization-Hadoop

25

Page 26: Detecting Large-Scale System Problems by Mining Console Logs

Conclusion

•Using source code as a reference to understand the structure of console logs are able to parse logs accurately

•New opportunities for turning built-in console logs into a powerful monitoring system for problem detection

26

Page 27: Detecting Large-Scale System Problems by Mining Console Logs

Thanks for your attentionQ&A

27


Recommended