+ All Categories
Home > Documents > LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School...

LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School...

Date post: 21-Jan-2016
Category:
Upload: randell-carson
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
LogTree LogTree : A Framework for : A Framework for Generating System Events from Generating System Events from Raw Textual Logs Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International University Miami, 33199, USA
Transcript
Page 1: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

LogTreeLogTree: A Framework for : A Framework for Generating System Events from Generating System Events from

Raw Textual LogsRaw Textual Logs

Liang Tang and Tao Li

School of Computing and Information Sciences

Florida International University

Miami, 33199, USA

Page 2: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

2

OutlineOutline

1. Problem Statement

2. Motivation

3. Semi-structural Log Message Clustering

4. Message Segment Table

5. Evaluation

Page 3: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

3

Problem Statement (1)Problem Statement (1)

1. System log analysis is widely used for anomaly detection, fault prevention.

2. Many systems only generate textual log messages. Raw textual log messages are difficult to analyze.

Page 4: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

4

Problem Statement (2)Problem Statement (2)

1. Most temporal pattern mining algorithms are based on system events. We try to generate events from system log messages.

Page 5: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

5

Problem Statement (3)Problem Statement (3)

1. Traditional solution : Writing a full log parser.

2. Weaknesses: 1. Only famous systems, such as Apache Web Server, Microsoft IIS

has well developed log parsers.

2. Time consuming to read documents and understand each type of log messages to write a parser by our own.

3. Many document is incomplete or only in the developer’s brain.

4. System is constantly updated, its log is constantly updated as well.

Page 6: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

6

OutlineOutline

1. Problem Statement

2. Motivation

3. Semi-structural Log Message Clustering

4. Message Segment Table

5. Evaluation

Page 7: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Motivation (1)Motivation (1)

Similar log messages describe the same event.

We can use data clustering algorithm on log messages.

However, how to define the similarity between two log messages?

7

Page 8: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Similarity between two sequences of terms:

1. Cosine similarity on Tf-idf vector

2. Jaccard Index Similarity.

3. Word Sequence Matching.

Motivation (2)Motivation (2)

8

Page 9: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Similarity between two sequences of terms:

1. Cosine similarity on Tf-idf vector

2. Jaccard Index Similarity.

3. Word Sequence Matching.

Motivation (3)Motivation (3)

9

How if two log messages have two different sets of words(terms)?

Page 10: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

In PVFS2 log files, the two following log messages both belong to status event.

However, none of terms are identical !

Motivation (4)Motivation (4)

10

Page 11: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

In PVFS2 log files, the two following log messages both belong to status event.

However, none of terms are identical !

Motivation (4)Motivation (4)

11

But, they have similar format.Format may be more useful than terms.

Page 12: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

12

OutlineOutline

1. Problem Statement

2. Motivation

3. Semi-structural Log Message Clustering

4. Message Segment Table

5. Evaluation

Page 13: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

13

Semi-structural Log Message Semi-structural Log Message Clustering (1)Clustering (1)

Step 1: Convert into semi-structural log messages ( log tree).

Step 2: Compute similarities between pair-wise log trees.

Step 3: Apply data clustering on the similarity matrix.

Page 14: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

14

Semi-structural Log Message Semi-structural Log Message Clustering (2)Clustering (2)

Step 1: Convert into semi-structural log messages ( log tree).

Page 15: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

15

Semi-structural Log Message Semi-structural Log Message Clustering (2)Clustering (2)

Step 1: Convert into semi-structural log messages ( log tree).

Accomplished by a simple log parser.

It is only a context-free grammar parser. It separates log message by comma, TAB, etc. It does NOT identify the meaning of terms (words). It can be automatically created by JLex and JCup (or JAVACC)

tools.

Page 16: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

16

Semi-structural Log Message Semi-structural Log Message Clustering (3)Clustering (3)

Step 2: Compute similarities between pair-wise log trees.

s1, s2 are two log messages.

Recursive Function for weight w

Page 17: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

17

Semi-structural Log Message Semi-structural Log Message Clustering (3)Clustering (3)

Step 2: Compute similarities between pair-wise log trees.

s1, s2 are two log messages.

Root node of s1 Root node of s2

Message Segment at node v1 Message Segment at node v2

Page 18: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

18

Semi-structural Log Message Semi-structural Log Message Clustering (3)Clustering (3)

Step 2: Compute similarities between pair-wise log trees.

s1, s2 are two log messages.

Root node of s1 Root node of s2

Message Segment at node v1 Message Segment at node v2

Best matching between subtree v1’s nodes with subtree v2’s nodes

Page 19: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

19

Semi-structural Log Message Semi-structural Log Message Clustering (3)Clustering (3)

Step 2: Compute similarities between pair-wise log trees.

s1, s2 are two log messages.

Root node of s1 Root node of s2

Message Segment at node v1 Message Segment at node v2

Best matching between subtree v1’s nodes with subtree v2’s nodes

Decrease weight for lower layer𝜆< 1

Page 20: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

20

Semi-structural Log Message Semi-structural Log Message Clustering (3)Clustering (3)

Step 2: Compute similarities between pair-wise log trees.

s1, s2 are two log messages.

Root node of s1 Root node of s2

Message Segment at node v1 Message Segment at node v2

Best matching between subtree v1’s nodes with subtree v2’s nodes

Decrease weight for lower layer𝜆< 1

Distance of Message Segment

Page 21: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Two message segments: m1=p1…pn1 , m2=q1…qn2

t(.) is the type of a term, types={number, separator, word, hostname…}

21

Semi-structural Log Message Semi-structural Log Message Clustering (3)Clustering (3)

Distance of Message Segment m1 and m2

Type of a term

Page 22: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Why this similarity is better?

1. We use format information, take account the format similarity.

2. Similarity is computed based on best matched pair of message segments.

For example, two message s1 and s2 both contain <hostname>, <username>.

It is not fair to compute similarity of s1’s <hostname> and s2’s <username>.

22

Semi-structural Log Message Semi-structural Log Message Clustering (4)Clustering (4)

Page 23: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Comparing to Tree Kernel:

Our similarity function is similar to tree kernel. However,– Tree kernel doesn’t assign importance weights for different layers

of nodes.

– Tree kernel compute every pair-wise nodes at each layer, very time-consuming. For our clustering, we don’t need similarity function to be a kernel function.

23

Semi-structural Log Message Semi-structural Log Message Clustering (5)Clustering (5)

Page 24: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

24

OutlineOutline

1. Problem Statement

2. Motivation

3. Semi-structural Log Message Clustering

4. Message Segment Table

5. Evaluation

Page 25: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

25

Message Segment Table (1)Message Segment Table (1)

1. A lot of message segments are duplicated.

2. Duplicated computation for the similarity of two message segments have been seen before?

3. Therefore, we build a data structure in memory to maintain high frequent appeared message segments.

Page 26: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

26

Message Segment Table (2)Message Segment Table (2)

1. Message Segment Table is composed by a hash table and a similarity matrix.

Occurrences (For keeping track of the frequency)

Column index

Similarity Matrix

Page 27: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

27

Message Segment Table (3)Message Segment Table (3)

MST Building: 1. Scan one pass, pick up high frequent message segments.

2. Put into Column Hash Table and similarity matrix.

3. Compute entries of the matrix.

Looking up MST:1. Search Column Hash Table to find the column index.

2. Extract the value from the similarity matrix by column index.

Updating MST:1. Search Column Hash Table to find the occurrence.

2. Insert/Remove Column Hash Table according to frequencies.

3. Then, modify similarity matrix…

See details in the paper

Page 28: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

28

OutlineOutline

1. Problem Statement

2. Motivation

3. Semi-structural Log Message Clustering

4. Message Segment Table

5. Evaluation

Page 29: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Experiment Machines, Data Collection:

Evaluation (1)Evaluation (1)

29

Page 30: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Comparative Methods:

– Two traditional clustering algorithms: k-means and single-link hierarchical clustering.

– We implements all by Java 1.5

Comparing Metric:– F1-Score

Evaluation (2)Evaluation (2)

30

Page 31: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Accuracy Result:

Evaluation (3)Evaluation (3)

31

TF-IDF and Jaccard perform badly.Sometimes, Tree kernel is better than LogTree. But, it is much slower.

Page 32: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Efficiency Result:Note the running time of LogTree includes the time for building Message Segment Table.

Evaluation (4)Evaluation (4)

32

TF-IDF is fastest, but the accuracy is very bad.Tree Kernel and Jaccard are slow.LogTree is the second fastest one.

Page 33: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Time Scalability:This experiment is done in the second machine ( 64-bits Linux server), and up to 10K log messages.

Evaluation (5)Evaluation (5)

33

Page 34: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

Memory Space Scalability:fmin= 0.00001.

Evaluation (6)Evaluation (6)

34

Number of Entries in Message Segment Table

Page 35: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

A Case Study: for detecting configuration error in Apache Web Server.

Evaluation (7)Evaluation (7)

35

An configuration error will case a series of continuous errors.

Page 36: LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.

36

The EndThe End

Thank you!

Authors’ contact information:

Liang Tang: [email protected]

Tao Li: [email protected]


Recommended