+ All Categories
Home > Documents > Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using...

Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using...

Date post: 05-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
26
DATA524 - Information Visualization Big Data Lab 3 Using Splunk Software 2016 John Hsu DATA524 - Big Data Information Visualization
Transcript
Page 1: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - Information Visualization

Big Data Lab 3 Using Splunk Software

2016 John Hsu

DATA524 - Big Data Information Visualization

Page 2: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

2 DATA524 - Big Data Information Visualization

Table of Contents Introduction ......................................................................................................... 3

About the UONA DATA524 Lab 3 - Search Tutorial ......................................... 3

Pre-request .......................................................................................................... 5

Part 1: Login to UONA DATA524 Lab 3 Web site .............................................. 5

What you need for this tutorial .......................................................................... 5

Part 2: Getting started with Regular Expressions ............................................ 8

About Splunk regular expressions .......................................................................... 8

Regular expressions terminology and syntax .......................................................... 8

Character types ........................................................................................................................ 10

Groups, quantifiers, and alternation ......................................................................................... 11

A simple example of groups, quantifiers, and alternation ....................................................... 12

Capture groups in regular expressions ................................................................. 12

Non-capturing group matching ................................................................................................ 13

Modular regular expressions .............................................................................. 13

Part 3: Using Splunk Search with Regular Expressions - 1 ........................... 15

Part 4: Using Splunk Search with Regular Expressions - 2 ........................... 18

Part 5: Using Splunk Search with Regular Expressions - 3 ........................... 22

Part 5: Using Splunk Search with Regular Expressions - 4 ........................... 24

Page 3: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 3

3 DATA524 - Big Data Information Visualization

Introduction

About the UONA DATA524 Lab 3 - Search Tutorial Splunk Search is the primary interface for using Splunk Enterprise to run

searches, save reports, and create dashboards. This Search Tutorial is written

for the user who is new to Splunk Enterprise and the Splunk Search feature.

What's in this tutorial?

This manual guides the first user through searching the data and showing your

location. If you're new to Splunk Search, this is the place to start.

• Part 1: Login to UONA DATA524 Lab Web site takes you through the

steps to access Lab’s Splunk web site.

• Part 2: Getting started with Regular Expressions walks you through

basic regular expressions.

• Part 3: Using Splunk Search with Regular Expressions - 1 walks you

through constructing search with regular expressions.

• Part 4: Using Splunk Search with Regular Expressions – 2 walks you

through search with regular expressions.

• Part 5: Using Splunk Search with Regular Expressions – 3 walks you

through search and create report and Visualization.

Page 4: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

4 DATA524 - Big Data Information Visualization

UONA DATA524 Big Data Lab Environment • Lab data is stored at Splunk server.

• Search engine is at Splunk server.

• Users are accessing servers from internet.

Using a PDF of the tutorial

Do not copy and paste searches or regular expressions directly from the PDF into Splunk Web. In some cases, doing so causes errors because of hidden characters that are included in the PDF formatting.

Page 5: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 5

5 DATA524 - Big Data Information Visualization

Pre-request

Must finished Lab 4 – uploaded tutorialdata.gz to Hadoop’s HDFS

Part 1: Login to UONA DATA524 Lab 3 Web

site

What you need for this tutorial Browser supported:

• MS IE.

• Google Chrome.

• Mozilla Firefox. Mac Safari.

https://uona.dynu.net:8803

Follow the message to authenticate with your credentials.

MS IE:

Page 6: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

6 DATA524 - Big Data Information Visualization

Chrome:

Firefox:

Page 7: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 7

7 DATA524 - Big Data Information Visualization

Find your username in “UONA LAB Account for DATA524”

username: bd524??

password: your_password

The first page you see is Splunk Home.

This completes Part 1 of the Login.

Page 8: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

8 DATA524 - Big Data Information Visualization

Part 2: Getting started with Regular

Expressions

About Splunk regular expressions This primer helps you create valid regular expressions. For a discussion of regular expression

syntax and usage, see an online resource such as www.regular-expressions.info or a manual on

the subject.

Regular expressions match patterns of characters in text and are used for extracting default fields,

recognizing binary file types, and automatic assignation of source types. You also use regular

expressions when you define custom field extractions, filter events, route data, and correlate

searches. Search commands that use regular expressions include rex and regex and evaluation

functions such as match and replace .

Splunk regular expressions are PCRE (Perl Compatible Regular Expressions) and use the PCRE

C library.

Regular expressions terminology and

syntax

Term Description

literal The exact text of characters to match using a regular expression.

regular expression The metacharacters that define the pattern that Splunk software uses to match

against the literal.

groups Regular expressions allow groupings indicated by the type of bracket used to

enclose the regular expression characters. Groups can define character

classes, repetition matches, named capture groups, modular regular

Page 9: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 9

9 DATA524 - Big Data Information Visualization

expressions, and more. You can apply quantifiers to and use alternation

within enclosed groups.

character class Characters enclosed in square brackets. Used to match a string. To set up a

character class, define a range with a hyphen, such as [A-Z] , to match any

uppercase letter. Begin the character class with a caret (^) to define a negative

match, such as [^A-Z] to match any lowercase letter.

character type Similar to a wildcard, character types represent specific literal matches. For

example, a period .matches any character, \w matches words or

alphanumeric characters including an underscore, and so on.

anchor Character types that match text formatting positions, such as return ( \r ) and

newline ( \n ).

alternation Refers to supplying alternate match patterns in the regular expression. Use a

vertical bar or pipe character ( | ) to separate the alternate patterns, which can

include full regular expressions. For example, grey|gray matches

either grey or gray .

quantifiers, or

repetitions

Use ( *, +, ? ) to define how to match the groups to the literal pattern. For

example, * matches 0 or more, + matches 1 or more, and ? matches 0 or 1.

back references Literal groups that you can recall for later use. To indicate a back reference to

the value, specify a dollar symbol ( $ ) and a number (not zero).

lookarounds A way to define a group to determine the position in a string. This definition

matches the regular expression in the group but gives up the match to keep

the result. For example, use a lookaround to match x that is followed

by y without matching y .

Page 10: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

10 DATA524 - Big Data Information Visualization

Character types

Character types are short for literal matches.

Term Description Example Explanation

\w Match a word character

(a letter, number, or

underscore character).

\w\w\w Matches any three word

characters.

\W Match a non-word

character.

\W\W\W Matches any three non-

word characters.

\d Match a digit character. \d\d\d-\d\d-

\d\d\d\d

Matches a Social Security

number, or a similar 3-2-4

number string.

\D Match a non-digit

character.

\D\D\D Matches any three non-

digit characters.

\s Match a whitespace

character.

\d\s\d Matches a sequence of a

digit, a whitespace, and

then another digit.

\S Match a non-

whitespace character.

\d\S\d Matches a sequence of a

digit, a non-whitespace

character, and another

digit.

. Match any character.

Use sparingly.

\d\d.\d\d.\d\d Matches a date string such

as 12/31/14 or 01.01.15,

but can also match

99A99B99.

Page 11: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 11

11 DATA524 - Big Data Information Visualization

Groups, quantifiers, and alternation

Regular expressions allow groupings indicated by the type of bracket used to enclose the regular

expression characters. You can apply quantifiers ( *, +, ? ) to the enclosed group and use

alternation within the group.

Term Description Example Explanation

* Match zero or more times. \w* Matches zero or more word

characters.

+ Match one or more times. \d+ Match at least one digit.

? Match zero or one time. \d\d\d-?\d\d-?\d\d\d\d Matches a Social Security

Number with or without

dashes.

( ) Parentheses define match or

capture groups, atomic

groups, and lookarounds.

(H..).(o..) When given the string Hello

World , this

matches Hel and o W .

[ ] Square brackets define

character classes.

[a-z0-9#] Matches any character that

is a through z , 0 through 9 ,

or # .

{ } Curly brackets define

repetitions.

\d{3,5} Matches a string of 3 to 5

digits in length.

< > Angle brackets define named

capture groups. Use the

syntax (?P<var> ...) to

set up a named field

extraction.

(?P<ssn>\d\d\d-\d\d-

\d\d\d\d)

Pulls out a Social Security

Number and assigns it to

the ssn field.

Page 12: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

12 DATA524 - Big Data Information Visualization

[[ ]] Double brackets define

Splunk-specific modular

regular expressions.

[[octet]] A validated 0-255 range

integer.

A simple example of groups, quantifiers, and alternation

This example shows two ways to match either to or too .

The first regular expression uses the ? quantifier to match up to one more "o" after the first.

The second regular expression uses alternation to specify the pattern.

to(o)? (to|too)

Capture groups in regular expressions

A named capture group is a regular expression grouping that extracts a field value when regular

expression matches an event. Capture groups include the name of the field. They are notated with

angle brackets as follows:

matching text (?<field_name>capture pattern) more matching text .

For example, you have this event text:

131.253.24.135 fail admin_user

Here are two regular expressions that use different syntax in their capturing groups to pull the

same set of fields from that event.

Expression A: (?<ip>\d+\.\d+\.\d+\.\d+) (?<result>\w+) (?<user>.*)

Expression B: (?<ip>\S+) (?<result>\S+) (?<user>\S+)

In Expression A, the pattern-matching characters used for the first capture group ( ip ) are

specific. \d means "digit" and +means "one or more." So \d+ means "one or more

digits." \. refers to a period.

The capture group for ip wants to match one or more digits, followed by a period, followed by one

or more digits, followed by a period, followed by one or more digits, followed by a period, followed

by one or more digits. This describes the syntax for an ip address.

Page 13: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 13

13 DATA524 - Big Data Information Visualization

The second capture group in Expression A for the result field has the pattern \w+ , which means

"one or more alphanumeric characters." The third capture group in Expression A for the user field

has the pattern .* , which means "match everything that's left."

Expression B uses a common technique called negative matching. With negative matching, the

regular expression does not try to define which text to match. Instead it defines what the text is

not. In this Expression B, the values that should be extracted from the sample event are "not

space" characters ( \S ). It uses the + to specify "one or more" of the "not space" characters.

So Expression B says:

1. Pull out the first string of not-space characters for the ip field value.

2. Ignore the following space.

3. Then pull out the second string of not-space characters for the result field value.

4. Ignore the second space.

5. Pull out the third string of not-space characters for the user field value."

Non-capturing group matching

Use the syntax (?: ... ) to create groups that are matched but which are not captured. Note

that here you do not need to include a field name in angle brackets. The colon character after

the ? character is what identifies it as a non-capturing group.

For example, (?:Foo|Bar) matches either Foo or Bar , but neither string is captured.

Modular regular expressions

Modular regular expressions refer to small chunks of regular expressions that are defined to be

used in longer regular expression definitions. Modular regular expressions are defined

in transforms.conf.

For example, you can define an integer and then use that regular expression definition to define a

float.

[int] # matches an integer or a hex number REGEX = 0x[a-fA-F0-9]+|\d+

Page 14: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

14 DATA524 - Big Data Information Visualization

[float] # matches a float (or an int) REGEX = \d*\.\d+|[[int]]

In the regular expression for [float] , the modular regular expression for an integer or hex

number match is invoked with double square brackets, [[int]] .

You can also use the modular regular expression in field extractions.

[octet] # this would match only numbers from 0-255 (one octet in an ip) REGEX = (?:2(?:5[0-5]|[0-4][0-9])|[0-1][0-9][0-9]|[0-9][0-9]?) [ipv4] # matches a valid IPv4 optionally followed by :port_num the # octets in the ip would also be validated 0-255 range # Extracts: ip, port REGEX = (?<ip>[[octet]](?:\.[[octet]]){3})(?::[[int:port]])?

The [octet] regular expression uses two nested non-capturing groups to do its work. See the

subsection in this topic on non-capturing group matching.

Page 15: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 15

15 DATA524 - Big Data Information Visualization

Part 3: Using Splunk Search with Regular

Expressions - 1

Before we start, please review the raw events.

The data for this lab must be loaded into Hadoop’s HDFS at Lab 4. Type following search string in the Search bar and press Enter to search for the data in the Hadoop Distributed File System (HDFS), which is uploaded in the part 1 of this lab:

index=uona2_68_lab source=/user/splunk/lab/bd524??/tutorialdata.gz

action=purchase

Note: replace bd524?? with your account ID

Page 16: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

16 DATA524 - Big Data Information Visualization

The events contain IP address and HTTP status. We are going to extract:

The IP address, create a field name as ip_address.

The HTTP status, create a field name as http_status

Now we are constructing the Regular Expressions: We are going to extract IP address and the status of HTTP request from raw data. For IP address:

The expression:

rex field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)"

The pattern for IP address is \d+\.\d+\.\d+\.\d

\d means "digit" and +means "one or more." So \d+ means "one or more

digits." \. refers to a period.

The capture group for IP wants to match one or more digits, followed by a period, followed by one or more digits, followed by a period, followed by one or more digits, followed by a period, followed by one or more digits. This describes the syntax for an IP address.

Page 17: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 17

17 DATA524 - Big Data Information Visualization

For HTTP status The expression:

rex field=_raw ".*(?:GET|POST).*\"\s (?<http_status>\d+)\s+" The Pattern for HTTP status is \d+

The capture group for status wants to match one or more digits, but we need specify

more detail about where it is.

The pattern “.*” means any characters.

We know each web event is either “GET” or “POST”, so the pattern is (?:GET|POST).

After GET or POST, there are some characters, so, the pattern is “.*” again.

The HTTP status is after some characters, a double-quote and a space: “\”\s”

Put them together to build the search string:

index=uona2_68_lab source=/user/splunk/lab/bd524??/tutorialdata.gz

action="purchase" | rex field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)" | rex

field=_raw ".*(?:GET|POST).*\"\s(?<http_status>\d+)\s+"

Page 18: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

18 DATA524 - Big Data Information Visualization

Part 4: Using Splunk Search with Regular

Expressions - 2

Using the regular expressions as we constructed at Part 3.

The data is in the Hadoop Distributed File System (HDFS).

Filled search field with command to find the data and create two new fields:

ip_address and http_status

index=uona2_68_lab

source=/user/splunk/lab/bd524??/tutorialdata.gz

action="purchase" | rex field=_raw

"(?<ip_address>\d+\.\d+\.\d+\.\d+)" | rex field=_raw

".*(?:GET|POST).*\"\s(?<http_status>\d+)\s+"

Note: replace bd524?? with your account ID

Page 19: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 19

19 DATA524 - Big Data Information Visualization

To verify the fields is created: click the “>”

Page 20: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

20 DATA524 - Big Data Information Visualization

Page 21: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 21

21 DATA524 - Big Data Information Visualization

The ip_address and http_status are showed as below:

Page 22: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

22 DATA524 - Big Data Information Visualization

Part 5: Using Splunk Search with Regular

Expressions - 3

Showing web accessing status summary:

The data is in the Hadoop Distributed File System (HDFS).

Step 1:

Filled search field with command:

index=uona2_68_lab source=/user/splunk/lab/bd524??/tutorialdata.gz

action="*" | rex field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)" | rex

field=_raw ".*(?:GET|POST).*\"\s(?<http_status>\d+)\s+" | stats count by

http_status

Note: replace bd524?? with your account ID

Page 23: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 23

23 DATA524 - Big Data Information Visualization

Step 2:

Showing visualization pie chart with web accessing status:

Page 24: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

24 DATA524 - Big Data Information Visualization

Part 5: Using Splunk Search with Regular

Expressions - 4

Showing actions of the clients which are represented by ip_address.

The data is in the Hadoop Distributed File System (HDFS).

Step 1:

Filled search field with command:

index=uona2_68_lab source=/user/splunk/lab/bd524??/tutorialdata.gz action="*" | rex

field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)" | rex field=_raw

".*(?:GET|POST).*\"\s(?<http_status>\d+)\s+" | table ip_address http_status action

Note: replace bd524?? with your account ID

Below are the results

Page 25: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION 25

25 DATA524 - Big Data Information Visualization

Step 2:

Showing the actions summary of the top 10 clients.

Filled search field with command:

index=uona2_68_lab source=/user/splunk/lab/bd524??/tutorialdata.gz

action="*" | rex field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)" | rex

field=_raw ".*(?:GET|POST).*\"\s(?<http_status>\d+)\s+" | top 10

ip_address action | chart sum(count) as count over ip_address by action

Note: replace bd524?? with your account ID

Below are the results

Page 26: Big Data Lab 3 Using Splunk Software€¦ · Splunk Search is the primary interface for using Splunk Enterprise to run searches, save reports, and create dashboards. This Search Tutorial

DATA524 - BIG DATA INFORMATION VISUALIZATION

26 DATA524 - Big Data Information Visualization

Step 2:

Showing top 10 clients’ actions summary.

Showing visualization chart with actions sumarry:

index=uona2_68_lab source=/user/splunk/lab/bd524??/tutorialdata.gz

action="*" | rex field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)" | rex

field=_raw ".*(?:GET|POST).*\"\s(?<http_status>\d+)\s+" | chart count

over ip_address by action | sort -purchase | head 10

Note: replace bd524?? with your account ID

End of this lab


Recommended