Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | michael-wilde |
View: | 14,191 times |
Download: | 0 times |
August 15, 2011
Making Reg[Ee]x Your Buddy
(?i)(mi(chael|ke) wilde), Splunk NinjaThursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 2
Hi, I’m Michael Wilde• You may know me from:
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
What is RegEx“Finite Automata”
3
•Regular Expression invented in the 1950’s by mathemaUcian Stephen Cole Kleene•Implemented by “ed” and “grep” creator Ken Thompson in 1973Pa[ern matching language for text processing•Has slightly different implementaUons (PERL, POSIX) •Way crypUc at first sight
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Why should you care
•Field extracUon is a requirement for reporUng
•Index-‐Ume filtering & rouUng
•You’ll seem smart
•It will be useful beyond Splunk•You might score with the (ladies|dudes) at (Maker\sFaire |ComiCon).
4
Thursday, August 18, 11
Thinking Regex
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Thinking Regex
•Log Events are a great place to start, they have structure•Don’t overthink it. The pa[ern is there waiUng to discovered
•Don’t be lazy and use wildcards too much
•Learn to love “NOT” regexes. \S+ \D+ \W+ [^,]+
6
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 7
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Be nice to your RegEx engine
• MS-‐DOS taught us to be laaaaaaaaaaaaaaaaazy with *.*
• A regex engine matches character by character, and then does backtracking.
• Match in as few steps as possible
8
Thursday, August 18, 11
Regexes in Splunk
Search Language: “rex”, “erex”, “regex”
Indexing: Filtering data (in|out), line breaking, timestamp extraction
Field Extraction
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
IFX• Splunk has a built in "interacUve field extractor"• It can be useful. Give it samples of data, and it will a[empt to learn a regex and persist a single field
• It has a limitaUon of the amount of events to display in its viewer.
• You might not see your search results when using it? Huh?
10
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
what if we could use that "intelligent" stuffIFX was doing but in the search language
• 11
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
meet "erex"• Allows you to give it examples, but it works on your search results
• Allows you to give it counterexamples of stuff you don't want to match on
• Builds you a proper rex command
12
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
...there's an app for that.right?
13
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Field Extractor App• Imagine you could use your mouse, highlight fields, name them, persist them, go home early and never write regex.
• David Carasso's Field Extractor app is like a "workbench for field extracUon"
• Download it from SplunkBase
14
Thursday, August 18, 11
searching with regex
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
the | regex search command
• Did you know splunk crushes all terms to lower case?• If you need to look for specific pa;erns or even words and respect the case the original events are in, use | regex
• index=splunktv|regex _raw="(MP3|M4A)" <-‐-‐noMce this is a case sensiMve pa;ern match.
16
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
What about good ole Rex?• Search Ume field extracUons via your own regexes -‐-‐ in the search language
• Name your fields• Reuse everyone elses work!
17
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
a few more tricks for you
18
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
host extracUon irritates me
19
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
regex in host extracUon
• Splunk will a[empt to do the right thing. Log source will likely make it hard for Splunk-‐-‐and you'll blame Splunk
• Props.conf & transforms.conf are needed to properly extract hostnames in some cases (F5 Big-‐IP and HP networking gear
• Use default seungs in props.conf and use your own seungs as well
20
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
priority boarding in props.conf[source::...a...]TRANSFORMS-‐ahosts = ahostextrac:onpriority = 1
[source::...z...]TRANSFORMS-‐zhosts = zhostextrac:onpriority = 99
what if the source we were matching against had the word "arizona" in it? It will match both, right? Use "Priority" to control matching. 99 is higher than 1. So 99 is a higher priority. Yeah, i know... weird.
21
Thursday, August 18, 11
Basic Training Complete!
Lets do something moredifficult
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Splunk is so smartexcept when its not
23
<policy id="3">Finjan HTTPS policy</policy> <cp id="5" name="AcUve Content" display_name="AcUve Content"/> <group id="5002" cp_id="5" type="0">Full profile -‐ Binary Behavior</group> <item id="28015">Format error in CRL lastUpdate field</item> <item id="3265747">*.served.com/*</item> <rule_comment id="2" name="Block cerUficate validaUon errors"><![CDATA[Block HTTPS content without a valid cerUficate]]></rule_comment>
AUTO-‐KV pulled the “id” field out of every event. Yay!!!
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
“id” is not the field namelook closer Agent Starling
24
<policy id="3">Finjan HTTPS policy</policy> <cp id="5" name="AcUve Content" display_name="AcUve Content"/> <group id="5002" cp_id="5" type="0">Full profile -‐ Binary Behavior</group> <item id="28015">Format error in CRL lastUpdate field</item> <rule_comment id="2" name="Block cerUficate validaUon errors"><![CDATA[Block HTTPS content without a valid cerUficate]]></rule_comment>
We can educate Splunk on dynamically pulling the KEY and VALUE with...
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Dynamic Key Value ExtracUon...but tailored for our needs
25
REGEX for the “KEY” is \<([^\=]+)\=Less than, followed by (anything that is “not an equal sign-‐-‐greedy match) followed by an equal sign
REGEX for the “VALUE” is \”(A quote (followed by anything that is not a quote-‐-‐greedy match) followed by a quote followed by a greater than sign
<policy id="3"> <cp id="5" <item id="28015">
<policy id="3"> <cp id="5" <item id="28015">
keep going dude!
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Persist your sweet dynamic KV pa[erns
26
props.conf & transforms.conf required
Create an entry in props.conf like this:
[mym86kv]REGEX = \<([^\=]+)\=\"([^\"]+)\"\>FORMAT = $1::$2
Create an entry in transforms.conf like this:
[m86_dynamic_kv]REPORT-‐m86fields = mym86kv
<policy id="3">Finjan HTTPS policy</policy>
$1 $2Text
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Dang it! It wasn’t perfect
27
some of our events don’t finish their XML tag right a~er a quote
Create an entry in props.conf like this:
[mym86kv]REGEX = \<([^\=]+)\=\"([^\"]+)[^\>]+\>FORMAT = $1::$2
Create an entry in transforms.conf like this:
[m86_dynamic_kv]REPORT-‐m86fields = mym86kv
<rule_comment id="690" name="Log everything except Image files"><![CDATA[Logs all content passing through the system except for ......
$1 $2Text
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 28
Think you’re good?Try extracUng the “service” field
2011/07/21 19:27:22.071 [(ninja-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ninja-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
Your job is to create a mulU-‐valued field as the “service” field exists mulUple Umes in each event
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 29
Look for the obvious pa[erns
Your brain will tell you to look for “anything a~er the first comma” a~er that le~ bracket and before the
second comma
2011/07/21 19:27:22.071 [(ela4-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 30
...and your brain was wrong.
Dang... what are we gonna do now?
2011/07/21 19:27:22.071 [(ela4-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
This is NOT a “service”
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 31
What is common with “services”
2011/07/21 19:27:22.071 [(ela4-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
They’re all alphanumeric or “word” characters
0-‐9A-‐Za-‐z_
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 32
But what about the preceding text
Le~ bracket followed by some stuff, followed by a comma.. but its not consistent. SomeUmes a “(“ le~ paren is in there.
2011/07/21 19:27:22.071 [(ela4-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 33
This is a be[er match
Le~ bracket, followed by anything in this character list (greedy). Followed by a comma, and then create a capturing group of text that matches upper or lower case roman alphabet-‐-‐greedy (as many Umes as possible). End capturing group, then followed by a comma.
2011/07/21 19:27:22.071 [(ela4-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
\[[\(\-‐a-‐zA-‐Z0-‐9]+,([a-‐zA-‐Z]+),
Say the matching paZern out loud. It will help
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 34
Can’t be too hard to extend it, right?
Le~ bracket, followed by anything in this character list (greedy). Followed by a comma, and then create a capturing group of text that matches upper or lower case roman alphabet-‐-‐greedy (as many Umes as possible). End capturing group, then followed by a comma. Followed by anything
that is NOT a Le~ Bracket, followed by.....
2011/07/21 19:27:22.071 [(ela4-‐fe96,opensocial,/makeRequest,2011/07/21 19:27:21.978)[ela4-‐be04,auth,Auth2Service.recoverSubject]] [] [Auth2Service] recoverSubject(V1.21.47,OSM:1t7Dg201000:i:1311276436:1d00a2fc1f9addd936af12ed5c430a169c362af8,null,shindig,172.17.207.243,)=[Principal[3],[OSM:1t7Dg201000:i:1311276439:20d1d0b474927a301376d70f2ad5949a2241e271,false,1h]] in 1ms
\[[\(\-‐a-‐zA-‐Z0-‐9]+,([a-‐zA-‐Z]+),[^\[]+\[[\(\-‐a-‐zA-‐Z0-‐9]+,([a-‐zA-‐Z]+),
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Sad TromboneThis one has four services
35
2011/07/21 19:27:27.596 [(ninja4-‐fe29,genie,/handle,131292312,2011/07/21 19:27:27.310)[ninja4-‐be716,lmt,PbContentService.write<tetherAccountData;default>][ninja4-‐be05,tether,TetherAccountService.bindAccount][ninja4-‐be393,auth,Auth2Service.upgradeSubject]] [] [Auth2Service] upgradeSubject(V1.21.49,"INT",[LIM:131292312:s:1311276361:b8f677d957eb3f7b9622247b72374c791720bc17,true],{internalAppName=twitter-‐sync},"tether",null)=[Principal[2],[INT:131292312/twitter-‐sync:1311276447:df9dd0175bd2e6107c2dfae36dfd9a9dc11f0631,false,20y]] in 15ms
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Remember “rex”?He devours data
36
But you can make “rex” very hungry and control how much lunch he eats. By
default, he only gets “one helping of meat”
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Using max_match with rexYou limit or expand the number of Umes it runs
37
rex max_match=20 "\[[\(\-a-zA-Z0-9]+,(?<service>[a-zA-Z]+),"
Instead of that last regex that matched “two” services, lets just match one, and tell rex to repeat our pa[ern matching
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
You can persist this in config filesprops.conf & transforms.conf required
38
Create an entry in props.conf like this:
[myepicregex]REGEX = \[[\(\-a-zA-Z0-9]+,(?<service>[a-zA-Z]+),MV_ADD = TRUE
Create an entry in transforms.conf like this:
[ninjasocial]REPORT-‐ninjafields = myepicregex
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
And now for something difficultgaming logs -‐ Team Fortress
39
L 08/02/2011 -‐ 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-‐2677 2177 -‐127") (victim_position "-‐2555 2323 -‐127")
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
I need the datagaming logs -‐ Team Fortress
40
L 08/02/2011 -‐ 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-‐2677 2177 -‐127") (victim_position "-‐2555 2323 -‐127")
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Who’s who?How do we know who did what to whom?
41
L 08/02/2011 -‐ 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-‐2677 2177 -‐127") (victim_position "-‐2555 2323 -‐127")
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 42
L 08/02/2011 -‐ 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-‐2677 2177 -‐127") (victim_position "-‐2555 2323 -‐127")
actor actor_team actor_typeactor_id
actee actee_teamactee_typeactee_id
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference
Didn’t we see this slide before?How do we know who did what to whom?
43
L 08/02/2011 -‐ 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-‐2677 2177 -‐127") (victim_position "-‐2555 2323 -‐127")
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 44
L 08/02/2011 -‐ 11:46:05: "The Administrator<61><BOT><Red>" killed "MoreGun<56><BOT><Blue>" with "flamethrower" (attacker_position "-‐2677 2177 -‐127") (victim_position "-‐2555 2323 -‐127")
See that pa[ern? Remember “max_match”?
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 45
"The Administrator<61><BOT><Red>" "MoreGun<56><BOT><Blue>"
See that pa[ern? Remember “max_match”?
Using rex / mv_add, lets capture it in to some temporary “mul9-‐value” fields
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 46
The Administrator,MoreGun61,56BOT,BOTRed,Blue
“Temporary” MulUValue Fields
Using rex / mv_add, lets capture it in to some temporary “mul9-‐value” fields
actor_name_zactor_id_z
actor_type_zactor_team_z
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 47
The Administrator,MoreGun61, 56BOT,BOTRed,Blue
Evaluate & Transform with “mvindex”mul9-‐value fields have an “posi9on value” in the array
actor_name_zactor_id_z
actor_type_zactor_team_z
0 1mvindex
Thursday, August 18, 11
© Copyright Splunk 2011Splunk Worldwide Users’ Conference 48
| eval actor_name = mvindex(actor_name_z,0)| eval actee_name = mvindex(actor_name_z,1)
Its Ume for our fields to split up!mul9-‐value fields have an “posi9on value” in the array
actor_name = The Administratoractee_name = MoreGun
Thursday, August 18, 11
Resources
• regexlib.com
• regular-‐expressions.info
• gskinner.com/RegExr
• Reggy / RegExhibit
• RegexBuddy (JGSo~.com)
Thursday, August 18, 11
Questions, just ask!Michael Wilde, Splunk Ninja
Thursday, August 18, 11