DerbyCon 7.0 Legacy: Regular Expressions (Regex) Overview

Post on 21-Jan-2018

166 views 1 download

transcript

Regular Expressions (Regex) Overview

September 24, 2017

Matt Scheurer

@c3rkah

Slides:https://www.slideshare.net/cerkah

((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))

About Me

Matt Scheurer

Systems Security EngineerWorking in the Financial Services Industry

Meeting Organizer for the CiNPA Security SIGDerbyCon 5.0 “Unity” Speaker

Certifications: CompTIA Security+, MCP, MCPS, MCTS, MCSA, and MCITP

What Regular Expressions are Not!

● The term “Regular Expressions” or often simply called “Regex” for short should not be confused with “Old Sayings”– Adages, Allegories, Aphorisms, Axioms, Clichés,

Epigrams, Idioms, Hyperboles, Maxims, Platitudes, Proverbs, Truisms, etc.

When it comes to “Old Sayings”...

You would be hard pressed to beat the recollection and retelling of old sayings than my own mother...

What is Regex?

Regex is a common syntax used to match patterns when parsing text data or output. Regex

capture groups are used to extract strings of specific data into reference points for retrieval or

processing.

Why learn Regex?

● Regex is a great skill set to have in the back pocket of nearly any interdisciplinary role across the Information Technology landscape

● Uses include:– Application and Software Development– Database queries

– Linux Administration and power user commands such as grep, awk, sed, find, etc.

– Searching through any type of text data or system logs

Regex uses in InfoSec

● Content filtering● Input validation● NGFW / UTM Layer 7 definitions● Parsing large volumes of data or system logs to pick out specific

data points of interest● SIEM systems

– Building or refining entire searches, or performing advanced parsing to narrow down extraneous information

– Finding specific log events or log event items and sub-data

● Understand the underpinnings of many security products and utilities

Regex Variations and Variances

Different flavors of Regex

● While all versions of Regex share common conventions there are proprietary differences across the various Regex engines

● Popular Regex Engines include:– Perl, PCRE, PHP, .NET, Java, JavaScript,

XRegExp, VBScript, Python, Ruby, Delphi, R, Tcl, POSIX, and others

Regex Resources

● Online Learning Site - https://regexone.com/● Regex Test Site - http://regexr.com/● Tutorial Site - http://www.rexegg.com/● Countless Additional Resources -

https://www.google.com/search?q=regex● Further Reading -

https://en.wikipedia.org/wiki/Regular_expression

Let’s Begin...

Regex Basics – Simple Matching

● Simply type in exactly what you are trying to match

● Text string pattern matching is case-sensitive!– NOTE: certain non-alpha-numeric characters may

require an escape prefix to match

● \

Regex Basics – Text Matching

● In addition to typing in an exact text string for an exact match “\w” will match a single alphanumeric character– Matches any word character (alphanumeric &

underscore)

– Only matches low-ascii characters (no accented or non-roman characters)

Regex Basics – Number Matching

● In addition to typing in an exact numeric string for an exact match “\d” will match a single digit.– Matches any digit character (0-9)

Regex Basics – Matching a Space

● In addition to typing in an exact string with a space included for an exact match “\s” will match a space in text– Matches any whitespace character (spaces, tabs,

line breaks)

Regex Basics – Matching Opposites

● We just looked at a few character classes– All character classes are case-sensitive

– Specifying those character classes in upper-case changes the pattern match to match the opposite

● “\W”, “\D”, and “\S” respectively translate to– Not a word character

– Not a digit

– Not whitespace

Regex Basics – Quantifiers

● “.” matches any single character

● “+” suffix matches one or more repetitions

● “*” suffix matches zero or more repetitions

● “?” suffix means the character is optional

● “|” is an ‘or’ separator between characters

● “^” is a ‘not’ specifier to exclude a character– Enclosed in square brackets prefixing the pattern

– [^<pattern>]

Regex Basics – Escaped Characters

● What if I want to match escaped characters such as a “., +, *, ?, |, ^, etc.” in my pattern against the data?

– Prefix reserved escape characters with a “\”

● What if I want to match a “\” in my pattern against the data?– \\

Regex Basics – Ranges

● In addition to quantifiers (wild cards), ranges may be specified with pattern matching

– Characters are enclosed inside of square brackets “[“ “]” and separated by a hyphen “-”

● Examples:– [a-z], [A-Z], and [0-9]

Regex Basics – Repetitions

● In addition to a range quantifier, repetitions may be specified with pattern matching

– The number of character occurrences are specified inside of curly brackets/braces “{“ “}”, or separated by a comma “,” for a range of occurrences

● A{4} matches exactly “AAAA”● A{1,4} matches “A”, “AA”, “AAA”, or “AAAA”● A{4,} matches four or more consecutive “A’s”

Regex Basics – Line Matching

● The beginning of a line and/or end of a line may be specified in Regex pattern matching

– “^”, matches the beginning (starts with) of a line

– “$”, matches the end of a line

– “^<pattern>$”, matches when the line begins with and ends with the specified pattern

Regex Capture Groups

● The true power of Regex is fully realized with defined capture groups

● These essentially define array like variables to pattern matched data– This is how we return the precise data we want,

while ignoring the content we do not care about

● Capture groups are defined by patterns enclosed inside of parenthesis “(“ “)”

Regex Sub-Capture Groups

● Regex sub-capture groups can be defined by using nested parenthesis “(“ “)”– Example:

● “(Pattern (match))”– First Capture Group = Pattern match– Second Capture Group = match

Regex Pattern Matching Problems?

Really Stuck? Just Remember...

Regex Example 1

● Threat Feed: malware-domains– Latest Blackhole-DNS File list

– "BOOT" format

– http://malware-domains.com/files/BOOT.zip

● Objective: Capture a list of FQDN’s

Example 1 – Data Format

Example 1 – Expression

PRIMARY\s(\S+)Capture Group

amazon.co.uk.security-check.ga

autosegurancabrasil.com

christianmensfellowshipsoftball.org

dadossolicitado-antendimento.sad879.mobi

hitnrun.com.my

houmani-lb.com

maruthorvattomsrianjaneyatemple.org

paypalsecure-2016.sucurecode524154241.arita.ac.tz

tei.portal.crockerandwestridge.com

tonyyeo.com

update-apple.com.betawihosting.net

Regex Example 2

● Threat Feed: malware-domains– Complete Zone File (bind)

– Spyware Domains– http://malware-domains.com/files/spywaredomains.zones.zip

● Objective: Capture a list of FQDN’s

Example 2 – Data Format

Example 2 – Expression

zone\s"(\S+)"

Capture Group

amazon.co.uk.security-check.ga

autosegurancabrasil.com

christianmensfellowshipsoftball.org

dadossolicitado-antendimento.sad879.mobi

hitnrun.com.my

houmani-lb.com

maruthorvattomsrianjaneyatemple.org

Regex Example 3

● Threat Feed: DNS BlackHole– IP Blacklist

– http://malc0de.com/bl/IP_Blacklist.txt

● Objective: Capture a list of IP addresses

Example 3 – Data Format

Example 3 – Expression

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})

Capture Group

185.165.29.49

185.91.116.237

76.74.167.171

193.227.248.241

149.210.167.172

216.114.192.21

89.255.9.102

86.109.162.144

85.25.203.171

209.90.88.139

Regex Example 4

● Threat Feed: SpamCop– Spam in progress

– Source of Mail– wget https://www.spamcop.net/w3m?action=inprogress

● Objective: Capture a list of IP addresses

Example 4 – Data Format

Example 4 – Expression

>(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})<

Capture Group

182.139.29.84

201.37.197.39

182.151.104.105

119.5.175.57

119.5.175.57

Regex Example 5

● Threat Feed: Malware Domain List– Complete database in CSV format– http://www.malwaredomainlist.com/mdlcsv.php

● export.csv

● Objective: Capture a list of FQDN’s

Example 5 – Data Format

Example 5 – Expression

"\d{4}\/\d{2}\/\d{2}_\d{2}:\d{2}","(\w[\.|\-|\w]+)

Capture Group

down.mykings.pw

ssl-6582datamanager.de

privatkunden.datapipe9271.com

alegroup.info

fourthgate.org

dieutribenhkhop.com

dieutribenhkhop.com

amazon-sicherheit.kunden-ueberpruefung.xyz

sarahdaniella.com

Keeping the Regex Saw Sharpened

Upcoming Speaking Engagements

Questions?

The EndBig Thank You and shout out to my dear sweet mother! She’s a very special person in my life, and a fantastic Grandmother!

...Plus she endured the unenviable task of raising me as a child and teenager. :)

Pictured above: My mom with my son

Love you mom!