+ All Categories
Home > Documents > Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular...

Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular...

Date post: 23-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
34
Regular Expressions Why use Regular Expressions Pull or filter data from larger files validation o HTML forms o GUI forms every languages enables Reg Ex o C++/Java/Python/Bash/CSH Regular expression (REs) Scanners are based on regular expressions that define simple patterns o Simpler and less expressive than BNF uses some of the same notation as EBNF Basic operations are set union, concatenation, Kleene closure o Plus: parentheses, naming patterns No recursion! Why use?? o able to name patterns is just syntactic sugar o use parentheses to group things is just syntactic sugar provided we specify the precedence and associatively of the operators (i.e., |, * and “concat”) refers to syntax within a programming language that is designed to make things easier to read or to express. 1
Transcript
Page 1: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Regular ExpressionsWhy use Regular Expressions

Pull or filter data from larger files validation

o HTML formso GUI forms

every languages enables Reg Exo C++/Java/Python/Bash/CSH

Regular expression (REs) Scanners are based on regular expressions that define simple patterns

o Simpler and less expressive than BNF uses some of the same notation as EBNF Basic operations are set union, concatenation, Kleene closure

o Plus: parentheses, naming patterns No recursion! Why use??

o able to name patterns is just syntactic sugaro use parentheses to group things is just syntactic sugar provided we

specify the precedence and associatively of the operators (i.e., |, * and “concat”)

refers to syntax within a programming language that is designed to make things easier to read or to express.

A regular language is a language that can be defined by a regular expression http://youtu.be/394NxYBDaiA (about 11 minutes)

o does a great job of explaining much of below great training website!!

o https://regexone.com/

1

Page 2: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Basic Regular Expression NotesSyntax Meaning Example Matched DFS !Match. Any single non-null

characterSh.t Shot, Shut, etc.. - Sht, Shoot,

a This particular character alone

a aAny other character than a

ab This particular characters joined alone

tha. that, than, thal, thay

Any other joined character than ab

a|b Or demo|example demo, example c, ab, ba, aa

* Zero or more times go*gle gooooogle, gogle, google

ggle, gooogoogle

[abc] any of these single characters

tha[nt] than, that tha, thant

[a-d] any of these single characters in range

so[b-f] sob, soc, sod, soe, sof

so, sobb, soy

[^abc] none of these characters

2

Page 3: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

(notice ^ leads off)[^a-d] not a character

within this range(notice ^ leads off)

so[^b-f] soa,sog, soh, sot, sos

sob, soc, sod, soe,sof

^ starts withnotice NOT within [grouping]

^The These, The, Theatre, Theta

these, Tomas, Darn

$ string or ϵ ends with $ton cotton, Clinton, ton, Scraton, Easton

jerk, certain,

? Zero or one character(need a value in front)

(dos)?e

doss?e(s in front of ? is targetted)

dose, e

dosse, dose, dossse

nose, doe

doddoss, dosss

+ one or more(need a value in front)

(dos)+e

doss+e(s in front of ? is targetted)

is the same as below, but less resources

3

Page 4: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

{n} n times exactly(need a value in front)

w{3}(nag){3} = ???

www ww, w, wwww

{n,m} from n to m times(need a value in front)

(blah){3,5} blahblahblah, blahblahblahblah

blah,blahblahblah blahblahblah

{n,} at least n times(need a value in front)

[] group\ Escape\s White Space\S non-White Space\d digit character\D non-digit character\w Word\W non-Word

(punctuation, spaces)

4

Page 5: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Simple Union thankfully nothing special, but there is order

Union Example 1A={grand, ε}, B={father, mother} What is AB? (A is then followed by a B)

AB={father, mother, grandfather, grandmother, …}

RE operator “+”, “ε”, “.” and “?” The operator “?” means ZERO or ONE!! (Optional)

o This is different than *, which is 0 or MANY

? operator ab?c

zero or 1

Epsilon εo Sometimes we’d like a token that represents nothingo This makes a regular expression matching more complex, but can be

useful The + operator is commonly used to mean “one or more repetitions” of a

patterno We can always do without this

letter+ = = letter letter*

o So the + operator is just syntactic sugar

+ operator ab+c

one or more

5

Page 6: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

The dot “.” In Reg Expressiono matches a single character, without caring what that character is

dot CANNOT be epsilon

. operator a.b*c

6

Page 7: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Regular Expression Edge Values edge values in an FA (Finite Automata) can be of varying values and setup for what we are doing, each edge will contain ONE value

Simplifying RE Edges for Reg. Exp. UnderstandingLoop space string match

Wildcard ? and . ε (epsilon)

(more next page)

7

Page 8: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

either or

Difference in Grouping there is a big difference in grouping in Reg. Ex.

o grouping options ( ) all together, whatever is within the ( )s [ ] select only ONE from whatever is within the [ ]s

Grouping Differences(ab) [ab]

(ab)* [ab]*

Kleene Example 18

Page 9: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

A={grand, ε}, B={father, mother}

What is A*B ?????

A*B={father, mother, grandfather, grandmother, grandgrandfather, …}

Kleene Example 2

(a | b | c)* = {"ε ", "a", "b", "c", "aa", "ab", ..., "bccabb" ...}[a – c]* = …

Order of operations this is important, can really switch things up

Precedence of operators

( )s* +Concatenation|All the operators are left associative

Example(A) | ((B)* (C)) is equivalent to A | B * C

9

Page 10: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Complete this exercise. $ is the delimiter character showing where the regular expression begins and ends. Strings to be matched start and end with non-blank characters: there are no leading or trailing blanks.

. match any character. WILDCARD (for one char.)* means zero or more instances of? means optional+ means one or more instances of

There can be more than ONE correct answer per question

1 Which of the following matches regexp $a(ab)*a$1) abababa2) aaba3) aabbaa4) aba5) aabababa 2 Which of the following matches regexp $ab+c?$1) abc2) ac3) abbb4) bbc 3 Which of the following matches regexp $a.[bc]+$1) abc2) abbbbbbbb3) azc4) abcbcbcbc5) ac6) asccbbbbcbcccc Answers Try these on your own, Positive list should be all red, Negative should be all black#1. http://regex.sketchengine.co.uk/cgi/ex1.cgi#2. http://regex.sketchengine.co.uk/cgi/ex2.cgi #3. http://regex.sketchengine.co.uk/cgi/ex3.cgi Answersb:

10

Page 11: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

The use of a starting character $ in the example above / in many JavaScript versions depends on the language

o (none in C++/Java needed)

Filtering remember you’re are given a “massive” amount of data in which you are

searching for matches the reg. ex. is going to filter out the strings that don’t match and produce

matches

Where to go for support Because I can’t remember everything

o http://www.regular-expressions.info/anchors.html Tutorials

o https://regexcrossword.com/

1. In regular-expression.info, review the documentation on the left side of the page for:

a. Word Boundariesb. Repetitionc. Dot

2. In regexcrossword.coma. create and account b. start the “Tutorial” portionc. (as of 3/10/17) last one (Space) is a little tricky, and may not exactly tell

you that the Tutorial portion was completed.d. Use the “Help” to view the various Reg. Ex. forms

3. Complete this problema. http://regex.sketchengine.co.uk/cgi/ex4.cgi

11

Page 12: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Testing your Regular Expression You can certainly buy/download a Regular Expression editor that will show

results RegEx testers on defined text

o these will already have sample text that you will try (filter) your Reg. Ex. on

o http://regexr.com/ try /l{2}/g

Reg. Ex. and Words and spacing around Data some of the other features will help with real life applications

Reg. Ex. handling words and spacing\s White Space\S non-White Space\d digit character\D non-digit character\w Word\W non-Word (punctuation, spaces)

Using RegEx in other Languages There are differences in some languages!!

o Minor but when programming can be hauntingo Be careful

Pythono https://www.debuggex.com/cheatsheet/regex/python

JavaScripto https://www.debuggex.com/cheatsheet/regex/javascript

What were the differences between the two??

12

Page 13: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Coded examples of RegEx In various languages

Languages and RegExJava – Simple Stringimport java.util.regex.*;

String expression = "JHGADEEZroots";String pattern = "(DEE?)";

Pattern cool = Pattern.compile(pattern);Matcher match = cool.matcher(expression);if (match.find( )){

System.out.println("Found value: " + match.group(0) ); //System.out.println("Found value: " + match.group(1) ); //System.out.println("Found value: " + match.group(2) ); }

else { System.out.println("NO MATCH"); }

Java – File IOimport java.util.regex.*;

…System.out.println("Regex on a text file");String allData = "";try{

String line2 = "";FileInputStream fstream = new FileInputStream("courses.txt");BufferedReader br = new BufferedReader(new InputStreamReader(fstream));while((line2 = br.readLine()) != null) { allData += line2; }br.close();

} catch (Exception ex) { }

// String to be scanned to find the pattern.String pattern = "[A-Z]{4}[0-9]{3}";// Create a Pattern objectPattern r = Pattern.compile(pattern);Matcher m2 = r.matcher(allData);while(m2.find()) { System.out.println(m2.group() ); }

13

Page 14: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Python (both simple and File IO)#!/usr/bin/pythonimport reimport sys

print("Regex on a string")line = "This class is CMSC433";searchObj = re.findall( r'[A-Z]{4}[0-9]{3}', line)for i in range (0, len(searchObj)):

print(searchObj[i])

print("Regex on a text file")allData = ""with open("courses.txt", "r") as f:

for line in f:allData += line

searchObj = re.findall('[A-Z]{4}[0-9]{3}', allData)for i in range (0, len(searchObj)):

print(searchObj[i])

C++ - Simple String#include <iostream>#include <regex>#include <string>

using namespace std;

int main(){string target = "Lupoli needs more work.";string replacement = "a vacation.";string result;regex vacation("m.*");

cout<<"Before regex replace: "<<target<<endl;cout<<"regex is: m.*"<<endl;

result = regex_replace(target, vacation, replacement);cout<<"After regex replace: "<<result<<endl;

return 0;}

14

Page 15: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

JavaScript Code for Regular Expressions has some items to watch for / and / in front and behind the string you are looking for

o much like http://regexr.com/ uses string commands in which the regular expression is within those

functionso search

returns index number of where it can be found try it here notice VERY limiting!!! will only return the last instance!! may have to create your own search function that will return an

array of starting index valueso replaceo test

returns a Boolean to see if the regular expression passed in returns anything

Sample test codevar dateTime = /\d\d-\d\d-\d\d\d\d \d\d:\d\d/;console.log(dateTime.test("30-01-2003 15:20")); // → trueconsole.log(dateTime.test("30-jan-2003 15:20")); // → falseconsole.log(/'\d+'/.test("'123'")); // → trueconsole.log(/'\d+'/.test("''")); // → falseconsole.log(/'\d*'/.test("'123'")); // → trueconsole.log(/'\d*'/.test("''")); // → true(from http://eloquentjavascript.net/09_regexp.html)

15

Page 16: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

modifierso global flags used for the file/data read in

JavaScript Modifier DescriptionsModifier Descriptioni Perform case-insensitive matchingg Perform a global match (find all matches rather than stopping after the

first match)m Perform multiline matching

Using the example above, create a new function completeSearch that will return and display a list of indices on where the string was found. This should help significantly.Answerb:

16

Page 17: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

The exec JavaScript Command tokenizes the data returns the full string match or each match within data

I hate you Lupoli, now you tell me about exec<!DOCTYPE html><html><body>

<p>Search a string for "w3Schools", and display the position of the match:</p>

<button onclick="myFunction()">Try it</button>

<p id="demo"></p>

<script>function myFunction() {

    var str = "Visit W3Schools! W3SCHOOLS";    completeSearch(str);    //document.getElementById("demo").innerHTML = n;}

function completeSearch(str) {var matches = [];

    var regex = /w3Schools/gi;    var match = "";

    while(match = regex.exec(str))     matches.push(match.index);

    var res = "";    for(var i = 0; i < matches.length; i++) {     res += matches[i];        if(i < matches.length - 1)         res += ", ";    }

    document.getElementById("demo").innerHTML = res;}</script>

</body></html>

This should display 6 and 17 to the screen. Why? Try another, changing the variables str and regex.

17

Page 18: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Game-planning for a Reg. Ex. application Overall gameplan

1. what the application is looking for is first2. knowing what is in the file first is second

a. and how it is set upb. you’ll know why XML rocks

3. How are we to read the filea. place it into a text boxb. upload the file

4. How are we to display the results?

Exercises:The data for both exercises can be found here (as of 1/6/16)https://earthquake.usgs.gov/earthquakes/feed/v1.0/quakeml.phpPast Day All Earthquakes Save the file as a .txt fileYou can also copy and paste into the regexr website for #1 and 2 belowIn regexr.com, remember to use Library Cheatsheet

Exercise 1 Exercise 21. Display town location of earthquake Display magnitude

2. look for <text> tag look for <mag><value> tag3. use http://regexr.com/ use http://regexr.com/4. use http://regexr.com/ use http://regexr.com/

part 1 - <text>..</text> is fine within the answerpart 2 – see if you can leave out <text>..</text>

(1 -4 above are from 1-4 in game planning)Answerb:

Exercise 3 Exercise 41. Display town location of earthquake Display magnitude2. look for <text> tag look for <mag><value> tag3. use HTML file uploader (use this as help) same as 4. display within same HTML page using innerText same as

18

Page 19: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

19

Page 20: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Substituting using Regular Expressions /s usually is the command for some scripting languages like BASH/CSH/KSH

Pull the timestamp with whatever language you wish http://www.usgovxml.com/examples/public/merged_catalog.xml

20

Page 21: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Solutions#1 – Using DFSM

#1 – BUT a(ab)+a

#3

21

Page 22: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Exercise 1 Exercise 2 Exercise 3\A(pi|sp|sl|re).* .*ap( |et|h|/|9|o|t).* .*(afgk|af.g.k|afg.k|af[a-z]gk).*r*e*s*l*a*p[ioa ]te*w*o* \A(ra|ta|ap|wr|sa|87|ap)[^r][^l].* [r,b]af.*|\Aaffgf.*(re)?s?(la)?p(a|e|o|i| )t(e|w|o)* s*w*8*7*r*t*(ap)o*(et)*[ t]*[hr9/][eta]*[mryhca]* [br]*aff*gf* *[hk][aik][nhtge]*.*p.t.* (had several times) .*(ap).?t.* *af+g.[a-z]+re|s+la|s|(p.t)+e?|wo? .*ap.?t.* (had several times) [br]?af+g.?k[ingahet]*(p|(sp)|(sl)|(re))+[aeiou]p?\s?t(e|wo)?

[rtws87]*ap[\s/9oe]?[thm]+e?[mcary]* .*af+g.*k.* (had several times)

^((pi)|[str]).*([oet]|(ot) .?af+g.?k.*.*p.{1}t.* .?af{1,2}g.*k.* .?af{1,2}g.?k.*.*(r|s|p|l)+.t.* .*(ap).?t.h*.* .?(af).?g.?k.*[a-z]*p.{1}t[a-z]* [a-z0-9]*ap.?t[a-z]* [a-z]+fg.?k[a-z]

completeSearchby Luke Carrico S17function globalSearch() {    //String to search    var str = “Visit W3Schools! W3Schools W3Schools”;    //Index of search result    var n = str.search(/W3Schools/i);    var index = [];    var offset = 0;    //search returns -1 when no match is found    while (n != -1)    {        //add the found index to index        //offset is included because part of the string is removed        //every time        index.push(n + offset);        //offset += start index of the string + size of the string        //keep track of what has already been removed        offset += n + 9;        //remove everything that has been searched        str = str.substr(n + 9);        //search for another match        n = str.search(/W3Schools/i);    }    document.getElementById(“demo”).innerHTML = index;}

by Dohyun Roh S17’

22

Page 23: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

RegExr and EarthQuake Data1. Part A - /<text>(\w+.)+/g 2. Part A -

/<mag>\n<value>(.*)<\/value>/gm1. Part B - (?![<text>]).*(?=<\/text>) 2. Part B -Part 3<input type="file" id="fileinput" /><script type="text/javascript">  function readSingleFile(evt) {    //Retrieve the first (and only!) File from the FileList object    var f = evt.target.files[0];

    if (f) {          var r = new FileReader();          r.onload = function(e) {          var contents = e.target.result;          var regex = /<text>.*<\/text>/g          console.log(regex.exec(contents))      }      r.readAsText(f);    } else {      alert("Failed to load file");    }  }

  document.getElementById('fileinput').addEventListener('change', readSingleFile, false);</script>

// originally from// http://www.htmlgoodies.com/beyond/javascript/read-text-files-using-the-javascript-filereader.html#fbid=lVKVjUCWdjk

23

Page 24: Why use Regular Expressionsfaculty.cse.tamu.edu/.../RegularExpressionsNotes.docx · Web viewRegular Expressions Why use Regular Expressions Pull or filter data from larger file s

Sourceshttp://www.csee.umbc.edu/~damas1/courses/cmsc433/fall2014/tools/regex-evaluator/index.phphttp://www.regular-expressions.info/ http://www.funduc.com/regexp.htm

Search and ReplaceJava - http://www.javamex.com/tutorials/regular_expressions/search_replace.shtml#.VtV-QfkrKUk

http://eloquentjavascript.net/09_regexp.html

24


Recommended