Date post: | 12-Jan-2017 |
Category: |
Software |
Upload: | danairat-thanabodithammachari |
View: | 514 times |
Download: | 4 times |
02 - Perl ProgrammingRegular Expression
97
Danairat T.
Line ID: Danairat
FB: Danairat Thanabodithammachari
+668-1559-1446
Danairat T.
Perl Regular Expressions
• A powerful, flexible, and efficient text
processing. Regular expressions like a mini
programming language.
• You can use Regular expressions to verify
whether input match with text pattern within
a larger body of text, to replace text matching
the pattern with other text.
98
Danairat T.
Regular Expressions - Topics
• Match Operator
– Match Operator Modifiers
• Substitution Operator– Substitution Operator Modifiers
• Translation Operator
– Translation Operator Modifiers
• Regular Expression Elements
– Metacharacters
– Character Classes
– Anchors
– Pattern Quantifiers
– Pattern Match Variables
– Backreferencing
99
Danairat T.
Match Operator
100
• The match operator represents by m//• We can use the match operator to determine text or string whether match to
provided pattern. The basic form of the operator is
m/PATTERN/;
• The =~ is used as regular expression match between variable and the pattern.
• The !~ is used as regular expression NOT match between variable and the pattern.
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";if ($myString =~ m/one/) {
print "match.";}
exit(0);
MatchEx01.pl
Results:-
match.
Danairat T.
Match Operator
101
• We can omit the m to be only //
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";if ($myString =~ /one/) {
print "match.";}
exit(0);
MatchOmitTheMEx01.pl
Results:-
match.
Danairat T.
Match Operator
102
• The m sometime make the code more clear
#!/usr/bin/perluse strict;use warnings;
my $myString ="/usr/local/lib";if ($myString =~ /\/usr\/local\/lib/) {
print "match without m\n";}
if ($myString =~ m(/usr/local/lib)) {print "match with m\n";
}
exit(0);
MatchWithMEx01.pl
Results:-
match without m
match with m
Danairat T.
Match Operator Modifiers
103
Modifier Meaning
g Match globally, i.e., find all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o
Evaluates the expression only once. Use this modifier
when the pattern is a variable running in the loop and
may be changed during running.
s Treat string as single line.
x
Allows you to use white space in the expression
for clarity.
Danairat T.
Match Operator Modifiers
104
• Normally, the match returns the first valid match for a regular
expression, but with the /g modifier in effect, all possible
matches for the expression are returned in a list
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";foreach my $myMatch ($myString =~ /e/g) {
print "match.\n";}exit(0);
GlobalMatchEx01.pl
Results:-
match.
match.
match.
Danairat T.
Match Operator Modifiers
105
• The /i is used for match case insensitive.
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";foreach my $myMatch ($myString =~ /e/ig) {
print "match.\n";}exit(0);
CaseInsensitiveGlobalMatchEx01.pl
Results:-
match.
match.
match.
match.
Danairat T.
Match Operator Modifiers
106
• the /m modifier is used, while ``^'‘ (leading with) and ``$''
(ending with) will match at every internal line boundary.
#!/usr/bin/perluse strict;use warnings;
my $myString =<<END_OF_LINES;Hello EveryoneEveryoneEND_OF_LINES
foreach my $myMatch ($myString =~ /^e/igm) {print "match.\n";
}exit(0);
MultilinesEx01.pl
Results:-
match.
match.
Danairat T.
Substitution Operator
107
• The Substitution operator represents by s///• The Substitution operator is really just an extension of the match operator that
allows you to replace the text matched with some new text. The basic form of the
operator is
s/PATTERN/REPLACEMENT/;
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";my $myCount = $myString =~ s/Hello/Hi/;print "$myString \n";print "$myCount \n";
exit(0);
SubstituteEx01.pl
Results:-
Hi Everyone
1
Danairat T.
Substitution Operator
108
• Language supported in the Substitution operator
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";my $myCount = $myString =~ s/Hello/สวสัด/ี;
print "$myString \n";print "$myCount \n";exit(0);
SubstituteEx02.pl
Results:-สวสัดี Everyone
1
Danairat T.
Substitution Operator Modifiers
109
Modifier Meaning
g Match globally, i.e., find all occurrences.
i Do case-insensitive pattern matching.
m Treat string as multiple lines.
o
Evaluates the expression only once. Use this modifier
when the pattern is a variable running in the loop and
may be changed during running.
s Treat string as single line.
x
Allows you to use white space in the expression
for clarity.
e
Evaluates the replacement as if it were a Perl
statement, and uses its return value as the replacement
text
Danairat T.
Substitution Operator Modifiers
110
• The Substitution operator with \L, \u, \i, \g can be
used to convert the character case
#!/usr/bin/perluse strict;use warnings;
my $myString = "hELlo eveRyoNe";
# the \w is match any alphanumeric# the + is match one or more than onemy $myCount = $myString =~ s/(\w+)/\u\L$1/ig;print "$myString \n";print "$myCount \n";
exit(0);
ChangeCaseEx01.pl
Results:-
Hello Everyone
2
Danairat T.
Substitution Operator Modifiers
111
• Using substitute with /m to match multiline text
MultiLinesSubstituteEx01.pl
#!/usr/bin/perluse strict;use warnings;
my $myString =<<END_OF_LINES;Hello EveryoneEveryoneEND_OF_LINES
$myString =~ s/^every/Any/igm;print $myString . "\n";
exit(0);
Results:-
Hello
Anyone
Anyone
Danairat T.
Substitution Operator Modifiers
112
• The /e modifier causes Perl to evaluate the REPLACEMENT text as if it
were a Perl expression, and then to use the value as the replacement
string. We’ve already seen an example of this when converting a date from
traditional American slashed format into the format:
$c =~ s{(\d+)/(\d+)/(\d+)}{sprintf("%04d%02d%02d",$3,$2,$1)}e;
• We have to use sprintf in this case; otherwise, a single-digit day or month
would truncate the numeric digits from the eight required—for example,
26/3/2000 would become 2000326 instead of 20000326.
Danairat T.
Translation Operator
113
• The tr function allows character-by-character translation. The
following expression replaces each a with e, each b with d,
and each c with f in the variable $sentence. The expression
returns the number of substitutions made.
$sentence =~ tr/abc/edf/
• Most of the special RE codes do not apply in the tr function.
However, the dash is still used to mean "between". This
statement converts string to upper case.
$sentence =~ tr/a-z/A-Z/;
Danairat T.
Translation Operator Modifiers
114
Modifier Meaning
c Complement SEARCHLIST.
d Delete found but unreplaced characters.
s Squash duplicate replaced characters
Danairat T.
Translation Operator Modifiers
115
• The /c modifier changes the replacement text to be
the characters not specified in SEARCHLIST.
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";my $myCount = $myString =~ tr/a-zA-z/-/c;print "$myString \n";print "$myCount \n";exit(0);
TrEx01.pl
Results:-
Hello-Everyone
1
Danairat T.
Translation Operator Modifiers
116
• The /d modifier removes any character in the search
list
#!/usr/bin/perluse strict;use warnings;
my $myString = 'He@l*lo E%very$one';my $myCount = $myString =~ tr/@$%*//d;print "$myString \n";print "$myCount \n";
exit(0);
TrEx02.pl
Results:-
Hello Everyone
4
Danairat T.
Translation Operator Modifiers
117
• The /s modifier performs converting the same
sequences character into a single character.
#!/usr/bin/perluse strict;use warnings;
my $myString = "Hello Everyone";my $myCount = $myString =~ tr/a-zA-Z//s;print "$myString \n";print "$myCount \n";
exit(0);
TrEx03.pl
Results:-
Helo Everyone
13
Danairat T.
Metacharacters
118
Symbol Atomic Meaning
\ Varies
Treats the following character as a
real character
^ No True at beginning of string (or line, if /m is used)
$ No True at end of string (or line, if /m is used)
| No Alternation match.
. Yes
Match one character except the
newline character.
(...) Yes Grouping (treat as a one unit).
[...] Yes
Looks for a set and/or range of characters, defined as a
single character class, The [...] only
represents a single character.
Danairat T.
Metacharacters
119
• The \ to match any escape sequence character
#!/usr/bin/perluse strict; use warnings;
print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /\t/) {
print "matched.";}exit(0);
UsingBackSlashEx03.pl
Results:-
<Please enter the [tab] to match
with pattern>
Danairat T.
Metacharacters
120
• The ^ to match the beginning of string
#!/usr/bin/perluse strict; use warnings;
print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /^The/) {
print "matched.";}exit(0);
MatchBeginningEx03.pl
Results:-
<Please enter the word start
with “the“ to match with
pattern>
Danairat T.
Metacharacters
121
• The $ to match the ending of string
#!/usr/bin/perluse strict; use warnings;
print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /\.$/) {
print "matched.";}exit(0);
MatchEndingEx03.pl
Results:-
<Please enter the word end
with “.“ to match with pattern>
Danairat T.
Metacharacters
122
• The | to perform alternation match.
#!/usr/bin/perluse strict; use warnings;
print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /apple|orange/) {
print "matched.";}exit(0);
MatchSelectionEx03.pl
Results:-
<Please enter “Apple” or
“Orange” to match with pattern>
Danairat T.
Metacharacters
123
• The period . to match any single character
#!/usr/bin/perluse strict; use warnings;
print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /b.ll/) {
print "matched.";}exit(0);
UsingDotEx03.pl
Results:-
<Please enter the bill or bull or
ball to match with pattern>
Danairat T.
Metacharacters
124
• The period . to match any single character
#!/usr/bin/perluse strict; use warnings;
print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /b.ll/) {
print "matched.";}exit(0);
UsingDotEx03.pl
Results:-
<Please enter the bill or bull or
ball to match with pattern>
Danairat T.
Character Classes
125
Code Matches
\\\\dddd A digit, same as [0[0[0[0----9]9]9]9]
\\\\DDDD A nondigit, same as [^0[^0[^0[^0----9]9]9]9]
\\\\wwwwA word character (alphanumeric), same as [a[a[a[a----zAzAzAzA----Z_0Z_0Z_0Z_0----9]9]9]9]
\\\\WWWW A non-word character, [^a[^a[^a[^a----zAzAzAzA----Z_0Z_0Z_0Z_0----9]9]9]9]
\\\\ssss A whitespace character, same as [ [ [ [ \\\\tttt\\\\nnnn\\\\rrrr\\\\f]f]f]f]
\\\\SSSS A non-whitespace character, [^ [^ [^ [^ \\\\tttt\\\\nnnn\\\\rrrr\\\\f]f]f]f]
\\\\CCCC Match a character (byte)
\\\\pPpPpPpP Match P-named (Unicode) property
\\\\PPPPPPPP Match non-P
\\\\XXXX Match extended unicode sequence
Danairat T.
Character Classes
126
Code Matches
\\\\llll Lowercase until next character
\\\\uuuu Uppercase until next character
\\\\LLLL Lowercase until \E
\\\\UUUU Uppercase until \E
\\\\QQQQ Disable pattern metacharacters until \E
\\\\EEEE End case modification
Danairat T.
Anchors
127
Anchors don't match any characters; they match places within a string.
Assertion Meaning
^̂̂̂Matches at the beginning of the string (or line, if /m/m/m/m is
used)
$$$$ Matches at the end of the string (or line, if /m is used)
\\\\bbbb Matches at word boundary (between \\\\wwww and \\\\WWWW)
\B Matches a non-word boundary
\A Matches at the beginning of the string
\Z Matches at the end of the string or before a newline
\z Matches only at the end of the string
\G Matches where previous m//g left off (only works with /g modifier).
Danairat T.
Pattern Quantifiers
128
• Pattern Quantifiers are used to specify the number of
instances that can match.
the quantifiers have a notation that allows for minimal matching. This notation uses a question mark immediately following the quantifier to force Perl to look for the earliest available match.
Maximal Minimal Allowed range
{{{{nnnn,mmmm}}}} {n{n{n{n,m}?m}?m}?m}? Must occur at least n times but no more than m times
{n,}{n,}{n,}{n,} {n,}?{n,}?{n,}?{n,}? Must occur at least n times
{n}{n}{n}{n} {n}?{n}?{n}?{n}? Must match exactly n times
**** *?*?*?*? 0 or more times (same as {0,})
++++ +?+?+?+? 1 or more times (same as {1,})
???? ???????? 0 or 1 time (same as {0,1})
Danairat T.
Character Classes
129
• Example
#!/usr/bin/perluse strict;use warnings;
my $myString ="Hello 111Every2343one";
if ($myString =~ /^(\w+)(\s+)(\d+)(\w+)(\d+)(\w+)$/) {print "match." . "\n";
}exit(0);
MatchChrClassEx01.pl
Results:-
<Please enter the bill or bull or
ball to match with pattern>
Danairat T.
• Example
Character Classes
130
#!/usr/bin/perluse strict;use warnings;
my $myString ="Hello 111Every2343one";
if ($myString =~ /^(\w+)(\s+)(\d{1,3})(\w+)(\d{1,4})(\w+)$/) {print "match." . "\n";
}exit(0);
MatchChrClassEx02.pl
Results:-
<Please enter the bill or bull or
ball to match with pattern>
Danairat T.
Pattern Match Variable $1, $2, …
131
• Parentheses () not only to group elements in a regular
expression, they also remember the patterns they
match.
• Every match from a parenthesized element is saved to
a special, read-only variable indicated by a number.
• Using \1, \2,.. to recall a match within the matching
pattern.
• Using $1, $2,... to recall a match outside of the
matching pattern.
Danairat T.
Pattern Match Variable $1, $2, …
132
#!/usr/bin/perluse strict;use warnings;
my $myString = "Everyone Hello";my $myCount = $myString =~ s/(\w+)\s(Hello)/$2 $1/;
print "$myString \n";print "$myCount \n";exit(0);
PatternMatchVarEx03.pl
Results:-
Hello Everyone
1
• Example:-
Danairat T.
Pattern Match Variable
The backreferencing
133
The backreferencing variables are:-
• $+ Returns the last parenthesized pattern match
• $& Returns the entire matched string
• $` Returns everything before the matched string
• $' Returns everything after the matched string
Backreferencing will slow down your program noticeably.
Danairat T.
Line ID: Danairat
FB: Danairat Thanabodithammachari
+668-1559-1446
Thank you