+ All Categories
Home > Software > Perl Programming - 02 Regular Expression

Perl Programming - 02 Regular Expression

Date post: 12-Jan-2017
Category:
Upload: danairat-thanabodithammachari
View: 514 times
Download: 4 times
Share this document with a friend
38
02 - Perl Programming Regular Expression 97 Danairat T. Line ID: Danairat FB: Danairat Thanabodithammachari +668-1559-1446
Transcript
Page 1: Perl Programming - 02 Regular Expression

02 - Perl ProgrammingRegular Expression

97

Danairat T.

Line ID: Danairat

FB: Danairat Thanabodithammachari

+668-1559-1446

Page 2: Perl Programming - 02 Regular Expression

Danairat T.

Perl Regular Expressions

• A powerful, flexible, and efficient text

processing. Regular expressions like a mini

programming language.

• You can use Regular expressions to verify

whether input match with text pattern within

a larger body of text, to replace text matching

the pattern with other text.

98

Page 3: Perl Programming - 02 Regular Expression

Danairat T.

Regular Expressions - Topics

• Match Operator

– Match Operator Modifiers

• Substitution Operator– Substitution Operator Modifiers

• Translation Operator

– Translation Operator Modifiers

• Regular Expression Elements

– Metacharacters

– Character Classes

– Anchors

– Pattern Quantifiers

– Pattern Match Variables

– Backreferencing

99

Page 4: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator

100

• The match operator represents by m//• We can use the match operator to determine text or string whether match to

provided pattern. The basic form of the operator is

m/PATTERN/;

• The =~ is used as regular expression match between variable and the pattern.

• The !~ is used as regular expression NOT match between variable and the pattern.

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";if ($myString =~ m/one/) {

print "match.";}

exit(0);

MatchEx01.pl

Results:-

match.

Page 5: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator

101

• We can omit the m to be only //

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";if ($myString =~ /one/) {

print "match.";}

exit(0);

MatchOmitTheMEx01.pl

Results:-

match.

Page 6: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator

102

• The m sometime make the code more clear

#!/usr/bin/perluse strict;use warnings;

my $myString ="/usr/local/lib";if ($myString =~ /\/usr\/local\/lib/) {

print "match without m\n";}

if ($myString =~ m(/usr/local/lib)) {print "match with m\n";

}

exit(0);

MatchWithMEx01.pl

Results:-

match without m

match with m

Page 7: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator Modifiers

103

Modifier Meaning

g Match globally, i.e., find all occurrences.

i Do case-insensitive pattern matching.

m Treat string as multiple lines.

o

Evaluates the expression only once. Use this modifier

when the pattern is a variable running in the loop and

may be changed during running.

s Treat string as single line.

x

Allows you to use white space in the expression

for clarity.

Page 8: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator Modifiers

104

• Normally, the match returns the first valid match for a regular

expression, but with the /g modifier in effect, all possible

matches for the expression are returned in a list

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";foreach my $myMatch ($myString =~ /e/g) {

print "match.\n";}exit(0);

GlobalMatchEx01.pl

Results:-

match.

match.

match.

Page 9: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator Modifiers

105

• The /i is used for match case insensitive.

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";foreach my $myMatch ($myString =~ /e/ig) {

print "match.\n";}exit(0);

CaseInsensitiveGlobalMatchEx01.pl

Results:-

match.

match.

match.

match.

Page 10: Perl Programming - 02 Regular Expression

Danairat T.

Match Operator Modifiers

106

• the /m modifier is used, while ``^'‘ (leading with) and ``$''

(ending with) will match at every internal line boundary.

#!/usr/bin/perluse strict;use warnings;

my $myString =<<END_OF_LINES;Hello EveryoneEveryoneEND_OF_LINES

foreach my $myMatch ($myString =~ /^e/igm) {print "match.\n";

}exit(0);

MultilinesEx01.pl

Results:-

match.

match.

Page 11: Perl Programming - 02 Regular Expression

Danairat T.

Substitution Operator

107

• The Substitution operator represents by s///• The Substitution operator is really just an extension of the match operator that

allows you to replace the text matched with some new text. The basic form of the

operator is

s/PATTERN/REPLACEMENT/;

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";my $myCount = $myString =~ s/Hello/Hi/;print "$myString \n";print "$myCount \n";

exit(0);

SubstituteEx01.pl

Results:-

Hi Everyone

1

Page 12: Perl Programming - 02 Regular Expression

Danairat T.

Substitution Operator

108

• Language supported in the Substitution operator

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";my $myCount = $myString =~ s/Hello/สวสัด/ี;

print "$myString \n";print "$myCount \n";exit(0);

SubstituteEx02.pl

Results:-สวสัดี Everyone

1

Page 13: Perl Programming - 02 Regular Expression

Danairat T.

Substitution Operator Modifiers

109

Modifier Meaning

g Match globally, i.e., find all occurrences.

i Do case-insensitive pattern matching.

m Treat string as multiple lines.

o

Evaluates the expression only once. Use this modifier

when the pattern is a variable running in the loop and

may be changed during running.

s Treat string as single line.

x

Allows you to use white space in the expression

for clarity.

e

Evaluates the replacement as if it were a Perl

statement, and uses its return value as the replacement

text

Page 14: Perl Programming - 02 Regular Expression

Danairat T.

Substitution Operator Modifiers

110

• The Substitution operator with \L, \u, \i, \g can be

used to convert the character case

#!/usr/bin/perluse strict;use warnings;

my $myString = "hELlo eveRyoNe";

# the \w is match any alphanumeric# the + is match one or more than onemy $myCount = $myString =~ s/(\w+)/\u\L$1/ig;print "$myString \n";print "$myCount \n";

exit(0);

ChangeCaseEx01.pl

Results:-

Hello Everyone

2

Page 15: Perl Programming - 02 Regular Expression

Danairat T.

Substitution Operator Modifiers

111

• Using substitute with /m to match multiline text

MultiLinesSubstituteEx01.pl

#!/usr/bin/perluse strict;use warnings;

my $myString =<<END_OF_LINES;Hello EveryoneEveryoneEND_OF_LINES

$myString =~ s/^every/Any/igm;print $myString . "\n";

exit(0);

Results:-

Hello

Anyone

Anyone

Page 16: Perl Programming - 02 Regular Expression

Danairat T.

Substitution Operator Modifiers

112

• The /e modifier causes Perl to evaluate the REPLACEMENT text as if it

were a Perl expression, and then to use the value as the replacement

string. We’ve already seen an example of this when converting a date from

traditional American slashed format into the format:

$c =~ s{(\d+)/(\d+)/(\d+)}{sprintf("%04d%02d%02d",$3,$2,$1)}e;

• We have to use sprintf in this case; otherwise, a single-digit day or month

would truncate the numeric digits from the eight required—for example,

26/3/2000 would become 2000326 instead of 20000326.

Page 17: Perl Programming - 02 Regular Expression

Danairat T.

Translation Operator

113

• The tr function allows character-by-character translation. The

following expression replaces each a with e, each b with d,

and each c with f in the variable $sentence. The expression

returns the number of substitutions made.

$sentence =~ tr/abc/edf/

• Most of the special RE codes do not apply in the tr function.

However, the dash is still used to mean "between". This

statement converts string to upper case.

$sentence =~ tr/a-z/A-Z/;

Page 18: Perl Programming - 02 Regular Expression

Danairat T.

Translation Operator Modifiers

114

Modifier Meaning

c Complement SEARCHLIST.

d Delete found but unreplaced characters.

s Squash duplicate replaced characters

Page 19: Perl Programming - 02 Regular Expression

Danairat T.

Translation Operator Modifiers

115

• The /c modifier changes the replacement text to be

the characters not specified in SEARCHLIST.

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";my $myCount = $myString =~ tr/a-zA-z/-/c;print "$myString \n";print "$myCount \n";exit(0);

TrEx01.pl

Results:-

Hello-Everyone

1

Page 20: Perl Programming - 02 Regular Expression

Danairat T.

Translation Operator Modifiers

116

• The /d modifier removes any character in the search

list

#!/usr/bin/perluse strict;use warnings;

my $myString = 'He@l*lo E%very$one';my $myCount = $myString =~ tr/@$%*//d;print "$myString \n";print "$myCount \n";

exit(0);

TrEx02.pl

Results:-

Hello Everyone

4

Page 21: Perl Programming - 02 Regular Expression

Danairat T.

Translation Operator Modifiers

117

• The /s modifier performs converting the same

sequences character into a single character.

#!/usr/bin/perluse strict;use warnings;

my $myString = "Hello Everyone";my $myCount = $myString =~ tr/a-zA-Z//s;print "$myString \n";print "$myCount \n";

exit(0);

TrEx03.pl

Results:-

Helo Everyone

13

Page 22: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

118

Symbol Atomic Meaning

\ Varies

Treats the following character as a

real character

^ No True at beginning of string (or line, if /m is used)

$ No True at end of string (or line, if /m is used)

| No Alternation match.

. Yes

Match one character except the

newline character.

(...) Yes Grouping (treat as a one unit).

[...] Yes

Looks for a set and/or range of characters, defined as a

single character class, The [...] only

represents a single character.

Page 23: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

119

• The \ to match any escape sequence character

#!/usr/bin/perluse strict; use warnings;

print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /\t/) {

print "matched.";}exit(0);

UsingBackSlashEx03.pl

Results:-

<Please enter the [tab] to match

with pattern>

Page 24: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

120

• The ^ to match the beginning of string

#!/usr/bin/perluse strict; use warnings;

print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /^The/) {

print "matched.";}exit(0);

MatchBeginningEx03.pl

Results:-

<Please enter the word start

with “the“ to match with

pattern>

Page 25: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

121

• The $ to match the ending of string

#!/usr/bin/perluse strict; use warnings;

print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /\.$/) {

print "matched.";}exit(0);

MatchEndingEx03.pl

Results:-

<Please enter the word end

with “.“ to match with pattern>

Page 26: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

122

• The | to perform alternation match.

#!/usr/bin/perluse strict; use warnings;

print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /apple|orange/) {

print "matched.";}exit(0);

MatchSelectionEx03.pl

Results:-

<Please enter “Apple” or

“Orange” to match with pattern>

Page 27: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

123

• The period . to match any single character

#!/usr/bin/perluse strict; use warnings;

print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /b.ll/) {

print "matched.";}exit(0);

UsingDotEx03.pl

Results:-

<Please enter the bill or bull or

ball to match with pattern>

Page 28: Perl Programming - 02 Regular Expression

Danairat T.

Metacharacters

124

• The period . to match any single character

#!/usr/bin/perluse strict; use warnings;

print "Please enter word: ";my $myWord = <STDIN>;chomp($myWord);if ($myWord =~ /b.ll/) {

print "matched.";}exit(0);

UsingDotEx03.pl

Results:-

<Please enter the bill or bull or

ball to match with pattern>

Page 29: Perl Programming - 02 Regular Expression

Danairat T.

Character Classes

125

Code Matches

\\\\dddd A digit, same as [0[0[0[0----9]9]9]9]

\\\\DDDD A nondigit, same as [^0[^0[^0[^0----9]9]9]9]

\\\\wwwwA word character (alphanumeric), same as [a[a[a[a----zAzAzAzA----Z_0Z_0Z_0Z_0----9]9]9]9]

\\\\WWWW A non-word character, [^a[^a[^a[^a----zAzAzAzA----Z_0Z_0Z_0Z_0----9]9]9]9]

\\\\ssss A whitespace character, same as [ [ [ [ \\\\tttt\\\\nnnn\\\\rrrr\\\\f]f]f]f]

\\\\SSSS A non-whitespace character, [^ [^ [^ [^ \\\\tttt\\\\nnnn\\\\rrrr\\\\f]f]f]f]

\\\\CCCC Match a character (byte)

\\\\pPpPpPpP Match P-named (Unicode) property

\\\\PPPPPPPP Match non-P

\\\\XXXX Match extended unicode sequence

Page 30: Perl Programming - 02 Regular Expression

Danairat T.

Character Classes

126

Code Matches

\\\\llll Lowercase until next character

\\\\uuuu Uppercase until next character

\\\\LLLL Lowercase until \E

\\\\UUUU Uppercase until \E

\\\\QQQQ Disable pattern metacharacters until \E

\\\\EEEE End case modification

Page 31: Perl Programming - 02 Regular Expression

Danairat T.

Anchors

127

Anchors don't match any characters; they match places within a string.

Assertion Meaning

^̂̂̂Matches at the beginning of the string (or line, if /m/m/m/m is

used)

$$$$ Matches at the end of the string (or line, if /m is used)

\\\\bbbb Matches at word boundary (between \\\\wwww and \\\\WWWW)

\B Matches a non-word boundary

\A Matches at the beginning of the string

\Z Matches at the end of the string or before a newline

\z Matches only at the end of the string

\G Matches where previous m//g left off (only works with /g modifier).

Page 32: Perl Programming - 02 Regular Expression

Danairat T.

Pattern Quantifiers

128

• Pattern Quantifiers are used to specify the number of

instances that can match.

the quantifiers have a notation that allows for minimal matching. This notation uses a question mark immediately following the quantifier to force Perl to look for the earliest available match.

Maximal Minimal Allowed range

{{{{nnnn,mmmm}}}} {n{n{n{n,m}?m}?m}?m}? Must occur at least n times but no more than m times

{n,}{n,}{n,}{n,} {n,}?{n,}?{n,}?{n,}? Must occur at least n times

{n}{n}{n}{n} {n}?{n}?{n}?{n}? Must match exactly n times

**** *?*?*?*? 0 or more times (same as {0,})

++++ +?+?+?+? 1 or more times (same as {1,})

???? ???????? 0 or 1 time (same as {0,1})

Page 33: Perl Programming - 02 Regular Expression

Danairat T.

Character Classes

129

• Example

#!/usr/bin/perluse strict;use warnings;

my $myString ="Hello 111Every2343one";

if ($myString =~ /^(\w+)(\s+)(\d+)(\w+)(\d+)(\w+)$/) {print "match." . "\n";

}exit(0);

MatchChrClassEx01.pl

Results:-

<Please enter the bill or bull or

ball to match with pattern>

Page 34: Perl Programming - 02 Regular Expression

Danairat T.

• Example

Character Classes

130

#!/usr/bin/perluse strict;use warnings;

my $myString ="Hello 111Every2343one";

if ($myString =~ /^(\w+)(\s+)(\d{1,3})(\w+)(\d{1,4})(\w+)$/) {print "match." . "\n";

}exit(0);

MatchChrClassEx02.pl

Results:-

<Please enter the bill or bull or

ball to match with pattern>

Page 35: Perl Programming - 02 Regular Expression

Danairat T.

Pattern Match Variable $1, $2, …

131

• Parentheses () not only to group elements in a regular

expression, they also remember the patterns they

match.

• Every match from a parenthesized element is saved to

a special, read-only variable indicated by a number.

• Using \1, \2,.. to recall a match within the matching

pattern.

• Using $1, $2,... to recall a match outside of the

matching pattern.

Page 36: Perl Programming - 02 Regular Expression

Danairat T.

Pattern Match Variable $1, $2, …

132

#!/usr/bin/perluse strict;use warnings;

my $myString = "Everyone Hello";my $myCount = $myString =~ s/(\w+)\s(Hello)/$2 $1/;

print "$myString \n";print "$myCount \n";exit(0);

PatternMatchVarEx03.pl

Results:-

Hello Everyone

1

• Example:-

Page 37: Perl Programming - 02 Regular Expression

Danairat T.

Pattern Match Variable

The backreferencing

133

The backreferencing variables are:-

• $+ Returns the last parenthesized pattern match

• $& Returns the entire matched string

• $` Returns everything before the matched string

• $' Returns everything after the matched string

Backreferencing will slow down your program noticeably.

Page 38: Perl Programming - 02 Regular Expression

Danairat T.

Line ID: Danairat

FB: Danairat Thanabodithammachari

+668-1559-1446

Thank you


Recommended