Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | harriet-sherman |
View: | 217 times |
Download: | 0 times |
Chapter 11:Perl Scripting
Off Larry’s Wall
In this chapter …• Background
• Terminology
• Syntax
• Variables
• Control Structures
• File Manipulation
• Regular Expressions
Perl• Practical Extraction and Report Language• Developed by Larry Wall in 1987• Originally created for data processing and
report generation• Elements of C, AWK, sed, scripting• Add-on modules and third party code make it
a more general programming language
Features• C-derived syntax
• Ambiguous variables & dynamic typing
• Singular and plural variables
• Informal, easy to use
• Many paradigms – procedural, functional, object-oriented
• Extensive third party modules
Features, con’t• As elegant as you make it
• Do What I Mean intelligence
• Fast, easy, down and dirty coding
• Interpreted, not compiled
• perldoc – man pages for Perl modules
Terminology• Module – one stand alone piece of code
• Distribution – set of modules
• Package – a namespace for one or more distributions
• Package variable – declared in package, accessible between modules
• Lexical variable – local variable (scope)
Terminology, con’t• Scalar – variable that contains only one
value (number, string, etc)
• Composite – variable made of one or more scalars
• List – series of one or more scalars– e.g. (2, 4, ‘Zach’)
• Array – composite variable containing a list
Invoking Perl• perl –e ‘text of perl program’
• perl perl_script
• Make perl script executable and you can execute the script itself– i.e. ./my_script.pl
• Common file extension .pl not required
• Like other scripts start with #! to specify execution program
Invoking Perl, con’t• Use perl –w to display warnings
– Will warn if using undeclared variables– Instead of –w, use warnings; in your script
• Same effect
• Usually you’ll find perl in /usr/bin/perl
Syntax• Each perl statement ended by semicolon (;)
• Can have multiple statements per line
• Whitespace ignored largely– Except within quoted strings
• Double quotes allow interpretation of variables and special characters (like \n)
• Single quotes don’t (just like the shell)
Syntax, con’t• Forward slash used to delimit regular
expressions (e.g. /.*sh?/)
• Backslash used for escape characters– E.g. \n – newline, \t – tab
• Lines beginning with # are ignored as comments
Output• Old way
– print what_to_print;– Concatenate
• print item_1, item_2
– Want a newline?• print what_to_print, “\n”
• New way– say what_to_print
• Automatically adds newline
Output, con’t• what_to_print can be many things
– Quoted string – “Here’s some text”– Variables - $myvar– Result of a function – toupper($myvar)– A combination
• print “Sub Tot: $total \n”, “Tax: $total*$tax \n”
• Want to display an error and exit?– die “Uh-oh!\n”;
Variables• Perl variables can be singular or plural
• Data typing done dynamically at runtime
• Three types– Scalar (singular)– Array (plural)– Hash a.k.a. Associative Arrays (plural)
• Variable names are case sensitive
• Can contain letters, numbers, underscore
Variables, con’t• Each type of variable starts with a different
special character to mark type
• By default all variables are package in scope
• To make lexical, preface declaration with my keyword
• Lexical variables override package variables
• Include use strict; to not allow use of undeclared variables
Variables, con’t• We’ve already covered use warnings;
• Undeclared variables, if referenced, have a default value of undef– Equates to 0 or null string– Can check by using defined() function
• $. is equal to the line number you’re on
• $_ is the default operand – ‘it’
Scalars• Singular, holds one value, either string or
number
• Must be preceded with $ i.e. $myvar
• Perl will automatically cast between strings and numbers
• Will treat as a number or string, whichever is appropriate in context
Arrays• Plural, containing an ordered list of scalars
• Zero-based indexing
• Dynamic size and allocation
• Begin with @ e.g. @myarray
• @variable references entire array
• To reference a single element (which would be a scalar, right?) $variable[index]
Arrays, con’t• $#array returns the index of the last element
– Zero based – this means it’s one less than the size of the array
• @array[x..y] returns a ‘slice’ or sublist
• Printing arrays– Array enclosed in double quotes prints space
delimited list– Not in quotes all entries concatenated
Arrays, con’t• Arrays can be treated like FIFO queues
– shift(@array) – pop first element off– push(@array, scalar) – push element on at end
• Use splice to combine arrays– splice(@array,offset,length,@otherarray)
Hashes• Plural, contain an array of key-value pairs
• Prefix with % i.e. %myhash
• Keys are strings, act as indexes to array
• Each key must be unique, returns one value
• Unordered
• Optimized from random access
• Keys don’t need quotes unless there are spaces
Hashes, con’t• Element access
– $hashvar{index} = value• e.g. $myvar{boat} =“tuna”; print $myvar{boat};
– %hashvar = ( key => value, …);• e.g. %myvar = ( boat => “tuna”, 4 => “fish”);
– Get array of keys or values• keys(%hashvar)• values(%hashvar)
Evaluating Expressions• Most control structures use an expression to
evaluate whether they are run
• Perl uses different comparison operators for strings and numbers
• Also uses the same file operators (existence, access, etc) that bash uses
Expressions• Numeric operators
– ==, !=, <, >, <=, >= – <=> returns 0 if equal, 1 if >, -1 if <
• String Operators– eq, ne, lt, gt, le, ge– cmp same as <=>
Control Structures• if (expr) {…}
• unless (expr) {…}
• if (expr) {…} else {…}
• if (expr) {…} elsif (expr) {…} … else {…}
• while (expr) {…}
• until (expr) {…}
Control Structures, con’t• for and foreach are interchangeble
• Syntax 1– Similar to bash for…in structure– foreach [var] (list) {…}– If var not defined, $_ assumed– For each loop iteration, the next value from list is
populated in var
Control Structures, con’t• for/foreach Syntax 2
– Similar to C’s for loop– foreach (expr1; expr2; expr3) {…}– expr1 sets initial condition– expr2 is the terminal condition– expr3 is the incrementor
Control Structures, con’t• Short-circuiting loops
– Use last to break out of loop altogether• Same as bash’s break
– Use next to skip to the next iteration of the loop• Same as bash’s continue
Handles• A handle is essentially a variable linked to a
file or process
• Perl automatically opens handles for the default streams– STDIN, STDOUT, STDERR
• You can open additional handles– To a file for input/output/appending– To a process for input/output
Handles, con’t• Basic syntax
– open(handle, [‘mode’], “ref”);– handle is a variable to reference the handle– mode can be many things
• Simple cases: <, >, >>, |• Input (<) implied if omitted
– ref is what to open – file or process– mode and ref can be combined as one string
Handles, con’t• Once open access via handle variable
• Output– print handle “what to print”
• Input– $var = <handle> gets one line of input– Use <handle> as a loop condition to read input
one line at a time, populating $_
Handles, con’t• <> - magic handle, pulls from STDIN or
command line arguments to perl
• Line of input contains EOL character– Use chomp($var) to remove it– Use chop($var) to remove the last character
• When done close(handle);– Housekeeping, good coding practice– Perl actually closes all open handles for you
Handles, con’t• Examples
– open(my $INPUT, “/path/to/file”);– open(my $ERRLOG, “>>/var/log/errors”);– open(my $SORT, “| sort –n”);– open(my $ALIST, "grep \'^[Aa]\' /usr/share/dict/words|")– while(<INPUT>) { print $ERRLOG $_; }
Regular Expressions• Recall Appendix A
• Perl has a few unique features and caveats
• Regular Expressions (RE) delimited by forward slash
• Perl uses the =~ operator for RE matching– Ex. if ($myvar =~ /^T/) { …} # if myvar starts w/ T
• To negate RE matching use !~ operator
RE, con’t• =~ operator can also be used to do
replacement– Ex. $result =~s/old/new/;– ‘old’ replaced with ‘new’ if matched
• Remember, RE (esp. in Perl) are greedy– Will match longest possible match
• Bracketed expressions don’t need to be escaped, just use parentheses