Jump to first page
Perl ProgrammingA short course with emphasis
on Internet and Database programming
Dragomir R. Radev
© University of Michigan
Jump to first page
About the course The course presents an introduction to Perl 5 with a
focus on building Internet-based applications. People taking it should be familiar with programming in
C and UNIX. They should also have some grasp of advanced programming concepts, such as TCP/IP sockets, Internet protocols and relational databases.
Jump to first page
Contents of the course - 1 Part I. Introduction to Perl
Why Perl? Characteristics of Perl String Matching File Processing Modules
Part II. CGI Programming Introduction Perl modules Applications
Jump to first page
Contents of the course - 2 Part III. Advanced Topics
Accessing relational databases Embedding C in Perl (xs) Embedding Perl in C (embed) TCP/IP Programming Web robots Text search Generating graphics
Part IV. A Complete Example
Jump to first page
Part I. Introduction to Perl Perl was first designed by Larry Wall (of trn fame) in
1986. He was a system administrator at the time and had to create large numbers of reports from different operating system logs. He was frustrated by Awk’s limitations and decided to create his own programming language to extract such reports. Hence, the acronym “PERL” which stands for “Practical Extraction Report Language”.
As the years went by, Wall added new features to the language, culminating with the inclusion of reusable modules in release 5. Currently, Perl is used not only for system administration tasks, but also for Internet programming, database access, rapid prototyping, and just about anything else imaginable.
Jump to first page
Why Perl?
You will see throughout this course!
(I don’t want to spoil anything ahead of time).
Jump to first page
Characteristics of Perl
Semi-Interpreted language (using an intermediate compiled code that is interpreted on the fly).
Freely accessible. Works under Unix, Windows 96/NT, OS/2. Automatically handles memory allocation and internals
of data types, unlike C. Object-oriented. Features a large library of well-documented and test
classes (modules) called CPAN (Comprehensive Perl Archive Network).
Really “eclectic” - includes ideas from many other languages and utilities, such as C/C++, Lisp, Awk, and Sed.
Jump to first page
A Simple Program (hello_world.pl)
Leading spaces are ignored. Statements are terminated by semicolons.
printf “Hello, world\n”;
The program can be executed either from the command line (perl hello_world.pl) or through the “shebang” convention.
#!/bin/perlprintf “Hello, world\n”;
Text after the “#” symbol on a particular line is ignored.
Jump to first page
String Matching
One of the most typical characteristics of Perl is that it is optimized for fast manipulation of large, heterogeneous data sets
More specifically, Perl is ideal for text processing. Basic components: regular expressions. Sample string matching program: grep.pl
#!/bin/perl$search = shift (ARGV);while (<>) { print if /$search/;}
Jump to first page
Data Types: constants and variables
CONSTANTS: Numbers: 993 (decimal), 022 (octal), 0xee
(hexadecimal) Strings: “feline”, “line1\nline2” Arrays: (1,2,3), (1..10),
((“cat”, “kitten”),(“dog”, “puppy”)) Hashes: (“cat”, “kitten”, “dog”, “puppy”)
VARIABLES: Numbers: $n Strings: $title Arrays: @inventory Hashes: %isa
Jump to first page
Perl idiosyncrasies
Negative indices: @month = (“january”, “february”, … , “december”);print $month[-1];
Hashes:%offspring = (“cat”, “kitten”, “dog”, “puppy”);print $offspring {“cat”};
Variable interpolation:$age = 33;$father {‘John’} = ‘Bill’;$string = “John’s father, $father {‘John’} ,is $age years old”;print $string;
Jump to first page
Operators
Binary operators: (“+”, “-”, “/”, “**”, “%”)
Unary operators: (“++”, “--”)
Logical operators: (“&&”, “||”, “!”)
Numerical relational operators:(“<”, “==”, “<=>”)
String relational operators:(“eq”, “ne”, “gt”, “ge”, “cmp”)
Other operators:(“.”, “..”, “x”)
Jump to first page
Functions
Functions in Perl can be passed as parameters to other functions. They can also be nested.
Parameters are passed through the @_ array. Example:
#!/bin/perlprint &average(1,2,3) . "\n";print &average(1,2,3,4,5,6) . "\n";sub average { $sum = 0; $ct = 0; for (@_) { $sum += $_; $ct++; } return $sum/$ct;}
Jump to first page
Some interesting functions
exists, defined: print “Exists\n” if exists $hash{$key};print “Defined\n” if defined
$hash{$key}; join:
$_ = join ’:’, $login, $passwd, $uid,$gid;
map, split:@words = map {split ’ ’} @lines;
pop, shift, push, unshift;@list = push (@list, pop @list);
reverse:@rlist = reverse @list;
Jump to first page
Statements
if, unless, while, until:print “wrong!” unless $guess == $hidden;$price++ until $customer_satisfaction >
$threshold; for, foreach:
foreach $day (1..30) {print “$day\n”;
} last, next, redo:
for ($index = 0; $index < 10; $index++) {if ($index == 5) {
last;}print (“loop: index = $index\n”;
}print (“index = $index\n”);
Jump to first page
Other Perl features
Regular expressions File and directory functions Modules (classes)
Jump to first page
Modules - Intro (1)
Perl provides a mechanism for alternative namespaces to protect packages from stomping on each other's variables. In fact, apartfrom certain magical variables, there's really no such thing as a global variable in Perl. The package statement declares thecompilation unit as being in the given namespace. The scope of the package declaration is from the declaration itself through the end of the enclosing block (the same scope as the local operator). All further unqualified dynamic identifiers will be in this namespace. A package statement affects only dynamic variables--including those you've used local on--but not lexical variables created with my. Typically it would be the first declaration in a file to be included by the require or use operator.
Jump to first page
Modules - Intro (2)
You can switch into a package in more than one place; it influences merely which symbol table is used by the compiler for the rest of that block. You can refer to variables and filehandles in other packages by prefixing the identifier with the package name and a double colon: $Package::Variable. If the package name is null, the main package is assumed. That is, $::sail is equivalent to$main::sail.
Jump to first page
Modules - Constructors and Destructors
There are two special subroutine definitions that function as package constructors and destructors. These are the BEGIN and END routines. The sub is optional for these routines. A BEGIN subroutine is executed as soon as possible, that is, the moment it is completely defined, even before the rest of thecontaining file is parsed. You may have multiple BEGIN blocks within a file--they will execute in order of definition. Because a BEGIN block executes immediately, it can pull in definitions of subroutines and such from other files in time to be visible to the rest of the file. An END subroutine is executed as late as possible, that is, when the interpreter is being exited, even if it is exiting as a result of a die function. (But not if it's is being blown out of the water by a signal--you have to trap that yourself (if you can).) You may have multiple END blocks within a file--they will execute in reverse order of definition; that is: last in, first out (LIFO).
Jump to first page
Creating Modules
A module is just a package that is defined in a library file of the same name, and is designed to be reusable. It may do this byproviding a mechanism for exporting some of its symbols into the symbol table of any package using it. Or it may function as aclass definition and make its semantics available implicitly through method calls on the class and its objects, without explicitexportation of any symbols. Or it can do a little of both.
For example, to start a normal module called Some::Module, create a file called Some/Module.pm and start with this template:
Jump to first page
Sample Module (1) package Some::Module; # assumes Some/Module.pm
use strict;
BEGIN { use Exporter (); use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
# set the version for version checking $VERSION = 1.00; # if using RCS/CVS, this may be preferred $VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; # must be all one line, for MakeMaker
@ISA = qw(Exporter); @EXPORT = qw(&func1 &func2 &func4); %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ],
# your exported package globals go here, # as well as any optionally exported functions @EXPORT_OK = qw($Var1 %Hashit &func3); } use vars @EXPORT_OK;
Jump to first page
Sample Module (2)
# non-exported package globals go here use vars qw(@more $stuff);
# initalize package globals, first exported ones $Var1 = ''; %Hashit = ();
# then the others (which are still accessible as $Some::Module::stuff) $stuff = ''; @more = ();
# all file-scoped lexicals must be created before # the functions below that use them.
# file-private lexicals go here my $priv_var = ''; my %secret_hash = ();
# here's a file-private function as a closure, # callable as &$priv_func; it cannot be prototyped. my $priv_func = sub { # stuff goes here. }
Jump to first page
Sample Module (3)
# make all your functions, whether exported or not; # remember to put something interesting in the {} stubs sub func1 {} # no prototype sub func2() {} # proto'd void sub func3($$) {} # proto'd to 2 scalars
# this one isn't exported, but could be called! sub func4(\%) {} # proto'd to 1 hash ref
END { } # module clean-up code here (global destructor)
Jump to first page
Module Use (1)
Perl modules are included into your program by saying use Module;or use Module LIST;This is exactly equivalent to BEGIN { require "Module.pm"; import Module; }or BEGIN { require "Module.pm"; import Module LIST; }As a special case use Module ();is exactly equivalent to BEGIN { require "Module.pm"; }
Jump to first page
Module Use (2)
All Perl module files have the extension .pm. use assumes this so that you don't have to spell out ``Module.pm'' in quotes. This also helps to differentiate new modules from old .pl and .ph files. Module names are also capitalized unless they're functioning as pragmas, ``Pragmas'' are in effect compiler directives, and are sometimes called ``pragmatic modules'' (or even ``pragmata'' if you're a classicist). Because the use statement implies a BEGIN block, the importation of semantics happens at the moment the use statement is compiled, before the rest of the file is compiled. This is how it is able to function as a pragma mechanism, and also how modules are able to declare subroutines that are then visible as list operators for the rest of the current file. This will not work if you use require instead of use. With require you can get into this problem:
Jump to first page
Module Use (3)
require Cwd; # make Cwd:: accessible $here = Cwd::getcwd();
use Cwd; # import names from Cwd:: $here = getcwd();
require Cwd; # make Cwd:: accessible $here = getcwd(); # oops! no main::getcwd()
In general use Module ; is recommended over require Module;.
Jump to first page
Module Use (4)
Perl packages may be nested inside other package names, so we can have package names containing ::. But if we used thatpackage name directly as a filename it would makes for unwieldy or impossible filenames on some systems. Therefore, if a module's name is, say, Text::Soundex, then its definition is actually found in the library file Text/Soundex.pm. Perl modules always have a .pm file, but there may also be dynamically linked executables or autoloaded subroutine definitions associated with the module. If so, these will be entirely transparent to the user of the module. It is the responsibility of the .pm file to load (or arrange to autoload) any additional functionality. The POSIX module happens to do both dynamic loading and autoloading, but the user can say just use POSIX to get it all. For more information on writing extension modules, see the perlxs manpage and the perlguts manpage.
Jump to first page
Examples 1-2
#ex_02-1#Learning Perl Appendix A, Exercise 2.1$pi = 3.141592654;$result = 2 * $pi * 12.5;print "radius 12.5 is circumference $result\n";
All examples are taken from the “Learning Perl” book by Randal Schwartz and Tom Christiansen.
#ex_02-2#Learning Perl Appendix A, Exercise 2.2print "What is the radius: ";chomp($radius = <STDIN>);$pi = 3.141592654;$result = 2 * $pi * $radius;print "radius $radius is circumference $result\n";
Jump to first page
Examples 3-4
#ex_02-4#Learning Perl Appendix A, Exercise 2.4print "String: "; $a = <STDIN>;print "Number of times: "; chomp($b = <STDIN>);$c = $a x $b; print "The result is:\n$c";
#ex_02-3#Learning Perl Appendix A, Exercise 2.3print "First number: "; chomp($a = <STDIN>);print "Second number: "; chomp($b = <STDIN>);$c = $a * $b; print "Answer is $c.\n";
Jump to first page
Examples 5-6
#ex_03-1b#Learning Perl Appendix A, Exercise 3.1 alternateprint "Enter the list of strings:\n";print reverse <STDIN>;
#ex_02-3#Learning Perl Appendix A, Exercise 2.3print "First number: "; chomp($a = <STDIN>);print "Second number: "; chomp($b = <STDIN>);$c = $a * $b; print "Answer is $c.\n";
Jump to first page
Examples 7-8
#ex_03-2#Learning Perl Appendix A, Exercise 3.2print "Enter the line number: "; chomp($a = <STDIN>);print "Enter the lines, end with ^D:\n"; @b = <STDIN>;print "Answer: $b[$a-1]";
#ex_03-3#Learning Perl Appendix A, Exercise 3.3srand;print "List of strings: "; @b = <STDIN>;print "Answer: $b[rand(@b)]";
Jump to first page
Examples 9-10
#ex_04-1#Learning Perl Appendix A, Exercise 4.1print "What temperature is it? ";chomp($temperature = <STDIN>);if ($temperature > 72) { print "Too hot!\n";} else { print "Too cold!\n";}
#ex_04-2#Learning Perl Appendix A, Exercise 4.2print "What temperature is it? ";chomp($temperature = <STDIN>);if ($temperature > 75) { print "Too hot!\n";} elsif ($temperature < 68) { print "Too cold!\n";} else { print "Just right!\n";}
Jump to first page
Examples 11-12
#ex_04-3#Learning Perl Appendix A, Exercise 4.3print "Enter a number (999 to quit): ";chomp($n = <STDIN>);while ($n != 999) { $sum += $n; print "Enter another number (999 to quit): "; chomp($n = <STDIN>);}print "the sum is $sum\n";
#ex_04-4#Learning Perl Appendix A, Exercise 4.4print "Enter some strings, end with ^D:\n";@strings = <STDIN>;while (@strings) { print pop @strings;}
Jump to first page
Examples 13-14
#ex_04-5a#Learning Perl Appendix A, Exercise 4.5 (without list)for ($number = 0; $number <= 32; $number++) { $square = $number * $number; printf "%5g %8g\n", $number, $square;}
#ex_04-5b#Learning Perl Appendix A, Exercise 4.5 alternate (with list)foreach $number (0..32) { $square = $number * $number; printf "%5g %8g\n", $number, $square;}
Jump to first page
Examples 15-16
#ex_05-1a#Learning Perl Appendix A, Exercise 5.1%map = qw(red apple green leaves blue ocean);print "A string please: "; chomp($some_string = <STDIN>);print "The value for $some_string is $map{$some_string}\n";
#ex_05-1a#Learning Perl Appendix A, Exercise 5.1%map = qw(red apple green leaves blue ocean);print "A string please: "; chomp($some_string = <STDIN>);print "The value for $some_string is $map{$some_string}\n";
Jump to first page
Examples 17-18
#ex_05-2#Learning Perl Appendix A, Exercise 5.2chomp(@words = <STDIN>); foreach $word (@words) { $count{$word} = $count{$word} + 1; # or $count{$word}++}foreach $word (keys %count) { print "$word was seen $count{$word} times\n";}
#ex_06-1#Learning Perl Appendix A, Exercise 6.1print reverse <>;
Jump to first page
Examples 19-20
#ex_06-2#Learning Perl Appendix A, Exercise 6.2@ARGV = reverse @ARGV;print reverse <>;
#ex_06-3#Learning Perl Appendix A, Exercise 6.3print "List of strings:\n";chomp(@strings = <STDIN>);foreach (@strings) { printf "%20s\n", $_;}
Jump to first page
Examples 21-22
#ex_06-4#Learning Perl Appendix A, Exercise 6.4print "Field width: ";chomp($width = <STDIN>);print "List of strings:\n";chomp(@strings = <STDIN>);foreach (@strings) { printf "%${width}s\n", $_;}
#ex_07-1a#Learning Perl Appendix A, Exercise 7.1a1. /a+b*/
Jump to first page
Examples 23-26
#ex_07-1b#Learning Perl Appendix A, Exercise 7.1b/\\*\**/
#ex_07-1c#Learning Perl Appendix A, Exercise 7.1c/($whatever){3}/
#ex_07-1d#Learning Perl Appendix A, Exercise 7.1d/[\000-\377]{5}/
#ex_07-1e#Learning Perl Appendix A, Exercise 7.1e/(^|\s)(\S+)(\s+\2)+(\s|$)/
Jump to first page
Examples 27-28
#ex_07-2a#Learning Perl Appendix A, Exercise 7.2awhile (<STDIN>) { if (/a/i && /e/i && /i/i && /o/i && /u/i) { print; }}
#ex_07-2b#Learning Perl Appendix A, Exercise 7.2bwhile (<STDIN>) { if (/a.*e.*i.*o.*u/i) { print; }}
Jump to first page
Examples 29-30
#ex_07-2c#Learning Perl Appendix A, Exercise 7.2cwhile (<STDIN>) { if (/^[eiou]*a[^iou]*e[^aou]*i[^aeu]*o[^aei]*u[^aeio]*$/i) { print; }}
#ex_07-3#Learning Perl Appendix A, Exercise 7.3while (<STDIN>) { chomp; ($user, $gcos) = (split /:/)[0,4]; ($real) = split(/,/, $gcos); print "$user is $real\n";}
Jump to first page
Examples 31-32#ex_07-4while (<STDIN>) { chomp; ($gcos) = (split /:/)[4]; ($real) = split(/,/, $gcos); ($first) = split(/\s+/, $real); $seen{$first}++;}foreach (keys %seen) { if ($seen{$_} > 1) { print "$_ was seen $seen{$_} times\n"; }}
#ex_07-5while (<STDIN>) { chomp; ($user, $gcos) = (split /:/)[0,4]; ($real) = split /,/, $gcos; ($first) = split /\s+/, $real; $names{$first} .= " $user";}foreach (keys %names) { $this = $names{$_}; if ($this =~ /. /) { print "$_ is used by:$this\n"; }}
Jump to first page
Example 33
#ex_08-1#Learning Perl Appendix A, Exercise 8.1sub card { my %card_map; @card_map{1..9} = qw( one two three four five six seven eight nine ); my($num) = @_; if ($card_map{$num}) { return $card_map{$num}; } else { return $num; }}# driver routine:while (<>) { chomp; print "card of $_ is ", &card($_), "\n";}
Jump to first page
Example 34
#ex_08-2#Learning Perl Appendix A, Exercise 8.2sub card { ...; } # from previous problemprint "Enter first number: ";chomp($first = <STDIN>);print "Enter second number: ";chomp($second = <STDIN>);$message = card($first) . " plus " . card($second) . " equals " . card($first+$second) . ".\n";print "\u$message";
Jump to first page
Example 35
#ex_08-3#Learning Perl Appendix A, Exercise 8.3sub card { my %card_map; @card_map{0..9} = qw( zero one two three four five six seven eight nine ); my($num) = @_; my($negative); if ($num < 0) { $negative = "negative "; $num = - $num; } if ($card_map{$num}) { return $negative . $card_map{$num}; } else { return $negative . $num; }}
Jump to first page
Example 36
#ex_09-1#Learning Perl Appendix A, Exercise 9.1sub card {} # from previous exercisewhile () { ## NEW ## print "Enter first number: "; chomp($first = <STDIN>); last if $first eq "end"; ## NEW ## print "Enter second number: "; chomp($second = <STDIN>); last if $second eq "end"; ## NEW ## $message = &card($first) . " plus " . card($second) . " equals " . card($first+$second) . ".\n"; print "\u$message";} ## NEW ##
Jump to first page
Examples 37-38
#ex_09-2#Learning Perl Appendix A, Exercise 9.2{ print "Enter a number (999 to quit): "; chomp($n = <STDIN>); last if $n == 999; $sum += $n; redo;}
#ex_10-1#Learning Perl Appendix A, Exercise 10.1print "What file? ";chomp($filename = <STDIN>);open(THATFILE, "$filename") || die "cannot open $filename: $!";while (<THATFILE>) { print "$filename: $_"; # presume $_ ends in \n}
Jump to first page
Example 39
#ex_10-2#Learning Perl Appendix A, Exercise 10.2print "Input file name: ";chomp($infilename = <STDIN>);print "Output file name: ";chomp($outfilename = <STDIN>);print "Search string: ";chomp($search = <STDIN>);print "Replacement string: ";chomp($replace = <STDIN>);open(IN,$infilename) || die "cannot open $infilename for reading: $!";## optional test for overwrite...die "will not overwrite $outfilename" if -e $outfilename;open(OUT,">$outfilename") || die "cannot create $outfilename: $!";while (<IN>) { # read a line from file IN into $_ s/$search/$replace/g; # change the lines print OUT $_; # print that line to file OUT}close(IN);close(OUT);
Jump to first page
Examples 40-41
#ex_10-3#Learning Perl Appendix A, Exercise 10.3while (<>) { chomp; # eliminate the newline print "$_ is readable\n" if -r; print "$_ is writable\n" if -w; print "$_ is executable\n" if -x; print "$_ does not exist\n" unless -e;}
#ex_10-4#Learning Perl Appendix A, Exercise 10.4while (<>) { chomp; $age = -M; if ($oldest_age < $age) { $oldest_name = $_; $oldest_age = $age; }}print "The oldest file is $oldest_name ", "and is $oldest_age days old.\n";
Jump to first page
Examples 42-43
#ex_11-1#Learning Perl Appendix A, Exercise 11.1open(PW,"/etc/passwd") || die "How did you get logged in?";while (<PW>) { ($user,$uid,$gcos) = (split /:/)[0,2,4]; ($real) = split /,/,$gcos; write;}format STDOUT =@<<<<<<< @>>>>>> @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<$user, $uid, $real.
#ex_11-2#Learning Perl Appendix A, Exercise 11.2# append to program from the first problem...format STDOUT_TOP =Username User ID Real Name======== ======= =========.
Jump to first page
Examples 44-45
#ex_11-3#Learning Perl Appendix A, Exercise 11.3# append to program from the first problem...format STDOUT_TOP =Page @<<<$%Username User ID Real Name======== ======= =========.
#ex_12-1#Learning Perl Appendix A, Exercise 12.1print "Where to? ";chomp($newdir = <STDIN>);chdir($newdir) || die "Cannot chdir to $newdir: $!";foreach (<*>) { print "$_\n";}
Jump to first page
Examples 46-47
#ex_12-2a#Learning Perl Appendix A, Exercise 12.2a (using directory handle)print "Where to? ";chomp($newdir = <STDIN>);chdir($newdir) || die "Cannot chdir to $newdir: $!";opendir(DOT,".") || die "Cannot opendir . (serious dainbramage): $!";foreach (sort readdir(DOT)) { print "$_\n"; }closedir(DOT);
#ex_12-2b#Learning Perl Appendix A, Exercise 12.2b (using globbing)print "Where to? ";chomp($newdir = <STDIN>);chdir($newdir) || die "Cannot chdir to $newdir: $!";foreach (sort <* .*>) { print "$_\n";}
Jump to first page
Examples 48-49
#ex_13-1#Learning Perl Appendix A, Exercise 13.1unlink @ARGV;
#ex_13-3#Learning Perl Appendix A, Exercise 13.3($old, $new) = @ARGV; # name themif (-d $new) { # new name is a directory, need to patch it up ($basename = $old) =~ s#.*/##s; # get basename of $old $new .= "/$basename"; # and append it to new name}link($old,$new) || die "Cannot link $old to $new: $!";
Jump to first page
Examples 50-51
#ex_13-4#Learning Perl Appendix A, Exercise 13.4if ($ARGV[0] eq "-s") { # wants a symlink $symlink++; # remember that shift(@ARGV); # and toss the -s flag}($old, $new) = @ARGV; # name themif (-d $new) { # new name is a directory, need to patch it up ($basename = $old) =~ s#.*/##s; # get basename of $old $new .= "/$basename"; # and append it to new name}if ($symlink) { # wants a symlink symlink($old,$new);} else { # wants a hard link link($old,$new);}
#ex_13-5#Learning Perl Appendix A, Exercise 13.5foreach $f (<*>) { print "$f -> $where\n" if defined($where = readlink($f));}
Jump to first page
Examples 52-53
#ex_14-1#Learning Perl Appendix A, Exercise 14.1if (`date` =~ /^S/) { print "Go play!\n";} else { print "Get to work!\n";}
#ex_14-2#Learning Perl Appendix A, Exercise 14.2open(PW,"/etc/passwd");while (<PW>) { chomp; ($user,$gcos) = (split /:/)[0,4]; ($real) = split(/,/, $gcos); $real{$user} = $real;}close(PW);open(WHO,"who|") || die "cannot open who pipe";while (<WHO>) { ($login, $rest) = /^(\S+)\s+(.*)/; $login = $real{$login} if $real{$login}; printf "%-30s %s\n",$login,$rest;}
Jump to first page
Examples 54-55
#ex_14-3#Learning Perl Appendix A, Exercise 14.3open(PW,"/etc/passwd");while (<PW>) { chomp; ($user,$gcos) = (split /:/)[0,4]; ($real) = split(/,/, $gcos); $real{$user} = $real;}close(PW);open(LPR,"|lpr") || die "cannot open LPR pipe";open(WHO,"who|") || die "cannot open who pipe";while (<WHO>) {# or replace previous two lines with: foreach $_ (`who`) { ($login, $rest) = /^(\S+)\s+(.*)/; $login = $real{$login} if $real{$login}; printf LPR "%-30s %s\n",$login,$rest;}
#ex_14-4#Learning Perl Appendix A, Exercise 14.4sub mkdir { !system "/bin/mkdir", @_;}
Jump to first page
Examples 56-57
#ex_14-5#Learning Perl Appendix A, Exercise 14.5sub mkdir { my($dir, $mode) = @_; (!system "/bin/mkdir", $dir) && chmod($mode, $dir);}
#ex_15-1#Learning Perl Appendix A, Exercise 15.1while (<>) { chomp; $slash = rindex($_,"/"); if ($slash > -1) { $head = substr($_,0,$slash); $tail = substr($_,$slash+1); } else { ($head,$tail) = ("", $_); } print "head = '$head', tail = '$tail'\n";}
Jump to first page
Examples 58-59
#ex_15-2#Learning Perl Appendix A, Exercise 15.2chomp(@nums = <STDIN>); # note special use of chomp@nums = sort { $a <=> $b } @nums;foreach (@nums) { printf "%30g\n", $_;}
#ex_15-3#Learning Perl Appendix A, Exercise 15.3open(PW,"/etc/passwd") || die "How did you get logged in?";while (<PW>) { chomp; ($user, $gcos) = (split /:/)[0,4]; ($real) = split(/,/, $gcos); $real{$user} = $real; ($last) = (split /\s+/, $real)[-1]; $last{$user} = "\L$last";}close(PW);for (sort by_last keys %last) { printf "%30s %8s\n", $real{$_}, $_;}sub by_last { ($last{$a} cmp $last{$b}) || ($a cmp $b) }
Jump to first page
Example 60 (top)
#ex_16-1#Learning Perl Appendix A, Exercise 16.1$: = " ";while (@pw = getpwent) { ($user, $gid, $gcos) = @pw[0,3,6]; ($real) = split /,/, $gcos; $real{$user} = $real; $members{$gid} .= " $user"; ($last) = (split /\s+/, $real)[-1]; $last{$user} = "\L$last";}while (@gr = getgrent) { ($gname,$gid,$members) = @gr[0,2,3]; $members{$gid} .= " $members"; $gname{$gid} = $gname;}
Jump to first page
Example 60 (bottom)
for $gid (sort by_gname keys %gname) { %all = (); for (split(/\s+/, $members{$gid})) { $all{$_}++ if length $_; } @members = (); foreach (sort by_last keys %all) { push(@members, "$real{$_} ($_)"); } $memberlist = join(", ", @members); write;}sub by_gname { $gname{$a} cmp $gname{$b}; }sub by_last { ($last{a} cmp $last{$b}) || ($a cmp $b); }format STDOUT =@<<<<<<<< @<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<$gname{$gid}, "($gid)", $memberlist~~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<$memberlist.
Jump to first page
Examples 61-62#ex_17-1#Learning Perl Appendix A, Exercise 17.1dbmopen(%ALIAS, "/etc/aliases", undef) || die "No aliases!: $!";while (($key,$value) = each(%ALIAS)) { chop($key,$value); print "$key $value\n";}
#ex_17-2#Learning Perl Appendix A, Exercise 17.2# program 1:dbmopen(%WORDS,"words",0644);while (<>) { foreach $word (split(/\W+/)) { $WORDS{$word}++; }}dbmclose(%WORDS);
# program 2:dbmopen(%WORDS,"words",undef);foreach $word (sort { $WORDS{$b} <=> $WORDS{$a} } keys %WORDS) { print "$word $WORDS{$word}\n";}dbmclose(%WORDS);
Jump to first page
Example 63
#ex_18-1#Learning Perl Appendix A, Exercise 18.1for (;;) { ($user,$home) = (getpwent)[0,7]; last unless $user; next unless open(N,"$home/.newsrc"); next unless -M N < 30; ## added value :-) while (<N>) { if (/^comp\.lang\.perl\.announce:/) { print "$user is a good person, ", "and reads comp.lang.perl.announce!\n"); last; } }}
Jump to first page
Example 64
#ex_19-1#Learning Perl Appendix A, Exercise 19.1use strict;use CGI qw(:standard);print header(), start_html("Add Me");print h1("Add Me");if(param()) { my $n1 = param('field1'); my $n2 = param('field2'); my $n3 = $n2 + $n1; print p("$n1 + $n2 = <strong>$n3</strong>\n");} else { print hr(), start_form(); print p("First Number:", textfield("field1")); print p("Second Number:", textfield("field2")); print p(submit("add"), reset("clear")); print end_form(), hr();}print end_html();
Jump to first page
Example 65 (top)
#ex_19-2#Learning Perl Appendix A, Exercise 19.2use strict;use CGI qw(:standard);print header(), start_html("Browser Detective");print h1("Browser Detective"), hr();my $browser = $ENV{'HTTP_USER_AGENT'};$_ = $browser;BROWSER:{ if (/msie/i) { msie($_); } elsif (/mozilla/i) { netscape($_); } elsif (/lynx/i) { lynx($_); } else { default($_); }}print end_html();
Jump to first page
Example 65 (bottom)
sub msie{ print p("Internet Explorer: @_. Good Choice\n");}sub netscape { print p("Netscape: @_. Good Choice\n");}sub lynx { print p("Lynx: @_. Shudder...");}sub default { print p("What the heck is a @_?");}
Jump to first page
Part II. CGI Programming Programming Web-based interfaces is one of the most
common uses of Perl. One typical application is related to the CGI-based
forms. CGI = “Common Gateway Interface”. CGI is used typically in different cases than Java. The CGI protocol implements Web forms using the
lower-level MIME media types. More than 85% of existing CGI scripts are written in
Perl. Two significant reasons: the ease of programming and
the availability of the CGI.pm module.
Jump to first page
Example
Jump to first page
Sample CGI Code
print"Content-type: text/html\n\n”,“<HTML><HEAD>
<TITLE>Search the Universe</TITLE></HEAD><BODY>
<CENTER><H1>Search the Universe</H1></CENTER> <HR> <FORM METHOD="GET" ACTION=cgi.pl
ENCTYPE=application/x-www-form-urlencoded> <P>Keywords: <INPUT TYPE=text NAME=keywords
VALUE=\”\” SIZE=80 MAXLENGTH=200> <P>Max. number of results to output:
<INPUT TYPE=text NAME=max_outputVALUE=20 SIZE=6 MAXLENGTH=6>
<P><HR>
<INPUT TYPE=submit NAME=Search VALUE=Search><INPUT TYPE=reset>
</FORM></BODY></HTML>”;
Jump to first page
Source Code Using CGI.pm#!/bin/perluse lib '/u/mask/radev/perl/lib/site_perl';use CGI;$query = new CGI;print $query->header, $query->start_html(-title=>'Search the Universe'), "<CENTER><H1>Search the Universe</H1></CENTER>", "<HR>", $query->startform('GET'), "<P>Keywords:", $query->textfield(-name=>'keywords', -default=>'', -size=>80, -maxlength=>200), "<P>Max. number of results to output:", $query->textfield(-name=>'max_output', -default=>'20', -size=>6, -maxlength=>6), "<P>", "<HR>", $query->submit(-name=>'Search'), $query->reset, $query->endform, $query->end_html;
Jump to first page
Part III. Advanced Topics
This section will cover a sampling of uses of Perl that go beyond the simple cases covered so far:
Accessing relational databases through Msql. Embedding C in Perl (xs) Embedding Perl in C (embed) TCP/IP programming Web robots Generating graphics on the fly
Jump to first page
Accessing relational databases
use Msql;
$dbh = Msql->connect($hostname,$databasename);
$sth = $dbh->query("select foo from bar");
$another_sth = $dbh->query("select bar from foo");
@row = $sth->fetchrow;
%hash = $sth->fetchhash;
$numrows = $sth->numrows; $numfields = $sth->numfields;
$length = $sth->length->[0];
@list => $sth->name;
Jump to first page
Embedding Perl in C (1)
#include <EXTERN.h> /* from the Perl distribution */ #include <perl.h> /* from the Perl distribution */
static PerlInterpreter *my_perl; /*** The Perl interpreter ***/
int main(int argc, char **argv, char **env) { my_perl = perl_alloc(); perl_construct(my_perl); perl_parse(my_perl, NULL, argc, argv, (char **)NULL); perl_run(my_perl); perl_destruct(my_perl); perl_free(my_perl); }
Jump to first page
Embedding Perl in C (2)
% cc -o interp interp.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
% interp print "Pretty Good Perl \n"; print "10890 - 9801 is ", 10890 - 9801; <CTRL-D> Pretty Good Perl 10890 - 9801 is 1089
% interp -e 'printf("%x", 3735928559)' deadbeef }
Jump to first page
Embedding Perl in C (3)
#include <EXTERN.h> #include <perl.h>
static PerlInterpreter *my_perl;
int main(int argc, char **argv, char **env) { char *args[] = { NULL }; my_perl = perl_alloc(); perl_construct(my_perl);
perl_parse(my_perl, NULL, argc, argv, NULL);
/*** skipping perl_run() ***/
perl_call_argv("showtime", G_DISCARD | G_NOARGS, args);
perl_destruct(my_perl); perl_free(my_perl); }
Jump to first page
Embedding Perl in C (4)
print "I shan't be printed.";
sub showtime { print time; }
% cc -o showtime showtime.c `perl -MExtUtils::Embed -e ccopts -e ldopts`
% showtime showtime.pl 818284590
Jump to first page
Embedding Perl in C (5)
Adding a Perl interpreter to your C program Calling a Perl subroutine from your C program Evaluating a Perl statement from your C program Performing Perl pattern matches and substitutions
from your C program Fiddling with the Perl stack from your C program Maintaining a persistent interpreter Maintaining multiple interpreter instances Using Perl modules, which themselves use C
libraries, from your C program
Jump to first page
Embedding C in Perl (RPC.xs -1)
#include "EXTERN.h" #include "perl.h" #include "XSUB.h"
#include <rpc/rpc.h>
typedef struct netconfig Netconfig;
MODULE = RPC PACKAGE = RPC
SV * rpcb_gettime(host="localhost") char *host PREINIT: time_t timep; CODE: ST(0) = sv_newmortal(); if( rpcb_gettime( host, &timep ) ) sv_setnv( ST(0), (double)timep );
Netconfig * getnetconfigent(netid="udp") char *netid
MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_
Jump to first page
Embedding C in Perl (RPC.xs -2)
void rpcb_DESTROY(netconf) Netconfig *netconf CODE: printf("NetconfigPtr::DESTROY\n"); free( netconf );
Jump to first page
Embedding C in Perl (typemap)
TYPEMAP Netconfig * T_PTROBJ
Jump to first page
Embedding C in Perl (RPC.pm)
package RPC;
require Exporter; require DynaLoader; @ISA = qw(Exporter DynaLoader); @EXPORT = qw(rpcb_gettime getnetconfigent);
bootstrap RPC; 1;
Jump to first page
Embedding C in Perl (rpctest.pl)
use RPC;
$netconf = getnetconfigent(); $a = rpcb_gettime(); print "time = $a\n"; print "netconf = $netconf\n";
$netconf = getnetconfigent("tcp"); $a = rpcb_gettime("poplar"); print "time = $a\n"; print "netconf = $netconf\n";
Jump to first page
TCP/IP Programming (client)#!/usr/bin/perl -w require 5.002; use strict; use Socket; my ($remote,$port, $iaddr, $paddr, $proto, $line);
$remote = shift || 'localhost'; $port = shift || 2345; # random port if ($port =~ /\D/) { $port = getservbyname($port, 'tcp') } die "No port" unless $port; $iaddr = inet_aton($remote) || die "no host: $remote"; $paddr = sockaddr_in($port, $iaddr);
$proto = getprotobyname('tcp'); socket(SOCK, PF_INET, SOCK_STREAM, $proto) || die "socket: $!"; connect(SOCK, $paddr) || die "connect: $!"; while ($line = <SOCK>) { print $line; }
close (SOCK) || die "close: $!"; exit;
Jump to first page
TCP/IP Programming (server - 1)
#!/usr/bin/perl -Twrequire 5.002;use strict;BEGIN { $ENV{PATH} = '/usr/ucb:/bin' }use Socket;use Carp;
sub spawn; # forward declarationsub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
my $port = shift || 2345;my $proto = getprotobyname('tcp');$port = $1 if $port =~ /(\d+)/; # untaint port number
socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!"; setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, pack("l", 1)) || die "setsockopt: $!";bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";listen(Server,SOMAXCONN) || die "listen: $!";
logmsg "server started on port $port";
Jump to first page
TCP/IP Programming (server - 2)
my $waitedpid = 0;my $paddr;
sub REAPER { $waitedpid = wait; $SIG{CHLD} = \&REAPER; # loathe sysV logmsg "reaped $waitedpid" . ($? ? " with exit $?" : '');}
$SIG{CHLD} = \&REAPER;
for ( $waitedpid = 0; ($paddr = accept(Client,Server)) || $waitedpid; $waitedpid = 0, close Client){ next if $waitedpid and not $paddr; my($port,$iaddr) = sockaddr_in($paddr); my $name = gethostbyaddr($iaddr,AF_INET);
logmsg "connection from $name [", inet_ntoa($iaddr), "] at port $port";
Jump to first page
TCP/IP Programming (server - 3)
spawn sub { print "Hello there, $name, it's now ", scalar localtime, "\n"; exec '/usr/games/fortune' or confess "can't exec fortune: $!"; };
}
sub spawn { my $coderef = shift;
unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') { confess "usage: spawn CODEREF"; }
my $pid; if (!defined($pid = fork)) { logmsg "cannot fork: $!"; return; } elsif ($pid) { logmsg "begat $pid"; return; # I'm the parent }
Jump to first page
TCP/IP Programming (server - 4)
# else I'm the child -- go spawn
open(STDIN, "<&Client") || die "can't dup client to stdin"; open(STDOUT, ">&Client") || die "can't dup client to stdout"; ## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr"; exit &$coderef();
Jump to first page
Web Robot (poacher - 1)
#!/bin/perl -wuse strict;require 5.002;
use WWW::Robot;use Getopt::Long;use IO::Pipe;use English;
use vars qw($VERSION);$VERSION = '0.005';
my $EXTERNAL = 0;my $SHOW_HELP = 0;my $SHOW_VERSION = 0;my $TRAVERSAL = 'depth';my $VERBOSE = 0;my $BOTNAME = 'Poacher';my $COMMAND;my $EMAIL;
my $siteRoot;my $robot;
Jump to first page
Web Robot (poacher - 2)&ParseCommandLine();&Initialise();$robot->run($siteRoot);exit 0;
#=======================================================================# Initialise() - initialise global variables, contents, tables, etc## This function sets up various global variables such as the version number# for WebAssay, the program name identifier, usage statement, etc.#=======================================================================sub Initialise{ my $oldfh;
#------------------------------------------------------------------- # Unbuffer standard output. Just in case we're running a command # on each URL - should ensure we don't get strangely interleaved # output from robot, us, and the command. #------------------------------------------------------------------- $oldfh = select STDOUT; $OUTPUT_AUTOFLUSH = 1; select $oldfh;
Jump to first page
Web Robot (poacher - 3) #------------------------------------------------------------------- # $robot: the WWW::Robot object we use to traverse web pages #------------------------------------------------------------------- $robot = new WWW::Robot( 'NAME' => $BOTNAME, 'VERSION' => $VERSION, 'EMAIL' => $EMAIL, 'TRAVERSAL' => $TRAVERSAL, 'VERBOSE' => $VERBOSE, );
if (!defined $robot) { die "Failed to create robot, unable to continue.\n"; }
$robot->addHook('follow-url-test', \&follow_url_test); $robot->addHook('invoke-on-contents', \&process_contents); $robot->addHook('invoke-on-get-error', \&process_get_error);
#$robot->proxy(['ftp', 'http', 'wais', 'gopher'], # 'http://horatio:8080/'); #$robot->no_proxy('canon2');
$robot->setAttribute('REQUEST_DELAY', 0);}
Jump to first page
Web Robot (poacher - 4)#=======================================================================# follow_url_test() - tell the robot module whether is should follow link#=======================================================================sub follow_url_test{ my $robot = shift; my $hook_name = shift; my $url = shift;
my $extension;
return 0 if $url->scheme ne 'http';
#------------------------------------------------------------------- # need to decide whether we should follow it. Rather than check # for .gif, .zip, etc, we look for extensions which we'll follow #------------------------------------------------------------------- if ($url =~ m![^/.]+\.([^/]+)$!) { $extension = $1; return 0 if ($extension !~ /^s?html?$/); }
#------------------------------------------------------------------- # Check whether URL is on the site
Jump to first page
Web Robot (poacher - 5) if ($url->as_string() =~ m!^$siteRoot!) { return 1; } else { #--------------------------------------------------------------- # An off-site link, save for later checking. # Hack this in for the next version :-) #--------------------------------------------------------------- return 0; }}
#=======================================================================# process_get_error() - hook function invoked whenever a GET fails#=======================================================================sub process_get_error{ my $robot = shift; my $hook_name = shift; my $url = shift; my $response = shift;
Jump to first page
Web Robot (poacher - 6) print "$url\n"; print " error code ", $response->code, "\n";}
#=======================================================================# process_contents() - process the contents of a URL we've retrieved#=======================================================================sub process_contents{ my $robot = shift; my $hook_name = shift; my $url = shift; my $response = shift; my $structure = shift; my $filename = shift;
return 1 if $response->content_type ne 'text/html'; print "$url\n";
if ($response->base eq $url.'/') { print " you should really have a trailing / on this URL.\n"; }
run_command($COMMAND, $filename) if defined $COMMAND;}
Jump to first page
Web Robot (poacher - 7)#------------------------------------------------------------------------# ParseCommandLine() - read configuration file for WebAssay## This function reads the configuration file for WebAssay. There are# three possible sources, which are tried in the following order:# 1. -f switch on the command-line# 2. $HOME/.webassay# 3. local site configuration file#------------------------------------------------------------------------sub ParseCommandLine{ my @switches = ( 'command=s', \$COMMAND, 'email=s', \$EMAIL, 'external', \$EXTERNAL, 'help', \$SHOW_HELP, 'verbose', \$VERBOSE, 'version', \$SHOW_VERSION, 'traversal=s', \$TRAVERSAL, );
&GetOptions(@switches) || die "use -help switch to display brief help\n";
Jump to first page
Web Robot (poacher - 8)
if ($SHOW_VERSION) { print "This is $BOTNAME, version $VERSION\n\n"; exit 0; }
if ($SHOW_HELP) { print <<EofHelp; $BOTNAME, v$VERSION - check a web site for broken links and other problems
Usage: poacher [ -external ] [ -command program ] url
-command program : run the specified program on every URL -email address : your contact email address -external : check external URLs referenced in pages -help : display this message -verbose : display verbose information as running -version : display the version of $BOTNAMEEofHelp exit 0; }
Jump to first page
Web Robot (poacher - 9)
if (@ARGV != 1) { die "$BOTNAME: you must give exactly one URL (the site root)\n"; }
#------------------------------------------------------------------- # We're gonna require the user to provide an email address. # We can probably be a lot smarter about working out a default. #------------------------------------------------------------------- if (!defined $EMAIL) { $EMAIL = $ENV{'USER'} || die "Please set your email address\n"; }
#-------------------------------------------------------------------- # A single URL on the command-line at this point: the URL for the # root of the site we are to check. #-------------------------------------------------------------------- $siteRoot = shift @ARGV;}
Jump to first page
Web Robot (poacher - 10)
#=======================================================================# run_command() - invoke a user-specified command on the page contents# $command - the command to invoke# $filename - path to file which we want to invoke the command on##=======================================================================sub run_command{ my $command = shift; my $filename = shift;
my $pipe;
$pipe = new IO::Pipe(); $pipe->reader("$command $filename") || do { warn "Failed to open a pipe from \"$command $filename\": $!\n"; return; };
Jump to first page
Web Robot (poacher - 11)
while (<$pipe>) { s/^/ /; print; } $pipe->close();}
EXAMPLE
% poacher -command 'weblint -s’ http://www.foobar.com/
Jump to first page
Generating graphics#!/bin/perl
use GD;
# create a new image
$im = new GD::Image(100,100);
# allocate some colors
$white = $im->colorAllocate(255,255,255);
$black = $im->colorAllocate(0,0,0);
$red = $im->colorAllocate(255,0,0);
$blue = $im->colorAllocate(0,0,255);
# make the background transparent and interlaced
$im->transparent($white);
$im->interlaced('true');
# Put a black frame around the picture
$im->rectangle(0,0,99,99,$black);
# Draw a blue oval
$im->arc(50,50,95,75,0,360,$blue);
# And fill it with red
$im->fill(50,50,$red);
# Convert the image to GIF and print it on standard output
print $im->gif;
Jump to first page
Part IV. A Multi-part Example We have covered a lot of material so far. This section
will show you a simple, yet fully operational Perl-based system that includes techniques from most of the course parts.
Universe is a system that organizes URLs into a hierarchy. It includes download, indexing, search, inclusion, deletion of Web pages. Information is stored in a relational database and can be retrieved based on user queries formulated through a Web-based interface.
More specifically, Universe uses modules in the CGI, HTML, LWP, MSQL hierarchies.
Jump to first page
Input Text
link_id L000001442
url http://www.uni-sofia.bg/
title Sofia University
author
cat1 ORGANIZATION
cat2 UNIVERSITY
cat3
cat4
email [email protected]
annotation The largest university
date_added 01 Aug 1994
date_indexed 27 Oct 1997
Jump to first page
Demo - Browsing
Jump to first page
Demo - Search
Jump to first page
Demo - Search
Jump to first page
Dmp2html.pl (top)
#!/bin/perl$/="";while (<>) { s/\n/\t/g; @record = split (/\t/); $path = "html"; while ($field = shift @record) { $value = shift @record; if ($value) { $path .= "/$value" if $field =~ /^cat/; if ($field =~ /^cat[1234]/) { mkdir $path, 0777 unless -d $path; } $url = "$value" if $field =~ /^url/; $title = "$value" if $field =~ /^title/; $annotation = "$value" if $field =~ /^annotation/; } } push @{ $HOL{$path} }, $title . "\t" . $url;}
Jump to first page
Dmp2html.pl (bottom)foreach $path (keys %HOL) { open (HTML, ">$path/.index.html") or die "$!"; $path1 = $path; $path1 =~ s/html\/(.*)/$1/; $path1 =~ s/\//:/g; if ($path1 eq "html") { $path1 = "TOP"; } print HTML "<H1>$path1</H1>"; print HTML "<UL>"; @list = (); foreach $i (0 .. $#{ $HOL{$path} }) { push (@list,$HOL{$path}[$i]); } foreach $i (sort @list) { ($title,$url) = split (/\t/,$i); if ($url =~ /^[A-Z\/]*$/) { $url =~ s/^([A-Z]*\/)*//; $title .= " [DIR]"; } print HTML "<LI><A HREF=\"$url\">$title</A>\n"; } print HTML "</UL>"; print HTML "<A HREF=\"..\">UP</A>\n" unless $path1 eq "TOP"; close HTML;}
Jump to first page
Webcopy.pl (top)
#!/bin/perl
use lib '/u/mask/radev/perl/lib/site_perl';
my $load_dir = "loaddir";
use LWP::UserAgent;use HTML::Parse;
while (<>) { chop; ($url,$file) = split (/\t/); my $ua = new LWP::UserAgent; $ua->agent("webcopy/0.1 " . $ua->agent); $ua->timeout(20); my $req = new HTTP::Request GET => $url; my $response = $ua->request($req); my $file_type = $response->content_type; print "$file_type\n";
Jump to first page
Webcopy.pl (bottom)
if (($file_type eq 'text/html') || ($file_type eq 'text/plain')) { print "$file\n"; open (FILE,"> $load_dir/$file") || die "File creation problem: $!"; if ($response->is_success) { $content = $response->content; if ($file_type eq 'text/html') { print FILE parse_html($content)->format; } else { print FILE $content; } } else { print "Error\n"; } } else { print "Not an HTML file\n"; }}
Jump to first page
Msql commands
INSERT into link VALUES ( 'L000001442', 'http://www.uni-sofia.bg/', 'Sofia University', '', 'ORGANIZATION', 'UNIVERSITY', '', '', ’[email protected]', ’The largest university', ’01-Aug-1994', ’27-Oct-1997') \g
Jump to first page
Dmp2msql.pl (top)#!/bin/perl
$/="";
&msqlcreate ();&msqlinsert ();
sub msqlcreate { print <<EOH;DROP TABLE link \\g
CREATE TABLE link (\tlink_id CHAR(10),\turl CHAR(190),\ttitle CHAR(110),\tauthor CHAR(60),\tcat1 CHAR(30),\tcat2 CHAR(20),\tcat3 CHAR(20),\tcat4 CHAR(20),\temail CHAR(40),\tannotation TEXT(50),\tdate_added TEXT(30),\tdate_indexed TEXT(30)) \\gEOH ;};
Jump to first page
Dmp2msql.pl (bottom)
sub msqlinsert { while (<>) { print "INSERT into link VALUES (\n"; s/\n/\t/g; @record = split (/\t/); while ($field = shift @record) { $value = shift @record; print "\t'$value'"; print "," if @record; print "\n"; } print ") \\g\n\n"; }}
Jump to first page
Bot format
http://www.mgu.bg/ http:&slash;&slash;www.mgu.bg&slash;http://www.uni-sofia.bg/ http:&slash;&slash;www.uni-sofia.bg&slash;http://www.aubg.edu/ http:&slash;&slash;www.aubg.edu&slash;http://www.tu-varna.acad.bg/ http:&slash;&slash;www.tu-varna.acad.bg&slash;http://www.unwe.acad.bg/ http:&slash;&slash;www.unwe.acad.bg&slash;http://www.ru.acad.bg/ http:&slash;&slash;www.ru.acad.bg&slash;http://www.medun.acad.bg/ http:&slash;&slash;www.medun.acad.bg&slash;http://www.uacg.acad.bg/ http:&slash;&slash;www.uacg.acad.bg&slash;http://www.uctm.acad.bg/ http:&slash;&slash;www.uctm.acad.bg&slash;http://www.vmei.acad.bg/ http:&slash;&slash;www.vmei.acad.bg&slash;
Jump to first page
Dmp2bot.pl
#!/bin/perl
#### Converts Universe .dmp file to .bot format##
use URI::URL;
while (<>) { if (s/url\t(http.*)/$eurl = $1/e) { my $url = url $eurl; $epath = $url->as_string; $epath =~ s/\//&slash\;/g; $epath =~ s/~/\&tilde\;/g; print "$url\t$epath\n"; }}
Jump to first page
Msql2dmp.pl#!/bin/perl
while (<>) { next unless /INSERT INTO/; s/^.*\(\'(.*)\'\)\\g/$1/; @line = split (/\',\'/); ($link_id,$url,$title,$author,$cat1,$cat2,$cat3,$cat4,$email,$annotation,$date_added,$date_indexed) = @line; print <<EOH;link_id\t$link_idurl\t$urltitle\t$titleauthor\t$authorcat1\t$cat1cat2\t$cat2cat3\t$cat3cat4\t$cat4email\t$emailannotation\t$annotationdate_added\t$date_addeddate_indexed\t$date_indexedEOH ;
Jump to first page
Create-index.pl (top)
#!/bin/perl
@docs = @ARGV;
dbmopen %INDEX, "index", 0666 or die "Can't create index: $!\n";
dbmopen %DOCS, "docs", 0666 or die "Can't create docs: $!\n";
undef %INDEX;undef %DOCS;
$doc_no = 0;
Jump to first page
Create-index.pl (bottom)
foreach $doc (@docs) { $doc_no++; $DOCS{$doc_no} = $doc; open (DOC,$doc); %LIST = (); while (<DOC>) { @words = split (/([\]\[\)\(\'\s])+/,$_); foreach $wd (@words) { $lwd = $wd; $lwd =~ tr/A-Z/a-z/; unless ($lwd =~ /^\s*$/) { $LIST{$lwd} = 1; } } } foreach $wd (keys %LIST) { $INDEX{$wd} .= $doc_no . "\n"; }}
dbmclose %INDEX;dbmclose %DOCS;
Jump to first page
Search-index.cgi (part 1)#!/bin/perl
use lib '/u/mask/radev/perl/lib/site_perl';use CGI;use Msql;
$msql_hostname = "bluewhale";$msql_databasename = "universe";
$query = new CGI;
dbmopen %INDEX, "index", 0666 or die "Can't open index: $!\n";dbmopen %DOCS, "docs", 0666 or die "Can't open docs: $!\n";
print &start_cgi;if ($query->param) { $query_string = $query->param('keywords'); $max_output = $query->param('max_output'); @keywords = split (/ /, $query_string); &do_msql_init; &do_search;};print &do_cgi;print &end_cgi;
Jump to first page
Search-index.cgi (part 2)sub start_cgi { $query->header, $query->start_html(-title=>'Search the Universe', -author=>'[email protected]', -BGCOLOR=>'white'),}sub end_cgi { $query->end_html;}sub do_cgi { "<CENTER><H1>Search the Universe</H1></CENTER>", "<HR>", $query->startform('GET'), "<P>Keywords:", $query->textfield(-name=>'keywords', -default=>'', -size=>80, -maxlength=>200), "<P>Max. number of results to output:", $query->textfield(-name=>'max_output', -default=>'20', -size=>6, -maxlength=>6), "<P>", "<HR>", $query->submit(-name=>'Search'), $query->reset;
Jump to first page
Search-index.cgi (part 3) $query->endform, "<P><A HREF=html>[BROWSE]</A>";}sub do_msql_init { unless ($dbh = Msql->connect($msql_hostname, $msql_databasename)) { print $Msql::db_errstr; }}sub do_search { print "<CENTER><H1>Search Results</H1></CENTER>", "<HR>"; foreach $kwd (@keywords) { $doc_ct++; $kwd =~ tr/A-Z/a-z/; foreach $doc_no (split /\n/, $INDEX{$kwd}) { $match{$doc_no}++; } } print "<P>Total number of words in query: $doc_ct", "<P>Top $max_output URLs that match the query: <B>@keywords</B><P>", "<CENTER>", "<TABLE BORDER>",
Jump to first page
Search-index.cgi (part 4)
"<TABLE BORDER>", "<TR><TH>Count</TH>”, "<TH>URL</TH>”, "<TH>Category</TH></TR>";
foreach $no (sort revbymatch keys %match) { $url = $DOCS{$no}; $url =~ s/.*\/(.*)/$1/; $url =~ s/˜/~/g; $url =~ s/&slash;/\//g;
$sth = $dbh->query ("select title,cat1,cat2,cat3,cat4 from link where url = '$url'");
($title,$cat1,$cat2,$cat3,$cat4) = $sth->fetchrow();
$category = "html" . "/" . $cat1; $category .= "/" . $cat2 if $cat2; $category .= "/" . $cat3 if $cat3; $category .= "/" . $cat4 if $cat4;
Jump to first page
Search-index.cgi (part 5)
print "<TR>", "<TD>$match{$no}</TD>", "<TD><A HREF=$url>$title</A></TD>", "<TD><A HREF=$category>$category</A></TD>", "</TR>\n" unless $show_ct++ >= $max_output; } print "</TABLE>", "</CENTER>", "<HR>";}
sub revbymatch { $match{$b} <=> $match{$a};}
Jump to first page
Acknowledgments
Larry Wall (the father of Perl) Tom Christiansen, Randal Schwartz
(“Programming Perl”, “Learning Perl”)Most examples in Sections 1 and 2 are from “Learning Perl”
Tom Christiansen, Nathan Torkington (“Perl Cookbook”).
David Medinets (“Perl 5 By Example”) Sriram Srinivasan (“Advanced Perl Programming”) Module maintainers (CGI: Lincoln Stein, LWP:
Gisle Aas, MSQL: David Hughes and Andreas Koenig, Robot: Neil Bowers, etc.)
Jump to first page
More Acknowledgments
Jon Orwant and and Doug MacEachern (perlembed)
Dean Roehrich (perlxs) Thomas Boutell and Lincoln Stein
(GD)
Jump to first page
Even More Acknowledgments
Michael Elhadad who was my host at Ben-Gurion University in 1997 where this mini-course was taught for the first time.
Jump to first page
Book References Programming Perl, O’Reilly and
Associates, Inc. 1997. Learning Perl, O’Reilly and Associates,
Inc. 1997. Advanced Perl Programming , O’Reilly and
Associates, Inc. 1996. Perl 5 by Example, QUE, 1996. Perl CookBook. O’Reilly and Associates,
Inc. 1998. Programming the Perl DBI. O’Reilly, 2000.
Jump to first page
Web Sites
The Perl Language home page (www.perl.com)
The Perl Institute (www.perl.org) The Perl Journal
(http://orwant.www.media.mit.edu/the_perl_journal/)
The Comprehensive Perl Archive Network (http://www.perl.com/CPAN-local/CPAN.html)
The Perl FAQ(http://language.perl.com/info/documentation.html)