|
|
|
History and Overview
-
1980s Larry Wall, system hacker and guru
-
Perl - Program Extraction and Report Language
-
hybrid compilation and interpretation
-
C-like "shell" or scripting language
-
type system - dynamic type binding, strong typing, lots of run-time
type coercion
-
string manipulation - many string operators including regular expression matching
and substitution
-
scope and storage binding - implicit declarations and explicit declarations
with static scoping, lots of heap bindings (for strings, arrays, etc.),
run-time memory manager does garbage collection, implicit allocation and
deallocation
-
built-in data structures - lists/arrays, associative arrays/hash tables, strings
-
at least five ways to do everything - a garbage can language! Perl Motto: There's more
than one way to do it!
-
love it or hate it, Perl is "useful"
Hello World
In a file, such as Helloworld.pl
# Helloworld.pl
# run using `perl -w Helloworld.pl'
print "Hello World!\n";
As a shell script.
#!/usr/local/bin/perl -w
# Helloworld.pl
# run using `Helloworld.pl'
print "Hello World!\n";
Comments
A # starts a comment.
# I am one line comment
print "Hello"; # now I can put a comment on this line
Scalars
A scalar is an integer, real, octal number, hexadecimal number, string,
or interpolated string. Only the latter two differ significantly from
C.
Scalar variables start with $. Arithmetic, comparison and
assignment operators are similar to those in the C programming language.
Variables have global scope unless otherwise declared using my.
Since global scope is a bad thing, usually, you can include the line
use strict;
at the beginning of each program to have the compiler warn you about
forgotten declarations.
In boolean operations, 0, "0", and "" are false while
non-empty string and nonzero numbers are true.
$a = 1 + 2; # Add 1 and 2 and store in $a
$a = 3 - 4; # Subtract 4 from 3 and store in $a
$a = 5 * 6; # Multiply 5 and 6
$a = 7 / 8; # Divide 7 by 8 to give 0.875
$a = 9 ** 10; # Nine to the power of 10
$a = 5 % 2; # Remainder of 5 divided by 2
++$a; # Increment $a and then return it
$a++; # Return $a and then increment it
--$a; # Decrement $a and then return it
$a--; # Return $a and then decrement it
$a = $b; # Assign $b to $a
$a += $b; # Add $b to $a
$a -= $b; # Subtract $b from $a
$a = $b . $c; # Concatenate $b and $c
$a .= $b; # Append $b onto $a
$a == $b # Is $a numerically equal to $b?
$a != $b # Is $a numerically unequal to $b?
$a eq $b # Is $a string-equal to $b?
$a ne $b # Is $a string-unequal to $b?
($a && $b) # Is $a and $b true?
($a || $b) # Is either $a or $b true?
!($a) # is $a false?
An uninterpreted string (one in single quote characters) evaluates verbatim.
print 'hello'; # prints hello
print 'hello\n'; # prints hello\n
print '$xhello\n'; # prints $xhello\n
In an interpolated string (one in double quote characters)
all variables and control characters are
substituted.
print "hello"; # prints hello
print "hello\n"; # prints hello followed by a newline
$x = 2;
print "$xhello\n"; # prints a carriage return ($xhello is
# undefined, warning if -w is set)
print "${x}hello\n"; #prints 2hello followed by a newline
@x = (1, 2, 3);
print "@x"; # prints 1 2 3
List of scalars
A list of scalars can hold any number of scalar elements (the new version
of Perl has lists of lists as well).
A list literal is delimited with brackets. The scalar elements
are separated with commas.
() # the empty list
(1, 2) # a list with two numbers
(1, 'hello', 2.5) # a list with three elements
Elements in a list can be referred to by array position. There is
no `bounds' checking. List variables start with `@'.
@x = (0, 1, 2); # @x is a list with three elements
$x[2] = 47; # @x now is (0, 1, 47)
$x[0] = 4; # @x now is (4, 1, 47)
print $x[5]; # the list has only three elements, so the
# sixth array location is uninitialised and
# a warning message will be printed (with -w)
A list can dynamically grow.
@x = (0, 1, 2); # @x is a list with three elements
$x[3] = 47; # @x now is (0, 1, 2, 47)
$x[500] = 47; # @x now has a 47 in the 501st array location
A list can be manipulated like a queue.
@x = (0, 1, 2); # @x is a list with three elements
$first = shift @x; # equivalent to a queue get, @x is now (1, 2)
unshift @x, 3; # equivalent to a queue push, @x is now (3, 1, 2)
A list can be manipulated like a stack.
@x = (0, 1, 2); # @x is a list with three elements
$top = pop @x; # equivalent to a stack pop, @x is now (0, 1)
push @x, 3; # equivalent to a stack push, @x is now (1, 2, 3)
Some other useful list manipulations.
@x = (2, 0, 1); # @x is a list with three elements
@sorted = sort @x; # sort the list, yields the list (0, 1, 2)
# iterate through a list using foreach
foreach $element (@x) {
print $element; # $element will be the value of
# successive elements
}
scalar(@x); # the size of the list
if (defined $x[23]) ... # test to determine if a location
# is initialised
Associative Arrays of Scalars
An associate array (or hash table or just hash for short) is a data
structure that provides
a mapping from a key scalar to a value scalar.
An associative array literal is delimited with brackets.
An arrow indicates a mapping from a key to a value.
The key,value pairs are separated with commas.
{} # the empty array
() # also the empty array
(1 => 2) # map the key 1 to the value 2
('hello' => 2, 3 => 'good') # map hello to 2 and 3 to good
Elements in a list are referred to by key. There is
no `bounds' checking. Associative array variables start with %.
%x = {}; # %x is empty
$x{'hello'} = 2; # key hello maps to value 2
$key = 'joe';
$x{$key} = 'jim'; # key joe maps to value jim
print $x{'sam'}; # there is no key sam in the array, that array
# location is uninitialised and a warning message
# will be printed (with -w turned on)
delete $x{'hello'}; # remove key hello from table x
if (defined $x{'joe'}) ... # test to determine if a key is defined
@keys = keys %x; # creates a list of keys, e.g., ('hello, 'joe')
@values = values %x; # creates a list of values, e.g., (2, 'jim')
All associative arrays can grow dynamically.
Flow of Control
Perl has the same basic control statements as C and Pascal, plus
a few more. Unlike C however, the opening and closing brackets
are not optional. Below are some useful examples.
Interpret the ... as one or more Perl statements.
# if
if (...) {...}
# if-then-else
if (...) {...} else {...}
# nested if-then-elses
if (...) {...} elsif (...) {...} ... else {...}
# a for loop
for ($i = 0; $i < $max; $i++) {...}
# a while loop
while (...) {...}
Below are some un-C-like examples of control statements.
print $x unless $x > 20; # S unless C means execute S unless C is true
print $x if $x > 20; # S if C means execute S if C is true
$x > 20 || print $x; # C || S means if C is false then execute S
File Input/Output
Opening files is easy in Perl, but remember to check to see if the open
failed or not. A file may not be open for both input and output.
# Open for output
if (!open(OUT, ">$filename")) { die "Could not open $filename"; }
# An alternative form
open(OUT, ">$filename") || die "Could not open $filename";
# Open for appending
open(APPEND, ">>$filename") || die "Could not open $filename";
# Open for input
open(IN, "<$filename") || die "Could not open $filename";
# Pipe standard output of a Unix command to input in our program
open(LS, "/usr/bin/ls |") || die "Could not open ls command";
# Pipe our output to the standard input of a Unix command
open(GREP, "| /usr/bin/grep") || die "Could not open grep command";
# Be sure to close a file when you are done!
# Here are two equivalent forms.
close(OUT);
close IN;
The names OUT, APPEND, IN, LS, GREP in the above examples are called file handles and exist
in a namespace separate from the variables, etc. File handles should be written in ALL CAPS --
otherwise you'll get a warning.
To print to a file handle that has been opened for output, just put the
file handle between the print and the string.
# An alternative form
$filename = "outfile";
open(OUT, ">$filename") || die "Could not open $filename";
print OUT "hello\n"; # output hello to outfile
close OUT;
Input can be read one line at a time (a scalar context) or every line
at a single go (a list context)! Just put the FILE variable inside
angle brackets. In the scalar context, to detect end of file, test for
false (empty string).
open(IN, "<inputfile") || die "Could not open inputfile";
@lines = <IN>; # read the entire input file as a list of lines
close IN;
open(IN, "<inputfile") || die "Could not open inputfile";
while ($line = <IN>) { # read one line from the input file at a time
...
}
close IN;
# the $_ variable is sometimes useful
open(IN, "<inputfile") || die "Could not open inputfile";
while (<IN>) { # read one line from the input file at a time
$line = $_; # the line is input into the $_ variable
}
close IN;
STDIN is an already opened file corresponding to standard input.
So you can do the following to read from standard input.
@lines = <STDIN>; # read all of standard input as a list of lines
Alternatively, use
while (<STDIN>) { # read one line from standard input at a time
$line = $_;
}
STDOUT and STDERR are already opened for output.
Lines read from files contain trailing newline characters. To get rid of them use chomp.
chomp $line;
Note that chomp updates the variable in place.
Regular Expressions
A regular expression is a pattern of characters. Regular expressions are
used to find patterns in strings.
In Perl, a regular expression is specified by giving the pattern directly inside
of / symbols.
For instance, the pattern /xxx/ is interpreted as the pattern of three
consecutive x's. However, the following symbols have special meaning.
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
() Group characters
[] Character class, match any character in the class
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-word character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
\b Match a word boundary
\B Match a non-(word boundary)
\A Match only at beginning of string
\Z Match only at end of string (or before newline at the end)
\G Match only where previous m//g left off
\t tab
\n newline
\f form feed
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
\U uppercase till \E (think vi)
\E end case modification (think vi)
\Q quote regexp metacharacters till \E
Here are some example patterns.
/The/ # the sequence 'The'
/The / # the sequence 'The' followed by a blank
/The\s/ # the sequence 'The' followed by a whitespace character
/\sThe\s/ # the sequence 'The' preceded and followed by whitespace
/\sThe\s/ # the sequence 'The' preceded and followed by whitespace
/^The\s/ # the sequence 'The' that starts a line
# followed by whitespace
/[The]/ # any character in the set {T, h, e}
/Th.s/ # the sequence 'Th' followed by any character
# followed by 's'
In addition, each character or pattern in ()
can be modified with the following modifiers.
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
Here are some example patterns.
/(The)?/ # either '' or 'The'
/(The)*/ # either '' or 'The' or 'TheThe' or 'TheTheThe' etc.
/(The)+/ # either 'The' or 'TheThe' or 'TheTheThe' etc.
/(The){3}/ # only 'TheTheThe'
/\d+\s/ # any number of digits followed by whitespace
String Matching
The string =~ re operation performs string matching, where
string is the string to be matched, and re is the
pattern or regular expression to match. The operation will be false
if the regular expression does not match, but true otherwise.
$string = "The rain in Spain stays\n mainly on the plain.\n";
# spain will not be found since the pattern is case-sensitive
if ($string =~ /spain/) { print 'spain found'; }
# the i modifier makes the pattern case insensitive
print 'spain found' if $string =~ /spain/i;
# only the first line will be matched, so this will fail
if ($string =~ /plain/) { print 'plain found'; }
# use the m modifier to do multi-line matching
if ($string =~ /plain/m) { print 'plain found'; }
When a string is matched, several variables are automatically updated.
$` is the part of the string before the matched pattern
$& is the part of the string that matched the pattern
$' is the part of the string after the matched pattern
$1 is the part of the string that matched the first group in ()
$2 is the part of the string that matched the second group in ()
$3 is the part of the string that matched the third group in ()
etc.
Here are some examples.
$string = "The rain in Spain stays\nmainly on the plain.\n";
$string =~ /rain/;
print "$`\n"; # prints 'The '
print "$&\n"; # prints 'rain'
print "$'\n"; # prints ' in Spain...';
$string =~ /(\w+)\s+(\w+)\s+(\w+)/;
print "$1\n"; # prints 'The'
print "$2\n"; # prints 'in'
print "$3\n"; # prints 'spain'
print "$`\n"; # prints ''
print "$&\n"; # prints 'The rain in'
print "$'\n"; # prints ' Spain...';
In many cases, a user may wish to substitute some text directly into
a string, or change some characters in a string into others.
$string = "The rain in Spain stays\nmainly on the plain.\n";
# translate a to f
$string =~ tr/a/f/; # The rfin in Spfin stfys\nmfinly on the plfin.\n
# translate all upper-case to lower-case
$string =~ tr/A-Z/a-z/;
# the rfin in spfin stfys\nmfinly on the plfin.\n
# translate blanks and \n to to z
$string =~ tr/ \n/zz/;
# thezrfinzinzspfinzstfyszmfinlyzonzthezplfin.z
$string = "The rain in Spain stays\nmainly on the plain.\n";
# substitute first occurrence of rain with rhine
$string =~ s/rain/rhine/;
# The rhine in Spain stays\nmainly on the plain.\n
# substitute first occurrence of whitespace with the empty string
$string =~ s/\s+//;
# Therhine in Spain stays\nmainly on the plain.\n
# substitute all occurrences of whitespace with the empty string
$string =~ s/\s+//g;
# TherhineinSpainstaysmainlyontheplain.
There are two useful string matching functions. split splits a
string into a list by breaking it up into pieces separated by a
given pattern. join joins the list, inserting text into the
joined places.
# split the sentence up into a list of words
@words = split(/\s/,
"The rain in Spain stays\nmainly on the plain.\n");
print join("\n", @words);
The $_ variable is the default string for string matching and for
file input and output, so if a variable is omitted, $_ is used.
This can often save a lot of typing.
# this program will read from standard input and print all lines that
# do not start with some whitespace followed by '#'
while (<>) { print $_ unless /^\s+#/; }
Subroutines
Subroutines start with the word sub, and the body of the subroutine is
contained within brackets. No parameters are declared.
sub factorial {
...
}
Subroutines cannot be nested. A subroutine is called typing its name and
a parameter list, possibly the empty list.
Within the body of the subroutine, the
@_ variable contains the parameter list. Use the my function
to declare local variables and initialise them from the passed parameter
list.
factorial(4); # call the factorial function passing the list (4)
power(4, 3); # call the power function passing the list (4, 3)
sub factorial {
my ($x) = @_; # $x is local to the factorial subroutine
... # and is initialised with the first element in the
# passed parameter list
}
sub power {
my ($x, $y) = @_;
# $x and $y are local to the power subroutine
... # and are initialised with the first and second
# elements respectively in the passed parameter list
}
Return a value from a subroutine using the return statement.
$x = factorial(4); # call the factorial function passing the list (4)
sub factorial {
my ($x) = @_;
...
return $result; # return the result computed
}
POD - Plain Old Documentation
Literate programming is the writing of a program primarily for
human understanding rather than computer understanding (effectively
programming as literature). perlpod is a tool to translate the
documentation in your program into some pretty format, e.g., html, rtf,
LaTeX, POD is embedded documentation.
pod commands start with an =.
#-----------------------------------------------------------
=head2 factorial(int x)
=over 4
=item *
x - Compute the factorial of x, that is x!
=back
Will compute the factorial of x, defined as
x! = x * (x - 1) * ... 1
or 1 if x is 0.
Returns the value computed.
=cut
#-----------------------------------------------------------
sub factorial {
...
}
To create the documentation, use pod2html or pod2man.
|