logo
 
Introduction to Perl
CptS 355 - Programming Language Design
Washington State University
Home
Calendar
Syllabus
Resources
People
Project turn-in

History and Overview

  • 1980s Larry Wall, system hacker and guru
  • Perl - Program Extraction and Report Language
  • hybrid compilation and interpretation
  • C-like "shell" or scripting language
  • type system - dynamic type binding, strong typing, lots of run-time type coercion
  • string manipulation - many string operators including regular expression matching and substitution
  • scope and storage binding - implicit declarations and explicit declarations with static scoping, lots of heap bindings (for strings, arrays, etc.), run-time memory manager does garbage collection, implicit allocation and deallocation
  • built-in data structures - lists/arrays, associative arrays/hash tables, strings
  • at least five ways to do everything - a garbage can language! Perl Motto: There's more than one way to do it!
  • love it or hate it, Perl is "useful"

Hello World

In a file, such as Helloworld.pl
# Helloworld.pl
#   run using `perl -w Helloworld.pl' 
  print "Hello World!\n";
As a shell script.
#!/usr/local/bin/perl -w
# Helloworld.pl
#   run using `Helloworld.pl'
  print "Hello World!\n";

Comments

A # starts a comment.
# I am one line comment
  print "Hello";  # now I can put a comment on this line

Scalars

A scalar is an integer, real, octal number, hexadecimal number, string, or interpolated string. Only the latter two differ significantly from C. Scalar variables start with $. Arithmetic, comparison and assignment operators are similar to those in the C programming language. Variables have global scope unless otherwise declared using my. Since global scope is a bad thing, usually, you can include the line
use strict;
at the beginning of each program to have the compiler warn you about forgotten declarations. In boolean operations, 0, "0", and "" are false while non-empty string and nonzero numbers are true.
   $a = 1 + 2;        # Add 1 and 2 and store in $a
   $a = 3 - 4;        # Subtract 4 from 3 and store in $a
   $a = 5 * 6;        # Multiply 5 and 6
   $a = 7 / 8;        # Divide 7 by 8 to give 0.875
   $a = 9 ** 10;      # Nine to the power of 10
   $a = 5 % 2;        # Remainder of 5 divided by 2
   ++$a;              # Increment $a and then return it
   $a++;              # Return $a and then increment it
   --$a;              # Decrement $a and then return it
   $a--;              # Return $a and then decrement it
   $a = $b;           # Assign $b to $a
   $a += $b;          # Add $b to $a
   $a -= $b;          # Subtract $b from $a
   $a = $b . $c;      # Concatenate $b and $c
   $a .= $b;          # Append $b onto $a
   $a == $b           # Is $a numerically equal to $b?
   $a != $b           # Is $a numerically unequal to $b?
   $a eq $b           # Is $a string-equal to $b?
   $a ne $b           # Is $a string-unequal to $b?
   ($a && $b)         # Is $a and $b true?
   ($a || $b)         # Is either $a or $b true?
   !($a)              # is $a false?
An uninterpreted string (one in single quote characters) evaluates verbatim.
   print 'hello';     # prints hello
   print 'hello\n';   # prints hello\n
   print '$xhello\n'; # prints $xhello\n
In an interpolated string (one in double quote characters) all variables and control characters are substituted.
   print "hello";     # prints hello
   print "hello\n";   # prints hello followed by a newline
   $x = 2;
   print "$xhello\n"; # prints a carriage return ($xhello is 
                      # undefined, warning if -w is set)
   print "${x}hello\n"; #prints 2hello followed by a newline
   @x = (1, 2, 3);
   print "@x";        # prints 1 2 3

List of scalars

A list of scalars can hold any number of scalar elements (the new version of Perl has lists of lists as well). A list literal is delimited with brackets. The scalar elements are separated with commas.
   ()                 # the empty list
   (1, 2)             # a list with two numbers
   (1, 'hello', 2.5)  # a list with three elements
Elements in a list can be referred to by array position. There is no `bounds' checking. List variables start with `@'.
   @x = (0, 1, 2);    # @x is a list with three elements
   $x[2] = 47;        # @x now is (0, 1, 47)
   $x[0] = 4;         # @x now is (4, 1, 47)
   print $x[5];       # the list has only three elements, so the 
                      # sixth array location is uninitialised and 
                      # a warning message will be printed (with -w)
A list can dynamically grow.
   @x = (0, 1, 2);    # @x is a list with three elements
   $x[3] = 47;        # @x now is (0, 1, 2, 47)
   $x[500] = 47;      # @x now has a 47 in the 501st array location
A list can be manipulated like a queue.
   @x = (0, 1, 2);    # @x is a list with three elements
   $first = shift @x; # equivalent to a queue get, @x is now (1, 2)
   unshift @x, 3;     # equivalent to a queue push, @x is now (3, 1, 2)
A list can be manipulated like a stack.
   @x = (0, 1, 2);    # @x is a list with three elements
   $top = pop @x;     # equivalent to a stack pop, @x is now (0, 1)
   push @x, 3;        # equivalent to a stack push, @x is now (1, 2, 3)
Some other useful list manipulations.
   @x = (2, 0, 1);    # @x is a list with three elements
   @sorted = sort @x; # sort the list, yields the list (0, 1, 2)
                      # iterate through a list using foreach
   foreach $element (@x) {
      print $element; # $element will be the value of 
                      #       successive elements
     }               
   scalar(@x);        # the size of the list
   if (defined $x[23]) ...  # test to determine if a location 
                            #    is initialised

Associative Arrays of Scalars

An associate array (or hash table or just hash for short) is a data structure that provides a mapping from a key scalar to a value scalar. An associative array literal is delimited with brackets. An arrow indicates a mapping from a key to a value. The key,value pairs are separated with commas.
   {}                 # the empty array 
   ()                 # also the empty array
   (1 => 2)           # map the key 1 to the value 2
   ('hello' => 2, 3 => 'good')  # map hello to 2 and 3 to good
Elements in a list are referred to by key. There is no `bounds' checking. Associative array variables start with %.
   %x = {};           # %x is empty
   $x{'hello'} = 2;   # key hello maps to value 2
   $key = 'joe';
   $x{$key} = 'jim';  # key joe maps to value jim
   print $x{'sam'};   # there is no key sam in the array, that array
                      # location is uninitialised and a warning message 
                      # will be printed (with -w turned on)
   delete $x{'hello'};  # remove key hello from table x
   if (defined $x{'joe'}) ...  # test to determine if a key is defined
   @keys = keys %x;     # creates a list of keys, e.g., ('hello, 'joe')
   @values = values %x; # creates a list of values, e.g., (2, 'jim')
All associative arrays can grow dynamically.

Flow of Control

Perl has the same basic control statements as C and Pascal, plus a few more. Unlike C however, the opening and closing brackets are not optional. Below are some useful examples. Interpret the ... as one or more Perl statements.
  # if 
  if (...) {...}
  # if-then-else
  if (...) {...} else {...}
  # nested if-then-elses
  if (...) {...} elsif (...) {...} ... else {...}
  # a for loop 
  for ($i = 0; $i < $max; $i++) {...}
  # a while loop 
  while (...) {...}
Below are some un-C-like examples of control statements.
  print $x unless $x > 20; # S unless C means execute S unless C is true
  print $x if $x > 20;     # S if C means execute S if C is true
  $x > 20 || print $x;     # C || S means if C is false then execute S

File Input/Output

Opening files is easy in Perl, but remember to check to see if the open failed or not. A file may not be open for both input and output.
  # Open for output
  if (!open(OUT, ">$filename")) { die "Could not open $filename"; }     
  # An alternative form
  open(OUT, ">$filename") || die "Could not open $filename"; 
  # Open for appending
  open(APPEND, ">>$filename") || die "Could not open $filename"; 
  # Open for input
  open(IN, "<$filename") || die "Could not open $filename"; 
  # Pipe standard output of a Unix command to input in our program
  open(LS, "/usr/bin/ls |") || die "Could not open ls command"; 
  # Pipe our output to the standard input of a Unix command 
  open(GREP, "| /usr/bin/grep") || die "Could not open grep command"; 

  # Be sure to close a file when you are done!  
  # Here are two equivalent forms.
  close(OUT);
  close IN;
The names OUT, APPEND, IN, LS, GREP in the above examples are called file handles and exist in a namespace separate from the variables, etc. File handles should be written in ALL CAPS -- otherwise you'll get a warning.

To print to a file handle that has been opened for output, just put the file handle between the print and the string.

  # An alternative form
  $filename = "outfile";
  open(OUT, ">$filename") || die "Could not open $filename"; 
  print OUT "hello\n";    # output hello to outfile
  close OUT;
Input can be read one line at a time (a scalar context) or every line at a single go (a list context)! Just put the FILE variable inside angle brackets. In the scalar context, to detect end of file, test for false (empty string).
  open(IN, "<inputfile") || die "Could not open inputfile"; 
  @lines = <IN>;       # read the entire input file as a list of lines
  close IN;

  open(IN, "<inputfile") || die "Could not open inputfile"; 
  while ($line = <IN>) {  # read one line from the input file at a time
    ...
    }
  close IN;

  # the $_ variable is sometimes useful
  open(IN, "<inputfile") || die "Could not open inputfile"; 
  while (<IN>) {          # read one line from the input file at a time
    $line = $_;           # the line is input into the $_ variable 
    }
  close IN;
STDIN is an already opened file corresponding to standard input. So you can do the following to read from standard input.
  @lines = <STDIN>;    # read all of standard input as a list of lines
Alternatively, use
  while (<STDIN>) {    # read one line from standard input at a time
    $line = $_;
    }
STDOUT and STDERR are already opened for output.

Lines read from files contain trailing newline characters. To get rid of them use chomp.

   chomp $line;
Note that chomp updates the variable in place.

Regular Expressions

A regular expression is a pattern of characters. Regular expressions are used to find patterns in strings. In Perl, a regular expression is specified by giving the pattern directly inside of / symbols. For instance, the pattern /xxx/ is interpreted as the pattern of three consecutive x's. However, the following symbols have special meaning.
   ^     Match the beginning of the line
   .     Match any character (except newline)
   $     Match the end of the line (or before newline at the end)
   ()    Group characters
   []    Character class, match any character in the class
   \w    Match a "word" character (alphanumeric plus "_")
   \W    Match a non-word character
   \s    Match a whitespace character
   \S    Match a non-whitespace character
   \d    Match a digit character
   \D    Match a non-digit character
   \b    Match a word boundary
   \B    Match a non-(word boundary)
   \A    Match only at beginning of string
   \Z    Match only at end of string (or before newline at the end)
   \G    Match only where previous m//g left off
   \t    tab
   \n    newline
   \f    form feed
   \l    lowercase next char (think vi)
   \u    uppercase next char (think vi)
   \L    lowercase till \E (think vi)
   \U    uppercase till \E (think vi)
   \E    end case modification (think vi)
   \Q    quote regexp metacharacters till \E
Here are some example patterns.
   /The/       # the sequence 'The'
   /The /      # the sequence 'The' followed by a blank
   /The\s/     # the sequence 'The' followed by a whitespace character
   /\sThe\s/   # the sequence 'The' preceded and followed by whitespace 
   /\sThe\s/   # the sequence 'The' preceded and followed by whitespace 
   /^The\s/ # the sequence 'The' that starts a line 
            # followed by whitespace 
   /[The]/  # any character in the set {T, h, e} 
   /Th.s/   # the sequence 'Th' followed by any character 
            # followed by 's'
In addition, each character or pattern in () can be modified with the following modifiers.
   *      Match 0 or more times
   +      Match 1 or more times
   ?      Match 1 or 0 times
   {n}    Match exactly n times
   {n,}   Match at least n times
   {n,m}  Match at least n but not more than m times
Here are some example patterns.
   /(The)?/    # either '' or 'The'
   /(The)*/    # either '' or 'The' or 'TheThe' or 'TheTheThe' etc.
   /(The)+/    # either 'The' or 'TheThe' or 'TheTheThe' etc.
   /(The){3}/  # only 'TheTheThe'
   /\d+\s/     # any number of digits followed by whitespace

String Matching

The string =~ re operation performs string matching, where string is the string to be matched, and re is the pattern or regular expression to match. The operation will be false if the regular expression does not match, but true otherwise.
  $string = "The rain in Spain stays\n mainly on the plain.\n";
  # spain will not be found since the pattern is case-sensitive
  if ($string =~ /spain/) { print 'spain found'; }
  # the i modifier makes the pattern case insensitive
  print 'spain found' if $string =~ /spain/i;
  # only the first line will be matched, so this will fail
  if ($string =~ /plain/) { print 'plain found'; }
  # use the m modifier to do multi-line matching
  if ($string =~ /plain/m) { print 'plain found'; }
When a string is matched, several variables are automatically updated.
  $` is the part of the string before the matched pattern
  $& is the part of the string that matched the pattern
  $' is the part of the string after the matched pattern
  $1 is the part of the string that matched the first group in ()
  $2 is the part of the string that matched the second group in ()
  $3 is the part of the string that matched the third group in ()
  etc.
Here are some examples.
  $string = "The rain in Spain stays\nmainly on the plain.\n";
  $string =~ /rain/;
  print "$`\n";      # prints 'The '
  print "$&\n";      # prints 'rain'
  print "$'\n";      # prints ' in Spain...';
  $string =~ /(\w+)\s+(\w+)\s+(\w+)/;
  print "$1\n";      # prints 'The'
  print "$2\n";      # prints 'in'
  print "$3\n";      # prints 'spain'
  print "$`\n";      # prints ''
  print "$&\n";      # prints 'The rain in'
  print "$'\n";      # prints ' Spain...';
In many cases, a user may wish to substitute some text directly into a string, or change some characters in a string into others.
  $string = "The rain in Spain stays\nmainly on the plain.\n";
  # translate a to f
  $string =~ tr/a/f/; # The rfin in Spfin stfys\nmfinly on the plfin.\n
  # translate all upper-case to lower-case
  $string =~ tr/A-Z/a-z/; 
                      # the rfin in spfin stfys\nmfinly on the plfin.\n
  # translate blanks and \n to to z
  $string =~ tr/ \n/zz/;  
                      # thezrfinzinzspfinzstfyszmfinlyzonzthezplfin.z

  $string = "The rain in Spain stays\nmainly on the plain.\n";
  # substitute first occurrence of rain with rhine
  $string =~ s/rain/rhine/; 
                    # The rhine in Spain stays\nmainly on the plain.\n
  # substitute first occurrence of whitespace with the empty string
  $string =~ s/\s+//;        
                    # Therhine in Spain stays\nmainly on the plain.\n
  # substitute all occurrences of whitespace with the empty string
  $string =~ s/\s+//g;       
                    # TherhineinSpainstaysmainlyontheplain.
There are two useful string matching functions. split splits a string into a list by breaking it up into pieces separated by a given pattern. join joins the list, inserting text into the joined places.
  # split the sentence up into a list of words
  @words = split(/\s/, 
                 "The rain in Spain stays\nmainly on the plain.\n");
  print join("\n", @words);
The $_ variable is the default string for string matching and for file input and output, so if a variable is omitted, $_ is used. This can often save a lot of typing.
  # this program will read from standard input and print all lines that
  # do not start with some whitespace followed by '#'
  while (<>) { print $_ unless /^\s+#/; }

Subroutines

Subroutines start with the word sub, and the body of the subroutine is contained within brackets. No parameters are declared.
  sub factorial {
    ...
    }
Subroutines cannot be nested. A subroutine is called typing its name and a parameter list, possibly the empty list. Within the body of the subroutine, the @_ variable contains the parameter list. Use the my function to declare local variables and initialise them from the passed parameter list.
  factorial(4);   # call the factorial function passing the list (4)
  power(4, 3);    # call the power function passing the list (4, 3)

  sub factorial {
    my ($x) = @_; # $x is local to the factorial subroutine
    ...           # and is initialised with the first element in the 
                  # passed parameter list
    }

  sub power {
    my ($x, $y) = @_;   
                  # $x and $y are local to the power subroutine
    ...           # and are initialised with the first and second
                  # elements respectively in the passed parameter list
    }
Return a value from a subroutine using the return statement.
  $x = factorial(4); # call the factorial function passing the list (4)

  sub factorial {
    my ($x) = @_;       
    ...                 
    return $result;  # return the result computed
    }

POD - Plain Old Documentation

Literate programming is the writing of a program primarily for human understanding rather than computer understanding (effectively programming as literature). perlpod is a tool to translate the documentation in your program into some pretty format, e.g., html, rtf, LaTeX, POD is embedded documentation. pod commands start with an =.
#-----------------------------------------------------------

=head2 factorial(int x)

=over 4

=item * 

x - Compute the factorial of x, that is x!

=back

Will compute the factorial of x, defined as
  x! = x * (x - 1) * ... 1
or 1 if x is 0.
Returns the value computed.

=cut

#-----------------------------------------------------------
sub factorial {
  ...
  }
To create the documentation, use pod2html or pod2man.
                                                                                                                                                                                                                                                                                                                                             
  (c) 2003 Curtis Dyreson, (c) 2004 Carl H. Hauser           E-mail questions or comments to Prof. Carl Hauser