image001

Home

Syllabus

Notes

Homework


Perl


Perl
        Data types
        Scalar Data
        Conversion between strings and numbers
        Numerical operators
        String operators
        Chop and chomp
        Print, printf, and sprintf
        Lists
        Arrays
        Control Structures
        Associative Arrays (Hashes)
        Basic I/O
        Regular Expressions
        Functions
        Misc. Control Structures
        Files


Data types

  • Perl has essentially three data types
    • Scalars
    • Arrays
    • Associative Arrays
  • (There are others, but we won't worry about them here.)

Scalar Data

  • A scalar is essentially anything that is a single item
  • Like with awk, whether it's a number or a string is determined by context
  • Numbers
    • Can be an integer or a floating-point number
    • You don't need to worry about how Perl stores the numbers, unless you base a decision on whether 10/3 is the same as 10*(1/3), because it isn't.
      1. Code to show this: ../perl/p0. Note: Perl will not truncate to integers. Caution: roundoff errors can confuse you ..
    • Numbers can be specified as literals (the number itself appears in the program) in any of the following formats:
 
        150
        -40        # the temperature where farenheit and celsius match
        2.13
        3.8e22     # 3.8 times 10 to the 22nd power
        5.8735e-43 # a very tiny number!
    • One warning: Don't start a number with a zero, like "041", because, like C, Perl interprets that to mean an octal number, which in this case would be 33.
    • "0x" indicates hexadecimal numbers. (0x21 is 33)
  • Strings
    • Any number of ASCII characters (up to the limits of your computer's memory).
    • "NUL" is not special, like it is in C.
    • You don't need to allocate memory for your string, like you do in C.
    • Literal strings can be "single-quoted strings", "double-quoted strings," or "here documents" (which we'll cover later).
    • Single-quoted and double-quoted strings work just like we've seen before
    • "Here documents" are a form of double-quoted string
    • Special characters in double-quoted strings
 
        \n      # Newline
        \t      # Tab
        \\      # Backslash
        \"      # Double quote
 
        \r      # Carriage Return
        \f      # Formfeed
        \b      # Backspace
        \a      # Bell
        \e      # Escape
        \0nn    # An octal value
        \xnn    # A hexadecimal value
        \cC     # A control character (control-C here)
        \l      # Make next letter lower-case
        \L      # Make everything lower-case until \E
        \u      # Make next letter upper-case
        \U      # Make everything upper-case until \E
        \Q      # Backslash-quote all nonalphanumeric characters until \E
        \E      # Terminate \L, \U, or \Q
  • Scalar variables
    • A scalar variable is specified by a dollar sign ($), followed by a letter, followed (possibly) by more letters, digits, or underscores.
    • Limit is 256 characters.
    • Case is significant. ($linelength is a different variable than $lineLength)
    • Before a variable has anything assigned to it (and in a few other instances) it has the value undef.
      • If used as a number undef is a 0; if used as a string, it is "". But it really is a distinct value.
  • Assignment
    • Like in C, assignment is indicated with an equal sign (=).
    • Ex: $total = $score1 + $score2 + $score3
    • The value of an assignment is the value assigned, so you can use an assignment in an expression
    • Examples: $b = 4 + ($a = 3) or (possibly more useful) $a = $b = $c = 3

Conversion between strings and numbers

  • Perl automatically converts between strings and numbers, as needed.
  • In string-to-number conversion, any numbers in the string will be used.
  • If no numbers are in the string, its numeric value is 0.
  • Numbers are converted to strings just as they would be in a "print" statement.
  • If you use the "-w" flag with Perl, (i.e. put "#!/usr/local/bin/perl -w" at the start of your script), Perl will warn you about "weird" conversions
  • By the way, it's a really good idea to use the "-w" flag, it can catch a lot of problems.

Numerical operators

  • Basic
    • + (Addition)
    • - (Subtraction)
    • * (Multiplication)
    • / (Division)
    • ** (Exponentiation. e.g. 2**3 == 8)
    • % (Modulus. e.g. 10%3 == 1)
  • Numerical comparison operators
    • == (equal)
    • != (Not equal)
    • <= (Less than or equal)
    • < (Less than)
    • >= (Greater than or equal)
    • > (Greater than)

String operators

  • Basic
    • . (Concatenation, e.g. "hello" . " " . "world" eq "hello world")
    • x (Repetition, e.g. "-" x 70 gives 70 dashes)
  • String comparison operators
    • eq (equal)
    • ne (Not equal)
    • le (Less than or equal)
    • lt (Less than)
    • ge (Greater than or equal)
    • gt (Greater than)
  • Note that the string vs. numerical comparison operators are the opposite of what awk uses
    • The way Perl does it makes more sense: String comparisons use "string" operators
  • Binary assignment operators
    • All of the ones like in C ( +=, -=, *=, /=, etc.) are there
    • .= adds to a string
  • Autoincrement and autodecrement
    • Work like in C

Chop and chomp

  • chop removes the last character of a string, and returns the chopped character
  • Example:
 
 $string = "testing 1 2 3";
 $character = chop $string; results in $string being "testing 1 2 " and $character being "3"
  • chomp removes the last character, if and only if it's a newline
  • Removing a trailing newline is a common need,. hence the existence of the chomp function.
  • Example: ../perl/p1

Print, printf, and sprintf

  • In short, the print function prints its argument(s)
  • Examples:
 
        print("The answer is $answer\n");
        print "The question was $question\n";
  • Note: in general, the parentheses are optional when using Perl's builtin functions.
  • printf works like in C
  • Sprintf is like printf
    • Returns the formatted string, rather than printing it
    • Useful for assigning formatted strings to variables

Lists

  • A list is an ordered collection of scalar data (used to assign to an array, which is covered next)
  • Represented in a program by several values, separated by commas and enclosed by parentheses
  • Strings and numbers can be mixed in a list
  • Example: (1, 2, "three", "four", 5)
  • The empty list is represented by a pair of empty parentheses ()
  • The list constructor operator can represent a sequence of numbers (1 .. 5) is the same as (1, 2, 3, 4, 5)
    • If the right value is less than the left value, the resulting list is the empty list
    • If the values are not whole numbers, the intervening values are still one greater than the starting value
    • If the "final" number is not an integer greater than the first value, the last "good" value is the value of the list
    • The list (1.2 .. 5.1) is really (1.2, 2.2, 3.2, 4.2)
  • If the list consists solely of strings, the "qw" (quote words) fuction can be used to simplify the representation
  • Example: ("eenie", "meenie", "minie", "moe") can be qw(eenie, meenie, minie, moe)

Arrays

  • An array is a variable that holds a list
  • Named like a scalar variable, but using @ instead of $
  • @something and $something are completely different variables
  • An array can be assigned a list, or another array
  • Example: @array = (1, 2, 3); @array2 = @array;
  • A list can contain an array. The array members are simply inserted into the list
    • Example: @array = (3, 4, 5); @array2 = (1, 2, @array, 6); results in @array2 being the list (1, 2, 3, 4, 5, 6)
    • If an array is used in a scalar context, the scalar value is the number of elements in the list
    • Example: @array = (3, 4, 5); $array = @array; results in $array being 3 (oops: another way to do this)
  • A list of variables can appear on the left-hand side of an assignment
    • Example: ($one,$two,$three) = (1, 2, 3);
  • Array elements
    • Array elements are accessed by a subscript in square brackets []
    • The index of the first element is 0
      • Example: If an array is @array = qw(one, two, three) then $array[0] is "one", $array[1] is "two" and $array[2] is "three"
      • This is the same as C, different than awk
    • Note that the array element starts with a $, since it's a scalar value
    • An array slice is more than one value from the same array, and since it's a list, the @ is used (Example: @array[0,1] is the first two elements of @array)
    • The index in an array can be a variable (useful in loops, for example)
    • If you access an element outside the bounds of the array, you get the undef value
      • Much nicer than C, which gives you a core dump if you do that
    • Assigning to an element outside the bounds of the array automatically extends the array (and assigns any intervening values to undef)
      • Again, much nicer than the core dump C would give you
    • The last index in an array is represented by $#arrayname
      • Example: @array = (1,2,3), $#array is 2
      • You can assign to $#arrayname to grow or shrink an array, but usually don't need to, since the array grows and shrinks automatically, as needed
    • A negative subscript counts from the end, so $array[-2] is the second to last element of @array
    • Example: ../perl/p2a
  • Push and pop
    • Arrays can be treated like stacks with the push and pop functions
    • push appends a scalar (or a list) to the end of an array
    • pop takes an element off the end of an array
    • Example
 
        @array = (1);
        push (@array, 2);
        push (@array, 3, 4, 5);
        $var = pop (@array);
    • push also happens to be a handy way to add values to an array, even if you aren't using the array as a stack
  • Shift and unshift
    • Like pop and push, but at the beginning rather than the end of the list
  • Reverse
    • Returns a list that contains the elements of its argument, in reverse order
    • Example
 
        @array = (1, 2, 3, 4);
        @revarray = reverse (@array);
  • Sort
    • Returns a list containing the elements of its argument, in sorted order
    • Default order is ASCII order, but you can specify your own order (we won't worry about this right now)
  • Chomp on an array
    • When used on an array, chomp chomps each element of the array

Control Structures

  • Statement blocks
    • A collection of statements, grouped by curly braces ({})
    • Can be used anywhere a single statement would be used
    • Semicolon on last statement is optional
  • If
    • Syntax: if ( expression ) block
    • If-else form: if ( expression ) block else block
      • Note that curly braces are always required on block, unlike C and Java which make it optional if block is only one line
    • If the expression is true, evaluate block
    • If the expression is false, and if there is an else statement, evaluate the else block
    • What is truth?
      • In essence, a value is true if, when evaluated as a string, it is neither the empty string nor "0"
      • The number 0 is false
      • The string "0" is false
      • The empty string ("") is false
      • The value undef is false (because it becomes the empty string)
      • Everything else is true
  • Unless
    • Sometimes, you really only want to do something if the test is false
    • You can negate the test
    • You can use unless in place of if
    • Unless can also have an else clause
  • Elsif
    • If you have multiple choices, you can use elsif
    • Syntax: if (statement) block elsif (statement) block elsif (statement) block else block
    • Note that it has an "s" in Perl, unlike in shell
  • while/until
    • Process a loop as long as a condition is true (or until a condition is true)
    • Syntax: while (statement) block, until (statement) block
    • Block is not evaluated if the condition is false/true the first time through
  • do {} while/until
    • Like while/until except the test is at the end of the loop, rather than the beginning
    • The block will always be executed at least once
  • for
    • Like the for statement in C and Java
    • Example:
 
        for ($i = 1; $i <= 61; $i++ ) {
                print "McGwire has $i home runs, no record yet\n";
        }
  • foreach
    • Like the shell's "for" statement
    • Iterates over the values of a list
    • Assigns to the named variable on each iteration
    • Example:
 
        @players = qw (McGwire, Sosa, Bonds);
        foreach $player (@players) {
                print "$player is a good home run hitter.\n";
        }
    • One note: Unless the list came from a function that returns a list, changing the value of the variable changes the value in the list/array
    • Could be both useful and dangerous.
    • Example:
 
        @a = (2, 3, 4, 5);
        foreach $num (@a) {
                $num **= 2;
        }
  • The $_ variable
    • A special variable that you'll see in many places in Perl.
    • Some things assign to $_ if you don't specify anything else
    • Many functions operate on the $_ variable if you don't specify anything else
    • Foreach uses $_ if you don't specify the variable
    • Example:
 
        foreach (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) {
                print;
        }

Associative Arrays (Hashes)

  • Hashes
    • Also called "associative arrays," though hash has become the more popular term (no doubt because it's shorter).
    • Unlike arrays, hashes have no particular order.
    • Represented as variables with a %, as in %some_hash.
    • The values of a hash are key/value pairs, referenced with curly braces
    • The key is automatically quoted
    • Example:
 
        $salary{Joe} = 40000;
        $salary{Sherry} = 60000;
        $salary{Sam} = 20000;
    • A hash has no literal representation. It is represented as an array, with the first value as a key, then a value, then a key, then a value, etc.
    • Example:
 
        %salary = ("Joe", 40000, "Sherry", 60000, "Sam", 20000);
        if ( $salary{Sam} <= $salary{Joe} ) {print "Sam is underpaid!\n";}
    • The token "=>" is just a synonym for ",", but makes hash declarations look much better
    • Example:
 
        %salary = (
                Joe    => 40000,
                Sherry => 60000,
                Sam    => 20000,
        );
    • Note that "=>" causes the item to its left to be quoted, so you don't need the quotes
  • keys function
    • Returns the keys of a hash, as a list
    • Returns an empty list if the hash is empty
    • Useful for iterating through the hash
    • Example:
 
        foreach $employee (keys (%salary)) {
                print "$employee makes $salary{$employee}.\n";
        }
  • values function
    • Like keys, but returns the values of the hash
  • each function
    • Returns a two-element list, containing a key/value pair from a hash
    • On each successive call, returns another key/value pair
    • Returns an empty list (hence false) when there are no more key/value pairs
    • Example:
 
        while ( ($employee, $salary) = each (%salary) ) {
                print "$employee makes $salary.\n";
        }
  • delete function
    • Removes a value from a hash
    • Example: delete $salary{Joe}; #Joe quit
  • Hash slices
    • A shorthand way to specify part of a hash
    • Example:
 
        @salary{Joe,Sherry,Sam} = (40000, 60000, 20000);

Basic I/O

  • A very simple example of processing I/O:
 
  while (<>) {
    print;
  }
    • The empty angle brackets are called the diamond operator.
      • It will process STDIN if there were no command line arguments.
      • If there were command line arguments, it will work on each specified file, in succession.
      • The command line arguments are in array @ARGV, and can be processed and/or added to before using the diamond operator.

Regular Expressions

  • Perl has support for regular expressions, with some extensions to what we've seen before
  • In Perl, regular expressions are usually enclosed in slashes
  • Default match is against $_
  • Example:
 
  while (<>){
    if (/foo/) {
      print "\$_ contains the string 'foo'.\n";
    }
  }
  • Another common usage is to replace one thing with another
    • Looks like sed
  • Example:
 
  while (<>) {
    s/foo/bar/;
    print;
  }
  • Parentheses
    • Parentheses can be put around part of a regular expression, and what matched within the partentheses is remembered for later use
    • The parentheses don't change how the regular expression matches
    • The matches can be used later in the regular expression as \1 for the first, \2 for the second, etc.
    • Afterwards they can also be used as $1 for first, $2 for second, etc.
    • Example: /f(.*)o(.*)o\2d\1/ matches "fxoyoydx" or "f--o__o__d--"
    •  
  • Greediness
    • Like other examples we've seen, Perl's regular expressions are greedy.
    • Examples:
 
  $_ = "fooooooooooooooood";
  s/o+/oo/;
Example: perl/p3a
 
$_ = “f xx o xxx o xxxx d”;
/f(.*)o(.*)d/;
Example: perl/p3b
 

  $_ = “f xx or xxx ol xxxx d”;

/f(.*)or(.*)d/;

Example: perl/p3c
 

 

$_ = “f xx or d xxx ol xxxx d”;

/f(.*)or(.*)d/;

Example: perl/p3d
    • The patterns can be made non-greedy by following them with a question mark
 
  $_ = "f xx o xxx o xxxx d";
  /f.*?o.*d/;
Example: left for the student
  • Alternation
    • Provide several alternatives to match, separated by the vertical bar (|)
    • Example: /ford|chevy/ matches either "ford" or "chevy"
  • Anchoring
    • \b matches at a word boundary (the space between a word character [a-zA-Z0-9_] and a non-word character
    • Example: /\bford/ matches "ford" but not "afford"
    • \B matches where there isn't a word boundary
    • Example: /\Bford\b/ matches "afford" but not "ford"
    • ^ (caret) and $ (dollar sign) work like we've seen before
      • $ could be confused with indicating a variable, but if it's at the end of a regular expression string, it will be interpreted as "end of line"
  • Precedence
    • Regular expressions have rules of precedence as well
    • Does /a|b*/ mean a|(b*) or (a|b)*?
    • From highest to lowest, the precedence is:
 
  ( ) (?: )
  ? + * {m,n} ?? +? *? {m,n}?
  abc ^ $ \A \Z (?= ) (?! )
  |
    • So, by these rules, /a|b*/ means /a|(b*)/
    • You can use parentheses to enforce your desired interpretation
    • But, parentheses "count" in the memory for \1, \2, etc.
    • You can use (?: ) to mean "group this stuff, but don't count it as a pattern
  • The =~ operator
    • Match on a string other than $_
    • Example $truck =~ /ford/; checks the variable $truck for the string "ford"
  • Ignoring case
    • Suppose you want to match "ford", "Ford", "FORD", or any other combination of case
    • You could, based on what's been covered, use /[Ff][Oo][Rr][Dd]/
      • What if you want to match a longer string, like "These are the times that try men's souls"?
    • There is a convenient shorthand /Ford/i
  • Different delimiters
    • If you're trying to match strings with "/" in them, the necessary backslash escapes can make things ugly really fast
    • To check for the "#!" line in a perl script, you might use /^#!\/usr\/local\/bin\/perl/
    • Or, you can use a different delimiter, by starting it with an "m" and using a pair of punctuation characters, like m@^/usr/local/bin/perl@
    • You can also use matching delimiters, like m[^/usr/local/bin/perl]
  • Variable interpolation
    • Variables are expanded inside of a regular expression match
    • Example: $match = "[Rr]ead"; $action =~ /$match/;
  • Special variables
    • After a match, the parenthesized matches are set to $1, $2, $3, etc.
    • $` contains what was before the match
    • $& contains what was matched
    • $' contains what was after the match
    • Example:
 
  $trucks = "ford chevy dodge toyota";
  $trucks =~ /(\w+)\W+(\w+)/;
  $first_truck = $1;  # "ford"
  $second_truck = $2; # "chevy"
  $trucks =~ /ch.*?y/;
  $before = $`; # "ford "
  $match = $&;  # "chevy"
  $after = $';  # " dodge toyota"
  • Substitutions
    • Form is s/regex/string/ (replaces regex with string)
    • Delimiter can be any characters, like with m//
    • Add "g" to the end to match all occurrences (rather than just the first)
    • Use $1, $2, etc. to use matches in the string
    • Example:
 
  $_ = "I have to go now";
  s/(\w+)/\U$1\E/g;
  • Split and join
    • Split breaks up a string, based on a regular expression. Returns a list.
    • Join joins item in a list, with a string separating each item. Returns a scalar.
    • Example:
 
  $passwd = "geoff:*:101:5:Geoff Allen:/users/geoff:/usr/local/bin/bash";
  @fields = split(/:/,$passwd);
  $new_line = join(":", @fields);

Functions

  • Defining a function
    • Functions (and subroutines -- there's no difference in Perl) are defined with the "sub" command.
    • Format is:
 
  sub funcname {
   do_something;
   do_something_else;
   etc;
  }
  • Calling a function
    • You call your functions by using the function name, followed by parentheses
    • For example, if you had defined a function called "hit_homerun", you'd call it with the statement hit_homerun();
  • Return values
    • A function returns the value of the last expression, or the value of the return function
    • A function's return value is used as its value in the expression in which it is called
    • Example:
 
  sub three {
   3;
  }
  print 4 + three();
  • Arguments
    • Arguments are passed to functions in the @_ array
    • They can be accessed one-by-one ($_[0], $_[1], etc.)
    • They can also be assigned all at once ( ($arg1, $arg2) = @_; )
    • Basically, you can do anything you want with the values
    • Warning! If you use a variable as an argument, and you modify things in @_ directly, you will modify the variable
  • Private variables
    • Perl provides the capability to make variables local to a function
    • This is done with the "my" operator (which takes a list of variables)
    • Example: my ($some_var, @some_array, %some_hash);
    • Another, slightly less private, variable operator is "local"
    • The difference is that "my" variables are seen only by the function/block; "local" variables are seen by the function/block and all functions called within that function/block
  • use strict;
    • There is a commonly-used "pragma" (compiler directive) for Perl that is quite useful
    • If you place the statement use strict; in your script (usually first or very early in the script), Perl will be a lot pickier about things
    • All variables in the script must be given a scope with my
    • use strict; will help prevent a lot of problems. Use it.

Misc. Control Structures

  • Last
    • Sometimes you want to be done with a loop before the loop is scheduled to be done
    • In C, you can use the "break" statement
    • In Perl, it's called last
    • Breaks out of a for, foreach, while, or until loop, not other blocks
    • Program continues after the end of the loop block
  • Next
    • Sometimes, you don't want to quit the loop, you just want to quit this iteration
    • next is how you do this
    • Example:
 
  while (<>) {
    if (/^$/) {
      next;
    }
    if (/foo/) {
      chomp;
      print "I found 'foo' on the line $_!"\n";
    }
  }
  • Redo
    • If redo appears in the loop block, it will cause that iteration of the loop to start over
  • Labeled Blocks
    • You can put a label at the start of a block, and use that to explicitly identify which loop you mean with the last, next, or redo statement
    • Example:
 
  OUTER: for ($i = 1; $i <=10; $i++) {
    INNER: for ($j = 1; $j <= 10; $j++) {
      if ($i * $j == 63) {
        print "$i times $j is 63!\n";
        last OUTER;
      }
      if ($j >= $i) {
        next OUTER;
      }
    }
  }
  • Expression Modifiers
    • A nice, short way to write simple conditionals
    • Examples:
 
  next if (/^$/);
  last if ( ($i * $j) == 63);
  $i = 0; $i++ while ($i <= 10);
  • && and ||
    • && (and) and || (or) can function as control structures as well
    • Because Perl stops if it knows the "answer" to an and or or statement
    • Example: 0 && $i++ -- $i will never get incremented, because Perl knows the and is false as soon as 0 is evaluated
    • The following are all equivalent:
 
  if (condition) { statement; }
  statement if (condition);
  condition && statement;
    • Likewise, the following are equivalent:
 
  unless (condition) { statement; }
  condition || statement;

Files

  • Filehandles
    • A filehandle is Perl's way of specifying a file to read from or write to
    • STDIN, STDOUT, and STDERR are filehandles that are provided for you
    • Filehandles have their own namespace
    • Traditional perl coding conventions say to use all UPPERCASE letters for the name of your filehandle
  • Open
    • The open call has several forms
 
  open(FILEHANDLE, "/tmp/somefile");
  open(FILEHANDLE, ">/tmp/somefile");
  open(FILEHANDLE, ">>/tmp/somefile");
  open(FILEHANDLE, "| somecommand");
  open(FILEHANDLE, "somecommand |"
  • Close
    • When you close a file, you flush any optout pending for a write
    • Files are automatically closed when the program exits, but it doesn't hurt to close them yourself
    • Syntax is simply close (FILEHANDLE);
  • Die
    • die will quit your program, with an error message
    • Useful (and often seen) with open statement
    • Example:
 
  open (PASSWD, "/etc/passwd") ||
    die "Couldn't open the passwd file!\n";
    • If your message ends in "\n", die prints the message
    • If your message doesn't end in "\n", die prints the line number, filename, and your message
    • The varialbe $! contains the error from the operating system
  • Warn
    • die's little brother
    • prints the message, but doesn't abort the program
  • Using Filehandles
    • So, you've got your file open, what do you do with it?
    • If reading, you can do something like:
 
 open (PASSWD, "/etc/passwd" ) ||
   die "Couldn't open passwd: $!";
 while (<PASSWD>) {
   print;
 }
    • If writing or appending, you can just add the filehandle to the print statement:
 
  print SOMEFILE "This goes in the file!\n";
  • File tests
    • There are a whole bunch of file tests available in Perl, many of which are copied from the test command used in shell programming
  • Advanced sorting
    • We've already looked at the sort function, and learned that it sorts in ASCII order
    • Now you get "the rest of the story."
    • You can provide a subroutine defining how to compare two of the things being sorted
    • The two things being compared are given as $a and $b
    • The routine should return a negative value if $a comes first, 0 if they're equal, or a positive number if $b comes first
    • Example:
 
  sub by_record {
    return -1 if ( $record{$a} <  $record{$b} );
    return  0 if ( $record{$a} == $record{$b} );
    return  1 if ( $record{$a} >  $record{$b} );
  }
  @al_west = sort by_record ("Seattle", "California", "Texas", "Oakland");
    • This type of comparison is common enough, that it has a special operator:
 
  sub by_record {
    $record{$a} <=> $record{$b};
  }
    • The equivalent operator for strings is cmp
    • Finally, the comparison routine can be put right inline:
 
  sort { $a <=> $b } (3, 1, 7, 2.813, 17.5, 4, 4.22);
  • Transliteration
    • The tr operator replaces characters from the first string with characters from the second string
      • i.e., just like the tr command
    • By default, works on $_, can work on something else with =~
    • Example:
 
  tr/a/A/;