Home

Syllabus

Notes

Homework

Grades


awk


awk
        awk background
        awk basics
        Fields
        Printing
        Patterns
        Arithmetic and variables
        Control flow
        Arrays and Associative Arrays
        Putting it all together


Reading: The Unix Programming Environment, Chapter 4

awk background

  • awk was written by Al Aho, Peter Weinberger and Brian Kernighan
  • It's something of a mixture of sed and C
    • Better than them at some things, worse at others

awk basics

  • The basic awk program is of the form:
    • pattern { action }
      pattern { action }
      ...
  • The input is processed one line at a time
  • For each pattern/action pair, the action is taken if the current line matches the pattern
  • The pattern is optional, in which case the action is done for every line of input
  • The action is optional, in which case the action is to print matching lines
  • The basic command line is:
    • awk program [filenames]
      • The program can be in a file rather than on the command line
      • awk -f commandfile [filenames]
  • Note: on Linux, awk is called ‘gawk’ for GNU awk, but ‘man awk’ still finds you the info you need

Fields

  • awk automatically splits every line into fields
    • (A feature useful enough that awk is sometimes used for this reason alone)
    • The first field is called $1
    • The second field is called $2
    • etc.
    • NF is the number of fields on the current line
      • $NF is the last field on the current line
    • $0 is the entire line
    • On a BSD system, ps aux | awk '{print $1,$10}' prints who is running what commands
    • By default, the field separator is white space
      • the "-F" flag defines a different field separator
      • awk -F: '{ print $1 }' /etc/passwd prints the user names in the passwd file
      • Field separators defined this way aren't "special" like white space is
      • But with gawk, the field separator can be a regular expression, not just a single character!

Printing

  • The "print" instruction prints things
    • A comma "," can be used to add white space
  • If special formatting is needed, "printf" is available
    • Works like the C version

Patterns

  • Patterns can come in several forms
    • /pattern/
      • Matches pattern against the whole line
    • $2 ~ /pattern/
      • True if field 2 matches pattern
    • $2 !~ /pattern/
      • True if field 2 does not match pattern
    • $2 == "string"
      • True if field 2 is "string"
    • NF % 2 != 0
      • Just about any awk expression, in this case, true if the number of fields is odd
  • The book gives several examples
  • awk has two special patterns
    • BEGIN
      • The action associated with BEGIN is done before any lines are processed
      • Useful for initializing variables and/or printing header lines
        • Note that variables don't need to be initialized
    • END
      • The action associated with END is done after the list input line is processed
      • Useful for printing the results of the program, etc.

Arithmetic and variables

  • Arithmetic operations are much like in C
    • s = s + 1
    • s += 1
    • b = abc/13
  • An example from the book
    • wc $* |
      awk '!/total$/ { n += int(($1+55) / 56) }
              END { print n }'
  • Variables can also be strings
    • There is no data type
    • Type is determined by context

Control flow

  • awk has many standard control flow operators
    • if-then-else
    • for loops
    • while loops
  • An example from the book illustrates these things
    • awk '
      FILENAME != prevfile { # new file
              NR = 1
              prevfile = FILENAME
      }
      NF > 0 {
              if ($1 == lastword)
                      printf "double %s, file %s, line %d\n",$1,FILENAME,NR
              for (i = 2; i <= NF; i++)
                      if ($i == $(i-1))
                              printf "double %s, file %s, line %d\n",$i,FILENAME,NR
                      if (NF > 0)
              lastword = $NF
      }' $*

Arrays and Associative Arrays

  • awk provides arrays that work just like you'd expect them to, with numeric indices
    • awk ' { line[NR] = $0 }
      END { for ( i = NR; i > 0; i--) print line[i] } ' $*
  • The split() function can split a string into an array
    • For example, to split the password file (which has fields separated by colons (:) and put the result into an array called "a":
      • split($0,a,":")
  • Associative arrays allow you to use a string as the index for the array, rather than a number
    • To revisit the word count example, in awk, it's
      • awk ' { for (i = 1; i <= NF; i++) num[$i]++ }
        END { for (word in num) print word, num[word] }'

Putting it all together

  • The text also briefly mentions string handling and shell interaction
    • Strings are concatenated simply by using them together
  • The main thing to note about interaction with the shell is that you may need to get funky with your quoting
  • Finally, the text offers a "calendar" program that will remind you of upcoming events
    • The program expects a file in your home directory called "calendar" of the format
      • Feb 14 Valentine's Day
        Feb 15 The Day after Valentine's Day
        etc...
    • The program mails its output, but we'll do it on stdout
      • We also won't go through the "development of the program" stuff. We'll just look at the "final" version
 
     awk <$HOME/calendar '
     BEGIN {
         x = "Jan 31 Feb 28 Mar 31 Apr 30 May 31 Jun 31 " \
             "Jul 31 Aug 31 Sep 30 Oct 31 Nov 30 Dec 31 Jan 31"
         split(x,data)
         for (i = 1; i < 24; i += 2) {
              days[data[i]] = data[i+1]
              nextmon[data[i]] = data[i+2]
         }
         split("'"`date`"'",date)
         mon1 = date[2]; day1 = date[3]
         mon2 = mon1; day2 = day1 + 1
         if (day1 >= days[mon1]) {
             day2 = 1
             mon2 = nextmon[mon1]
         }
     }
     $1 == mon1 && $2 == day1 || $1 == mon2 && $2 == day2 '