logo
 
Names in Programming Languages
CptS 355 - Programming Language Design
Washington State University
Home
Notices
Calendar
Homework
Syllabus
Resources
People

Introduction

There are many names in a typical program. There are names for variables, types, procedures and functions, classes, constants, libraries, monitors, etc. Names are for humans readers of programs, good names help understanding, bad names hinder it. Can you guess what the following function does?
  int factorial(int x, int sum) {
    return x + sum * previous;
    }
How about if we use different names?
  int next_height(int height, int velocity) {
    return height + velocity * DT;
    }
Some languages impose restrictions on names.
  • C - A name must start with a letter
  • Fortran 77 - Names must be six characters or less
  • C89 - Names cannot be more than 31 characters
  • C - A name cannot have an embedded space. The following is not a name.
      foo bar 
    
    In Fortran the space would be removed yielding just foobar. In C it used to be common to add an underscore,
      foo_bar 
    
    but now it is more conventional to use capitalization.
      fooBar 
    
  • C++ - Case matters, so fooBar and FooBar would be different names.
  • C - Alphanumeric characters (and underscore) only in a name, e.g., foo,Bar is not one name. The reason for this restriction is that, in languages that are whitespace-free, the lexical analyzer would have a difficult time determining where a name ends.
  • C - Name cannot be a reserved word, so we couldn't have a variable named 'if'. A special case of a reserved word is a keyword. Keywords have context-sensitive meanings, and names can be a keyword when used in a different context. For example in Fortran we could define the following variable, named Integer to be of type Real.
      Real Integer;  
    

Variables

In imperative programming languages, each variable is has several characteristics.
  • Name - Name of the variable.
  • Address - Location in memory of the value of the variable. Sometimes referred to as the l-value (the value used in the left-hand side of an assignment).
  • Aliases - Use of pointers could create several names associated with the same address.
  • Type - The type specifies the interpretation of bits at the storage location and a set of possible operations permitted for that type. Types are described in more detail below.
  • Value - The sequence of bits at the location, also called the r-value.

Binding

Binding
  • Associates a name and a property (e.g., the Type).
  • Can occur at different times (i.e., what is the binding time?)
    • compile-time
    • load-time
    • link-time
    • run-time
For example, consider the bindings in the following fragment of a C program.
   int fooBar;
   fooBar++;
The Type, int, is bound to the name, fooBar, at compile-time. The compiler generates intermediate code to interpret fooBar's value as an integer (stored in two's complement, with a sign bit). The compiler also checks that only operations on integers are applied fooBar. The Value of fooBar is bound at run-time. The Address of fooBar might be bound at run-time, load-time, or link-time. Static binding
  • binding occurs before run-time
  • remains unchanged during execution
The Type binding in C is an example of a static binding. Dynamic binding
  • first happens at run-time
  • can change during execution
The following PostScript code dynamically binds a name.
  /x exch def
The name is bound at run-time to a Type, Address, and Value.

Type Bindings

In static type binding, a declaration is elaborated to produce a type binding. The declaration could be explicit or implicit.
  • Explicit - The following C example is an explicit type binding.
           int fooBar;
       
  • Implicit - Fortran implicitly defined variables that start with I-N to be an integer rather than a float (hence, in C code, loop variables are by convention usually i, j, or k).
           x = 3;     /* Implicit declaration of x as a float */
           i = x + 3; /* Implicit declaration of i as an integer */
       
    Perl has a different convention for implicit declarations.
           $x = 3;           # Declare a scalar variable, x 
           @x = (1, 3);      # Declare a list variable, x 
           %x = (1 => 3);    # Declare an associate array variable, x 
       
In dynamic type binding, the type binding can change during run-time. Another way to think of this is that the values have types, rather than the variables.
  • Languages that have dynamic type binding are sometimes referred to as "typeless" languages, but this is not very descriptive since they still do type checking. Perl has dynamic typing.
           $x = 3;           # Declare a scalar variable, x, 
                             # bind to an integer
           $x = "hello";     # Bind x to a string type!
       
  • With dynamic type binding, some type errors cannot be detected until run-time.
  • Convenient for user (don't have to specify types, can be lazy)
  • Potentially slower, usually must be interpreted rather than compiled, additional run-time checks for type errors

Type Inference

In type inference, the type is inferred
  • Type need not be declared
  • Infer type as compile-time
  • ML example, the type of the square function can be inferred from the type of the arguments that are passed when it is called. ML also has type declarations, so we could specify a type.
           
            fun square(x) = x * x        /* type returned is inferred 
                                             from function call */
            ...
            fun square2(x):real = x * x  /* type returned is real */
      
    Image if we did a similar thing in C.
       
            square (x) {
              return x * x;
            }
       /* At this point, we don't know the type for x or 
          for the square function */
            int y = 4;
            y = square(y);
       /* At this point, we can infer that x is of type int, 
          and square too */
            char c = 'c';
            c = square(c);   /* type error!! */
       

Meaning of a Type

A value is anything that may be evaluated, stored, incorporated in a data structure, passed as an argument, returned, etc. For example, what kinds of values exist in, say C?
  • primitive values (int, char, float etc.)
  • composite values (int array[30], structs, unions)
  • pointers/addresses (int *, &x, int *function())
So what is a type?
  • a set of values, e.g., the type integer might correspond to the set {..., -1, 0, 1, ....} NOTATION: v is a value of type T means that v is in the set of values described by T.
  • exhibit uniform behavior under some set of operations like addition, multiplication, comparison, e.g., the Type {13, orange, 'z'} would be an unusual type since it might be difficult to define operations such as a comparison of values (is orange less than 'z'?)
  • some examples
       boolean = {false, true}
       integer = {... -1, 0, 1, ...}
       real = {... -1.0, ..., 0.0, ..., 1.0, ...}
       character = {'\0', ..., 'a', ..., 'z', ... '\255'}
       enumerated = {jan, feb, ..., dec}
         such as from a Pascal type definition like 
         `type month = (jan,...dec);'
       subrange = {28, 29, 30, 31}
         such as from a Pascal definition like 
         `var day : integer[28..31];'
  • NOTATION: We will indicate the cardinality of a set S as #S. The cardinality is just the number of elements in a set.
       #boolean = 2
       #integer = MAXINT - MININT   (in Pascal)
       #character = 255
       #{jan, feb, ..., dec} = 12
       #{28, 29, 30, 31} = 4
There are some types supported at the `hardware' level, such as in C, byte - 8 bits (most computers are byte-addressable), long - 32 bits, and long longs - 64 bits. Some logical operations on these entities are the following. hardware operations on `uninterpreted' bits like logical-or. Some groups of bits, however, have `interpretations'. For example, signed integers are 32 bits long but 1 bit is the sign bit. If the sign bit is 'on' then the number is a negative number, if it is 'off', it is a positive number. Can represent a number in the range range -(2**31) to 2**31 - 1. Other encodings include floating point, packed decimal, 4 bit digits, and character codes such as
  • EBCIDIC - 7 bit characters, IBM generated and dying a slow death
  • ASCII - 8 bit characters, current international standard
  • Unicode - 16 bit characters, supports non-English characters
A composite type captures relationships between various primitive types. There are several common composite types available in a variety of languages.

Cartesian product

Let S = {a,b} and T = {c, d, e}. The Cartesian product of S and T, written S X T, is all possible combinations of pairs of elements from the sets. So, for instance
   S X T = {(a,c), (b,c), (a,d), (b,d), (a,e), (b,e)}.
NOTATION: We will indicate the cardinality of a set S as #S. The cardinality is just the number of elements in a set. Observe that
   #(S X T) = #S * #T
The Cartesian product type corresponds to the record type in Pascal or the struct type in C.
  /* C example */
  typedef struct {
    int x;
    int y;
    } point_type;      /* point_type = integer X integer */

  typedef struct {
    point_type origin;
    float radius;
    } circle_type;     /* circle_type = 
                          (integer X integer) X real */
 
There exists a useful special case of a Cartesian product, which is a type that is the set containing only an empty tuple, {()}, called the unit type. In the language ML, the unit type is indicated by the keyword unit, in C and Algol, it is void.

Disjoint union

Let S = {a,b} and T = {c, d, e}. The disjoint union of S and T, written S + T, is the union of the two sets, but each element is ``tagged'' with the set from which it came. So, for instance
   S + T = {S:a, S:b, T:c, T:d, T:e} where the tags are S: and T:
Observe that
   #(S + T) = #S + #T
Also note that S + T is not the same as T + S. The disjoint union type corresponds to the variant record type in Pascal or the union type in C.
  /* C example */
  typedef union {
    int exact;
    float approx;
    } number_type;      /* number_type = integer + real */

  typedef struct {
    number_type x;
    number_type y;
    } point_type;       /* point_type = (integer + real) 
                           X (integer + real) */
Note that some languages, like Pascal, have special syntax for querying the tag to distinguish which type of value is available, but other languages, such as C, do not.

Mappings (or functions)

Let S = {a,b} and T = {c, d, e}. The mapping from S to T, written S -> T, is all possible functions from S into T. In other words
   S --> T = { m | if x in S then m(x) in T }
So, for instance
   S --> T = { {a->c,b->c}, {a->c,b->d}, {a->c,b->e},
               {a->d,b->c}, {a->d,b->d}, {a->d,b->e},
               {a->e,b->c}, {a->e,b->d}, {a->e,b->e} }
Observe that
   #(S --> T) = #T ** #S  (#T raised to the #S power)
The mapping type corresponds to arrays. Let's look at a Pascal example.
  { Pascal example }
  type color_type = (red, green, blue);
       pixel_type = array[color_type] of 0..1;
The set of values for pixel_type are
   pixel_type = 
     color_type --> 0..1 = 
     { {red->0,green->0,blue->0},
       {red->0,green->0,blue->1},
       {red->0,green->1,blue->0},
       {red->0,green->1,blue->1},
       {red->1,green->0,blue->1},
       {red->1,green->0,blue->0},
       {red->1,green->1,blue->1},
       {red->1,green->1,blue->0} }
There is a close correspondence between arrays and programming language functions. Consider the following C function.
      int plus(int x, int y) { return x + y; }
We can think of as table
             INPUTS | OUTPUT
            x   y   |
       --------------------------
            0   0   |    0
            1   0   |    1
            0   1   |    1
            1   1   |    2
            1   2   |    3
               ....
so can rewrite function using an array as follows.
         /* initialize plus_array */

         int plus(int x, int y) { 
           static int plus_array[MAX_X][MAX_Y];
           static initialized = 0;

           /* Initialize array */
           if (!initialized) {
             int i, j;
             for (i = 0; i < MAX_X; i++)
               for (j = 0; j < MAX_Y; j++) plus_array[i][j] = i+j;
             initialized = 1;  /* make sure we 
                                  initialize it only once! */
             }
           
           return plus_array[x][y]; 
           }
Note however there is a difference in the time and space costs of the array vs. the function implementation. Also, functions may have other side effects, so the two are not the same.

Power set

Let S = {a,b}. The power set of S, written p(S), is the set of all the sets that can be formed with elements of S. So, for instance
   p(S) = { {}, {a}, {b}, {a,b} }
Observe that
   #p(S) = 2 ** #S
The power set type corresponds to the set of type in Pascal.
  /* Pascal example */
  type rgb_type = set of (red, green, blue);
                                   /* rgb_type = p({red,green,blue}) */
Sets are very useful, but difficult to implement. For implementation reasons, the Pascal set type is limited to sets with fewer than word-size (e.g., 32) elements.

Type Checking

Why must types be checked?
  • To ensure that nonsensical operations are prevented, type checks on operands may be performed.
  • Type errors are a common kind of programmer error.
      
       /* C example */
     
       void swap(int *x, int *y) {
          ...
         }
     
       int a, b;
       swap(a, b);   /* Type mismatch error! Will C 
                        compiler complain? */
    
When are types checked (static vs. dynamic)?
  • compile-time - static typing. Type of variable is lexically inferred. Variable can only take on values of the given type. Types must be explicitly cast. No run-time overhead.
  • run-time - dynamic typing (only values have types). Variables can hold values of any type. Very flexible, but type conversions checks must be done at run-time, so is slower.
      
       # Perl example
     
       $x = "Hello World\n";    # x has a value of string type
       $x = 3;                  # x has a value of integer type
       $z = "5";                # z has a value of string type
    
       $x = $z + $x;            # convert string type for z 
                                # into integer
     
How are types checked, that is what is type compatibility?
  • name equivalence - check to make sure types are of exactly the same name
       typedef int new_int;
     
       int z;
       new_int y;
       
       z = y;   /* not the same type name, so fails type check! */
  • structural equivalence - T = T if and only if T and T' have the same set of values. Intuitively, check to make sure variables have the same structure only
       typedef int new_int;
    
       typedef struct {
         int value;
       } simple_type;
    
       typedef struct {
         int value;
       } another_type;
    
       typedef struct {
         int value;
         char kind;
       } a_third_type;
     
       int z;
       new_int y;
       simple_type silly;
       another_type kinda_silly;
       a_third_type really_silly;
       
       z = y;  /* OK by structure, not by name */
       silly = kinda_silly;  /* OK by structure, not by name */
       silly = really_silly;  /* not the same structure */
  • coercion - finally, some types are coerced to ensure equivalence. The coercion could be implicit or explicit. C example.
       char c;
       int i = 23;
       char intToChar(int);
      
       c = i;             /* Type error, int assigned to char */
                          /* Wouldn't it be nice if type was 
                             automatically coerced? */
       c = intToChar(i);  /* Coerce type int to type char 
                             using a function */
    
Another issue is that not all type errors can be detected statically, even in languages that do static type checking! Consider the following example from C.
   typedef union {
      int i;
      char c;
   } a_type;

  a_type x, y;
  x.c = 'a';
  y.i = 2;
  y.i += x.c;   /* type error! */
Strong typing - all type errors are caught! Otherwise a language is said to have weak typing.
  • C++ and C - weak typing because of union data type
  • C#, Java - strong typing (but allows explicit casts)
  • Scheme, PostScript - strong typing, but binding is dynamic, no explicit declarations

Type Completeness

  • first-class values - a value that can be evaluated, assigned, passed as an argument, used as a component of composite values (integers, records?, arrays?, truth values in Pascal).
  • second-class values - a value that is restricted (e.g., arrays in C, function abstractions in Pascal).
Type Completeness Principle - No operations should be arbitrarily restricted in the types of values involved (i.e., all values should be first-class values). Pascal: A function cannot return a string, set, or record. This is a violation of type completeness principle.

Storage Binding

  • Allocation - process of binding storage to a name
  • Deallocation - process of breaking the binding between storage and a name
  • Lifetime - allocation to deallocation time, bind to unbind time

Storage Model

There is a storage model that supports all of the storage bindings.
  • visual depiction of run-time storage
  • Calls to malloc (C) or new (Java and C++) allocates space in the heap.
  • Static storage is for statically allocated variables
  • The stack is for local variables and parameters.
  • function call - push stack frame
  • function exit - pop stack frame
  • visual depiction of stack calls

Kinds of Storage Bindings

  • static
    • lifetime is entire run-time
    • allocated, bound when program starts
    • deallocated, unbound when program finishes
  • stack-dynamic
    • Elaboration or evaluation of declaration produces a binding
    • Allocation - when block is "entered"
    • Deallocation - when block is "exited"
    • lifetime - During block
    • local variables and parameters are stack-dynamic variables
  • explicit heap-dynamic
    • In C++, when we create a new object (by calling a constructor), an explicit heap-dynamic binding is done
             Person p = new Person(...);  
                           /* p is explicitly bound to storage
                              dynamically allocated in the heap */
         
    • Allocation - must be explicit, e.g., constructor call or malloc() in C
    • Deallocation - can implicit or explicit, e.g., free() in C
    • Lifetime - from allocation to deallocation
  • implicit heap-dynamic
    • In some language implementations, the run-time system manages the heap and implicitly allocates heap memory when needed. For example in Scheme
           (list 1 2 3 4) 
         
      the list '(1 2 3 4) is allocated implicitly in the heap, and deallocated implicitly as well!

Name Spaces and Scopes

The scope refers to the where a name is visible in a program.
  • scope - range of a name (usually a block)
  • a name can be visible or hidden, a name is visible if it can be referenced (inner name spaces can hide names in outer name spaces)
  • non-local names - the name is visible in the name space, but not declared in the name space
Name spaces
  • flat name space - in early computer languages such as COBOL, there is only one name space in a program so all names are global
  • modules, blocks, functions, or procedures create a "hierarchical" name space in which names spaces can be nested.
  • scope rules govern to which namespace a name belong
  • the lifetime and scope are not always the same (consider a local static variable in C).
  • C has a block-structured namespace
       +---------------------+----global namespace
       |int x;               |
       |                     |
       |main (...) {         |
       |  +----------------+------main namespace
       |  |int x;          | |
       |  |  {             | |
       |  |  +-----------+--------block inside main namespace
       |  |  |int x;     | | |
       |  |  |           | | |
       |  |  +-----------+ | |
       |  |  }             | |
       |  |  ...           | |
       |  +----------------+ |
       |  }                  |
       |                     |
       |foo  (...) {         |
       |  +----------------+------foo namespace
       |  |int x;          | |
       |  |  ...           | |
       |  +----------------+ |
       |  }                  |
       +---------------------+
  • referencing environment - Which names are visible at a given point in the code. The referencing environment in the body of foo consists of
       x in foo
       main
       foo
    
    The x in the global namespace is hidden by the x in foo.

Static vs. dynamic scoping

In static scoping, the scope can be determined at compile-time, C has static scoping. Does Scheme have static scoping?
   ; The following program determines if Scheme has static scoping
   ; for parameters

   (define (foo) (define y 1)); when foo is called, it will bind y

   (define (bar y) (begin     ; does y have static scope?
                      (foo)
                      y       ; bar will return the value of y
                   ))

   ; if local names are statically scoped, then 3 will be displayed,
   ; otherwise y is dynamically scoped so 1 will be displayed
   (display (bar 3))
Scheme has static scoping for function parameters, but dynamic scoping in the global name space! In dynamic scoping, the scope is determined at run-time. PostScript has dynamic scoping. Let's do in PostScript the same thing that we did above in Scheme.
   /foo {/y 1 def} def

   /bar {
        1 dict begin
           /y exch def
           foo
           y
        end
        } def

   3 bar
If 3 is left on the stack, the PostScript is statically scoped, but if 1 is left on the stack, then PostScript is dynamically scoped (the binding of /y in foo binds the /y in bar). PostScript has dynamic binding, so 1 will be left on the stack.

Named constants

A constant is a name that is bound once, and can be bound only once, during its lifetime.
  • C - done by the macro pre-processor
      #define SIZE 10  /* only compile-time expressions allowed */
      int x[SIZE]; 
      
  • C++ - reserved word const
      const int size = 10;   /* no assignments are allowed to size */
                             /* value can be run-time expression */
      int x[size];
      
  • Java - reserved word final
      static final int size = z;  /* cannot reassign size, value can
                                      be run-time expression */
      int[] x = new int[size];
      
  • Scheme - no constants, we can rebind (redefine) any name

Source of Information

These lecture notes are based on Chapter 5 in "Programming Languages, 6ed" by Robert Sebesta and Chapter 2 in "Programming Language Concepts and Paradigms" by David Watt.
                                                                                                                                                                                                                                                                                                                                             
  (c) 2003 Curtis Dyreson, (c) 2004 Carl H. Hauser           E-mail questions or comments to Prof. Carl Hauser