|
|
|
Introduction
There are many names in a typical program.
There are names for variables, types, procedures and functions, classes,
constants, libraries, monitors, etc.
Names are for humans readers of programs, good names help understanding,
bad names hinder it.
Can you guess what the following function does?
int factorial(int x, int sum) {
return x + sum * previous;
}
How about if we use different names?
int next_height(int height, int velocity) {
return height + velocity * DT;
}
Some languages impose restrictions on names.
-
C - A name must start with a letter
-
Fortran 77 - Names must be six characters or less
-
C89 - Names cannot be more than 31 characters
-
C - A name cannot have an embedded space. The following
is not a name.
foo bar
In Fortran the space would be removed yielding just foobar.
In C it used to be common to add an underscore,
foo_bar
but now it is more conventional to use capitalization.
fooBar
-
C++ - Case matters, so fooBar and FooBar would be different names.
-
C - Alphanumeric characters (and underscore) only in a
name, e.g., foo,Bar is not one name. The reason for this restriction
is that, in languages that are whitespace-free, the lexical analyzer
would have a difficult time determining where a name ends.
-
C - Name cannot be a reserved word, so we couldn't have a
variable named 'if'. A special case of
a reserved word is a keyword. Keywords have context-sensitive
meanings, and names can be a keyword when used in a different
context. For example in Fortran we could define the following variable,
named
Integer to be of type Real.
Real Integer;
Variables
In imperative programming languages, each variable is has several
characteristics.
-
Name - Name of the variable.
-
Address - Location in memory of the value of the variable. Sometimes
referred to as the l-value (the value used in the left-hand side
of an assignment).
-
Aliases - Use of pointers could create several names associated with
the same address.
-
Type - The type specifies the interpretation of bits at the storage
location and a set of possible operations permitted for that type.
Types are described in more detail below.
-
Value - The sequence of bits at the location, also called the r-value.
Binding
Binding
-
Associates a name and a property (e.g., the Type).
-
Can occur at different times (i.e., what is the binding time?)
-
compile-time
-
load-time
-
link-time
-
run-time
For example, consider the bindings in the following fragment of a
C program.
int fooBar;
fooBar++;
The Type, int,
is bound to the name, fooBar, at compile-time.
The compiler generates intermediate code to interpret fooBar's value
as an integer (stored in two's complement, with a sign bit).
The compiler also checks that only operations on integers are applied
fooBar.
The Value of fooBar is bound at run-time.
The Address of fooBar might be bound at run-time,
load-time, or link-time.
Static binding
-
binding occurs before run-time
-
remains unchanged during execution
The Type binding in C is an example of a static binding.
Dynamic binding
-
first happens at run-time
-
can change during execution
The following PostScript code dynamically binds a name.
/x exch def
The name is bound at run-time to a Type, Address, and Value.
Type Bindings
In static type binding,
a declaration is elaborated to produce a
type binding. The declaration could be explicit or implicit.
-
Explicit - The following C example is an explicit type binding.
int fooBar;
-
Implicit - Fortran implicitly defined variables that start with I-N to
be an integer rather than a float (hence, in C code, loop variables are
by convention usually i, j, or k).
x = 3; /* Implicit declaration of x as a float */
i = x + 3; /* Implicit declaration of i as an integer */
Perl has a different convention for implicit declarations.
$x = 3; # Declare a scalar variable, x
@x = (1, 3); # Declare a list variable, x
%x = (1 => 3); # Declare an associate array variable, x
In dynamic type binding, the type binding can change during
run-time. Another way to think of this is that the values have types,
rather than the variables.
Type Inference
In type inference, the type is inferred
-
Type need not be declared
-
Infer type as compile-time
-
ML example, the type of the square function can be inferred from the type of
the arguments that are passed when it is called. ML also has type
declarations, so we could specify a type.
fun square(x) = x * x /* type returned is inferred
from function call */
...
fun square2(x):real = x * x /* type returned is real */
Image if we did a similar thing in C.
square (x) {
return x * x;
}
/* At this point, we don't know the type for x or
for the square function */
int y = 4;
y = square(y);
/* At this point, we can infer that x is of type int,
and square too */
char c = 'c';
c = square(c); /* type error!! */
Meaning of a Type
A value is anything that may be evaluated, stored, incorporated
in a data structure, passed as an argument, returned, etc.
For example, what kinds of values exist in, say C?
- primitive values (
int, char, float etc.)
- composite values (
int array[30], structs, unions)
- pointers/addresses (
int *, &x, int *function())
So what is a type?
- a set of values, e.g., the type integer might correspond to
the set {..., -1, 0, 1, ....}
NOTATION: v is a value of type T means
that v is in the set of values described by T.
- exhibit uniform behavior under some set of operations like
addition, multiplication, comparison, e.g.,
the Type {13, orange, 'z'} would be an unusual type since it might
be difficult to define operations such as a comparison of values
(is orange less than 'z'?)
- some examples
boolean = {false, true}
integer = {... -1, 0, 1, ...}
real = {... -1.0, ..., 0.0, ..., 1.0, ...}
character = {'\0', ..., 'a', ..., 'z', ... '\255'}
enumerated = {jan, feb, ..., dec}
such as from a Pascal type definition like
`type month = (jan,...dec);'
subrange = {28, 29, 30, 31}
such as from a Pascal definition like
`var day : integer[28..31];'
- NOTATION: We will indicate the cardinality of a set S as #S. The
cardinality is just the number of elements in a set.
#boolean = 2
#integer = MAXINT - MININT (in Pascal)
#character = 255
#{jan, feb, ..., dec} = 12
#{28, 29, 30, 31} = 4
There are some types supported at the `hardware' level, such as in C,
byte - 8 bits (most computers are byte-addressable), long - 32 bits,
and long longs - 64 bits.
Some logical operations on these entities are the following.
hardware operations on `uninterpreted' bits like logical-or.
Some groups of bits, however, have `interpretations'. For example,
signed integers are 32 bits long but 1 bit is the sign bit.
If the sign bit is 'on' then the number is a negative number, if it
is 'off', it is a positive number. Can represent a number in the range
range -(2**31) to 2**31 - 1.
Other encodings include
floating point, packed decimal, 4 bit digits, and character codes such as
- EBCIDIC - 7 bit characters, IBM generated and dying a slow death
- ASCII - 8 bit characters, current international standard
- Unicode - 16 bit characters, supports non-English characters
A composite type captures relationships between various primitive types.
There are several common composite types available in a variety of
languages.
Cartesian product
Let S = {a,b} and T = {c, d, e}. The Cartesian product of
S and T, written S X T, is all possible combinations of pairs of
elements from the sets.
So, for instance
S X T = {(a,c), (b,c), (a,d), (b,d), (a,e), (b,e)}.
NOTATION: We will indicate the cardinality of a set S as #S.
The cardinality is just the number of elements in a set.
Observe that
#(S X T) = #S * #T
The Cartesian product type corresponds to the record type in Pascal
or the struct type in C.
/* C example */
typedef struct {
int x;
int y;
} point_type; /* point_type = integer X integer */
typedef struct {
point_type origin;
float radius;
} circle_type; /* circle_type =
(integer X integer) X real */
There exists a useful special case of a Cartesian product, which is a type
that is the set containing only an empty tuple, {()}, called the unit
type.
In the language ML, the unit type is indicated
by the keyword unit, in C and Algol, it is void.
Disjoint union
Let S = {a,b} and T = {c, d, e}. The disjoint union of
S and T, written S + T, is the union of the two sets, but each element is
``tagged'' with the set from which it came.
So, for instance
S + T = {S:a, S:b, T:c, T:d, T:e} where the tags are S: and T:
Observe that
#(S + T) = #S + #T
Also note that S + T is not the same as T + S.
The disjoint union type corresponds to the variant
record type in Pascal or the union type in C.
/* C example */
typedef union {
int exact;
float approx;
} number_type; /* number_type = integer + real */
typedef struct {
number_type x;
number_type y;
} point_type; /* point_type = (integer + real)
X (integer + real) */
Note that some languages, like Pascal, have special syntax for querying the
tag to distinguish which type of value is available, but other languages,
such as C, do not.
Mappings (or functions)
Let S = {a,b} and T = {c, d, e}. The mapping from
S to T, written S -> T, is all possible functions from S into T.
In other words
S --> T = { m | if x in S then m(x) in T }
So, for instance
S --> T = { {a->c,b->c}, {a->c,b->d}, {a->c,b->e},
{a->d,b->c}, {a->d,b->d}, {a->d,b->e},
{a->e,b->c}, {a->e,b->d}, {a->e,b->e} }
Observe that
#(S --> T) = #T ** #S (#T raised to the #S power)
The mapping type corresponds to arrays. Let's look at a Pascal example.
{ Pascal example }
type color_type = (red, green, blue);
pixel_type = array[color_type] of 0..1;
The set of values for pixel_type are
pixel_type =
color_type --> 0..1 =
{ {red->0,green->0,blue->0},
{red->0,green->0,blue->1},
{red->0,green->1,blue->0},
{red->0,green->1,blue->1},
{red->1,green->0,blue->1},
{red->1,green->0,blue->0},
{red->1,green->1,blue->1},
{red->1,green->1,blue->0} }
There is a close correspondence between arrays and programming language
functions. Consider the following C function.
int plus(int x, int y) { return x + y; }
We can think of as table
INPUTS | OUTPUT
x y |
--------------------------
0 0 | 0
1 0 | 1
0 1 | 1
1 1 | 2
1 2 | 3
....
so can rewrite function using an array as follows.
/* initialize plus_array */
int plus(int x, int y) {
static int plus_array[MAX_X][MAX_Y];
static initialized = 0;
/* Initialize array */
if (!initialized) {
int i, j;
for (i = 0; i < MAX_X; i++)
for (j = 0; j < MAX_Y; j++) plus_array[i][j] = i+j;
initialized = 1; /* make sure we
initialize it only once! */
}
return plus_array[x][y];
}
Note however there is a difference in the time and space costs of the
array vs. the function implementation. Also, functions may have other
side effects, so the two are not the same.
Power set
Let S = {a,b}. The power set of
S, written p(S), is the set of all the sets that can be formed with elements
of S. So, for instance
p(S) = { {}, {a}, {b}, {a,b} }
Observe that
#p(S) = 2 ** #S
The power set type corresponds to the set of
type in Pascal.
/* Pascal example */
type rgb_type = set of (red, green, blue);
/* rgb_type = p({red,green,blue}) */
Sets are very useful, but difficult to implement. For implementation
reasons, the Pascal set type is limited to sets with fewer than
word-size (e.g., 32) elements.
Type Checking
Why must types be checked?
- To ensure that nonsensical operations are prevented, type checks
on operands may be performed.
- Type errors are a common kind of programmer error.
/* C example */
void swap(int *x, int *y) {
...
}
int a, b;
swap(a, b); /* Type mismatch error! Will C
compiler complain? */
When are types checked (static vs. dynamic)?
How are types checked, that is what is type compatibility?
- name equivalence - check to make sure types are of exactly the same
name
typedef int new_int;
int z;
new_int y;
z = y; /* not the same type name, so fails type check! */
- structural equivalence - T = T if and only if T
and T' have the same set of values.
Intuitively, check to make sure variables have the
same structure only
typedef int new_int;
typedef struct {
int value;
} simple_type;
typedef struct {
int value;
} another_type;
typedef struct {
int value;
char kind;
} a_third_type;
int z;
new_int y;
simple_type silly;
another_type kinda_silly;
a_third_type really_silly;
z = y; /* OK by structure, not by name */
silly = kinda_silly; /* OK by structure, not by name */
silly = really_silly; /* not the same structure */
- coercion - finally, some types are coerced
to ensure equivalence. The coercion could be implicit or explicit.
C example.
char c;
int i = 23;
char intToChar(int);
c = i; /* Type error, int assigned to char */
/* Wouldn't it be nice if type was
automatically coerced? */
c = intToChar(i); /* Coerce type int to type char
using a function */
Another issue is that not all type errors can be detected statically,
even in languages that do static type checking! Consider the
following example from C.
typedef union {
int i;
char c;
} a_type;
a_type x, y;
x.c = 'a';
y.i = 2;
y.i += x.c; /* type error! */
Strong typing - all type errors are caught! Otherwise a language
is said to have weak typing.
-
C++ and C - weak typing because of union data type
-
C#, Java - strong typing (but allows explicit casts)
-
Scheme, PostScript - strong typing, but binding is dynamic,
no explicit declarations
Type Completeness
- first-class values - a value that can be evaluated, assigned,
passed as an argument, used as a component of composite values
(integers, records?, arrays?, truth values in Pascal).
- second-class values - a value that is restricted (e.g.,
arrays in C, function abstractions in Pascal).
Type Completeness Principle - No operations should be arbitrarily
restricted in the types of values involved (i.e., all values should
be first-class values).
Pascal: A function cannot return a string, set, or record.
This is a violation of type completeness principle.
Storage Binding
-
Allocation - process of binding storage to a name
-
Deallocation - process of breaking the binding between storage and a name
-
Lifetime - allocation to deallocation time, bind to unbind time
Storage Model
There is a storage model that supports all of the storage bindings.
- visual depiction of run-time storage
- Calls to
malloc (C) or new (Java
and C++) allocates space in the heap.
- Static storage is for statically allocated variables
- The stack is for local variables and parameters.
- function call - push stack frame
- function exit - pop stack frame
- visual depiction of stack calls
Kinds of Storage Bindings
-
static
-
lifetime is entire run-time
-
allocated, bound when program starts
-
deallocated, unbound when program finishes
-
stack-dynamic
-
Elaboration or evaluation of declaration produces a binding
-
Allocation - when block is "entered"
-
Deallocation - when block is "exited"
-
lifetime - During block
-
local variables and parameters are stack-dynamic variables
-
explicit heap-dynamic
-
implicit heap-dynamic
-
In some language implementations, the run-time system manages the heap
and implicitly allocates heap memory when needed.
For example in Scheme
(list 1 2 3 4)
the list '(1 2 3 4) is allocated implicitly in the heap,
and deallocated implicitly as well!
Name Spaces and Scopes
The scope refers to the where a name is visible in a program.
- scope - range of a name (usually a block)
- a name can be visible or hidden, a name is visible if it can be referenced
(inner name spaces can hide names in outer name spaces)
- non-local names - the name is visible in the name space, but not
declared in the name space
Name spaces
Static vs. dynamic scoping
In static scoping, the scope can be determined at compile-time,
C has static scoping. Does Scheme have static scoping?
; The following program determines if Scheme has static scoping
; for parameters
(define (foo) (define y 1)); when foo is called, it will bind y
(define (bar y) (begin ; does y have static scope?
(foo)
y ; bar will return the value of y
))
; if local names are statically scoped, then 3 will be displayed,
; otherwise y is dynamically scoped so 1 will be displayed
(display (bar 3))
Scheme has static scoping for function parameters, but dynamic
scoping in the global name space!
In dynamic scoping, the scope is determined at run-time.
PostScript has dynamic scoping. Let's do in PostScript
the same thing that we did above in Scheme.
/foo {/y 1 def} def
/bar {
1 dict begin
/y exch def
foo
y
end
} def
3 bar
If 3 is left on the stack, the PostScript is statically scoped, but
if 1 is left on the stack, then PostScript is dynamically scoped
(the binding of /y in foo binds the
/y in bar).
PostScript has dynamic binding, so 1 will be left on the stack.
Named constants
A constant is a name that is bound once, and can be bound only
once, during its lifetime.
-
C - done by the macro pre-processor
#define SIZE 10 /* only compile-time expressions allowed */
int x[SIZE];
-
C++ - reserved word const
const int size = 10; /* no assignments are allowed to size */
/* value can be run-time expression */
int x[size];
-
Java - reserved word final
static final int size = z; /* cannot reassign size, value can
be run-time expression */
int[] x = new int[size];
-
Scheme - no constants, we can rebind (redefine) any name
Source of Information
These lecture notes are based on Chapter 5 in "Programming Languages, 6ed"
by Robert Sebesta and Chapter 2 in
"Programming Language Concepts and Paradigms" by David Watt.
|