|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Thought question
IntroductionLanguages are described by a
Formal Descriptions of SyntaxFormally, a language, represented as L, is a set of strings, usually called sentences, from some alphabet, represented as S. A grammar, G is a set of rules that describe (legal sentences) a language. From a grammar we can automatically construct the following machines:
To simplify the specification of a grammar, usually the syntax of a programming language is specified with respect to an alphabet of token classes. (Contrast with CptS 317 where alphabets are typically collections of single characters). A token is a sequence of characters that are treated as a unit in subsequent processing. Basing the language description on tokens rather than individual characters simplifies both the description of a language and its implementation. Example tokens (literally, what appears in the program) are:
sum = x + y;The tokens are (from left to right): sum, =, x, +, y, ; and the corresponding classes are: identifier, assignment op, identifier, plus op, identifier, semicolon. BNFWork by Turing and Chomsky in the 1940-50s identified four categories of languages of increasing power and complexity: regular, context-free, context-sensitive, and recursively enumerable. Usually, programming languages are context-free. The first programming language to have a formally specified grammar was ALGOL 60. The formal description was in a "metalanguage" called Backus Naur Form. A metalanguage is a language used to describe other languages. The components of BNF include the following.
DerivationsLet's look at how a grammar can be used to generate a sentence in the language. The process is called derivation. The idea is that if we can somehow derive the sentence from the start symbol, then the sentence is part of the language described by the grammar. Derivation proceeds by replacing a nonterminal with its body. Consider the following simple grammar.
X = Y - Y * X.
Let's try to derive it.
Derivation TreeEach derivation creates a derivation tree.
stmt_list
|
stmt
/ | \
var = expr
| / | \
X expr * expr
/ | \ |
expr - expr X
| |
var var
| |
Y Y
A grammar is ambiguous is if there are two
or more derivation trees for some sentence.
This grammar is ambiguous since there is more than one possible
derivation tree for the sentence above. Here is a second derivation
and its corresponding tree.
stmt_list
|
stmt
/ | \
var = expr
| / | \
X expr - expr
| / | \
Y expr * expr
| |
var var
| |
Y X
ParsingDerivation starts with the start symbol and proceeds by replacing nonterminals. Parsing is the inverse process: starting with a string purportedly in the language it attempts to find a derivation tree which is now called a parse tree. For our purposes, informal approaches to parsing will be sufficient. Parsing is examined more rigorously in the Compilers course, CptS 452.Relationship between Grammar, Associativity and PrecedenceSpecifying the right grammar for a language can help to control associativity and precedence.Associativity refers to a "direction" in which (binary) operators associate. In mathematical notation, subtraction is left-associative meaning that 7 - 3 - 4is interpreted to mean (7 - 3) - 4rather than 7 - (3 - 4)which has a very different meaning! Precedence refers to which operations are executed prior to others. Multiplication typically has higher precedence than subtraction meaning it should be done first so 7 + 3 * 4evaluates to 19 and not to 40. Most (but not all) programming languages respect these mathematical conventions.
A parse tree implicitly says which operations' results are input to other operations.
For example,
in the parse tree given above, if we assume X = 3 and Y = 4, then the
result is
Question: what is the result if we use the first parse tree instead? WARNING: the following material is not in the book but you are responsible for it nevertheless! Associativity and precedence can be specified in a grammar by altering whether recursion is done on the right or left sides of rules, and by altering the derivation order of the grammar rules. To specify precedence, the trick is to split the production where the precedence is ambiguous into two (or more) productions. Notice that this moves multiplication down the parse tree, so that a multiplication can never be the ancestor of a subtraction.
Question: what might we do to the grammar so that the result of a subtraction might be an operand of a multiplication? How would you do it in mathematical notation? We can specify associativity in the grammar by giving a direction to the parse, that is, by recursing on only the left side (or right side, but not both) of an operation. Let's make subtraction left associative and make multiplication right-associative (just as an illustration).
Exercise: convince yourself that the above grammar gives multiplication precedence over subtraction, that subtraction associates to the left, and that multiplication associates to the right, by creating parse trees for several expressions. End of warning So now for a nasty little secret about programming languages and context-free grammars. The grammar for a PL typically does *not* specify the acceptable programs of the language. Consider int *c; c = 17;
Some aspects of the language, such as type checking, are difficult
(in the sense that they make the grammar blow up in size) or
impossible to express using CFGs. These aspects usually go by
the name of static semantics. We will take up the issue of
types and type checking later in the semester.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| (c) 2003 Curtis Dyreson, (c) 2004 Carl H. Hauser E-mail questions or comments to Prof. Carl Hauser | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||