Next: Semantics Up: Towards a Mathematical Science Previous: The Description of Programming

# Abstract Syntax of Programming Languages

The Backus normal form that is used in the ALGOL report, describes the morphology of ALGOL programs in a synthetic manner. Namely, it describes how the various kinds of program are built up from their parts. This would be better for translating into ALGOL than it is for the more usual problem of translating from ALGOL. The form of syntax we shall now describe differs from the Backus normal form in two ways. First, it is analytic rather than synthetic; it tells how to take a program apart, rather than how to put it together. Second, it is abstract in that it is independent of the notation used to represent, say sums, but only affirms that they can be recognized and taken apart.

Consider a fragment of the ALGOL arithmetic terms, including constants and variables and sums and products. As a further simplification we shall consider sums and products as having exactly two arguments, although the general case is not much more difficult. Suppose we have a term t. From the point of view of translating it we must first determine whether it represents a constant, a variable, a sum, or a product. Therefore we postulate the existence of four predicates: , , and . We shall suppose that each term satisfies exactly one of these predicates.

Consider the terms that are variables. A translator will have to be able to tell whether two symbols are occurrences of the same variable, so we need a predicate of equality on the space of variables. If the variables will have to be sorted an ordering relation is also required.

Consider the sums. Our translator must be able to pick out the summands, and therefore we postulate the existence of functions and defined on those terms that satisfy . Similarly, we have the functions and defined on the products. Since the analysis of a term must terminate, the recursively defined predicate

must converge and have the value true for all terms.

The predicates and functions whose existence and relations define the syntax, are precisely those needed to translate from the language, or to define the semantics. That is why we need not care whether sums are represented by a+b, or + ab, or (PLUS A B), or even by Gödel numbers .

It is useful to consider languages which have both an analytic and a synthetic syntax satisfying certain relations. Continuing our example of terms, the synthetic syntax is defined by two functions and which, when given two terms t and u, yield their sum and product respectively. We shall call the syntax regular if the analytic and the synthetic syntax are related by the plausible relations:

1. and

Once the abstract syntax of a language has been decided, then one can choose the domain of symbolic expressions to be used. Then one can define the syntactic functions explicitly and satisfy one self, preferably by proving it, that they satisfy the proper relations. If both the analytic and synthetic functions are given, we do not have the difficult and sometimes unsolvable analysis problems that arise when languages are described synthetically only.

In ALGOL the relations between the analytic and the synthetic functions are not quite regular, namely the relations hold only up to an equivalence. Thus, redundant parentheses are allowed, etc.

Next: Semantics Up: Towards a Mathematical Science Previous: The Description of Programming

John McCarthy
Tue May 14 13:32:03 PDT 1996