[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7. Equations and Expression Evaluation

"Computational processes are abstract beings that inhabit computers." [Abelson/Sussman 1985, p.1]

At its core, any programming language faces the programmer with three distinct abstract notions [Abelson/Sussman 1985]: the entities which are to be manipulated, usually referred to as "data", the computational "processes" which carry out those manipulations, and the description of these processes by finite collections of computational rules, commonly called "programs".

As in other functional programming languages, all computations performed by Q scripts are expression evaluations: The user specifies an expression to be evaluated, the interpreter computes the value of that expression and prints it as the result of the computation.

Having described the "data" manipulated by Q scripts, expressions, we now turn to the computational processes carried out on those expressions, expression evaluations, and their description by collections of computational rules, equations.

7.1 Equations  
7.2 Non-Linear Equations  
7.3 Free Variables  
7.4 Local Variables  
7.5 Type Guards  
7.6 Normal Forms and Reduction Strategy  
7.7 Conditional Rules  
7.8 Rule Priorities  
7.9 Performing Reductions on a Stack  
7.10 Tail Recursion  
7.11 Error Handling  

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.1 Equations

In the Q language, there is in fact no such thing like a "function definition" in conventional languages. Rather, definitions take the form of equations which describe how a given expression, the left-hand side of the equation, can be transformed into another one, the corresponding right-hand side. Note the difference between this interpretation and common mathematical usage in which an equation is usually something to be "solved". Here we are actually dealing with "rewriting rules", i.e., the equations are oriented from left to right. The syntax of equations is captured by the following grammar rules:

definition              : expression '=' expression qualifiers ';'
                          {'=' expression qualifiers ';'}
qualifiers              : {qualifier}
qualifier               : condition
                        | where-clause
condition               : 'if' expression
                        | 'otherwise'
where-clause            : 'where' expression '=' expression
                          {',' expression '=' expression}

Variables occurring on the left-hand side of an equation are taken to be universally quantified. They play the role of formal parameters in conventional programming languages, which are "bound" to their corresponding values when the equation is applied to an actual expression value. For instance, suppose we wish to define a function sqr which squares its argument. We know that to square something we simply multiply it with itself. The corresponding equation is:

sqr X                   = X*X;

Here the left-hand side is simply the application of the function symbol sqr to an argument value X which stands for any actual value we might apply the sqr function to.

An equation may also contain a collection of conditions and "local" variable definitions (these two kinds of supplementary elements are also called qualifiers in the Q jargon), and multiple right-hand sides (which are interpreted as a collection of equations for the same left-hand side). An empty condition may be denoted with the keyword otherwise, which is a convenient means to indicate the "default case" of a definition. For instance, the following equations define the factorial function:

fac N                   = N*fac (N-1) if N>0;
                        = 1 otherwise;

The otherwise keyword is nothing but syntactic sugar; it can always be omitted. However, it tends to improve the readability of definitions like the one above.

In difference to function definitions in conventional programming languages, the left-hand side of an equation may contain constant and structured argument patterns which are to be matched against the actual parameters of a function. For instance, the following equations, which involve constant values on the left-hand side, define the Fibonacci function:

fib 0                   = 0;
fib 1                   = 1;
fib N                   = fib (N-1) + fib (N-2) otherwise;

As an example for structured argument patterns, operations to extract the "head" (first element, the "car" in Lisp terminology) and "tail" (list of remaining elements, the "cdr") of a list can easily be defined as follows:

hd [X|Xs]               = X;
tl [X|Xs]               = Xs;

Note that in the above example we actually only need one of the involved component values. In such a case it is possible to employ the anonymous variable `_' as a placeholder for any component value we do not care for. For instance:

hd [X|_]                = X;
tl [_|Xs]               = Xs;

The anonymous variable can only be used on the left-hand side of an equation, i.e., it is illegal to refer to the "value" of the anonymous variable on the right-hand side because in fact this variable has no value. (In the interpreter this is different; there the anonymous variable can be used to denote the value of the most recent evaluation result, see B.2 Command Language.) We also remark that multiple instances of the anonymous variable are matched independently from each other, therefore an equation like

foo _ _                 = 0;

will be matched for any combination of arguments.

As another example, here is the definition of a function sum which computes the sum of a list of numbers:

sum [X|Xs]              = X+sum Xs;
sum []                  = 0;

(The above definition is the Q equivalent for what Lisp programmers call "cdr'ing down a list".)

In the same fashion, we can use tuples and applications to form expression patterns on the left-hand side of an equation. For instance, we can define a binary tree insertion operation as follows:

insert X nil            = bin X nil nil;
insert X (bin Y T1 T2)  = bin Y (insert X T1) T2  if X<Y;
                        = bin Y T1 (insert X T2)  otherwise;

Here the nil and bin symbols act as constructors which are used to build a tree-like data structure. Applications involving such symbols are usually normal form expressions without any "defining" equations. In such a case they can also be declared const (cf. 5. Declarations), which means that the compiler will enforce that these symbols cannot be "redefined" by means of an equation:

const nil, bin X T1 T2;

After this declaration, equations like the following will be invalid:

nil                 = ...; // WRONG!
bin X T1 T2         = ...; // WRONG!

The same applies to the built-in constant symbols true and false which are predeclared as const, and also to built-in constants, i.e., numbers, strings, files, lists and tuples. The general rule is that no constant value is allowed as the left-hand side, or the leftmost ("function") part of an application which forms the left-hand side of an equation.

Note that since expressions involving operators are nothing but syntactic sugar for applications, they can occur on the left-hand side of an equation as well (with the exception of the ' operator which is a constructor symbol, see 9. Special Forms). For instance, here is a collection of algebraic simplification rules which cause any expression involving only + and * to be converted into a "sum of products" form:

// distributivity laws:
(X+Y)*Z                 = X*Y+X*Z;
X*(Y+Z)                 = X*Y+X*Z;
// associativity laws:
X+(Y+Z)                 = (X+Y)+Z;
X*(Y*Z)                 = (X*Y)*Z;

A word of caution: if the = operator occurs at the toplevel of the left- or right-hand side of an equation, it must be parenthesized. This is necessary to prevent the = operator to be confused with the = symbol used to separate both sides of an equation. (The same also applies to variable definitions.) E.g., you should write something like

(X=X)                   = true;

instead of

X=X                     = true; // WRONG!

Although almost any expression can form the left-hand side of an equation, you should be aware that an equation will only be applicable if the pattern you specify can actually occur in the course of an expression evaluation. In particular, since the Q interpreter evaluates expressions in a leftmost-innermost fashion, you should make sure that argument patterns match normal form expressions (cf. 7.6 Normal Forms and Reduction Strategy), unless the function to be defined is a special form (cf. 9. Special Forms). Otherwise the equation may be useless, albeit syntactically correct. The compiler cannot verify these conditions, so they are at your own responsibility.

Using equations we can also easily define "higher order" functions which take functions as arguments or return them as results. This is made possible by the fact that function application is an explicit operation. For instance, as a generalization of the sum function above, we can implement an operation which applies any "binary" function F to a list, starting from a given initial value A:

foldr F A []            = A;
foldr F A [X|Xs]        = F X (foldr F A Xs);

The foldr ("fold-right") function takes as its arguments a function F, an initial value A, and a list [X1,...,Xn], and computes the value

F X1 (F X2 (... (F Xn A)...)).

(The name of this function comes from the fact that the recursive applications of F are parenthesized to the right.) Now we can easily define the sum function in terms of foldr as follows:

sum                     = foldr (+) 0;

There is another version of the fold operation, which accumulates results from left to right rather than from right to left:

foldl F A []            = A;
foldl F A [X|Xs]        = foldl F (F A X) Xs;

We remark that although both functions work in linear time (with respect to the size of the input list), the foldl function actually is more efficient than foldr in terms of stack space requirements since its definition is "tail-recursive" (cf. 7.10 Tail Recursion).

As another example, here is the definition of the function map which maps a given function F to each member of a list Xs:

map F []                = [];
map F [X|Xs]            = [F X|map F Xs];

Finally, we also give an example for a function which returns another function. It is a toy implementation of the lambda calculus, based on the combinatorial calculus. (This definition will only work with simple kinds of expressions. Please refer to the standard library script lambda.q described in 11.7 Lambda Calculus, for a much more comprehensive implementation which also properly handles the obscure cases. Also note that the lambda function is declared as a special form which prevents its arguments from being evaluated. This will be explained in 9. Special Forms.)

special lambda X Y;

lambda X X              = i;
lambda X (Y Z)          = s (lambda X Y) (lambda X Z);
lambda X Y              = k Y otherwise;

/* combinator rules */

i X                     = X;
k X Y                   = X;
s X Y Z                 = X Z (Y Z);

For instance, you might wish to try the following:

==> lambda X (2*X)
s (s (k (*)) (k 2)) i

==> _ 4

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.2 Non-Linear Equations

If a variable has multiple occurrences on the left-hand side of a rule, each occurrence must be matched to the same value. In term rewriting theory, such rules are called "non-linear" ("non-left-linear", to be more precise). The following definition implements an operation uniq which removes adjacent duplicates from a list:

uniq []                 = [];
uniq [X,X|Xs]           = uniq [X|Xs];
uniq [X|Xs]             = [X|uniq Xs] otherwise;

It is important to notice the difference between the syntactical notion of equality which plays a role in definitions like the one above, and the "semantic" equality relation defined by the = operator (cf. 6.4.3 Relational Operators). Non-linear equations and equations involving constants, such as

uniq [X,X|Xs]           = uniq [X|Xs];
fib 0                   = 0;

call for the verification of syntactic identities. While two expressions are considered syntactically equal only if they print out the same in the interpreter, the meaning of the = operation depends on built-in rules and equations specified by the programmer. For instance, two different instances of the expression 0 always denote the same object. However, since 0 is an integer and 0.0 a floating point number, these two expressions are syntactically different. Nevertheless, the = operator allows to compare integers and floating point numbers. Indeed it asserts that 0 and 0.0 are "equal".

If you would like to have an explicit syntactic equality operation, you can easily define it yourself as follows:

eq X X                  = true;
eq _ _                  = false otherwise;

In fact, this definition is provided in the stdlib.q script, see 11. The Standard Library.

You should note that the strict separation of syntactic and semantic equality is actually quite useful. First of all, it allows the = operation to be treated in a manner consistent with other comparison operators such as < and >. Secondly, it allows you to tailor the definition of = to your application. For instance, an application might call for a set data structure, and you would like to be able to test whether two expressions of a certain kind represent the same set. Now there are fairly sophisticated data structures for representing sets efficiently, but almost none of them allow you to decide whether two objects represent the same set simply on the basis of syntactic identity. Instead, you will have to provide an explicit = operation which tests whether two set objects contain the same elements.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.3 Free Variables

On the right-hand side of an equation (as well as in the condition part and the right-hand sides of local definitions), we distinguish between bound variables, i.e., unqualified variable symbols which are introduced in the left-hand side of an equation (or the left-hand side of a local definition, see 7.4 Local Variables), and free variables which are not bound by any of the left-hand side patterns. Free variables can also be declared explicitly using a var declaration (cf. 5. Declarations). They are effectively treated as (parameterless) function symbols, except that they may be assigned a value by means of a variable definition. Variable definitions have the following syntax:

definition              : 'def' expression '=' expression
                          {',' expression '=' expression} ';'
                        | 'undef' identifier {',' identifier} ';'

For instance, in the following equation,

foo X                   = C*X;

the variable C occurs free on the right-hand side and we have:

==> foo 23

We can assign a value to the variable as follows:

def C = 2;

The same effect can also be achieved by using the def command in the interpreter:

==> def C = 2

After this definition we have:

==> foo 23

Free variables are useful if you have a script which repeatedly refers to the same value whose calculation may be costly or involve functions with side-effects (such as the built-in fopen function, see 10. Built-In Functions). By assigning such a value to a variable, you can refer to it without having to recompute the value each time it is required, as would be the case with a corresponding function definition. Note that free variable definitions cannot be changed inside an equation, and therefore the use of free variables does not compromise referential transparency. This is in contrast to other functional programming languages such as Lisp which allow you to modify the value assigned to a variable in a function definition. In Q, you can count on that the value of a free variable does not change during a single expression evaluation, although it may be altered between different evaluations of the same expression.

The right-hand side of a variable definition may be any Q expression, as described in 6. Expressions. The expression is evaluated once, at the time the script is loaded, and is then matched against the left-hand side of the definition, binding variables to the corresponding component values of the right-hand side value. In the special case that the left-hand side is a single variable, it is simply bound to the right-hand side value. Whenever a defined variable occurs "free" on the right-hand side of an equation or variable definition, it will then be replaced by the value of the assigned expression. For instance, the following definition assigns the value exp 1.0 = 2.71828... to the (explicitly declared) free variable e:

var e;
def e = exp 1.0;

Variable definitions are carried out in the order in which they occur in a script, s.t. a definition may refer to the values of variables assigned in previous definitions. Unless a variable has been declared const, it is also possible to override assignments made in previous definitions, and to undefine a variable. Multiple definitions can be performed with a single def statement. For instance:

def N = 99, K = N, N = 2*K+1, M = N;
undef N, K;

As already mentioned, the value assigned to a variable may also be changed with def and undef statements in the interpreter, see B.2 Command Language.

Free variable definitions can also involve a pattern on the left-hand side which is to be matched against the supplied value. E.g.,

def (X,Y) = foo Z;

causes foo Z to be evaluated and to be matched against (X,Y). The interpreter then defines the free variables X and Y accordingly. Pattern matching is performed in the same manner as for the left-hand side of an equation. However, an error is reported if the match fails.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.4 Local Variables

As already mentioned, an equation may also contain one or more auxiliary variable definitions a.k.a. where clauses, which bind additional variables to their corresponding values. In difference to the global definitions of "free" variables discussed in 7.3 Free Variables, the definitions in where clauses are "local" to the rule in which they occur, meaning that the values of these variables are only available in the rule they are defined in. As in the case of free variable definitions, the left-hand side of each where clause can be an arbitrary expression pattern which is to be matched against the value computed from the corresponding right-hand side. The variables occuring on the left-hand side of the definition will then be bound to their corresponding values, thereby becoming "bound" variables in the context of the rule, in addition to the variables occuring on the left-hand side of the equation. Local definitions also act as additional conditions in that the rule will only be applicable if all left-hand sides in where clauses match their corresponding right-hand side values (this is guaranteed, of course, if each left-hand side is a single variable).

Local definitions are useful if the right-hand side of a rule repeatedly refers to the same (maybe complicated) subexpression. You can avoid the recalculation of such a value be assigning it to a variable, and referring to the variable in the right-hand side. For instance:

foo X                   = bar Y Z
                          where Y = baz X, Z = qux Y;

As already mentioned, pattern-matching definitions are permitted as well (and the anonymous variable also works as usual). A failing match causes the entire rule to fail. For instance,

foo Z                   = bar X Z where [X|_] = Z;

works like

foo [X|Xs]              = bar X [X|Xs];

but avoids repeating the list value on the right.

Note that if a variable is bound by the left-hand side or a local definition of a rule, then a global variable of the same name will be "shadowed" within the rule. However, you can still access the value of the shadowed variable with a qualified identifier. For instance, with the following definitions, foo 2 reduces to bar 2*99:

/* this is the script foo.q */
def FOO                 = 99;
foo X                   = FOO*foo::FOO where FOO = bar X;

The variable bindings in a single where clause are performed in the given order, and each definition may refer to all left-hand side variables of the rule, as well as all variables introduced in earlier definitions. Each use of a variable refers to its most recent definition. It is also possible to have multiple where clauses, as in:

foo X = bar Y where Y = baz Z where Z = qux U where U = quux X;

Note that here the definitions will be processed from right to left. Such constructs are convenient if there is a series of values to be computed in which one value depends on the next one. Using multiple where clauses, you start off with the definitions immediately required by the right-hand side of the rule, then add another where clause for the variables introduced in the first clause, etc. The above definition is actually equivalent to the following:

foo X = bar Y where U = quux X, Z = qux U, Y = baz Z;

It is also possible to mix where clauses with conditions, as in:

foo X = bar Y if qux Y where Y = baz Z if quux Z where Z = bazz X;

Again, the different qualifiers are actually considered from right to left; the rule will be aborted as soon as the first qualifier fails (i.e., a condition evaluates to false or the left-hand side of a definition does not match the corresponding right-hand side), so that no time is wasted with definitions which are not needed anyway.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.5 Type Guards

The Q language allows you to restrict the set of patterns which can be matched by a left-hand side variable. This is done by augmenting any left-hand side occurrence of the variable with a type guard, using the notation:


For instance, to declare a type BinTree which comprises the constructor patterns nil and bin X T1 T2, the following declaration might be used (cf. 5. Declarations):

public type BinTree = const nil, bin X T1 T2;

This declaration introduces two function symbols nil and bin, just as if they were declared by:

public const nil, bin X T1 T2;

However, it also assigns these symbols to the type symbol BinTree. The type symbol can then be used as a guard on variables to ensure that a variable is only matched to objects of the given type. E.g., the following equation will only apply if the argument to the foo function has the form nil or bin X T1 T2 for some subexpressions X, T1 and T2:

foo T:BinTree           = ...;

This makes it possible to avoid explicit argument patterns like

foo nil                 = ...;
foo (bin X T1 T2)       = ...;

in cases in which the component values are not actually required. Also, it allows you to define operations on a given type without actually referring to the constructor symbols of the type, and thereby makes it possible to avoid dependencies on implementation details. You can even "hide" the constructor symbols of a type by declaring them private, which makes the symbols accessible only to the script which declares the type they belong to. For instance:

public type BinTree = private const nil, bin X T1 T2;

The Q language allows you to construct a hierarchy of types by making one type a subtype of another. This is done by specifying a supertype following the type identifier in a type declaration. For instance, to make BinTree a subtype of the type SearchTree, BinTree would be declared as follows:

type BinTree : SearchTree = const nil, bin X T1 T2;

Now a variable of type SearchTree will not only match SearchTree's, but also any object of its subtype BinTree. In fact, we might have declared SearchTree as an "abstract" type which does not provide any constructors at all:

type SearchTree;

Abstract supertypes like SearchTree are useful for factoring out generic operations which are shared by different subtypes; see 8. Types, for examples, and for a more detailed discussion of the type guard concept.

The Q language has nine predefined type symbols which can be used to distinguish the corresponding types of objects built into the language: Int, Float, Num (which is the supertype for both Int and Float), String, Char (the subtype of String denoting the single-character strings), File, List, Tuple and Bool. (Besides these, there are also two types Exception and SysException for dealing with exception values, cf. 10.6 Exception Handling.) For instance, an operation isstr which determines whether its argument is a string can be defined as:

isstr _:String          = true;
isstr _                 = false otherwise;

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.6 Normal Forms and Reduction Strategy

The process in which equations are applied in the Q interpreter to evaluate a given expression is fairly straightforward. In fact, it corresponds to the way in which we manipulate algebraic formulae, and we learned how to do that in high school.

We say that an equation L=R is applicable to a given expression X, if we can substitute actual values for the variables occurring in L s.t. we obtain X. This is also expressed as "L matches X" or "X is an instance of L". In this case, we can replace X by Y, where Y is the expression obtained from R by replacing the variables occurring in L by their corresponding values. We shall call such a replacement step a reduction, and write it down as X => Y.

For instance, given the rule

sqr X                   = X*X;

we have that

sqr 2                   => 2*2

since the expression sqr 2 matches the left-hand side sqr X, with 2 being substituted for the variable X. This works with compound expressions as well, even if they involve variables themselves. For instance:

sqr (X+1)               => (X+1)*(X+1).

Here, the compound expression (X+1) has been substituted for the variable X.

Equations are usually applied in the context of a larger expression to be evaluated. For instance, we have that

sqr 2 + 2               => 2*2+2

by the application of the above definition of sqr to the subexpression sqr 2 of sqr 2 + 2.

Before we proceed, a remark on the role of the built-in operations of the Q language is in order. Primitive operations, such as the built-in + and * operators described in 6.4 Built-In Operators, do not require any equations specified by the programmer. Instead, they may be thought of as being predefined by a large set of built-in equations. Each application of a built-in rule also counts as a single reduction. For instance, we have that

2*2                     => 4

by the built-in rule which handles the multiplication of integers.

Built-in rules take priority over any equations specified by the programmer. However, it is possible to extend the definition of a primitive operation with equations supplied by the programmer. A corresponding example has already been given in 7.1 Equations. By refining built-in definitions accordingly, you can also treat "exceptions" in built-in operations; see 2.5 Runtime Errors, for an example.

In order to complete the evaluation of a given expression, we have to repeatedly perform single reductions until no further equations (built-in or user-defined) apply. We then say that the resulting expression is in normal form, and interpret it as the value of the original expression. For instance, the following reduction sequence leads to the normal form (i.e., value) 6 for the target expression sqr 2 + 2 (we still assume the definition of sqr introduced above):

sqr 2 + 2  =>  2*2+2  =>  4+2  =>  6.

Normal forms do not necessarily exist. Just as in conventional programming languages, the evaluation may go into an endless loop and never return an answer. Even if a normal form exists, it need not be unique -- it may depend on which equations are used in the reduction process and in what order. The problem is that at any point in the reduction process there may be more than one "reducible" subexpression, and even if there is only one such expression (termed a redex), there may be more than one applicable equation.

To illustrate this kind of problem, consider the definition of the sqr function from above, and the target expression sqr (1+1). This expression admits three distinct reduction sequences:

sqr (1+1)  =>  sqr 2  =>  2*2  =>  4
sqr (1+1)  =>  (1+1)*(1+1)  =>  2*(1+1)  =>  2*2  => 4
sqr (1+1)  =>  (1+1)*(1+1)  =>  (1+1)*2  =>  2*2  => 4.

The second and third reduction sequence appear to be almost the same, but the first reduction sequence makes clear that it matters whether we reduce the subexpression (1+1) before applying the definition of sqr or not. In the present example, the order in which reductions are performed only affects the number of required reduction steps, but it is easy to construct other systems of equations in which both the termination of a normal form evaluation and the computed normal forms actually depend on the evaluation order.

Non-uniqueness of normal forms does not necessarily mean that there must be two or more rules being applicable to a given expression. Ambiguities can also arise when an equation "overlaps" with other equations or with itself, as in the following example:

foo (foo X)             = bar X;

This system consisting of a single equation is terminating, but it has two distinct normal forms for the expression foo (foo (foo X)), namely foo (bar X) and bar (foo X).

Indeed, the termination of rewrite systems and the uniqueness of normal forms are important issues in the theory of term rewrite systems [Dershowitz/Jouannaud 1990]. The Q language simply avoids these problems by imposing a reduction strategy which makes the reduction process deterministic. The reduction strategy must specify unambiguously for each reduction step the redex that is to be reduced and the equation which is to be applied. For this purpose the Q interpreter applies the following rules:

The leftmost-innermost strategy is common in many programming languages, and it also corresponds to the "natural" way in which people tend to carry out calculations manually. It means that at any point in the reduction process it is always the leftmost-innermost redex which is to be reduced. In particular, this implies that in an application of the form F X first F is evaluated, then X, before the value of F is applied to the value of X. Thus the first reduction sequence from above,

sqr (1+1)  => sqr 2  =>  2*2  =>  4

would be the one actually carried out in the interpreter in the course of the evaluation of sqr (1+1).

The leftmost-innermost strategy resolves all ambiguities in the choice of redices during a normal form evaluation. The second disambiguating rule allows to decide which equation to apply if more than one equation is applicable to a given redex. As indicated, the default is to apply equations in the order in which they appear in the source script. This allows you to have equations with overlapping left-hand sides which naturally arise, e.g., in definitions involving "special case rules" such as the definition of the fib function in 7.1 Equations:

fib 0                   = 0;
fib 1                   = 1;
fib N                   = fib (N-1) + fib (N-2) otherwise;

We remark that the default textual order of equations can also be changed by using an appropriate "rule priority" directive. This is described in 7.8 Rule Priorities.

The rule for applying equations is extended to the built-in equations of primitive operations by assuming -- as already mentioned -- that the built-in rules come "before" any equations specified by the programmer, and hence always take priority. We also remark that external functions, which are declared with the extern modifier and implemented in a C module, are treated in an analogous manner. In general, the interpreter will first try a built-in rule, then an extern rule, then the equations supplied in the script. (The treatment of extern functions is described in more detail in C. C Language Interface.)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.7 Conditional Rules

Conditional rules are applied in the same manner as simple equations, only we also have to check the conditions of the rule. When the left-hand side of a conditional rule matches the target expression, we first evaluate the condition (with left-hand side variables being replaced as usual). This expression must evaluate to a truth value, otherwise we cannot decide whether the rule is applicable or not. (The Q interpreter generates a runtime error in such cases.) The rule is applicable only when the condition evaluates to true.

For instance, recall the definition of the factorial function:

fac N                   = N*fac (N-1) if N>0;
                        = 1 otherwise;

The first rule is applicable only if N>0 evaluates to true; if it evaluates to false, the second rule applies. Furthermore, if N happens to be incomparable with 0, then a runtime error will occur.

Here's how this definition is applied to evaluate the expression fac 3:

fac 3                   => 3*fac (3-1)         => 3*fac 2
                        => 3*(2*fac (2-1))     => 3*(2*fac 1)
                        => 3*(2*(1*fac (1-1))) => 3*(2*(1*fac 0))
                        => 3*(2*(1*1))         => 3*(2*1)
                        => 3*2                 => 6.

Local variable definitions (cf. 7.4 Local Variables) are treated in a similar fashion. We first evaluate the right-hand side of the definition and match it against the corresponding left-hand side. If the match fails, the rule is skipped. Otherwise, we bind variables accordingly and proceed with the next qualifier or the right-hand side of the rule.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.8 Rule Priorities

We already discussed that the Q interpreter normally considers equations in the order in which they appear in a script. Hence, if all your equations go into a single script, you simply write down overlapping equations in the order in which they should be tried by the interpreter. However, if equations for the same expression pattern are scattered out over different modules, then the rule order also depends on the order in which the different modules are processed by the compiler. In the current implementation, the rules of each module are processed when the first import or include declaration for the module is encountered during a preorder traversal of the module graph starting at the main script, but you should not rely on this.

Hence it is usually a bad idea to have overlapping equations scattered out over different source files. However, in some situations this is hard to avoid. For instance, you might decide to put all "default rules" for certain expression patterns in a separate module. But if you simply import this module at the beginning of your main script then the default rules might override all the other equations.

To work around this, the Q language allows you to explicitly attach priorities to equations by means of a priority declaration, which has the following format:

declaration             : '@' ['+'|'-'] unsigned-number

This specifies the given number (which must be an integer, optionally signed with `+' or `-') as the priority level of the following equations. In the current implementation, priority levels must be 32 bit integers, i.e., in the range -0x80000000..0x7fffffff. Rules are ordered according to their priorities, highest priority first. Rules with the same priority are considered in the order in which they appear in the script.

The priority level set with a priority declaration applies only to the script in which the declaration is located. The default priority level, which is in effect up to the first @ declaration, is 0.

Note that even with priority declarations the built-in operations still take priority over all user-defined rules, i.e., those operations effectively have infinite priority. (The same applies to external functions, see C. C Language Interface.)

As a simple (and contrived) example for the use of priority declarations, consider the following script:

foo X                   = -1; // "default" rule

foo X:Num               =  0; // "standard" rule

foo X:Int               =  1; // "special case" rule

The second equation is at the default priority level, 0. The first equation, although it comes before the second one, becomes a "fallback" rule by assigning it a low priority, -1. The last equation is put on a high priority level, +1, and hence overrides both preceding equations. The net effect is that the equations are actually ordered in reverse textual order. You can verify this in the interpreter:

==> foo 77

==> foo 77.0

==> foo ()

Of course, this example is somewhat silly, because in a single script we can easily arrange equations in the correct textual order. However, if for some reason we had to put the three rules into three different modules, then using the given priority declaration ensures that the equations will always be applied in the correct order.

Priority levels should be used sparingly, if at all. Using low priorities to factor out a module with default rules can occasionally be quite useful, but overriding rules with high priorities is considered harmful and should be avoided whenever possible.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.9 Performing Reductions on a Stack

The present and the following section contain material for the technically interested reader which, while not being strictly necessary to work successfully with the Q programming language, helps to understand the operation of the Q interpreter and to write better Q scripts. You might wish to skim the following at first reading, and later return to it when you have become more proficient with the Q language and require some further background knowledge.

While the "rewriting model" of expression evaluation introduced in 7.6 Normal Forms and Reduction Strategy, provides a nice conceptual model for the Q programmer, it is too inefficient to be a useful implementation model for the Q interpreter. In particular, it would be rather inefficient to actually construct the full expressions which occur as intermediate results in the course of a reduction sequence, and repeatedly scan these expressions for the leftmost-innermost redex. Instead, we can evaluate expressions by a simple recursive procedure as follows. The input to the algorithm is an expression X to be evaluated.

  1. If X is an application, list or tuple, evaluate the parts of X recursively, from left to right. Proceed with step 2.

  2. If a built-in (or extern) rule is applicable to X, invoke it and return the corresponding value.

  3. Otherwise, determine the first equation L=R which is applicable to X. If there is no such rule, X is in normal form and is returned as the value of the expression. Otherwise we recursively evaluate R (with variables instantiated to their corresponding values) and return it as the value of the original expression.

(The above description of course leaves out many important details. For instance, the interpreter will also have to take care of def'ed symbols, and we will have to organize the search for an applicable equation. However, here we only want to give the general idea, and not a complete implementation of the interpreter.)

The above procedure can actually be implemented non-recursively if we keep track of the rules which are currently "active", together with the corresponding variable bindings. This information can easily be maintained on a stack. We illustrate this with a simple example. Recall the definition of the sum function:

sum [X|Xs]              = X+sum Xs;
sum []                  = 0;

The evaluation of sum [1,2,3] then proceeds as follows:

sum [1,2,3] => 1+sum [2,3]:
        sum [2,3] => 2+sum [3]:
                sum [3] => 3+sum []:
                        sum [] => 0
                        3+0 => 3
                sum [3] => 3
                2+3 => 5
        sum [2,3] => 5
        1+5 => 6
sum [1,2,3] => 6

Each new level of indentation indicates that we "suspend" the current rule (push it on the stack) and activate a new rule in order to evaluate some part of the right-hand side of the suspended rule. When all parts of the right-hand side have been evaluated, we return to the suspended rule (pop it from the stack) and actually perform the reduction. More precisely, we replace the left-hand side of the suspended rule by the result obtained by evaluating the right-hand side, which is already in normal form. (We will of course have to keep track of the results of intermediate reductions, which can be done on a second "evaluation" stack. When the evaluation of the right-hand side is finished, the corresponding result will be on top of the evaluation stack.)

If desired, we can easily reconstruct the "context" of a rule by following the chain of stacked rules starting from the current rule, and proceeding towards the bottom of the stack. For instance, when we perform the reduction

sum [] => 0

in the above example, the context of this rule is described by the following stacked rules:

sum [3] => 3+sum []
sum [2,3] => 2+sum [3]
sum [1,2,3] => 1+sum [2,3]

Thus, when sum [] gets reduced to 0, we have actually completed the following initial part of the reduction sequence:

sum [1,2,3] => 1+sum [2,3] => 1+(2+sum [3]) => 1+(2+(3+sum []))
            => 1+(2+(3+0))

(Note that we never actually constructed intermediate results like the expression 1+(2+(3+sum [])). Rather, these expressions are represented implicitly by the contents of the stack.)

We can also apply the above procedure to qualified rules accordingly. When the interpreter comes to consider a conditional rule, it pushes it onto the stack as usual. However, it then first starts to evaluate the qualifying conditions and where clauses of the rule. When the value of a condition has been computed, we can check whether it holds. If the computed value is neither true nor false, the interpreter aborts the evaluation with a runtime error. If it is true, it proceeds by evaluating other qualifiers or the right-hand side. If it is false, however, it gives up on the current rule (pops it from the stack), and considers the next applicable equation. A similar approach is used to handle local variable definitions.

For instance, consider once more the definition of the factorial function:

fac N                   = N*fac (N-1) if N>0;
                        = 1 otherwise;

The computation of fac 3 then proceeds as follows (the -!- symbol indicates that the condition N>0 has evaluated to false and hence the corresponding rule is abandoned):

fac 3 => 3*fac (3-1):
        3>0 => true
        3-1 => 2
        fac 2 => 2*fac (2-1):
                2>0 => true
                2-1 => 1
                fac 1 => 1*fac (1-1):
                        1>0 => true
                        1-1 => 0
                        fac 0 => 0*fac (0-1):
                                0>0 => false   -!-
                        fac 0 => 1
                        1*1 => 1
                fac 1 => 1
                2*1 => 2
        fac 2 => 2
        3*2 => 6
fac 3 => 6

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.10 Tail Recursion

The above equations for the sum and fac functions are examples for recursive definitions. The computational processes generated from such definitions are characterized by chains of suspended rules in which the same rules are activated over and over again. For instance, an evaluation of fac N with N>0 requires N recursive activations of the rule:

fac N                   = N*fac (N-1) if N>0;

Hence, if N becomes very large, the recursive definition of fac is in danger of running out of stack space. In the following, we show how to avoid this defect by employing the technique of "tail-recursive programming".

It is a well-known fact that the factorial function also has an "iterative" implementation which can be executed in constant memory space. The idea is to maintain a "running product" P and a "counter" I which counts down from N to 1. The iterative algorithm can be written down in a conventional programming language like Pascal as follows:

P := 1; I := N;
while I>0 do begin
  P := P*I;
  I := I-1;

While the Q language does not provide any special looping constructs (and it also misses an assignment operation), there is an alternative definition of fac which takes up the idea of running products and counters and implements the factorial function in an iterative fashion:

fac N                   = fac2 1 N;
fac2 P I                = fac2 (P*I) (I-1) if I>0;
                        = P otherwise;

Here, the "state variables" P and I are implemented as arguments of an "auxiliary" function fac2 which is invoked from fac. Again, this is a recursive definition; it requires N recursive applications of the rule

fac2 P I                = fac2 (P*I) (I-1) if I>0;

when fac N is computed. However, in difference to our previous definition of fac, the recursive rule is always applied "on top" of the target expression. Such rules are called tail-recursive. (The name "tail recursion" comes from the fact that the recursive application of fac2 is the last operation considered during the leftmost-innermost evaluation of the right-hand side.) For instance, the evaluation of fac 3 now proceeds as follows (abbreviating reductions by built-in rules for `-' and `*'):

fac 3 => fac2 1 3 => fac2 (1*3) (3-1)
      => fac2 3 2 => fac2 (3*2) (2-1)
      => fac2 6 1 => fac2 (6*1) (1-1)
      => fac2 6 0 => 6

Tail-recursive definitions can be employed for implementing all kinds of functions which can be computed on a machine with a fixed set of "registers" and no auxiliary memory [Abelson/Susssman 1985]. For instance, here is a tail-recursive implementation of the Fibonacci function:

fib N                   = fib2 1 0 N;
fib2 A B N              = fib2 (A+B) A (N-1) if N>0;
                        = B otherwise;

(This definition also has the desirable side effect that it cuts down the exponential running time of the "naive" definition given in 7.1 Equations, to linear.)

The Q interpreter employs a clever optimization trick, commonly known as tail recursion optimization (see e.g., [Steele/Sussman 1975]), to actually execute tail-recursive definitions within constant stack space. Hence no special looping constructs are required for implementing iterative algorithms efficiently in the Q language.

Assume that in our stack model of expression evaluation we are working on a reduction X => Y, and that we already recursively evaluated all parts of the right-hand side Y, but not Y itself. Furthermore, suppose that in order to evaluate Y we will have to apply the rule Y => Z next. Then, instead of keeping the rule X => Y suspended and evaluating Y using rule Y => Z recursively, we can also immediately perform the reduction X => Y and replace this rule with Y => Z on the stack. Thus, the new rule Y => Z will not require any additional stack space at all, but simply reuses the existing "activation record" for the X => Y rule. In other words, instead of invoking the Y => Z rule as a "subroutine", we effectively perform a kind of "goto" to the new rule. We also refer to this process as performing the tail reduction X => Y. The evaluation now proceeds as if we had been working on the rule Y => Z in the first place.

For instance, with the new definition of fac the evaluation of fac 3 would be carried out using only a single level of suspended rules (again, the -!- symbol signals abortion of a rule with failing qualifier):

fac 3 => fac2 1 3:
fac2 1 3 => fac2 (1*3) (3-1):
        3>0 => true
        1*3 => 3
        3-1 => 2
fac2 3 2 => fac2 (3*2) (2-1):
        2>0 => true
        3*2 => 6
        2-1 => 1
fac2 6 1 => fac2 (6*1) (1-1):
        1>0 => true
        6*1 => 6
        1-1 => 0
fac2 6 0 => fac2 (6*0) (0-1):
        0>0 => false                  -!-
fac2 6 0 => 6

Besides the tail recursion optimization technique discussed above, the Q interpreter also automatically optimizes toplevel sequences (i.e., applications of the || operator which are not inside a nested subexpression) on the right-hand side of equations, s.t. basic imperative-style looping constructs can be executed in a tail-recursive fashion as well. Suppose, for instance, that we want to implement a do function which applies a function to each member of a list, like the map function discussed in 7.1 Equations, but instead of collecting the results in an output list simply throws away the intermediate values and returns (). This is useful, of course, only if the function is evaluated solely for its side-effects, e.g., if we want to print out all elements of a list. A straightforward definition of do looks as follows:

do F []                 = ();
do F [X|Xs]             = F X || do F Xs;

Now this definition is not tail-recursive in the strict sense alluded to above, because the last application on the right-hand side of the second rule actually involves (||) and not the do function. (Recall that an infix expression like X||Y is nothing but syntact sugar for the function application (||) X Y.)

However, as a remedy, the Q interpreter actually implements toplevel sequences on the right-hand side of a rule by direct stack manipulation. That is, the first argument of a sequence is thrown away as soon as it has been computed. By these means, the interpreter, after having computed F X, effectively carries out the reduction do F [X|Xs] => do F Xs on which it can perform tail recursion optimization as usual. Thus the above definition actually works in constant stack space, as one might reasonably expect.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.11 Error Handling

There are a few conditions which may force the Q interpreter to give up on a reduction sequence and abort the evaluation with a runtime error message. These conditions are listed below.

All of the above error conditions can also be caught and handled by the running script in any desired manner, see 10.6 Exception Handling.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Albert Gräf on October, 14 2003 using texi2html