[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
"Computational processes are abstract beings that inhabit computers." [Abelson/Sussman 1985, p.1]
At its core, any programming language faces the programmer with three distinct abstract notions [Abelson/Sussman 1985]: the entities which are to be manipulated, usually referred to as "data", the computational "processes" which carry out those manipulations, and the description of these processes by finite collections of computational rules, commonly called "programs".
As in other functional programming languages, all computations performed by Q scripts are expression evaluations: The user specifies an expression to be evaluated, the interpreter computes the value of that expression and prints it as the result of the computation.
Having described the "data" manipulated by Q scripts, expressions, we now turn to the computational processes carried out on those expressions, expression evaluations, and their description by collections of computational rules, equations.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the Q language, there is in fact no such thing like a "function definition" in conventional languages. Rather, definitions take the form of equations which describe how a given expression, the left-hand side of the equation, can be transformed into another one, the corresponding right-hand side. Note the difference between this interpretation and common mathematical usage in which an equation is usually something to be "solved". Here we are actually dealing with "rewriting rules", i.e., the equations are oriented from left to right. The syntax of equations is captured by the following grammar rules:
definition : expression '=' expression qualifiers ';' {'=' expression qualifiers ';'} qualifiers : {qualifier} qualifier : condition | where-clause condition : 'if' expression | 'otherwise' where-clause : 'where' expression '=' expression {',' expression '=' expression} |
Variables occurring on the left-hand side of an equation are taken to be
universally quantified. They play the role of formal parameters in
conventional programming languages, which are "bound" to their
corresponding values when the equation is applied to an actual
expression value. For instance, suppose we wish to define a function
sqr
which squares its argument. We know that to square something
we simply multiply it with itself. The corresponding equation is:
sqr X = X*X; |
Here the left-hand side is simply the application of the function symbol
sqr
to an argument value X
which stands for any actual
value we might apply the sqr
function to.
An equation may also contain a collection of conditions and "local"
variable definitions (these two kinds of supplementary elements are also
called qualifiers in the Q jargon), and multiple right-hand sides
(which are interpreted as a collection of equations for the same
left-hand side). An empty condition may be denoted with the keyword
otherwise
, which is a convenient means to indicate the "default
case" of a definition. For instance, the following equations define the
factorial function:
fac N = N*fac (N-1) if N>0; = 1 otherwise; |
The otherwise
keyword is nothing but syntactic sugar; it can always
be omitted. However, it tends to improve the readability of definitions like
the one above.
In difference to function definitions in conventional programming languages, the left-hand side of an equation may contain constant and structured argument patterns which are to be matched against the actual parameters of a function. For instance, the following equations, which involve constant values on the left-hand side, define the Fibonacci function:
fib 0 = 0; fib 1 = 1; fib N = fib (N-1) + fib (N-2) otherwise; |
As an example for structured argument patterns, operations to extract the "head" (first element, the "car" in Lisp terminology) and "tail" (list of remaining elements, the "cdr") of a list can easily be defined as follows:
hd [X|Xs] = X; tl [X|Xs] = Xs; |
Note that in the above example we actually only need one of the involved component values. In such a case it is possible to employ the anonymous variable `_' as a placeholder for any component value we do not care for. For instance:
hd [X|_] = X; tl [_|Xs] = Xs; |
The anonymous variable can only be used on the left-hand side of an equation, i.e., it is illegal to refer to the "value" of the anonymous variable on the right-hand side because in fact this variable has no value. (In the interpreter this is different; there the anonymous variable can be used to denote the value of the most recent evaluation result, see B.2 Command Language.) We also remark that multiple instances of the anonymous variable are matched independently from each other, therefore an equation like
foo _ _ = 0; |
will be matched for any combination of arguments.
As another example, here is the definition of a function sum
which
computes the sum of a list of numbers:
sum [X|Xs] = X+sum Xs; sum [] = 0; |
(The above definition is the Q equivalent for what Lisp programmers call "cdr'ing down a list".)
In the same fashion, we can use tuples and applications to form expression patterns on the left-hand side of an equation. For instance, we can define a binary tree insertion operation as follows:
insert X nil = bin X nil nil; insert X (bin Y T1 T2) = bin Y (insert X T1) T2 if X<Y; = bin Y T1 (insert X T2) otherwise; |
Here the nil
and bin
symbols act as constructors
which are used to build a tree-like data structure. Applications
involving such symbols are usually normal form expressions without any
"defining" equations. In such a case they can also be declared
const
(cf. 5. Declarations), which means that the compiler
will enforce that these symbols cannot be "redefined" by means of an
equation:
const nil, bin X T1 T2; |
After this declaration, equations like the following will be invalid:
nil = ...; // WRONG! bin X T1 T2 = ...; // WRONG! |
The same applies to the built-in constant symbols true
and
false
which are predeclared as const
, and also to built-in
constants, i.e., numbers, strings, files, lists and tuples. The general
rule is that no constant value is allowed as the left-hand side, or the
leftmost ("function") part of an application which forms the left-hand
side of an equation.
Note that since expressions involving operators are nothing but
syntactic sugar for applications, they can occur on the left-hand side
of an equation as well (with the exception of the '
operator
which is a constructor symbol, see 9. Special Forms). For instance,
here is a collection of algebraic simplification rules which cause any
expression involving only +
and *
to be converted into a
"sum of products" form:
// distributivity laws: (X+Y)*Z = X*Y+X*Z; X*(Y+Z) = X*Y+X*Z; // associativity laws: X+(Y+Z) = (X+Y)+Z; X*(Y*Z) = (X*Y)*Z; |
A word of caution: if the =
operator occurs at the toplevel of
the left- or right-hand side of an equation, it must be
parenthesized. This is necessary to prevent the =
operator to be
confused with the =
symbol used to separate both sides of an
equation. (The same also applies to variable definitions.) E.g., you
should write something like
(X=X) = true; |
instead of
X=X = true; // WRONG! |
Although almost any expression can form the left-hand side of an equation, you should be aware that an equation will only be applicable if the pattern you specify can actually occur in the course of an expression evaluation. In particular, since the Q interpreter evaluates expressions in a leftmost-innermost fashion, you should make sure that argument patterns match normal form expressions (cf. 7.6 Normal Forms and Reduction Strategy), unless the function to be defined is a special form (cf. 9. Special Forms). Otherwise the equation may be useless, albeit syntactically correct. The compiler cannot verify these conditions, so they are at your own responsibility.
Using equations we can also easily define "higher order" functions
which take functions as arguments or return them as results. This is
made possible by the fact that function application is an explicit
operation. For instance, as a generalization of the sum
function
above, we can implement an operation which applies any "binary"
function F
to a list, starting from a given initial value
A
:
foldr F A [] = A; foldr F A [X|Xs] = F X (foldr F A Xs); |
The foldr
("fold-right") function takes as its arguments a function
F
, an initial value A
, and a list [X1,...,Xn]
, and
computes the value
F X1 (F X2 (... (F Xn A)...)). |
(The name of this function comes from the fact that the recursive
applications of F
are parenthesized to the right.) Now we can easily
define the sum
function in terms of foldr
as follows:
sum = foldr (+) 0; |
There is another version of the fold operation, which accumulates results from left to right rather than from right to left:
foldl F A [] = A; foldl F A [X|Xs] = foldl F (F A X) Xs; |
We remark that although both functions work in linear time (with respect
to the size of the input list), the foldl
function actually is
more efficient than foldr
in terms of stack space requirements
since its definition is "tail-recursive" (cf. 7.10 Tail Recursion).
As another example, here is the definition of the function map
which
maps a given function F
to each member of a list Xs
:
map F [] = []; map F [X|Xs] = [F X|map F Xs]; |
Finally, we also give an example for a function which returns
another function. It is a toy implementation of the lambda calculus,
based on the combinatorial calculus. (This definition will only work
with simple kinds of expressions. Please refer to the standard library
script lambda.q
described in 11.7 Lambda Calculus, for a much
more comprehensive implementation which also properly handles the
obscure cases. Also note that the lambda
function is declared as
a special form which prevents its arguments from being
evaluated. This will be explained in 9. Special Forms.)
special lambda X Y; lambda X X = i; lambda X (Y Z) = s (lambda X Y) (lambda X Z); lambda X Y = k Y otherwise; /* combinator rules */ i X = X; k X Y = X; s X Y Z = X Z (Y Z); |
For instance, you might wish to try the following:
==> lambda X (2*X) s (s (k (*)) (k 2)) i ==> _ 4 8 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If a variable has multiple occurrences on the left-hand side of a rule,
each occurrence must be matched to the same value. In term rewriting
theory, such rules are called "non-linear" ("non-left-linear", to be
more precise). The following definition implements an operation
uniq
which removes adjacent duplicates from a list:
uniq [] = []; uniq [X,X|Xs] = uniq [X|Xs]; uniq [X|Xs] = [X|uniq Xs] otherwise; |
It is important to notice the difference between the syntactical notion
of equality which plays a role in definitions like the one above, and
the "semantic" equality relation defined by the =
operator
(cf. 6.4.3 Relational Operators). Non-linear equations and equations
involving constants, such as
uniq [X,X|Xs] = uniq [X|Xs]; fib 0 = 0; |
call for the verification of syntactic identities. While two expressions are
considered syntactically equal only if they print out the same in the
interpreter, the meaning of the =
operation depends on built-in rules
and equations specified by the programmer. For instance, two different
instances of the expression 0
always denote the same object. However,
since 0
is an integer and 0.0
a floating point number, these
two expressions are syntactically different. Nevertheless, the =
operator allows to compare integers and floating point numbers. Indeed it
asserts that 0
and 0.0
are "equal".
If you would like to have an explicit syntactic equality operation, you can easily define it yourself as follows:
eq X X = true; eq _ _ = false otherwise; |
In fact, this definition is provided in the stdlib.q
script, see
11. The Standard Library.
You should note that the strict separation of syntactic and semantic
equality is actually quite useful. First of all, it allows the =
operation to be treated in a manner consistent with other comparison
operators such as <
and >
. Secondly, it allows you to tailor
the definition of =
to your application. For instance, an application
might call for a set data structure, and you would like to be able to test
whether two expressions of a certain kind represent the same set. Now there
are fairly sophisticated data structures for representing sets efficiently,
but almost none of them allow you to decide whether two objects represent
the same set simply on the basis of syntactic identity. Instead, you will
have to provide an explicit =
operation which tests whether two set
objects contain the same elements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
On the right-hand side of an equation (as well as in the condition part
and the right-hand sides of local definitions), we distinguish between
bound variables, i.e., unqualified variable symbols which are
introduced in the left-hand side of an equation (or the left-hand side
of a local definition, see 7.4 Local Variables), and free
variables which are not bound by any of the left-hand side
patterns. Free variables can also be declared explicitly using a
var
declaration (cf. 5. Declarations). They are effectively
treated as (parameterless) function symbols, except that they may be
assigned a value by means of a variable definition. Variable
definitions have the following syntax:
definition : 'def' expression '=' expression {',' expression '=' expression} ';' | 'undef' identifier {',' identifier} ';' |
For instance, in the following equation,
foo X = C*X; |
the variable C
occurs free on the right-hand side and we have:
==> foo 23 C*23 |
We can assign a value to the variable as follows:
def C = 2; |
The same effect can also be achieved by using the def
command in
the interpreter:
==> def C = 2 |
After this definition we have:
==> foo 23 46 |
Free variables are useful if you have a script which repeatedly refers
to the same value whose calculation may be costly or involve functions
with side-effects (such as the built-in fopen
function, see
10. Built-In Functions). By assigning such a value to a variable, you
can refer to it without having to recompute the value each time it is
required, as would be the case with a corresponding function
definition. Note that free variable definitions cannot be changed inside
an equation, and therefore the use of free variables does not compromise
referential transparency. This is in contrast to other functional
programming languages such as Lisp which allow you to modify the value
assigned to a variable in a function definition. In Q, you can count on
that the value of a free variable does not change during a single
expression evaluation, although it may be altered between different
evaluations of the same expression.
The right-hand side of a variable definition may be any Q expression, as
described in 6. Expressions. The expression is evaluated once,
at the time the script is loaded, and is then matched against the
left-hand side of the definition, binding variables to the corresponding
component values of the right-hand side value. In the special case that
the left-hand side is a single variable, it is simply bound to the
right-hand side value. Whenever a defined variable occurs "free" on
the right-hand side of an equation or variable definition, it will then
be replaced by the value of the assigned expression. For instance, the
following definition assigns the value exp 1.0 = 2.71828...
to the (explicitly declared) free variable e
:
var e; def e = exp 1.0; |
Variable definitions are carried out in the order in which they occur in
a script, s.t. a definition may refer to the values of variables
assigned in previous definitions. Unless a variable has been declared
const
, it is also possible to override assignments made in
previous definitions, and to undefine a variable. Multiple definitions
can be performed with a single def
statement. For instance:
def N = 99, K = N, N = 2*K+1, M = N; undef N, K; |
As already mentioned, the value assigned to a variable may also be
changed with def
and undef
statements in the interpreter,
see B.2 Command Language.
Free variable definitions can also involve a pattern on the left-hand side which is to be matched against the supplied value. E.g.,
def (X,Y) = foo Z; |
causes foo Z
to be evaluated and to be matched against
(X,Y)
. The interpreter then defines the free variables X
and Y
accordingly. Pattern matching is performed in the same
manner as for the left-hand side of an equation. However, an error is
reported if the match fails.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As already mentioned, an equation may also contain one or more auxiliary
variable definitions a.k.a. where
clauses, which bind
additional variables to their corresponding values. In difference to the
global definitions of "free" variables discussed in 7.3 Free Variables, the definitions in where
clauses are "local" to the
rule in which they occur, meaning that the values of these variables are
only available in the rule they are defined in. As in the case of free
variable definitions, the left-hand side of each where
clause can
be an arbitrary expression pattern which is to be matched against the
value computed from the corresponding right-hand side. The variables
occuring on the left-hand side of the definition will then be bound to
their corresponding values, thereby becoming "bound" variables in the
context of the rule, in addition to the variables occuring on the
left-hand side of the equation. Local definitions also act as additional
conditions in that the rule will only be applicable if all left-hand
sides in where
clauses match their corresponding right-hand side
values (this is guaranteed, of course, if each left-hand side is a
single variable).
Local definitions are useful if the right-hand side of a rule repeatedly refers to the same (maybe complicated) subexpression. You can avoid the recalculation of such a value be assigning it to a variable, and referring to the variable in the right-hand side. For instance:
foo X = bar Y Z where Y = baz X, Z = qux Y; |
As already mentioned, pattern-matching definitions are permitted as well (and the anonymous variable also works as usual). A failing match causes the entire rule to fail. For instance,
foo Z = bar X Z where [X|_] = Z; |
works like
foo [X|Xs] = bar X [X|Xs]; |
but avoids repeating the list value on the right.
Note that if a variable is bound by the left-hand side or a local
definition of a rule, then a global variable of the same name will be
"shadowed" within the rule. However, you can still access the value of
the shadowed variable with a qualified identifier. For instance, with
the following definitions, foo 2
reduces to bar 2*99
:
/* this is the script foo.q */ def FOO = 99; foo X = FOO*foo::FOO where FOO = bar X; |
The variable bindings in a single where
clause are performed in
the given order, and each definition may refer to all left-hand side
variables of the rule, as well as all variables introduced in earlier
definitions. Each use of a variable refers to its most recent
definition. It is also possible to have multiple where
clauses,
as in:
foo X = bar Y where Y = baz Z where Z = qux U where U = quux X; |
Note that here the definitions will be processed from right to
left. Such constructs are convenient if there is a series of values to
be computed in which one value depends on the next one. Using multiple
where
clauses, you start off with the definitions immediately
required by the right-hand side of the rule, then add another
where
clause for the variables introduced in the first clause,
etc. The above definition is actually equivalent to the following:
foo X = bar Y where U = quux X, Z = qux U, Y = baz Z; |
It is also possible to mix where
clauses with conditions, as in:
foo X = bar Y if qux Y where Y = baz Z if quux Z where Z = bazz X; |
Again, the different qualifiers are actually considered from right to
left; the rule will be aborted as soon as the first qualifier fails
(i.e., a condition evaluates to false
or the left-hand side of a
definition does not match the corresponding right-hand side), so that
no time is wasted with definitions which are not needed anyway.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The Q language allows you to restrict the set of patterns which can be matched by a left-hand side variable. This is done by augmenting any left-hand side occurrence of the variable with a type guard, using the notation:
variable:type |
For instance, to declare a type BinTree
which comprises the
constructor patterns nil
and bin X T1 T2
, the following
declaration might be used (cf. 5. Declarations):
public type BinTree = const nil, bin X T1 T2; |
This declaration introduces two function symbols nil
and
bin
, just as if they were declared by:
public const nil, bin X T1 T2; |
However, it also assigns these symbols to the type symbol BinTree
.
The type symbol can then be used as a guard on variables to ensure
that a variable is only matched to objects of the given type. E.g., the
following equation will only apply if the argument to the foo
function has the form nil
or bin X T1 T2
for some
subexpressions X
, T1
and T2
:
foo T:BinTree = ...; |
This makes it possible to avoid explicit argument patterns like
foo nil = ...; foo (bin X T1 T2) = ...; |
in cases in which the component values are not actually required. Also, it
allows you to define operations on a given type without actually referring
to the constructor symbols of the type, and thereby makes it possible to
avoid dependencies on implementation details. You can even "hide" the
constructor symbols of a type by declaring them private
, which makes
the symbols accessible only to the script which declares the type they
belong to. For instance:
public type BinTree = private const nil, bin X T1 T2; |
The Q language allows you to construct a hierarchy of types by making one
type a subtype of another. This is done by specifying a supertype following
the type identifier in a type declaration. For instance, to make
BinTree
a subtype of the type SearchTree
, BinTree
would
be declared as follows:
type BinTree : SearchTree = const nil, bin X T1 T2; |
Now a variable of type SearchTree
will not only match
SearchTree
's, but also any object of its subtype BinTree
. In
fact, we might have declared SearchTree
as an "abstract" type which
does not provide any constructors at all:
type SearchTree; |
Abstract supertypes like SearchTree
are useful for factoring out
generic operations which are shared by different subtypes; see 8. Types,
for examples, and for a more detailed discussion of the type guard concept.
The Q language has nine predefined type symbols which can be used to
distinguish the corresponding types of objects built into the language:
Int
, Float
, Num
(which is the supertype for both
Int
and Float
), String
, Char
(the subtype of
String
denoting the single-character strings), File
,
List
, Tuple
and Bool
. (Besides these, there are
also two types Exception
and SysException
for dealing with
exception values, cf. 10.6 Exception Handling.) For instance, an
operation isstr
which determines whether its argument is a string
can be defined as:
isstr _:String = true; isstr _ = false otherwise; |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The process in which equations are applied in the Q interpreter to evaluate a given expression is fairly straightforward. In fact, it corresponds to the way in which we manipulate algebraic formulae, and we learned how to do that in high school.
We say that an equation L=R
is applicable to a given expression
X
, if we can substitute actual values for the variables occurring in
L
s.t. we obtain X
. This is also expressed as "L
matches X
" or "X
is an instance of L
". In this case,
we can replace X
by Y
, where Y
is the expression
obtained from R
by replacing the variables occurring in L
by
their corresponding values. We shall call such a replacement step a
reduction, and write it down as X => Y
.
For instance, given the rule
sqr X = X*X; |
we have that
sqr 2 => 2*2 |
since the expression sqr 2
matches the left-hand side sqr X
,
with 2
being substituted for the variable X
. This works with
compound expressions as well, even if they involve variables themselves.
For instance:
sqr (X+1) => (X+1)*(X+1). |
Here, the compound expression (X+1)
has been substituted for the
variable X
.
Equations are usually applied in the context of a larger expression to be evaluated. For instance, we have that
sqr 2 + 2 => 2*2+2 |
by the application of the above definition of sqr
to the subexpression
sqr 2
of sqr 2 + 2
.
Before we proceed, a remark on the role of the built-in operations of the
Q language is in order. Primitive operations, such as the built-in +
and *
operators described in 6.4 Built-In Operators, do not require
any equations specified by the programmer. Instead, they may be thought of
as being predefined by a large set of built-in equations. Each application
of a built-in rule also counts as a single reduction. For instance, we have
that
2*2 => 4 |
by the built-in rule which handles the multiplication of integers.
Built-in rules take priority over any equations specified by the programmer. However, it is possible to extend the definition of a primitive operation with equations supplied by the programmer. A corresponding example has already been given in 7.1 Equations. By refining built-in definitions accordingly, you can also treat "exceptions" in built-in operations; see 2.5 Runtime Errors, for an example.
In order to complete the evaluation of a given expression, we have to
repeatedly perform single reductions until no further equations (built-in
or user-defined) apply. We then say that the resulting expression is in
normal form, and interpret it as the value of the original
expression. For instance, the following reduction sequence leads to the
normal form (i.e., value) 6
for the target expression
sqr 2 + 2
(we still assume the definition of sqr
introduced
above):
sqr 2 + 2 => 2*2+2 => 4+2 => 6. |
Normal forms do not necessarily exist. Just as in conventional programming languages, the evaluation may go into an endless loop and never return an answer. Even if a normal form exists, it need not be unique -- it may depend on which equations are used in the reduction process and in what order. The problem is that at any point in the reduction process there may be more than one "reducible" subexpression, and even if there is only one such expression (termed a redex), there may be more than one applicable equation.
To illustrate this kind of problem, consider the definition of the sqr
function from above, and the target expression sqr (1+1)
. This
expression admits three distinct reduction sequences:
sqr (1+1) => sqr 2 => 2*2 => 4 sqr (1+1) => (1+1)*(1+1) => 2*(1+1) => 2*2 => 4 sqr (1+1) => (1+1)*(1+1) => (1+1)*2 => 2*2 => 4. |
The second and third reduction sequence appear to be almost the same, but
the first reduction sequence makes clear that it matters whether we reduce
the subexpression (1+1)
before applying the definition of sqr
or not. In the present example, the order in which reductions are performed
only affects the number of required reduction steps, but it is easy to
construct other systems of equations in which both the termination of a
normal form evaluation and the computed normal forms actually depend on the
evaluation order.
Non-uniqueness of normal forms does not necessarily mean that there must be two or more rules being applicable to a given expression. Ambiguities can also arise when an equation "overlaps" with other equations or with itself, as in the following example:
foo (foo X) = bar X; |
This system consisting of a single equation is terminating, but it has
two distinct normal forms for the expression foo (foo (foo X))
,
namely foo (bar X)
and bar (foo X)
.
Indeed, the termination of rewrite systems and the uniqueness of normal forms are important issues in the theory of term rewrite systems [Dershowitz/Jouannaud 1990]. The Q language simply avoids these problems by imposing a reduction strategy which makes the reduction process deterministic. The reduction strategy must specify unambiguously for each reduction step the redex that is to be reduced and the equation which is to be applied. For this purpose the Q interpreter applies the following rules:
The leftmost-innermost strategy is common in many programming languages, and
it also corresponds to the "natural" way in which people tend to carry
out calculations manually. It means that at any point in the reduction
process it is always the leftmost-innermost redex which is to be reduced.
In particular, this implies that in an application of the form
F X
first F
is evaluated, then X
, before the
value of F
is applied to the value of X
. Thus the first
reduction sequence from above,
sqr (1+1) => sqr 2 => 2*2 => 4 |
would be the one actually carried out in the interpreter in the course
of the evaluation of sqr (1+1)
.
The leftmost-innermost strategy resolves all ambiguities in the choice
of redices during a normal form evaluation. The second disambiguating
rule allows to decide which equation to apply if more than one equation
is applicable to a given redex. As indicated, the default is to apply
equations in the order in which they appear in the source script. This
allows you to have equations with overlapping left-hand sides which
naturally arise, e.g., in definitions involving "special case rules"
such as the definition of the fib
function in 7.1 Equations:
fib 0 = 0; fib 1 = 1; fib N = fib (N-1) + fib (N-2) otherwise; |
We remark that the default textual order of equations can also be changed by using an appropriate "rule priority" directive. This is described in 7.8 Rule Priorities.
The rule for applying equations is extended to the built-in equations of
primitive operations by assuming -- as already mentioned -- that the
built-in rules come "before" any equations specified by the
programmer, and hence always take priority. We also remark that
external functions, which are declared with the extern
modifier and implemented in a C module, are treated in an analogous
manner. In general, the interpreter will first try a built-in rule, then
an extern rule, then the equations supplied in the script. (The
treatment of extern functions is described in more detail in C. C Language Interface.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Conditional rules are applied in the same manner as simple equations,
only we also have to check the conditions of the rule. When the
left-hand side of a conditional rule matches the target expression, we
first evaluate the condition (with left-hand side variables being
replaced as usual). This expression must evaluate to a truth
value, otherwise we cannot decide whether the rule is applicable or
not. (The Q interpreter generates a runtime error in such cases.) The
rule is applicable only when the condition evaluates to true
.
For instance, recall the definition of the factorial function:
fac N = N*fac (N-1) if N>0; = 1 otherwise; |
The first rule is applicable only if N>0
evaluates to
true
; if it evaluates to false
, the second rule
applies. Furthermore, if N
happens to be incomparable with
0
, then a runtime error will occur.
Here's how this definition is applied to evaluate the expression
fac 3
:
fac 3 => 3*fac (3-1) => 3*fac 2 => 3*(2*fac (2-1)) => 3*(2*fac 1) => 3*(2*(1*fac (1-1))) => 3*(2*(1*fac 0)) => 3*(2*(1*1)) => 3*(2*1) => 3*2 => 6. |
Local variable definitions (cf. 7.4 Local Variables) are treated in a similar fashion. We first evaluate the right-hand side of the definition and match it against the corresponding left-hand side. If the match fails, the rule is skipped. Otherwise, we bind variables accordingly and proceed with the next qualifier or the right-hand side of the rule.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
We already discussed that the Q interpreter normally considers equations in the order in which they appear in a script. Hence, if all your equations go into a single script, you simply write down overlapping equations in the order in which they should be tried by the interpreter. However, if equations for the same expression pattern are scattered out over different modules, then the rule order also depends on the order in which the different modules are processed by the compiler. In the current implementation, the rules of each module are processed when the first import or include declaration for the module is encountered during a preorder traversal of the module graph starting at the main script, but you should not rely on this.
Hence it is usually a bad idea to have overlapping equations scattered out over different source files. However, in some situations this is hard to avoid. For instance, you might decide to put all "default rules" for certain expression patterns in a separate module. But if you simply import this module at the beginning of your main script then the default rules might override all the other equations.
To work around this, the Q language allows you to explicitly attach priorities to equations by means of a priority declaration, which has the following format:
declaration : '@' ['+'|'-'] unsigned-number |
This specifies the given number (which must be an integer, optionally
signed with `+' or `-') as the priority level of the
following equations. In the current implementation, priority levels must
be 32 bit integers, i.e., in the range
-0x80000000..0x7fffffff
. Rules are ordered according to their
priorities, highest priority first. Rules with the same priority are
considered in the order in which they appear in the script.
The priority level set with a priority declaration applies only to the
script in which the declaration is located. The default priority level,
which is in effect up to the first @
declaration, is 0.
Note that even with priority declarations the built-in operations still take priority over all user-defined rules, i.e., those operations effectively have infinite priority. (The same applies to external functions, see C. C Language Interface.)
As a simple (and contrived) example for the use of priority declarations, consider the following script:
@-1 foo X = -1; // "default" rule @0 foo X:Num = 0; // "standard" rule @+1 foo X:Int = 1; // "special case" rule |
The second equation is at the default priority level, 0. The first equation, although it comes before the second one, becomes a "fallback" rule by assigning it a low priority, -1. The last equation is put on a high priority level, +1, and hence overrides both preceding equations. The net effect is that the equations are actually ordered in reverse textual order. You can verify this in the interpreter:
==> foo 77 1 ==> foo 77.0 0 ==> foo () -1 |
Of course, this example is somewhat silly, because in a single script we can easily arrange equations in the correct textual order. However, if for some reason we had to put the three rules into three different modules, then using the given priority declaration ensures that the equations will always be applied in the correct order.
Priority levels should be used sparingly, if at all. Using low priorities to factor out a module with default rules can occasionally be quite useful, but overriding rules with high priorities is considered harmful and should be avoided whenever possible.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The present and the following section contain material for the technically interested reader which, while not being strictly necessary to work successfully with the Q programming language, helps to understand the operation of the Q interpreter and to write better Q scripts. You might wish to skim the following at first reading, and later return to it when you have become more proficient with the Q language and require some further background knowledge.
While the "rewriting model" of expression evaluation introduced in
7.6 Normal Forms and Reduction Strategy, provides a nice conceptual model
for the Q programmer, it is too inefficient to be a useful implementation
model for the Q interpreter. In particular, it would be rather inefficient to
actually construct the full expressions which occur as intermediate results
in the course of a reduction sequence, and repeatedly scan these expressions
for the leftmost-innermost redex. Instead, we can evaluate expressions by a
simple recursive procedure as follows. The input to the algorithm is an
expression X
to be evaluated.
X
is an application, list or tuple, evaluate the parts of X
recursively, from left to right. Proceed with step 2.
X
, invoke it and
return the corresponding value.
L=R
which is
applicable to X
. If there is no such rule, X
is in normal form
and is returned as the value of the expression. Otherwise we recursively
evaluate R
(with variables instantiated to their corresponding values)
and return it as the value of the original expression.
(The above description of course leaves out many important details. For
instance, the interpreter will also have to take care of def
'ed
symbols, and we will have to organize the search for an applicable equation.
However, here we only want to give the general idea, and not a complete
implementation of the interpreter.)
The above procedure can actually be implemented non-recursively if we keep track of the rules which are currently "active", together with the corresponding variable bindings. This information can easily be maintained on a stack. We illustrate this with a simple example. Recall the definition of the sum function:
sum [X|Xs] = X+sum Xs; sum [] = 0; |
The evaluation of sum [1,2,3]
then proceeds as follows:
sum [1,2,3] => 1+sum [2,3]: sum [2,3] => 2+sum [3]: sum [3] => 3+sum []: sum [] => 0 3+0 => 3 sum [3] => 3 2+3 => 5 sum [2,3] => 5 1+5 => 6 sum [1,2,3] => 6 |
Each new level of indentation indicates that we "suspend" the current rule (push it on the stack) and activate a new rule in order to evaluate some part of the right-hand side of the suspended rule. When all parts of the right-hand side have been evaluated, we return to the suspended rule (pop it from the stack) and actually perform the reduction. More precisely, we replace the left-hand side of the suspended rule by the result obtained by evaluating the right-hand side, which is already in normal form. (We will of course have to keep track of the results of intermediate reductions, which can be done on a second "evaluation" stack. When the evaluation of the right-hand side is finished, the corresponding result will be on top of the evaluation stack.)
If desired, we can easily reconstruct the "context" of a rule by following the chain of stacked rules starting from the current rule, and proceeding towards the bottom of the stack. For instance, when we perform the reduction
sum [] => 0 |
in the above example, the context of this rule is described by the following stacked rules:
sum [3] => 3+sum [] sum [2,3] => 2+sum [3] sum [1,2,3] => 1+sum [2,3] |
Thus, when sum []
gets reduced to 0
, we have actually completed
the following initial part of the reduction sequence:
sum [1,2,3] => 1+sum [2,3] => 1+(2+sum [3]) => 1+(2+(3+sum [])) => 1+(2+(3+0)) |
(Note that we never actually constructed intermediate results like the
expression 1+(2+(3+sum []))
. Rather, these expressions are represented
implicitly by the contents of the stack.)
We can also apply the above procedure to qualified rules accordingly.
When the interpreter comes to consider a conditional rule, it pushes it
onto the stack as usual. However, it then first starts to evaluate the
qualifying conditions and where
clauses of the rule. When the
value of a condition has been computed, we can check whether it
holds. If the computed value is neither true
nor false
,
the interpreter aborts the evaluation with a runtime error. If it is
true
, it proceeds by evaluating other qualifiers or the
right-hand side. If it is false
, however, it gives up on the
current rule (pops it from the stack), and considers the next applicable
equation. A similar approach is used to handle local variable
definitions.
For instance, consider once more the definition of the factorial function:
fac N = N*fac (N-1) if N>0; = 1 otherwise; |
The computation of fac 3
then proceeds as follows (the -!-
symbol indicates that the condition N>0
has evaluated to false
and hence the corresponding rule is abandoned):
fac 3 => 3*fac (3-1): 3>0 => true 3-1 => 2 fac 2 => 2*fac (2-1): 2>0 => true 2-1 => 1 fac 1 => 1*fac (1-1): 1>0 => true 1-1 => 0 fac 0 => 0*fac (0-1): 0>0 => false -!- fac 0 => 1 1*1 => 1 fac 1 => 1 2*1 => 2 fac 2 => 2 3*2 => 6 fac 3 => 6 |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The above equations for the sum
and fac
functions are examples
for recursive definitions. The computational processes generated from
such definitions are characterized by chains of suspended rules in which
the same rules are activated over and over again. For instance, an
evaluation of fac N
with N>0
requires N
recursive
activations of the rule:
fac N = N*fac (N-1) if N>0; |
Hence, if N
becomes very large, the recursive definition of fac
is in danger of running out of stack space. In the following, we show how to
avoid this defect by employing the technique of "tail-recursive
programming".
It is a well-known fact that the factorial function also has an "iterative"
implementation which can be executed in constant memory space. The idea is
to maintain a "running product" P
and a "counter" I
which
counts down from N
to 1
. The iterative algorithm can be written
down in a conventional programming language like Pascal as follows:
P := 1; I := N; while I>0 do begin P := P*I; I := I-1; end; |
While the Q language does not provide any special looping constructs (and
it also misses an assignment operation), there is an alternative definition
of fac
which takes up the idea of running products and counters and
implements the factorial function in an iterative fashion:
fac N = fac2 1 N; fac2 P I = fac2 (P*I) (I-1) if I>0; = P otherwise; |
Here, the "state variables" P
and I
are implemented as
arguments of an "auxiliary" function fac2
which is invoked from
fac
. Again, this is a recursive definition; it requires N
recursive applications of the rule
fac2 P I = fac2 (P*I) (I-1) if I>0; |
when fac N
is computed. However, in difference to our previous
definition of fac
, the recursive rule is always applied "on top"
of the target expression. Such rules are called tail-recursive.
(The name "tail recursion" comes from the fact that the recursive
application of fac2
is the last operation considered during the
leftmost-innermost evaluation of the right-hand side.)
For instance, the evaluation of fac 3
now proceeds as follows
(abbreviating reductions by built-in rules for `-' and `*'):
fac 3 => fac2 1 3 => fac2 (1*3) (3-1) => fac2 3 2 => fac2 (3*2) (2-1) => fac2 6 1 => fac2 (6*1) (1-1) => fac2 6 0 => 6 |
Tail-recursive definitions can be employed for implementing all kinds of functions which can be computed on a machine with a fixed set of "registers" and no auxiliary memory [Abelson/Susssman 1985]. For instance, here is a tail-recursive implementation of the Fibonacci function:
fib N = fib2 1 0 N; fib2 A B N = fib2 (A+B) A (N-1) if N>0; = B otherwise; |
(This definition also has the desirable side effect that it cuts down the exponential running time of the "naive" definition given in 7.1 Equations, to linear.)
The Q interpreter employs a clever optimization trick, commonly known as tail recursion optimization (see e.g., [Steele/Sussman 1975]), to actually execute tail-recursive definitions within constant stack space. Hence no special looping constructs are required for implementing iterative algorithms efficiently in the Q language.
Assume that in our stack model of expression evaluation we are working
on a reduction X => Y
, and that we already recursively
evaluated all parts of the right-hand side Y
, but not Y
itself. Furthermore, suppose that in order to evaluate Y
we will
have to apply the rule Y => Z
next. Then, instead of
keeping the rule X => Y
suspended and evaluating Y
using rule Y => Z
recursively, we can also immediately
perform the reduction X => Y
and replace this rule with
Y => Z
on the stack. Thus, the new rule Y =>
Z
will not require any additional stack space at all, but simply reuses
the existing "activation record" for the X => Y
rule. In
other words, instead of invoking the Y => Z
rule as a
"subroutine", we effectively perform a kind of "goto" to the new
rule. We also refer to this process as performing the tail
reduction X => Y
. The evaluation now proceeds as if we
had been working on the rule Y => Z
in the first place.
For instance, with the new definition of fac
the evaluation of
fac 3
would be carried out using only a single level of suspended
rules (again, the -!- symbol signals abortion of a rule with failing
qualifier):
fac 3 => fac2 1 3: fac2 1 3 => fac2 (1*3) (3-1): 3>0 => true 1*3 => 3 3-1 => 2 fac2 3 2 => fac2 (3*2) (2-1): 2>0 => true 3*2 => 6 2-1 => 1 fac2 6 1 => fac2 (6*1) (1-1): 1>0 => true 6*1 => 6 1-1 => 0 fac2 6 0 => fac2 (6*0) (0-1): 0>0 => false -!- fac2 6 0 => 6 |
Besides the tail recursion optimization technique discussed above, the Q
interpreter also automatically optimizes toplevel sequences
(i.e., applications of the ||
operator which are not inside a
nested subexpression) on the right-hand side of equations, s.t. basic
imperative-style looping constructs can be executed in a tail-recursive
fashion as well. Suppose, for instance, that we want to implement a
do
function which applies a function to each member of a list,
like the map
function discussed in 7.1 Equations, but instead
of collecting the results in an output list simply throws away the
intermediate values and returns ()
. This is useful, of course,
only if the function is evaluated solely for its side-effects, e.g., if
we want to print out all elements of a list. A straightforward
definition of do
looks as follows:
do F [] = (); do F [X|Xs] = F X || do F Xs; |
Now this definition is not tail-recursive in the strict sense
alluded to above, because the last application on the right-hand side of
the second rule actually involves (||)
and not the do
function. (Recall that an infix expression like X||Y
is nothing
but syntact sugar for the function application (||) X Y
.)
However, as a remedy, the Q interpreter actually implements toplevel
sequences on the right-hand side of a rule by direct stack manipulation.
That is, the first argument of a sequence is thrown away as soon as it
has been computed. By these means, the interpreter, after having
computed F X
, effectively carries out the reduction do F
[X|Xs] => do F Xs
on which it can perform tail recursion
optimization as usual. Thus the above definition actually works in
constant stack space, as one might reasonably expect.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
There are a few conditions which may force the Q interpreter to give up on a reduction sequence and abort the evaluation with a runtime error message. These conditions are listed below.
Ctl-C
). In this case it is possible to resume the evaluation in
the debugger (cf. D. Debugging), provided that debugging has been
enabled with the break on
command (see B.2 Command Language).
break
is on
,
the debugger is invoked to inform the user of the equation which gave
rise to the error condition.
All of the above error conditions can also be caught and handled by the running script in any desired manner, see 10.6 Exception Handling.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |