| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The notion of expressions is an intrinsic part of most, if not all, programming languages. However, in most programming languages expressions are merely treated as descriptions of computations to be performed in order to obtain the value of an expression, which usually belongs to some basic type provided by the language. In contrast, in the Q language expressions are objects in their own right which, as we shall see, are manipulated according to computational rules specified by the programmer.
As usual, expressions are built from certain kinds of basic objects. In the Q language, these are integers, floating point numbers, strings, function and variable symbols. These objects are combined into larger, compound expressions by means of applications, lists and tuples. For convenience, the Q language also provides prefix and infix operator symbols for the most common operations; these are actually treated as "syntactic sugar" for applications. The syntax of all these entities is described by the following grammatical rules:
expression : identifier
| variable-identifier ':' identifier
| number
| string
| expression expression
| unary-op expression
| expression binary-op expression
| '(' [element-list] ')'
| '[' [element-list] ']'
| '(' op ')'
| '(' expression binary-op ')'
| '(' binary-op expression ')'
element-list : expression-list ['|' expression]
expression-list : expression {',' expression}
op : unary-op|binary-op
unary-op : '-'|'#'|'not'|'''|'`'|'~'
binary-op : '^'|'!'|'++'|'+'|'-'|'\'|'*'|'/'|'div'|'mod'
| 'and'|'or'|'and' 'then'|'or' 'else'
|'<'|'>'|'='|'<='|'>='|'<>'|'in'|'||'
|
6.1 Constants and Variables 6.2 Applications 6.3 Lists and Tuples 6.4 Built-In Operators
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The nonterminals identifier, number and string
occuring in the syntax rules above refer to the lexical entities
introduced in 3. Lexical Matters. Note that the Q language actually
distinguishes two different types of identifiers (function and variable
identifiers) and two different types of numeric quantities (integers and
floating point numbers). Integers are implemented as "bignums" using
the GNU multiprecision package (GMP); thus the size of an integer value
is only limited by available memory. Floating point values are
implemented using 64 bit (i.e., double precision) floating point
numbers; on most machines, these should provide nonzero absolute values
ranging from 1.7E-308 to 1.7E308 and a precision of 15 decimal digits.
String constants are character sequences internally represented as character
arrays permitting constant-time access to the individual characters of a
string. There is no a priori length restriction for string constants. In the
current implementation, the null character \0 is reserved as a string
terminator and must not be used as an ordinary string character.
Furthermore, the Q language provides four other built-in constants which
denote the truth values (true and false), and the empty list
and tuple ([] and (), to be discussed in 6.3 Lists and Tuples).
A variable symbol can be "bound" or "free", depending on the context
in which it occurs. We say that a variable is bound in an equation
if it occurs on the left-hand side of the equation. Otherwise, the
variable is a free variable. In this case, the variable may denote
a value (introduced with def), or simply stands for itself. In
any case, a variable is a legal atomic expression which may stand
wherever a constant is allowed.
On the left-hand side of equations, it is possible to declare that a variable is of a given type by using the notation:
variable:type |
This requires that you have previously declared type as a type identifier, see 5. Declarations. When a type identifier is specified with a variable, the variable will only match values belonging to the given type, cf. 7.5 Type Guards.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Application is probably the most important single construct in the Q
language. It allows you to apply one object (the "function") to another
one (the "argument"). This construct is used to denote "function calls"
such as sqrt 2 as well as "constructor terms" such as
bin X T1 T2 which encode tree-like data structures. Also, as
we will see in 6.4 Built-In Operators, the built-in operators +,
-, etc. are merely syntactic sugar for applications. Indeed,
applications could also be used to represent sequences of objects. However,
as this latter kind of objects tends to be used so frequently in Q scripts,
the Q language provides special support for sequences by means of the
built-in list and tuple constructs discussed in 6.3 Lists and Tuples.
As in other contemporary functional languages, application is a binary
operation written simply as juxtaposition: X Y denotes the
application of X to Y, both of which may be arbitrary
expressions themselves. Application associates to the left; the
construct X Y1 ... Yn is equivalent to (...((X Y1)
Y2) ...) Yn, and denotes the application of X to n arguments
Y1, ..., Yn. This style of writing function
applications is commonly referred to as currying, after the
American logician H.B. Curry. We will have to say more about this
shortly.
Here are some valid examples of applications:
sqrt 2 sqrt (Y+1) foo X (bar (Y-1)) Z |
Note that since application is left-associative, nested applications in
arguments must be set off with parentheses. For instance, foo (bar X)
applies foo to bar X, whereas foo bar X applies
foo to two arguments bar and X.
Since currying is so ubiquitous in functional programming, you should be well familiar with it, so let us briefly explain what it is, and what it is good for.
Functions of multiple arguments can generally be defined in two different ways:
max, which takes the maximum of two values, as follows:
max (X,Y) = X if X>Y;
= Y otherwise;
|
max function:
max X Y = X if X>Y;
= Y otherwise;
|
Here max X Y is to be read as (max X) Y, meaning that by
applying max to the first argument X, we obtain another
function max X, which, when applied to the second argument
Y, yields the maximum of X and Y.
The Q language supports both kinds of notations. Choosing between the
two is more than just a matter of taste. Besides saving parentheses,
curried functions have the chief advantage that they allow us to make
use of initial "parts" of a multi-argument function. For instance,
given the curried definition of max from above, max 0 can
be used to denote the function computing the "nonnegative part" of its
argument (which is the argument itself if it is nonnegative, and zero
otherwise). This does not work with the uncurried definition since that
definition requires us to specify both arguments of max in one
chunk; instead, we would have to start defining the derived function
from scratch.
Uncurried definitions also have their merits. In particular, they allow
us to define variadic functions, i.e., functions which can handle
a varying number of arguments. For instance, the following definition
enables us to apply the max function to both pairs and triples:
max (X,Y) = X if X>=Y;
= Y otherwise;
max (X,Y,Z) = max (X,max (Y,Z))
|
In fact, in the Q language it is possible to define generic functions
which apply to any number of arguments specified as a tuple. For
instance, the following version of max handles tuples of any size
at least 2:
max (X,Y) = X if X>=Y;
= Y otherwise;
max (X,Y|Zs) = max (X,max (Y|Zs));
|
(In the above example, the notation (...|Zs) is used to
denote a tuple whose "remaining elements" form the tuple
Zs. This is explained in the following section.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Besides strings, the Q language provides two closely related general constructs for representing sequences of objects: lists and tuples.
The constant [] denotes the empty list. In general, a list consisting
of elements X1, X2, ..., Xn is denoted
[X1,X2,...,Xn]. For instance, [a,b,c] consists of three
elements (symbols) a, b and c. It is possible to have
nested lists, as in [a,[b,c]] which consists of two elements, the
symbol a and the list [b,c].
As in Prolog, lists are represented in a right-recursive fashion using a
binary constructor [|] which takes as its arguments the head element
of the list and the list of the remaining elements. Thus,
[a,b,c] is simply a convenient shorthand notation for
[a|[b|[c|[]]]]. You can also mix both styles of notation; for
instance, [a,b|[c,d]] is another way to represent the 4-element list
[a,b,c,d].
Note that [a|[b,c]] is different from [a,[b,c]]: the
former denotes a three-element list, while the latter is a two-element
list whose second member happens to be a list itself. Also note that the
[|] constructor can in fact be applied to any pair of values (the
second value does not have to be a list); e.g., [a|b] is a
perfectly well-formed expression (although the built-in length, indexing
and concatenation operations described in 6.4.5 String/List/Tuple Operators, will fail on such values).
Tuples work in much the same fashion as lists. The constant ()
denotes the empty tuple, and a tuple consisting of elements X1,
X2, ..., Xn is written as (X1,X2,...,Xn),
which is equivalent to (X1|(X2|...|(Xn|()))), where the
notation (X|Xs) denotes a tuple consisting of a first element
X and the tuple Xs of the remaining elements.
The Q language also has the notion of a 1-tuple
(X)=(X|()). It is important to note that in Q a 1-tuple is really
different from its single member. Otherwise, there could be no nested
1-tuples -- in fact, due to the right-recursive nature of tuples in Q,
there would be no nested tuples at the end of a tuple at all. Therefore
Q distinguishes between 1-tuples and their members, and if you want to
define a function operating on both 1-tuples and ordinary expressions,
you will have to provide equations for both.
Unfortunately, since parentheses are used for two different purposes,
namely for expression grouping and for tuples, 1-tuples also give rise
to an ugly syntactic ambiguity: should an expression of the form
(X) denote a 1-tuple or a simple parenthesized expression? For
notational convenience, Q adopts the following convention: any
"primary" expression (i.e., anything which binds stronger than an
application) can be turned into a 1-tuple simply by enclosing it in
parentheses. So, for instance, (99), ((-99)), ('X),
(foo), ((foo (bar X))) all denote 1-tuples. Note the extra
level of parentheses around compound expressions required to distinguish
1-tuples from ordinary parenthesized expressions. (This also applies to
negative numbers like -99, which syntactically are applications
of unary minus, i.e., a compound expression.) If the above sounds
confusing to you, here is a simple rule of thumb: an extra level of
parentheses is required whenever the target expression must be
parenthesized when occuring as an argument of a function
application. Nested 1-tuples can be obtained by adding extra levels of
parentheses accordingly. E.g., ((99)) and (((-99))) both
denote a 1-tuple of a 1-tuple of a number.
(NB: A common pitfall with Q's tuple notation is that one easily gets
unwanted 1-tuples if one has the habit of decorating expressions with
superflous parentheses. This is undoubtedly one of Q's worst
"features", but the author just could not put up with alternative
notations like curly braces, or the unwieldy (X|()) syntax.)
The big difference between lists and tuples in the Q language is that,
while lists are always represented as recursive data objects using a
binary constructor symbol (just the way that they are written), tuples
are actually implemented as "vectors" which are stored as contiguous
sequences in memory. (Of course, this only works for "well-formed"
tuples; if the "remainder" Xs of a tuple (X|Xs) is not a
tuple, then this tuple can only be represented using an explicit
application of the tuple constructor.) Therefore tuples normally use up
much less memory than lists of the same size, and they also allow
constant-time access to their members. The size of a tuple can be
determined in constant time as well. In contrast, the same operations,
when applied to a list, require time proportional to the size of the
list. On the other hand, lists are more efficient when accessing the
remainder part of a list using pattern matching, and when a new element
is prepended to a list using the list constructor, which can both be
done in constant time. Here a tuple needs time proportional to its size,
since the member sequence of the original tuple must be copied when
accessing its remainder part or when constructing a new tuple. (This
also implies that converting a list to a tuple using the tuple constructor
actually takes quadratic time and hence is quite slow for larger
sequences; as a remedy, a built-in tuple function is provided
which does the conversion in linear time, see 10.4 Conversion Functions.)
These tradeoffs should be carefully considered when deciding whether to implement a given sequence as a list or a tuple. Tuples are usually the best choice for implementing fixed sequences requiring fast random access to its individual members, whereas lists provide an efficient means to represent sequences which have to be traversed and manipulated very frequently.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Besides the forms of expressions already discussed above, the Q language also provides a collection of infix and prefix operator symbols for common arithmetic, logical, relational and other operations. A complete list of these symbols is given below, in order of decreasing precedence:
' ` ~
^ !
- # not
* / div mod and and-then
++ + - or or-else
< > = <= >= <> in
||
Most of these symbols have their usual meaning; a closer description
follows below. All binary operators are left-associative, with the
exception of ^ and ! which associate to the right, and the
relational operators which are nonassociative. Application takes
precedence over all these operations except the quotation operators;
hence sqrt X^3 denotes (sqrt X)^3 and not
sqrt (X^3). The quotation operators have the highest possible
precedence, and hence bind stronger than even applications. Parentheses
are used to group expressions and overwrite default precedences and
associativity as usual. But note that extra parentheses around a
primary expression (identifier, operator symbol, number, string,
list, tuple, quoted or parenthesized expression, i.e., anything which
binds stronger than function application) turns the expression into a
1-tuple, see 6.3 Lists and Tuples. C programmers will also note that
the logical operators have the same "wrong" precedence as in
Pascal. Thus you should make sure that you always parenthesize
relational expressions when combining them with logical connectives.
You should also note that unary minus must be parenthesized when
appearing in an argument of a function application. For instance,
although foo X -Y is a perfectly well-formed expression, it
denotes (foo X) - Y and not foo X (-Y) as you might
have intended by the spacing which is however ignored by the Q
compiler. As already noted in 3. Lexical Matters, unary minus in
front of a number is special; it causes values such as -2 to be
interpreted as negative numbers rather than denoting an explicit
application of the unary minus operator (an explicit application of
unary minus can be denoted using the built-in minus symbol; see
below). The rules for parenthesizing negative numbers are the same as
those for unary minus; you have to write foo X (-2) instead of
foo X -2 (which denotes (foo X) - 2).
In the Q language, expressions involving operators are merely syntactic sugar
for applications. By enclosing an operator in parentheses, you can turn it
into an ordinary prefix function. For instance, X+Y is exactly the same
as (+) X Y, and not X actually denotes the application
(not) X. The built-in operators simply provide a convenient way for
applying these operations to their arguments. Moreover, you can also turn a
binary operator into a unary function by specifying either the left or the
right operand. E.g., (1/) denotes the reciprocal and (*2) the
doubling function. Such constructs are commonly called operator
sections. Inside a section, the usual precedence and associativity rules
apply. For instance, the X+3 subterm in (*(X+3)) must be
parenthesized because * has a higher precedence than +, and hence
the partial expression (*X+3) is not well-formed.
The - operator plays a somewhat awkward role in the syntax, since it is
used to denote both unary and binary minus. Q adopts the convention that the
notation (-) always denotes binary minus; unary minus may be
denoted by the built-in minus function. Thus the expression minus
X applies unary minus to X. Note that this construct always denotes an
explicit application of the unary minus operation. For instance, minus
5 denotes the application of unary minus to the integer 5, while
-5 is a negative integer.
We also remark that the construct (-X) is not an operator
section, but a parenthesized expression involving unary minus. The easiest way
to construct an operator section which subtracts a given value from its
argument is to formulate the function using the addition operator as in
(+(-X)).
6.4.1 Quotation Operators 6.4.2 Arithmetic Operators 6.4.3 Relational Operators 6.4.4 Logical and Bit Operators 6.4.5 String/List/Tuple Operators 6.4.6 The Sequence Operator
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The ' (quote), ` (backquote) and ~ (tilde)
operators are used to deal with so-called special forms. The quote
operator quotes an expression as a literal; it is a constructor symbol
and hence becomes part of the quoted expression. The backquote and tilde
operators are used to "splice" and "force" subterms in an
expression. We postpone a discussion of these operations until
9. Special Forms.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The operators +, -, *, / denote the sum, the
difference, the product and the quotient of two numeric values,
respectively. As already noted, - is also used as unary
minus. The operators div and mod denote integer quotient
and modulo, respectively; they only apply to integers. The ^
operator denotes exponentiation, as defined by X^Y = exp (ln
X*Y); it always returns a floating point value. (If X<0 then
X^Y is defined only if Y is an integer. 0^0 is left
undefined as well, so if you would like to have that 0^0 = 1 then
you must add corresponding equations yourself. Also note that the
complex.q standard library module extends the built-in definition
of the exponentiation operator to handle the case that X<0 with
general exponent; see 11.12 Complex Numbers.)
The argument and corresponding result types of these operations are
summarized in the following table (Int denotes integer, Float
floating point, and Num numeric (integer or floating point) values):
+ - *
Int Int -> Int Int Float -> Float Float Int -> Float Float Float -> Float
/ ^
Num Num -> Float
div mod
Int Int -> Int
- (unary)
Int -> Int Float -> Float
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The operators <, >, =, <=, >=, <>
are binary predicates meaning "less", "greater", "equal", "less or
equal", "greater or equal" and "not equal", respectively. The built-in
definition of these operations only applies to numbers, strings and truth
values. All relational operators return truth values (true,
false) as results. Strings are compared lexicographically, on the basis
of the local character encoding. Truth values are ordered by false <
true.
If you would like to compare other types of values than the basic
objects mentioned above, normally you will have to provide suitable
definitions yourself. For instance, you might wish to extend the
equality operation to other built-in and user-defined data structures
such as lists, trees, etc., by overloading the = operator
accordingly. The following equations implement an equality predicate on
lists (the parentheses on the left-hand side are necessary to prevent
the equality operation to be interpreted as the equal sign separating
left-hand and right-hand side of an equation):
([] = []) = true; ([] = [Y|Ys]) = false; ([X|Xs] = []) = false; ([X|Xs] = [Y|Ys]) = (X=Y) and then (Xs=Ys); |
Rules for other comparison operators (<>, <=, etc.) could
be added in the same fashion. Actually, the standard library provides
corresponding definitions; see 11. The Standard Library.
A special case arises with types consisting of only nullary constructor
symbols declared with const, so-called enumeration types,
see 8. Types. The values of such a type can always be compared with
the relational operators, using the order in which they are listed in
the type declaration. (A special case of this is the order of the
built-in truth values.) For instance, assume that the Day type is
declared as follows:
type Day = const sun, mon, tue, wed, thu, fri, sat; |
Then the listed constants will be ordered as sun < mon < ... <
sat.
Besides the equality predicate, the Q language also provides a notion of "syntactic" equality which applies to all kinds of expressions; see 7.2 Non-Linear Equations, for details.
The Q language provides one other relational operator, the in
operator, which does not have a predefined meaning and hence can be
employed by the programmer for his own purposes. In the standard
library, the in operator is used to form list and stream
comprehensions, see 11.8 List Comprehensions, and 11.9 Streams.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The logical operations not, and, or denote logical
negation, conjunction and disjunction, respectively. These operators
take truth values as their arguments. They are defined in a
straightforward manner:
not true = false; not false = true; true and true = true; true and false = false; false and true = false; false and false = false; true or true = true; true or false = true; false or true = true; false or false = false; |
Like most other programming languages, Q also has logical connectives for
the short-circuit evaluation of logical expressions, which are
denoted by the operators and then and or else. These
operations are actually implemented as "special forms" which evaluate
their arguments from left to right only as far as required to determine
the value of the expression (cf. 9. Special Forms). They are defined
by the following built-in equations:
true and then X = X; false and then X = false; false or else X = X; true or else X = true; |
The first operand is always evaluated. Depending on its value, the
second operand may not be evaluated at all. For instance, if X
evaluates to false, then X and then Y immediately
reduces to false, without ever having to evaluate the second
argument Y. On the other hand, if X evaluates to
true, then Y is evaluated and returned as the value of the
expression. The or else operation works analogously.
One reason for using short-circuit evaluation is efficiency: prevent unnecessary computations when an initial part of a logical expression already determines the value of the entire expression. Furthermore, short-circuit evaluation is sometimes essential to check a condition before evaluating an expression depending on the validity of this condition. For instance:
(X <> 0) and then (foo (1/X) > 0) |
You should also note that, according to the above definitions, X
and then Y and X or else Y are always reduced if X
is a truth value, even if the second operand Y does not
evaluate to a truth value. This may look a bit weird at first, but it is
in fact the most reasonable way to implement short-circuit evaluation in
the Q language, since Q uses dynamic typing, and hence the type of an
expression is only known after it has been evaluated. In fact,
this "feature" can be quite useful at times. For instance, you can
also use and then to write down a simple kind of conditional
expression like
check X and then bar X |
where bar X is the value you wish to compute, but only if the
condition check X holds.
The Q interpreter also uses the operators not, and and
or to denote bitwise logical operations on integers. Thus,
not X denotes the one's complement of an integer X, and
X and Y and X or Y the bitwise logical conjunction and
disjunction of integers X and Y, respectively. These
operations behave as if negative integers were represented in two's
complement (although GMP actually uses a sign-magnitude
representation). This means that for each integer X, -X =
not X + 1, or, equivalently, not X = -X-1. For instance:
==> 17 and not 13 16 ==> 17 or not 13 -13 ==> not _ 12 |
These results become clear when considering that 17 has bits 0
(=1) and 4 (=16) on (and all other bits off), while not 13 has
bits 0 (=1), 2 (=4) and 3 (=8) off (and all other bits on). Thus,
writing these numbers as unsigned bit sequences with most significant
bits first, where ...1 denotes an infinite sequence of leading
1's, we have:
17 and not 13 = 10001 and not 1101 = 10001 and ...10010 = 10000 = 16 17 or not 13 = 10001 or not 1101 = 10001 or ...10010 = ...10011 (= not 1100 = not 12) = -13 |
Together with the built-in bit shift operations, the bitwise logical operations can also be used to implement "bitsets", i.e., sets of nonnegative integers represented by bit patterns. See 10.1 Arithmetic Functions, for details.
In case you are wondering about "exclusive or:" While this operation is not provided as a builtin, you can easily define it yourself as follows:
xor X Y = (X or Y) and not (X and Y); |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The ++ operator denotes concatenation, # is the length or
size operator, and the subscript operator ! is used for
indexing. These operations work consistently on strings, lists and
tuples. For instance:
"abc"++"xy" => "abcxy" [a,b,c]++[x,y] => [a,b,c,x,y] (a,b,c)++(x,y) => (a,b,c,x,y) #"abc" => 3 #[a,b,c] => 3 #(a,b,c) => 3 "abc"!1 => "b" [a,b,c]!1 => b (a,b,c)!1 => b |
Note that indexing with the subscript operator starts at zero,
s.t. X!0 and X!(#X-1) denote the first and last member of
a string, list or tuple, respectively. We also remark that since
! is right-associative, double subscripts may have to be
parenthesized. For instance, compare (X!I)!J and
X!I!J=X!(I!J).
It should also be noted that all list and tuple operators check that
their first argument is a well-formed list or tuple value. However, the
second argument of the ++ operator may in fact be any value. List
concatenation just replaces the empty sublist marking the end of its
first list argument with the second argument, as if it was defined by
the following equations:
[]++Ys = Ys; [X|Xs]++Ys = [X|Xs++Ys]; |
Hence we have that, e.g., []++1 => 1 and [1,2]++3
=> [1,2|3], which may look odd, but is consistent with the above
equations. Tuple concatenation works in an analogous manner.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The sequence operator lets you evaluate sequences of expressions in a given order. The value of the sequence is given by the rightmost expression. That is,
X || Y => Y. |
The sole purpose of this construct is to allow operations with side-effects (such as I/O functions) to be carried out sequentially. A typical example is
writes "Input: " || reads |
which prints a prompt on the terminal and then reads in a string. (The
built-in functions writes and reads are described in
10. Built-In Functions.)
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |