[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6. Expressions

The notion of expressions is an intrinsic part of most, if not all, programming languages. However, in most programming languages expressions are merely treated as descriptions of computations to be performed in order to obtain the value of an expression, which usually belongs to some basic type provided by the language. In contrast, in the Q language expressions are objects in their own right which, as we shall see, are manipulated according to computational rules specified by the programmer.

As usual, expressions are built from certain kinds of basic objects. In the Q language, these are integers, floating point numbers, strings, function and variable symbols. These objects are combined into larger, compound expressions by means of applications, lists and tuples. For convenience, the Q language also provides prefix and infix operator symbols for the most common operations; these are actually treated as "syntactic sugar" for applications. The syntax of all these entities is described by the following grammatical rules:

 
expression              : identifier
                        | variable-identifier ':' identifier
                        | number
                        | string
                        | expression expression
                        | unary-op expression
                        | expression binary-op expression
                        | '(' [element-list] ')'
                        | '[' [element-list] ']'
                        | '(' op ')'
                        | '(' expression binary-op ')'
                        | '(' binary-op expression ')'

element-list            : expression-list ['|' expression]

expression-list         : expression {',' expression}

op                      : unary-op|binary-op

unary-op                : '-'|'#'|'not'|'''|'`'|'~'

binary-op               : '^'|'!'|'++'|'+'|'-'|'\'|'*'|'/'|'div'|'mod'
                        | 'and'|'or'|'and' 'then'|'or' 'else'
                        |'<'|'>'|'='|'<='|'>='|'<>'|'in'|'||'

6.1 Constants and Variables  
6.2 Applications  
6.3 Lists and Tuples  
6.4 Built-In Operators  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.1 Constants and Variables

The nonterminals identifier, number and string occuring in the syntax rules above refer to the lexical entities introduced in 3. Lexical Matters. Note that the Q language actually distinguishes two different types of identifiers (function and variable identifiers) and two different types of numeric quantities (integers and floating point numbers). Integers are implemented as "bignums" using the GNU multiprecision package (GMP); thus the size of an integer value is only limited by available memory. Floating point values are implemented using 64 bit (i.e., double precision) floating point numbers; on most machines, these should provide nonzero absolute values ranging from 1.7E-308 to 1.7E308 and a precision of 15 decimal digits.

String constants are character sequences internally represented as character arrays permitting constant-time access to the individual characters of a string. There is no a priori length restriction for string constants. In the current implementation, the null character \0 is reserved as a string terminator and must not be used as an ordinary string character.

Furthermore, the Q language provides four other built-in constants which denote the truth values (true and false), and the empty list and tuple ([] and (), to be discussed in 6.3 Lists and Tuples).

A variable symbol can be "bound" or "free", depending on the context in which it occurs. We say that a variable is bound in an equation if it occurs on the left-hand side of the equation. Otherwise, the variable is a free variable. In this case, the variable may denote a value (introduced with def), or simply stands for itself. In any case, a variable is a legal atomic expression which may stand wherever a constant is allowed.

On the left-hand side of equations, it is possible to declare that a variable is of a given type by using the notation:

 
variable:type

This requires that you have previously declared type as a type identifier, see 5. Declarations. When a type identifier is specified with a variable, the variable will only match values belonging to the given type, cf. 7.5 Type Guards.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.2 Applications

Application is probably the most important single construct in the Q language. It allows you to apply one object (the "function") to another one (the "argument"). This construct is used to denote "function calls" such as sqrt 2 as well as "constructor terms" such as bin X T1 T2 which encode tree-like data structures. Also, as we will see in 6.4 Built-In Operators, the built-in operators +, -, etc. are merely syntactic sugar for applications. Indeed, applications could also be used to represent sequences of objects. However, as this latter kind of objects tends to be used so frequently in Q scripts, the Q language provides special support for sequences by means of the built-in list and tuple constructs discussed in 6.3 Lists and Tuples.

As in other contemporary functional languages, application is a binary operation written simply as juxtaposition: X Y denotes the application of X to Y, both of which may be arbitrary expressions themselves. Application associates to the left; the construct X Y1 ... Yn is equivalent to (...((X Y1) Y2) ...) Yn, and denotes the application of X to n arguments Y1, ..., Yn. This style of writing function applications is commonly referred to as currying, after the American logician H.B. Curry. We will have to say more about this shortly.

Here are some valid examples of applications:

 
sqrt 2
sqrt (Y+1)
foo X (bar (Y-1)) Z

Note that since application is left-associative, nested applications in arguments must be set off with parentheses. For instance, foo (bar X) applies foo to bar X, whereas foo bar X applies foo to two arguments bar and X.

Since currying is so ubiquitous in functional programming, you should be well familiar with it, so let us briefly explain what it is, and what it is good for.

Functions of multiple arguments can generally be defined in two different ways:

The Q language supports both kinds of notations. Choosing between the two is more than just a matter of taste. Besides saving parentheses, curried functions have the chief advantage that they allow us to make use of initial "parts" of a multi-argument function. For instance, given the curried definition of max from above, max 0 can be used to denote the function computing the "nonnegative part" of its argument (which is the argument itself if it is nonnegative, and zero otherwise). This does not work with the uncurried definition since that definition requires us to specify both arguments of max in one chunk; instead, we would have to start defining the derived function from scratch.

Uncurried definitions also have their merits. In particular, they allow us to define variadic functions, i.e., functions which can handle a varying number of arguments. For instance, the following definition enables us to apply the max function to both pairs and triples:

 
max (X,Y)               = X if X>=Y;
                        = Y otherwise;
max (X,Y,Z)             = max (X,max (Y,Z))

In fact, in the Q language it is possible to define generic functions which apply to any number of arguments specified as a tuple. For instance, the following version of max handles tuples of any size at least 2:

 
max (X,Y)               = X if X>=Y;
                        = Y otherwise;
max (X,Y|Zs)            = max (X,max (Y|Zs));

(In the above example, the notation (...|Zs) is used to denote a tuple whose "remaining elements" form the tuple Zs. This is explained in the following section.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3 Lists and Tuples

Besides strings, the Q language provides two closely related general constructs for representing sequences of objects: lists and tuples.

The constant [] denotes the empty list. In general, a list consisting of elements X1, X2, ..., Xn is denoted [X1,X2,...,Xn]. For instance, [a,b,c] consists of three elements (symbols) a, b and c. It is possible to have nested lists, as in [a,[b,c]] which consists of two elements, the symbol a and the list [b,c].

As in Prolog, lists are represented in a right-recursive fashion using a binary constructor [|] which takes as its arguments the head element of the list and the list of the remaining elements. Thus, [a,b,c] is simply a convenient shorthand notation for [a|[b|[c|[]]]]. You can also mix both styles of notation; for instance, [a,b|[c,d]] is another way to represent the 4-element list [a,b,c,d].

Note that [a|[b,c]] is different from [a,[b,c]]: the former denotes a three-element list, while the latter is a two-element list whose second member happens to be a list itself. Also note that the [|] constructor can in fact be applied to any pair of values (the second value does not have to be a list); e.g., [a|b] is a perfectly well-formed expression (although the built-in length, indexing and concatenation operations described in 6.4.5 String/List/Tuple Operators, will fail on such values).

Tuples work in much the same fashion as lists. The constant () denotes the empty tuple, and a tuple consisting of elements X1, X2, ..., Xn is written as (X1,X2,...,Xn), which is equivalent to (X1|(X2|...|(Xn|()))), where the notation (X|Xs) denotes a tuple consisting of a first element X and the tuple Xs of the remaining elements.

The Q language also has the notion of a 1-tuple (X)=(X|()). It is important to note that in Q a 1-tuple is really different from its single member. Otherwise, there could be no nested 1-tuples -- in fact, due to the right-recursive nature of tuples in Q, there would be no nested tuples at the end of a tuple at all. Therefore Q distinguishes between 1-tuples and their members, and if you want to define a function operating on both 1-tuples and ordinary expressions, you will have to provide equations for both.

Unfortunately, since parentheses are used for two different purposes, namely for expression grouping and for tuples, 1-tuples also give rise to an ugly syntactic ambiguity: should an expression of the form (X) denote a 1-tuple or a simple parenthesized expression? For notational convenience, Q adopts the following convention: any "primary" expression (i.e., anything which binds stronger than an application) can be turned into a 1-tuple simply by enclosing it in parentheses. So, for instance, (99), ((-99)), ('X), (foo), ((foo (bar X))) all denote 1-tuples. Note the extra level of parentheses around compound expressions required to distinguish 1-tuples from ordinary parenthesized expressions. (This also applies to negative numbers like -99, which syntactically are applications of unary minus, i.e., a compound expression.) If the above sounds confusing to you, here is a simple rule of thumb: an extra level of parentheses is required whenever the target expression must be parenthesized when occuring as an argument of a function application. Nested 1-tuples can be obtained by adding extra levels of parentheses accordingly. E.g., ((99)) and (((-99))) both denote a 1-tuple of a 1-tuple of a number.

(NB: A common pitfall with Q's tuple notation is that one easily gets unwanted 1-tuples if one has the habit of decorating expressions with superflous parentheses. This is undoubtedly one of Q's worst "features", but the author just could not put up with alternative notations like curly braces, or the unwieldy (X|()) syntax.)

The big difference between lists and tuples in the Q language is that, while lists are always represented as recursive data objects using a binary constructor symbol (just the way that they are written), tuples are actually implemented as "vectors" which are stored as contiguous sequences in memory. (Of course, this only works for "well-formed" tuples; if the "remainder" Xs of a tuple (X|Xs) is not a tuple, then this tuple can only be represented using an explicit application of the tuple constructor.) Therefore tuples normally use up much less memory than lists of the same size, and they also allow constant-time access to their members. The size of a tuple can be determined in constant time as well. In contrast, the same operations, when applied to a list, require time proportional to the size of the list. On the other hand, lists are more efficient when accessing the remainder part of a list using pattern matching, and when a new element is prepended to a list using the list constructor, which can both be done in constant time. Here a tuple needs time proportional to its size, since the member sequence of the original tuple must be copied when accessing its remainder part or when constructing a new tuple. (This also implies that converting a list to a tuple using the tuple constructor actually takes quadratic time and hence is quite slow for larger sequences; as a remedy, a built-in tuple function is provided which does the conversion in linear time, see 10.4 Conversion Functions.)

These tradeoffs should be carefully considered when deciding whether to implement a given sequence as a list or a tuple. Tuples are usually the best choice for implementing fixed sequences requiring fast random access to its individual members, whereas lists provide an efficient means to represent sequences which have to be traversed and manipulated very frequently.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4 Built-In Operators

Besides the forms of expressions already discussed above, the Q language also provides a collection of infix and prefix operator symbols for common arithmetic, logical, relational and other operations. A complete list of these symbols is given below, in order of decreasing precedence:

' ` ~
quotation operators (unary)
^ !
exponentiation and subscript
- # not
prefix operators (unary)
* / div mod and and-then
multiplication operators
++ + - or or-else
addition operators
< > = <= >= <> in
relational operators
||
sequence operator

Most of these symbols have their usual meaning; a closer description follows below. All binary operators are left-associative, with the exception of ^ and ! which associate to the right, and the relational operators which are nonassociative. Application takes precedence over all these operations except the quotation operators; hence sqrt X^3 denotes (sqrt X)^3 and not sqrt (X^3). The quotation operators have the highest possible precedence, and hence bind stronger than even applications. Parentheses are used to group expressions and overwrite default precedences and associativity as usual. But note that extra parentheses around a primary expression (identifier, operator symbol, number, string, list, tuple, quoted or parenthesized expression, i.e., anything which binds stronger than function application) turns the expression into a 1-tuple, see 6.3 Lists and Tuples. C programmers will also note that the logical operators have the same "wrong" precedence as in Pascal. Thus you should make sure that you always parenthesize relational expressions when combining them with logical connectives.

You should also note that unary minus must be parenthesized when appearing in an argument of a function application. For instance, although foo X -Y is a perfectly well-formed expression, it denotes (foo X) - Y and not foo X (-Y) as you might have intended by the spacing which is however ignored by the Q compiler. As already noted in 3. Lexical Matters, unary minus in front of a number is special; it causes values such as -2 to be interpreted as negative numbers rather than denoting an explicit application of the unary minus operator (an explicit application of unary minus can be denoted using the built-in minus symbol; see below). The rules for parenthesizing negative numbers are the same as those for unary minus; you have to write foo X (-2) instead of foo X -2 (which denotes (foo X) - 2).

In the Q language, expressions involving operators are merely syntactic sugar for applications. By enclosing an operator in parentheses, you can turn it into an ordinary prefix function. For instance, X+Y is exactly the same as (+) X Y, and not X actually denotes the application (not) X. The built-in operators simply provide a convenient way for applying these operations to their arguments. Moreover, you can also turn a binary operator into a unary function by specifying either the left or the right operand. E.g., (1/) denotes the reciprocal and (*2) the doubling function. Such constructs are commonly called operator sections. Inside a section, the usual precedence and associativity rules apply. For instance, the X+3 subterm in (*(X+3)) must be parenthesized because * has a higher precedence than +, and hence the partial expression (*X+3) is not well-formed.

The - operator plays a somewhat awkward role in the syntax, since it is used to denote both unary and binary minus. Q adopts the convention that the notation (-) always denotes binary minus; unary minus may be denoted by the built-in minus function. Thus the expression minus X applies unary minus to X. Note that this construct always denotes an explicit application of the unary minus operation. For instance, minus 5 denotes the application of unary minus to the integer 5, while -5 is a negative integer.

We also remark that the construct (-X) is not an operator section, but a parenthesized expression involving unary minus. The easiest way to construct an operator section which subtracts a given value from its argument is to formulate the function using the addition operator as in (+(-X)).

6.4.1 Quotation Operators  
6.4.2 Arithmetic Operators  
6.4.3 Relational Operators  
6.4.4 Logical and Bit Operators  
6.4.5 String/List/Tuple Operators  
6.4.6 The Sequence Operator  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.1 Quotation Operators

The ' (quote), ` (backquote) and ~ (tilde) operators are used to deal with so-called special forms. The quote operator quotes an expression as a literal; it is a constructor symbol and hence becomes part of the quoted expression. The backquote and tilde operators are used to "splice" and "force" subterms in an expression. We postpone a discussion of these operations until 9. Special Forms.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.2 Arithmetic Operators

The operators +, -, *, / denote the sum, the difference, the product and the quotient of two numeric values, respectively. As already noted, - is also used as unary minus. The operators div and mod denote integer quotient and modulo, respectively; they only apply to integers. The ^ operator denotes exponentiation, as defined by X^Y = exp (ln X*Y); it always returns a floating point value. (If X<0 then X^Y is defined only if Y is an integer. 0^0 is left undefined as well, so if you would like to have that 0^0 = 1 then you must add corresponding equations yourself. Also note that the complex.q standard library module extends the built-in definition of the exponentiation operator to handle the case that X<0 with general exponent; see 11.12 Complex Numbers.)

The argument and corresponding result types of these operations are summarized in the following table (Int denotes integer, Float floating point, and Num numeric (integer or floating point) values):

+ - *
Int Int -> Int
Int Float -> Float
Float Int -> Float
Float Float -> Float
/ ^
Num Num -> Float
div mod
Int Int -> Int
- (unary)
Int -> Int
Float -> Float


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.3 Relational Operators

The operators <, >, =, <=, >=, <> are binary predicates meaning "less", "greater", "equal", "less or equal", "greater or equal" and "not equal", respectively. The built-in definition of these operations only applies to numbers, strings and truth values. All relational operators return truth values (true, false) as results. Strings are compared lexicographically, on the basis of the local character encoding. Truth values are ordered by false < true.

If you would like to compare other types of values than the basic objects mentioned above, normally you will have to provide suitable definitions yourself. For instance, you might wish to extend the equality operation to other built-in and user-defined data structures such as lists, trees, etc., by overloading the = operator accordingly. The following equations implement an equality predicate on lists (the parentheses on the left-hand side are necessary to prevent the equality operation to be interpreted as the equal sign separating left-hand and right-hand side of an equation):

 
([] = [])               = true;
([] = [Y|Ys])           = false;
([X|Xs] = [])           = false;
([X|Xs] = [Y|Ys])       = (X=Y) and then (Xs=Ys);

Rules for other comparison operators (<>, <=, etc.) could be added in the same fashion. Actually, the standard library provides corresponding definitions; see 11. The Standard Library.

A special case arises with types consisting of only nullary constructor symbols declared with const, so-called enumeration types, see 8. Types. The values of such a type can always be compared with the relational operators, using the order in which they are listed in the type declaration. (A special case of this is the order of the built-in truth values.) For instance, assume that the Day type is declared as follows:

 
type Day = const sun, mon, tue, wed, thu, fri, sat;

Then the listed constants will be ordered as sun < mon < ... < sat.

Besides the equality predicate, the Q language also provides a notion of "syntactic" equality which applies to all kinds of expressions; see 7.2 Non-Linear Equations, for details.

The Q language provides one other relational operator, the in operator, which does not have a predefined meaning and hence can be employed by the programmer for his own purposes. In the standard library, the in operator is used to form list and stream comprehensions, see 11.8 List Comprehensions, and 11.9 Streams.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.4 Logical and Bit Operators

The logical operations not, and, or denote logical negation, conjunction and disjunction, respectively. These operators take truth values as their arguments. They are defined in a straightforward manner:

 
not true                = false;
not false               = true;

true and true           = true;
true and false          = false;
false and true          = false;
false and false         = false;

true or true            = true;
true or false           = true;
false or true           = true;
false or false          = false;

Like most other programming languages, Q also has logical connectives for the short-circuit evaluation of logical expressions, which are denoted by the operators and then and or else. These operations are actually implemented as "special forms" which evaluate their arguments from left to right only as far as required to determine the value of the expression (cf. 9. Special Forms). They are defined by the following built-in equations:

 
true and then X         = X;
false and then X        = false;

false or else X         = X;
true or else X          = true;

The first operand is always evaluated. Depending on its value, the second operand may not be evaluated at all. For instance, if X evaluates to false, then X and then Y immediately reduces to false, without ever having to evaluate the second argument Y. On the other hand, if X evaluates to true, then Y is evaluated and returned as the value of the expression. The or else operation works analogously.

One reason for using short-circuit evaluation is efficiency: prevent unnecessary computations when an initial part of a logical expression already determines the value of the entire expression. Furthermore, short-circuit evaluation is sometimes essential to check a condition before evaluating an expression depending on the validity of this condition. For instance:

 
(X <> 0) and then (foo (1/X) > 0)

You should also note that, according to the above definitions, X and then Y and X or else Y are always reduced if X is a truth value, even if the second operand Y does not evaluate to a truth value. This may look a bit weird at first, but it is in fact the most reasonable way to implement short-circuit evaluation in the Q language, since Q uses dynamic typing, and hence the type of an expression is only known after it has been evaluated. In fact, this "feature" can be quite useful at times. For instance, you can also use and then to write down a simple kind of conditional expression like

 
check X and then bar X

where bar X is the value you wish to compute, but only if the condition check X holds.

The Q interpreter also uses the operators not, and and or to denote bitwise logical operations on integers. Thus, not X denotes the one's complement of an integer X, and X and Y and X or Y the bitwise logical conjunction and disjunction of integers X and Y, respectively. These operations behave as if negative integers were represented in two's complement (although GMP actually uses a sign-magnitude representation). This means that for each integer X, -X = not X + 1, or, equivalently, not X = -X-1. For instance:

 
==> 17 and not 13
16

==> 17 or not 13
-13

==> not _
12

These results become clear when considering that 17 has bits 0 (=1) and 4 (=16) on (and all other bits off), while not 13 has bits 0 (=1), 2 (=4) and 3 (=8) off (and all other bits on). Thus, writing these numbers as unsigned bit sequences with most significant bits first, where ...1 denotes an infinite sequence of leading 1's, we have:

 
17 and not 13 =
10001 and not 1101 = 10001 and ...10010 = 10000
= 16

17 or not 13 =
10001 or not 1101 = 10001 or ...10010 = ...10011 (= not 1100 = not 12)
= -13

Together with the built-in bit shift operations, the bitwise logical operations can also be used to implement "bitsets", i.e., sets of nonnegative integers represented by bit patterns. See 10.1 Arithmetic Functions, for details.

In case you are wondering about "exclusive or:" While this operation is not provided as a builtin, you can easily define it yourself as follows:

 
xor X Y                 = (X or Y) and not (X and Y);


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.5 String/List/Tuple Operators

The ++ operator denotes concatenation, # is the length or size operator, and the subscript operator ! is used for indexing. These operations work consistently on strings, lists and tuples. For instance:

 
"abc"++"xy"             => "abcxy"
[a,b,c]++[x,y]          => [a,b,c,x,y]
(a,b,c)++(x,y)          => (a,b,c,x,y)

#"abc"                  => 3
#[a,b,c]                => 3
#(a,b,c)                => 3

"abc"!1                 => "b"
[a,b,c]!1               => b
(a,b,c)!1               => b

Note that indexing with the subscript operator starts at zero, s.t. X!0 and X!(#X-1) denote the first and last member of a string, list or tuple, respectively. We also remark that since ! is right-associative, double subscripts may have to be parenthesized. For instance, compare (X!I)!J and X!I!J=X!(I!J).

It should also be noted that all list and tuple operators check that their first argument is a well-formed list or tuple value. However, the second argument of the ++ operator may in fact be any value. List concatenation just replaces the empty sublist marking the end of its first list argument with the second argument, as if it was defined by the following equations:

 
[]++Ys                  = Ys;
[X|Xs]++Ys              = [X|Xs++Ys];

Hence we have that, e.g., []++1 => 1 and [1,2]++3 => [1,2|3], which may look odd, but is consistent with the above equations. Tuple concatenation works in an analogous manner.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4.6 The Sequence Operator

The sequence operator lets you evaluate sequences of expressions in a given order. The value of the sequence is given by the rightmost expression. That is,

 
X || Y                  => Y.

The sole purpose of this construct is to allow operations with side-effects (such as I/O functions) to be carried out sequentially. A typical example is

 
writes "Input: " || reads

which prints a prompt on the terminal and then reads in a string. (The built-in functions writes and reads are described in 10. Built-In Functions.)


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Albert Gräf on October, 14 2003 using texi2html