This section deals with the micro-structure of a Yazoo script: how the various words and symbols fit together, without bothering just yet about what they do or how they work. Let's ignore for now the grossest level of organization, which is the grouping of code by curly braces {}, since those are explored in later sections. The various flow-control commands, such as if and while, in some sense constitute a second organizational level, which we'll also describe separately. Within and between these blocks are chunks of code that execute straightforwardly from beginning to end, whose anatomy we are going to study here.
There is an obvious third layer of organization even within a simple code chunk: that code can be broken into lines, separated of course by line breaks (either carriage returns or end-of-line characters). These are the Yazoo equivalent of sentences: they represent complete thoughts (instructions). There are two `sentences' in the following code fragment:
a := 2*7 + 9
print(a)
Sentences can also be marked off by commas, so there are also two sentences here:
a := 2*7 + 9, print(a)
This latter form is useful when several sentences should be entered together on one line. For example, to execute a for-loop from the command prompt we have to tap in the for, end for and everything in between all on the same line, which we can do using commas.
The other option is to break sentences over multiple lines using the line-continuation symbol `&', as in:
a := 2* &
7 + 9
For one thing, this lets us enter multi-line commands at the command prompt. (The command prompt at present is very fussy however, requiring that the & be the very last character entered from the previous line -- no trailing spaces.) It can also improve the readability of long sentences in scripts. Generally a sentence can be broken between any two names or symbols, but not within a name or symbol or string, and not between the name of an array and its opening bracket, or between a function and its opening parenthesis.
Most two sentences can be logically written in any order (even if they won't work in the wrong order). But within the sentence there is some internal structure -- the various symbols inside can't just be moved around willy-nilly. The constituents of a sentence are variables and constants, and operators. Operators allow large phrases to be built up, since they glue variables and constants to one another and to other operators. To start with, let's dissect the following sentence:
a = b + 2
There are two operators: `+' which sums its two arguments (the variable b and the constant 2), and `=' which sets the value of the left-hand argument a to the value of the right-hand argument. The right-hand argument to `=' cannot be b because the rules of syntax dictate that the two operators cannot share this same argument; therefore it is actually the entire sum b+2. The net effect is to store the sum b+2 in the variable a.
There actually could have been two valid interpretations of our example, which we can make explicit with parentheses.
a = ( b + 2 )
( a = b ) + 2
The first line indicates that b+2 is the right-hand argument to `='; in the second line a = b is the left-hand argument to `+'. These are both valid Yazoo sentences; parentheses can be always placed around any group of operators in the same sentence in order to force a certain grouping of terms.
But as we indicated, sans parentheses Yazoo always interprets a = b + 2 in the former sense. How do we know that the convention is for the equate operator to absorb the summation rather than the other way around? The answer is that there is a rule that says that, in the absence of parentheses, sums are always grouped within equates. Sums have higher precedence than equates: they are evaluated first so that their output can be handed off to equate. Likewise multiplications and divisions are always contained within additions and subtractions, so 3 + 5 * 6 groups as 3 + ( 5 * 6 ). Multiplication has higher precedence than addition. The precedence of all Yazoo operators is given in Table 2, in order from highest (1st priority) to lowest (13th priority) (so high precedence equals low priority number).
The final column in the Table 2 specifies how operators of the same precedence level are grouped when they are found in series. Does the phrase 8 / 4 / 2 mean (8 / 4) / 2, which gives 1, or 8 / (4 / 2), which gives 4? Again, we could force either interpretation by writing the parentheses; but from the table we see that, in their absence, operators in the multiplication/division level are grouped from left to right, so (8 / 4) / 2 is the correct interpretation. On the other hand, equate operators as in a = b = c = 2 are grouped right-to-left, so the grouping of this expression would be a = (b = (c = 2)).
priority | commands | symbols | grouping |
1 | function calls | () | left to right |
step to member | . | ||
step to index/indices | [] +[] [+] | ||
2 | raise to power | ^ | left to right |
3 | negate | - | N/A |
4 | multiply, divide | * / | left to right |
5 | add, subtract | + - | left to right |
6 | byte block | block | N/A |
7 | append code | : | left to right |
8 | substitute code | << | left to right |
9 | define/equate | :: ::@ = := :=@ @:: | right to left |
forced equate | =! | ||
10 | conditionals | == /= > >= < <= | N/A |
11 | logical not | not | right to left |
12 | logical and, or, eor | and or eor | left to right |
13 | commands | return remove delete | N/A |
The two characteristics of an operator are the effect it has on its arguments, and the value it returns. An operator must return something in order to be used as an argument to an encompassing operator. A handful of commands, such as the remove and delete commands, are operators that indeed do something but do not return any value, so these cannot be embedded in larger expressions. By contrast numeric operators and comparison operators do not do anything to their arguments, so to be useful they must themselves be arguments of an encompassing operator to which they can deliver their result. The various define and equate operators are hybrids: they first alter their left-hand arguments, by rewriting or redefining them, then they return the altered variable if they are embedded in a larger phrase.
Each argument of a Yazoo operator expects a certain category of expression, and using the wrong category will cause an error. For example, because all of the arithmetic operators expect two numeric arguments, trying to add, say, a number to a string will cause an error. Conditional or logical expressions such as 1 > 2 return true or false values that, unlike in C, are non-numeric and can only be used in if, while and do statements; a = b and c is completely illegal. The `braces operator', as in { a :: ulong; print(a) }, returns code, which is a separate substance from numbers, strings and conditionals. The rule for a define or equate statement is that its left-hand argument must be a variable, and although that variable may be of any type -- numeric, string, or `code' (function) -- both left and right arguments must have types that match or can be made to match (e.g. by integer-to-float conversion).
In any sentence there is always one top-level operator whose arguments encompass the entire sentence. Consider the two sentences
(a :: ulong) = B^2 + f(2)
(x^2 + y^2)^0.5 - x - y
After a bit of thought we can see that in sentence 1, the top-level operator is the equate `=', and in sentence 2 it is the second subtraction `-'. The top-level operator in sentence 1 is useful, in that when all else is said and done that equate still has some useful task to perform, which is to change the value of variable a. The top-level operator in sentence 2 seems less useful, since the subtraction operator only returns a value without affecting any variable, and at the top level there is nowhere to return the result to.
So, presumably whatever number sentence 2 computes will disappear into outer space? Well, actually no. When the Yazoo compiler hits some useless sentence like this (or at least one whose top-level operation is useless), it will fashion it into something called a `token'. A token is a sort of placeholder for some value that was computed so that it can be recovered later if needed. We will come across tokens several times later on; they are especially useful in defining sets, and (as we have already seen, if you think about it) in passing arguments to functions.
Last update: July 28, 2013