Yazoo ---> Online Help Docs ---> Yazoo bytecode ---> Inventions of the compiler

Tokens

A token is a bare reference to an object without an operation having being associated with it. In the examples below, there is a token associated with every instance of `a', `b' or `5'.


    set :: { a, b, 5 }
    print( a, b, 5 )
    5
   

For the moment, we'll concentrate on that third `sentence'.

A token is to Yazoo what an object is to an English-speaker. If we turned back the clock and tried to communicate with a cave man, we might get a lot of sentences like:


    rock
   

which our brains would probably embellish with subject and verb to mean:


    [That which I want to draw your attention to] [is] the rock.
   

Yazoo thinks along exactly the same lines. When it comes across some terse command like


    5
   

it rolls its eyes and compiles something similar to:


    var1 := 5
   

A couple of caveats have to be made. First, the member that `5' is assigned to isn't actually named `var1', as that would cause confusion if we had happened to be using a var1 of our own in our script. Secondly, the operator that follows it is somewhat modified from a standard define-equate.

The way Yazoo avoids conflicts between token `names' and user-defined names is by compiling tokens into members with negative ID numbers, while user-defined members are assigned positive IDs beginning from 1. (All bytecode words are read as signed long integers.) Token IDs start from -1 and count downwards, so the compiler needs to keep track of how many tokens it has assigned so far. Thus if N is the byte-size of a long word, then the first N bytes of a namespace (a string passed between the user and compiler that keeps track of member names) contains the number of tokens that have already been declared; the textual names of the user's members start at byte N+1.

The most accurate portrait of a token is its disassembly, which for a "5" script is


    deq* ( sm -1 , csl 5 )
   

or


    deq* ( sm $1 , csl 5 )
   

if we use the compiled namespace in the disassembly. The (first and only) token variable has the ID -1 (the $ signifies a negative ID). Note that the disassembler writes deq* to indicate that it differs from the user's deq: it has the unjammable flag set, and the update-members flag cleared. The unjammable flag prevents tokens from jamming arrays left and right. The update-members flag is cleared to prevent an error being thrown in case the same token is re-assigned to a variable of a different type (which can happen if, for example, a particular argument to a function changes its type between function calls).

Different objects translate into different kinds of tokens. A reference to a variable or a function, such as


    my_var
   

translates into


    dqa* ( sm $1 , sm $my_var )
   

Again, it is a modified dqa operator: the update-members flag has been cleared, and the unjammable flag set. If Yazoo senses that an array is being passed --- for example, if the command was "my_array[*]" --- then it modifies the expression somewhat:


    dqa* ( s* ( sm $1 ) , s* ( sm $my_array ) )
   

which is analogous to arr1[*] := @my_array[*]. Finally, if the compiler encounters a code of any sort -- function, set, whatever -- sitting naked on its own line, it will wrap it in a token having a modified define operator. So


    {2, 5}
   

turns into


    def* ( sm $3 , scr { deq* ( sm $1 , csl 2 ) , deq* ( sm $2 , csl 5 ) } )
   

The flag modifications are the same: def* equals def minus update-members, plus unjammable.

This last example deserves a second look, because we can see not only that the set itself is represented by a token, but also that the set's members are tokens as well. Think of the set as a function, or a function constructor (since it lacks a code marker/semicolon). And remember that line breaks can be marked by commas as well as by end-of-lines. Essentially the constructor makes two tokens to hold the two numbers that the set represents. Had the numbers been variables instead, the tokens would have been aliases, which explains how sets can hold objects that are also somewhere else. Finally, subsets are simply composite variables within the set -- functions within the function whose constructors in turn endow them with the appropriate members.

Tokens also help to explain function arguments. The code inside the parentheses of a function call is essentially a script; the fact that it is surrounded by parentheses rather than braces is only due to common syntactic convention that Yazoo adopted. This argument-code, or at least the constructor component of it, fills the args variable with a number of tokens (or, if the user wishes, explicitly-defined members) that the function will later see as args[1], args[2], etc.

One conspicuous mystery remains: what, and where, is the args variable? Identifying args and its cousins is the final project of the chapter.


Prev: Inventions of the compiler   Next: Hidden members


Last update: July 28, 2013

Get Yazoo scripting language at SourceForge.net. Fast, secure and Free Open Source software downloads