Yazoo provides a number of built-in C-coded functions, which can be invoked directly by name (in contrast to user-defined C functions which require a call() invocation). Some of these built-in routines perform various common and handy chores, like calculating logarithms. Others allow a script to read from and even alter some of Yazoo's private records, enabling scripts to do things like detect hidden members or obtain the compiled bytecodes of any function.
The built-in functions work hand-in-hand with the registers (described in the previous section). Any built-in function that returns a value uses some register for storing that value. For example, abs() returns a type-double variable. It is a built-in C-coded routine, but it behaves as if it had been written:
abs :: {
code
R_double = ...
return R_double
}
That is, if we write
y = abs(x)
then the result is copied to y, from the register R_double where abs() first stored it. Other routines use different registers, and not all registers are so-called return registers: call() writes to six different registers, only one of which can be the return variable. It is generally best to copy data out of the registers as quickly as possible, since they are global variables that can be quickly overwritten with another function call.
The compiler translates each built-in function call into: the built-in-function operator, followed by the ID number of the given function, and at the end the argument variable. The ID numbers of the built-in functions are listed in Table 4. Some of the functions will create a token if they are not explicitly assigned to a variable; these are denoted by checkmarks in the `@' column. Each function is discussed in turn below.
ID | name | @ | ID | name | @ | ID | name | @ |
0 | call | 14 | abs | X | 27 | variable_type | X | |
1 | compile | 15 | round_down | X | 28 | variable_member_top | X | |
2 | transform | 16 | round_up | X | 29 | variable_code_top | X | |
3 | load | 17 | log | X | 30 | member_type | X | |
4 | save | 18 | cos | X | 31 | member_code_top | X | |
5 | input | 19 | sin | X | 32 | member_indices | X | |
6 | 20 | tan | X | 33 | member_ID | X | ||
7 | read_string | 21 | acos | X | 34 | if_member_targeted | X | |
8 | print_string | 22 | asin | X | 35 | if_hidden_member | X | |
9 | clip | 23 | atan | X | 36 | variable_code | X | |
10 | trap | 24 | random | X | 37 | member_code | X | |
11 | throw | 25 | extract | X | 38 | variable_code_ID | X | |
12 | top | X | 26 | find | X | 39 | member_code_ID | X |
13 | size | X | 40 | variable_code_offset | X | |||
41 | member_code_offset | X | ||||||
42 | GetParentScriptInfo | |||||||
43 | SpringCleaning |
abs()
syntax: (numeric) y = abs((numeric) x)
Returns the absolute value of its argument (which must be numeric). R_double is the return variable for this function.
acos()
syntax: (numeric) y = acos((numeric) x)
Returns the inverse cosine of its argument. The argument must be a number on the interval [-1, 1] (a number outside this range will generate the `not a number' value on many machines). The result is on the interval [0, pi]. R_double is the return variable for this function.
asin()
syntax: (numeric) y = asin((numeric) x)
Returns the inverse sine of its argument. The argument must be a number on the interval [-1, 1] (a number outside this range will generate the `not a number' value on many platforms). The result is on the interval [-pi/2, pi/2]. R_double is the return variable for this function.
atan()
syntax: (numeric) y = atan((numeric) x)
Returns the inverse tangent of the argument, which must be numeric. The result is an angle in radians on the interval [-pi/2, pi/2]. R_double is the return variable for this function.
call()
syntax: (numeric) return_code = call((string/numeric) C_routine, [arguments])
Runs a C or C++ routine from within a Yazoo script. The routine is specified by name or by number in the first argument. The subsequent arguments form the argv array that the C routine receives. The return value of the routine is stored in R_double which is the return register of the call() function.
User-written C or C++ functions make their introductions to Yazoo through the UserFunctionSet[] array, which is in the userfn.c or userfn.cpp source file. This array consists of a list of Yazoo names (strings) paired with the addresses of C/C++ functions that they call. (The only functions that a script will be able to call are those that had an entry in the UserFunctionSet[] array when Yazoo was compiled, so adding new C/C++ routines always necessitates a rebuild.) To call a C/C++ function from a script, the user needs to specify either its Yazoo name (the string in the UserFunctionSet[] entry) or its index within the UserFunctionSet[] array. The index method follows the Yazoo convention that the first element begins at 1, even though it is an element of a C array. The only advantage of calling by number rather than name is that it can be slightly faster, since it saves Yazoo the trouble of searching its names list.
As described in Chapter 2, an embedded C or C++ function must be written in the following style:
int my_function(int argc, char **argv)
just as if it were a complete program. The number of arguments is passed through argc, and the address of the array of pointers to the actual arguments is located at argv. So *argv is the pointer to the first argument, and **argv is the first byte of the first argument; *(argv+1) is the pointer to the second argument; etc. Only primitive variables can compose an argument to the C routine; a composite argument to call() generally contains, and is passed as, multiple primitive arguments, one for each primitive component. Strings are passed as linked lists (see the reference section), and call() defragments these lists before running the embedded C code. Void arguments or members are skipped.
call() also creates one argument of its own that can be used for type-checking: an `argument-info' array, whose pointer it tucks into *(argv+argc). This is a list of elements of type arg_info, which is defined in userfn.h. There is one entry for each argument (not including itself), and each entry looks like:
typedef struct {
Ulong arg_type;
Ulong arg_size;
Ulong arg_indices;
} arg_info;
The arg_type codes are defined in Table 1. The arg_size field gives the size in bytes of each element (in case the type is a block), and arg_indices gives the number of indices that were passed (1 if it was just a variable, N if an array).
All arguments are passed by reference: they are pointers to Yazoo's own storage for that data, so data can be exported as well as imported. It is easy to crash Yazoo by overwriting the wrong regions of memory. One of the very first things to check if there is a crash is whether a call() routine was run, since that is very often the culprit. Beware that this crash often happens far downstream, long after the call() had finished.
The return value of the C/C++ function is stored in R_double (even though that is an integer), which is the return variable of call(). So if we write:
y = call("Factorial", x)
then whatever value the C-coded Factorial function, or whatever it is called, returns will be copied into the script variable y.
clip()
syntax: clip((functions) f1, f2, ...)
When a variable is requested by name, Yazoo first looks for a member of that name in the current function. Failing that, it works backwards up the current search path and looks for the member in each parent function. The clip() routine cuts the search paths that cross the arguments to clip (which are usually functions or function-containing classes), preventing any code inside these functions from accessing members that lie outside. (A void argument causes the currently-running function to be clipped). This is the only way to encapsulate code in Yazoo. clip() does not return any value.
The following example illustrates the use of clip.
var1 := 1
clip_demo :: {
var2 := 2
code
print("Before clipping: ", var2, ", ", var1, "\n")
| clip(this) (doesn't work)
clip(clip_demo) | this is one way to do it
clip(*) | this is another
print("After clipping: ", var2, ", ") | so far so good...
print(var1, "\n") | this line will cause an error
}
clip_demo()
When clip_demo() is run, it crashes on the last line since, after clipping, it is no longer able to see the global variable var1. As demonstrated, clip(*) simply clips the current function. Surprisingly, clip(this), which might be expected to do the same, does not work, because this is the argument space of clip(), not the function space of clip_demo().
Note that clip(f1) clips all fibrils (search paths) that pass from the function f1 through the program counter's fibril, at the location of f1. On the other hand, clip(*) clips only the program counter's fibril. A clipped fibril is one that has no stem; see the section on search paths and fibrils.
compile()
syntax: (numeric) error_code = compile((string) source [, (string) var_names [, (string) line_positions]])
Before Yazoo can interpret a script, the script must be pseudo-compiled into a binary form that is much easier to execute than the raw text. The built-in compile() function accepts a Yazoo text script as a string and returns its bytecode into the R_string register. compile() emphatically does not generate executable machine code; the bytecode is something only Yazoo will understand. (Likewise the "disassemble()" routine in start.zoo parses bytecode, not real assembly code.)
To generate a stand-alone script, it is sufficient to invoke compile() with a single argument, as in the following example.
source_code := "a := 1, b := 2, print(a+b)"
err := compile(source_code)
if err == 0
translated_code := R_string
save("add.hob", translated_code)
else
print("Error ", err, " at character ", R_error_index)
endif
Notice that compile() does not return the bytecode, but rather the error code returned by the compiler, which is zero upon a successful compilation. The user must fetch the bytecode manually from the R_string register, and it is usually a good idea to do this immediately after the compiler finishes since other subsequent commands may overwrite R_string.
The built-in compiler only returns one error at a time. The error code is stored in R_error_code, and as we see this register is also the return variable. The register R_error_index stores the character number (not the line number), starting from 1, at which the compiler flagged the error.
Two scripts that are compiled independently as above will generally not be compatible with each other, because the names of all variables and functions get replaced with numbers during compilation; since the namespaces are separate there isn't likely to be much correspondence between the member IDs of the two scripts. If the two scripts are to be run in the same space they must be compiled using the same namespace. The user can control the namespace by providing it as a second string (argument number 2) to compile(); the compiler will then update the namespace string in addition to writing bytecode in R_string.
The first N bytes of the namespace string (where N equals size(ulong)), when viewed as an unsigned long integer, give the number of hidden members that Yazoo has defined so far. These hidden members are for tokens and do not correspond to names that the user generates; they have negative ID numbers. Starting from byte N+1 in the names-table string, the user-defined variables are stored as null-terminated strings in the order that they were discovered by the compiler. So if the variables x, y, result were encountered (in that order) along with two tokens, the names string will read
\00\00\00\02x\00y\00result\00
The following procedure properly initializes the names table:
(InitialNullNamesNum::ulong) = 0
(NamesTable :: string) =! InitialNullNamesNum
...
err := compile(source_code, NamesTable)
Any scripts subsequently compiled with this name table will be compatible with each other. They will most likely not be compatible with the script that compiled them, unless that script has access to its own name table. In fact, the command-prompt user does have access to his own name table, in the variable AllNames. Be warned that if the AllNames variable gets improperly overwritten the user should expect immediate and irreversible dysfunction for the rest of his Yazoo session.
The optional third argument to compile() gives the location in the compiled bytecode of each line break in the original text script. (The user must then provide a names table as well; otherwise the line-breaks argument will be mistaken for the names table.) Each line in Yazoo script translates into roughly one line of bytecode, so by using the line-break table a runtime error, whose location is necessarily given as an index in the bytecode, can be traced back to a specific line in the original script. By storing both the original human-readable text and the table of line breaks that compile() returns, it is possible to flag runtime errors next to the uncompiled text where the error occurred.
The line-breaks string is composed of a number of place-markers, and each place-marker consists of two unsigned long integers. The first integer in each place-marker is the position of some character in the uncompiled text, beginning at 1 for the first character in the script. The second integer is the corresponding index in the compiled bytecode, where index 1 is the first word in the bytecode. Whereas indices in the text enumerate single-byte characters, each word in the bytecode is the length of a long integer: traditionally 4 or 8 bytes.
While compile() is certainly the easiest way to generate bytecode, it is not the only way, and in fact some bytecode cannot be generated at all using compile(). In this case, certainly the user can do the compilation himself, but this is laborious. Unless you are doing something wild, the easiest way to generate non-scriptable code is to compile the next-closest thing that can be scripted using compile() and then tweak the resulting output string as needed (see do_in()).
In a sense compile() is 'just' a string operation: it takes in one string and generates a second string that is based in some complicated way upon the first. As far as Yazoo is concerned the second string is treated just like any other string: it is not in any way executable. In order to actually run the compiled script one must subsequently incorporate the output string into Yazoo's internal code bank, using the built-in routine transform().
cos()
syntax: (numeric) y = cos((numeric) x)
Returns the cosine of its argument. The argument must be numeric. The return variable is R_double.
extract()
syntax: (string) sub_str = extract((string) str, (numeric) first_char, last_char)
Returns a piece of the string given in the first argument, starting from the character position specified by the second argument and ending with the character position specified by the third argument. The following two scripts are equivalent:
part_str := extract(full_str, 5, 10)
and
all_chars[*] :: ubyte
part_string :: string
all_chars[*] =! full_str
part_str =! all_chars[5, 10]
Both the second and the third arguments, which give character positions (starting from 1), must be positive and less than or equal to the number of characters in the original string, or else an error will be generated. The range includes the two endpoints, so if the third argument equals the second then one character will be returned, etc. The return variable is R_string.
find()
syntax: (numeric) result = find((strings) search_in, search_for [, (numeric) mode [, (numeric) starting_position]])
Finds an instance of, or counts the number of instances of, a substring (argument 2) within another string (argument 1). If find() is used in search mode, it returns the character position (where 1 denotes the first character) where the substring was first found, and 0 if it was not found anywhere. If find() is run in count mode, it returns the number of instances of the substring found within the larger string. The return register is R_slong.
The third argument controls the mode that find() is run in: it needs to be -1, 0 or 1. If a mode is not specified then it defaults to mode 1, which denotes a forward search; i.e. it will return the first instance of the substring that it finds. Mode -1 corresponds to a reverse search, which will find the last instance of the substring. Mode 0 is the count mode.
By default, a forward search begins from the first character, and a reverse search begins with the last character. A count proceeds forward from the first character. The starting character can be changed by specifying a starting position in the fourth argument. A mode has to be given in order for a starting position to be specified.
If find() is used to count the number of instances of the substring, it counts only non-overlapping instances. Thus find("ooo", "oo", 0) returns 1 even though the substring occurs starting at both positions 1 and 2, and find("oooo", "oo", 0) returns 2. In both cases the routine finds the substring at position 1, then jumps to the end of that instance---i.e. to position 3---before it resumes searching.
GetParentScriptInfo()
syntax: GetParentScriptInfo((function) f, (numeric) code_index, (numeric) code_ID [, (numeric) code_offset])
Yazoo maintains a global internal registry of all active scripts that have been loaded (see transform()). For example, start.zoo, user.zoo, the scripts that have been run using the run() function, and entries from the command prompt are all stored as separate scripts. For some purposes (notably error flagging) it is important to know the internal ID number of a script, which is what GetParentScriptInfo() provides. The first two arguments specify a code of some function to query (the code index is usually just 1 unless the function was defined with the inheritance operator). GetParentScriptInfo() stores the global ID of that code's enclosing script in the third argument, and the location of the function's bytecode relative to the start of the compiled script (after compilation, measured in long words) into the optional fourth argument.
The following illustrates the distinction between a function, or a function's code, and the script that encapsulates it.
f1 :: {
code
a = b+c | f1 has only one code index
}
f2 :: { ... } : { ... } | f2 has two code indices
If the above text was loaded from a single file using run(), it will have been loaded into memory with a single invocation of transform() and hence the codes of both f1 and f2 will be contained within the same script. However, the two codes of f2 will have a larger offset from the beginning of the script than the single code of f1, since f1 was defined before f2.
if_hidden_member()
syntax: (numeric) if_hidden = if_hidden_member((composite variable) var, (numeric) member_number)
Returns `true' if the given member is hidden, and `false' if it not. Unless the user is writing his own compiled bytecode, the members he defines are not hidden. Hidden members correspond to compiler-generated tokens. The member is specified by its enclosing variable in argument 1, and its member number in argument 2 (not an index!). The return variable is R_slong.
if_member_targeted()
syntax: (numeric) if_not_void = if_member_targeted((composite variable) var, (numeric) member_number)
Returns `true' if the given member has a target, and `false' if it has no target (i.e. it points at the void). The member is specified by its enclosing variable (argument 1), and its member number (argument 2) -- which is not the same as an index. The return variable is R_slong.
input()
syntax: (string) str = input()
Reads in a single line from the C standard input (which is usually the keyboard). input() causes Yazoo's execution to halt until an end-of-line character is read (i.e. the user hits return or enter). The string of characters before, but not including, the end-of-line, is loaded into R_string and returned to the enclosing expression, and script execution resumes. A null character causes the error "I/O error" to be thrown.
load()
syntax: (string) file_string = load((string) file_name)
Reads a file into a string. R_string is used as the return variable because the storage space is a-priori unknown; however, there is no requirement that the data be ASCII-encoded. The file name must include a path if the file is not in the default directory, as in "/Users/bob/Desktop/MyFile.txt". If there is an error in opening or reading the file (i.e. if the file was not found or there was a permissions problem), then load() returns "I/O error", signifying that the error comes from the operating system, not Yazoo. The counterpart to load() is save().
load() searches only in the default directory. The user.zoo routine Load() extends the built-in load() by searching all paths specified in the DirectoryPaths[] array. (The run() function in user.zoo also searches all DirectoryPaths[].)
log()
syntax: (numeric) y = log((numeric) x)
Returns the natural logarithm (base e) of its argument. The argument must be numeric. The logarithm is only defined for arguments greater than zero. The return variable is R_double.
member_code()
syntax: (string) code_str = member_code((variable) var, (numeric) member_number, code_index)
Returns the compiled bytecode of a given member into R_string. This, along with variable_code(), is an inverse operation to transform().
In a sense, member_code() simply reconstructs the original output of the compile() function that generated its script. The only potential difference is that member_code() only returns the part of the script that was used to define the member in question. For example, we may have compiled and transformed a script that, among other things, defined a function.
compile( "var1 :: ulong, f1 :: { code, ... }" )
transform(R_string)
big_function :: R_composite
sub_function :: big_function.f1
(Realize that although transform() suppresses the constructor when it is first introduced into R_composite, the constructor does run when variables are defined in the ordinary way using R_composite as a template.) Now, we can inspect our two members' codes:
big_code := member_code(big_function, 1, 1)
little_code := member_code(little_function, 1, 1)
The code defining the big_code member (and also the variable) is the full script: everything between compile()'s quotation marks. The code defining both the little_code member and its variable is just the code within (and excluding) the braces defining f1.
variable_code() is the complement to member_code; it returns the codes of the functions themselves, as opposed to the member's code. Since a member's code is never actually executed, it is better understood as the type restriction on the variables that member is allowed to target.
member_code_ID()
syntax: (string) the_code_ID = member_code_ID((variable) var, (numeric) member_number, code_index)
Returns the global code ID of a given code of a given member, via the return variable R_slong. This ID was assigned by transform() when the code was transformed. Note that code ID numbers can be recycled if the original code is no longer used and was cleaned out of memory.
For reasons that were explained in the last subsection on member_code(), the code ID of a compiled script is identical to that of the codes of all members that the script defined (as their codes are subsumed within the larger script). To differentiate between the script and all members, one has to look at both the code ID and the code offset (see member_code_offset()).
member_code_offset()
syntax: (string) the_code_offset = member_code_offset((variable) var, (numeric) member_number, code_index)
Returns the code offset of a given code of a given member, using the return variable R_slong. The code offset is measured relative to the start of the script in which it the code was loaded. See the explanation for member_code() for an example of this. The offset is the number of long-integer words the member's bytecode is inset from the beginning of the full script that it was defined in; if the code is this entire script then the offset is zero.
member_code_top()
syntax: (numeric) member_codes_num = member_code_top((composite variable) var, (numeric) member_number)
Returns the number of codes in a member definition. The member is given by the member number (2nd argument) of the given variable (1st argument). Note that the member number is different from an index. Primitive members or untyped members have no codes, so they return a zero. The member may be void. The return variable is R_slong.
member_ID()
syntax: (numeric) mbr_ID = member_ID((composite variable) var, (numeric) member_number)
Returns the ID number of a given member of a composite variable. The ID is essentially the member's name; when a script is compiled all names get converted to numbers. Under normal conditions user-defined names are assigned positive ID numbers, whereas hidden members are given unique negative ID numbers. The variable enclosing the member is the first argument, and the member number (not index!) is the second argument. The return variable is R_slong.
member_indices()
syntax: (numeric) num_indices = member_indices((composite variable) var, (numeric) member_number)
Returns the number of indices spanned by a given member. The member is specified by its host composite variable (the first argument) and a member number (the second argument); hidden members are allowed. The return variable is R_slong.
member_type()
syntax: (numeric) mbr_type = member_type((composite variable) var, (numeric) member_number)
Returns the type restriction of a given member of a composite variable. The variable is the first argument, and the member number is the second argument. The member number is not an index; for example an array may have one member with many indices. Hidden members are included. The types IDs are listed in Table 1. A composite-typed member only returns a `10' (composite) even though its full type is properly determined by its code list. The return variable is R_slong.
print()
syntax: print((vars) v1, v2, ...)
Writes data to the standard output (which is normally the command prompt window). The arguments are printed sequentially and without spaces in between. Numeric arguments are converted to ASCII and printed as legible integers or floating-point numbers. Both string- and block-typed arguments are written verbatim (byte-for-byte) to the screen, except that for string types only, unprintable characters are replaced by their hexadecimal equivalents "\AA" (which is also the format in which these characters may be written into a string). Also, carriage returns in strings are written as end-of-line characters, so a PC-style line ending marked by "\0D\n" outputs as a double line-break.
print_string()
syntax: print_string([(numeric) precision,] (string) to_write, (vars) v1, v2, ...)
Writes data to a text string. print_string() is the counterpart to read_string(). Roughly speaking, print_string() is to print() as C's more elaborate sprintf() is to printf(). The string to write is followed by any number of variables whose data Yazoo writes to the string (with no spaces in between). Importantly, numeric variables are written as text, so print_string is different from a forced equate. For example:
print_string(str, 5, 2.7)
sets str to "52.7", whereas
str =! { 5, 2.7 }
gives something illegible (the raw bytes encoding the two numbers in binary format).
Strings from the source variables get copied into the destination string verbatim. Block-typed variables (and only block variables) are copied in binary form, as in a forced equate.
If the first argument is numeric, then it is taken as the output precision for floating-point single and double variables; then the output string must follow as the second argument. Otherwise the output precision is determined by the C constants FLT_DIG and DBL_DIG for single and double variables, respectively. Thus, when no precision is specified, print_string prints considerably more digits than does print().
random()
syntax: (numeric) y = random()
Returns a pseudo-random number uniformly drawn on the interval [0, 1]. To obtain the random number to double-precision, Yazoo uses C's rand() function in the following formula:
random() = rand()/RAND_MAX + rand()/(RAND_MAX)2
The random number generator is initialized by Yazoo to the current clock time each time the program is run, so the generated sequence should not be repeatable. The return variable is R_double.
read_string()
syntax: read_string((string) to_write, (vars) v1, v2, ...)
Reads data from an ASCII string into variables. The first argument is the string to read from; following arguments give the variables that will store the data. read_string() is the humble cousin to C's sscanf() routine (it does not take a format string). The various fields within the string must be separated by white space or end-of-line characters.
read_string() converts ASCII data in the source string into the binary format of Yazoo's memory. Thus numeric fields in the source string need to be written out as text, as in "3.14" rather than its unintelligible floating-point representation. String fields must be one English (as opposed to integer) word long, so "the quick brown" will be read into three string variables, not one. block variables are simply copied byte-for-byte from the string. Composite variables are decomposed into their primitive components, which are read sequentially from the source string. Void members are skipped.
Here is an example of the use of read_string()
date :: { month :: string, day :: year :: ushort }
activity :: string
read_string("Jan 5 2007 meeting", date, activity)
If the string cannot be read into the given variables (i.e. there are too many or too few variables to read), Yazoo throws a type-mismatch warning. Warnings can also be thrown if Yazoo cannot read a field that should be numeric, or if there is an overflow in a numeric field.
read_string() is a counterpart to print_string(). However, print_string() does not write spaces in between the fields, so unless spaces are put in explicitly its output cannot be read directly by read_string().
round_down()
syntax: (numeric) y = round_down((numeric) x)
Returns the nearest integer that is as low as or lower than the (numeric) argument. This is equivalent to the floor() function in C. For example, round_down(2.3) returns 2, round_down(-2.3) returns -3, and round_down(-4) returns -4. The return variable is R_double.
round_up()
syntax: (numeric) y = round_up((numeric) x)
Returns the nearest integer that is as high as or higher than the argument, which must be numeric. round_up() is equivalent to the ceil() function in C. For example, round_up(5.6) returns 6, round_up(-5.6) returns -5, and round_up(2) returns 2. The return variable is R_double.
save()
syntax: save((strings) file_name, data_to_write)
Saves the data from the second argument into the file specified in the first argument. There is no return value, although the error "I/O error" will be thrown if the save is unsuccessful. (An error would likely indicate a bad pathname, disk full, or that we don't have write permissions for that file or directory). If the directory is not explicitly written before the file name, as in "/Library/my_file", then the file is saved in the default directory, which is probably the directory where Yazoo resides.
There is no need for the data to be encoded in ASCII format, even though it gets passed to save() as a string. Online conversion to the proper string type can be done in the following way:
save("my_data", (temp_str :: string) =! the_data)
where the_data may be a variable or array or any other object. save() writes the data verbatim; if the data is ASCII text, then a text file will be produced; otherwise the output should be considered a binary file. The saved data can be read back into a string by the load() function.
sin()
syntax: (numeric) y = sin((numeric) x)
Returns the sine of its argument. The argument must be numeric. The return variable is R_double.
size()
syntax: (numeric) var_size = size((var) my_var)
Returns the size, in bytes, of the argument variable. For composite variables, this is the sum of the sizes of all its members. If two members of a composite variable point to the same data (i.e. one is an alias of the other), then that data will indeed be double-counted. Therefore the number that is returned gives the number of bytes that will participate in, for example, a forced-equate or save(), which may be more than the number of bytes of actual storage. If a member points back to the composite variable, as in
a :: {
self := @this
data :: ulong }
size(a) | will cause an error
then the size of a, including its members and its members' members, etc., is effectively infinite, and Yazoo throws a self-reference error.
SpringCleaning()
syntax: SpringCleaning()
This function removes all unused objects from Yazoo's memory. An object is termed `unused' if it cannot be accessed by the program counter (PC) or any location on the PC stack (the list of currently running functions). Removing these frees up system memory that might otherwise slowly fill with disused debris.
Internally, Yazoo keeps track of which data structures are being used by assigning them `references': each reference marks a use of that structure by somewhere else in memory. For example, a running function might be referenced twice: once by a member that points to it, and once by the PC acknowledging that the function's code is currently being used; in turn that function references the internally-stored code that is being run. When an object's references reach zero, there is no way to ever access it again---no members or fibrils point to it, since otherwise the object would be referenced---so Yazoo frees its memory.
Although much memory is freed up automatically, unfortunately there are some things that the referencing apparatus will miss. The reason is that two objects can simultaneously become unhooked from the main memory tree that still reference each other: both are now garbage, but they both have at least one reference. For example, variable a might have a member pointing to variable b, while a member of b points back to a. This particular case might sound contrived, but for various reasons all sorts of self-referencing, self-sustaining symbioses form during normal execution.
There is no way to identify and exterminate once and for all these living dead without combing through the whole memory tree, which is what SpringCleaning() does. It first explores every item in memory that could ever be accessed by the program counter; then it scours the entire memory tree and removes anything that was not reached in the first pass. This not only frees up memory but probably speeds up program execution for processes like memory allocation where Yazoo has to search for free slots within its reference lists.
When Yazoo is run in interactive mode, start.zoo disinfects with a SpringCleaning() after every command from the user. Lengthy, memory-intensive scripts may also benefit from a periodic scrubbing, especially if they allocate and subsequently remove, or un-alias, large variables frequently. For example, the following script:
a :: b :: { this[1000][1000] :: double }
for c1 in [1, 10000]
(b = @nothing) :: a
endf
would benefit greatly from a SpringCleaning() within the loop.
tan()
syntax: (numeric) y = tan((numeric) x)
Returns the tangent of its (numeric) argument. The return variable is R_double.
throw()
syntax: throw((numeric) error_code [, (composite) error_script [, (numeric) error_index [, (Boolean) if_warning]]])
Causes an error to occur. This of course stops execution and throws Yazoo back to the last enclosing trap() function; if there is none (even in start.zoo) then Yazoo bails out completely. The first argument gives the error code; the optional second and third arguments allow one to specify which script and where in that script the error appears to come from. If one sets the optional fourth argument to true, then the error will be thrown as a warning instead. All arguments may be skipped with a `*'.
Although all real errors have error codes in the range 1-46, throw() works perfectly well for larger error codes that Yazoo has never heard of. It can be hard to tell when throw() is working. For starters, if the error code is zero then it will appear that throw() is not doing its job, just because 0 is code for `no error'. throw() does require that the error code be zero or positive, so it gives a number-out-of-range error if the argument is negative. However, the following also gives a range error:
throw(6)
In this case throw() actually worked: we got an out-of-range error because that is error #6. (That caused the author some confusion at first.)
top()
syntax: (numeric) vartop = top((composite variable) my_var)
Returns the number of indices of the argument variable. The argument must be a composite variable or equivalent (e.g. set, function, class, etc.). top() does not count hidden members. Therefore the value it returns corresponds to the highest index of the variable that can be accessed, so
my_var[top(my_var)]
is legal (unless the top member is void) whereas
my_var[top(my_var) + 1]
is always illegal (unless we are in the process of defining it).
transform()
syntax: (numeric) error_code = transform((string) compiled_code [, (numeric) code_ID [, (composite array) search_path]])
Copies compiled bytecode, stored as a string, into the internal code of register R_composite without running the constructor. The transformed code may be given the search path of the transforming code, or encapsulated in the sense of having no search path leading out of the register, or given a custom search path. After invoking transform(), R_composite can either be used to run the code directly, or else serve as a template for defining objects that will have the same code. The internal ID number that Yazoo assigns the new code is copied into the optional second argument, if that is provided (it may also be skipped with a `*' void placeholder).
Every composite object in Yazoo has a code or sequence of codes (if the inheritance operator is used) that defines that object's type, its constructor and, if it is a function, its action upon execution. Code is something manifestly different from data: whereas data can be altered during runtime, a bytecode definition lies in Yazoo's internal memory and is generally inaccessible to the user. transform() bridges the gap between the separate worlds of code and data, by generating a new internal code definition from a string (data). Its counterparts, variable_code() and member_code(), copy internal code definitions back into strings.
We can demonstrate a transformation by using it to define a variable. Ordinarily, we would do this by writing something like:
cmplx :: { real :: imag :: double }
However, we could also have done the same thing manually, by writing
code_def := "real :: imag :: double"
compile(code_def, AllNames) | bytecode stored in R_string
transform(R_string) | embed bytecode into R_composite
(Ignore the AllNames list --- this just ensures that the command prompt will remember our members' names.) After these three lines it is almost as if we had written
R_composite :: { real :: imag :: double } | won't work though
except that this latter definition won't actually work because there will be a type mismatch with R_composite's existing code definition. transform() removes any existing codes before incorporating the new code definition, so it doesn't have that problem.
One significant difference with the usual way of defining things is that transform() doesn't run a constructor, so R_composite won't have members real and imag. We do construct these members in our own variable whenever we copy the code into our own variables.
cmplx :: R_composite | the constructor will run inside cmplx
There is one minor difference between a cmplx that was defined directly, and the one whose code we manually transformed. In the former case, the code of cmplx will be part of a larger script, so if cmplx causes an error then start.zoo script will be able to flag it correctly. If we have transformed the code ourselves, any error will have to be flagged in bytecode by the disassemble() routine unless we add the code and relevant info to the ScriptStrings and YH_Lines variables that start.zoo provides.
We did not set it up this way, but there is also a potentially bigger difference between native and user-transformed code, having to do with the search paths inherited upon transformation. The default is to use the search path of the calling function: the transformer was privileged to explore back through a certain set of variables looking for members, so its transformed progeny can be entrusted with the same. But this can be changed: if the transformation had a null third argument as in
transform(R_string, *, *) | * in the third argument
then R_composite and by extension any variables defined from it would not inherit any search path: their code would only see members directly inside their respective variables. The practical consequence is that the following:
my_data :: double
...
print_data :: { code, print(my_data) }
print_data() | fine
works, but the manually-transformed version
my_data :: double
...
compile("code, print(my_data)", AllNames)
transform(R_string, *, *)
print_data :: R_composite
print_data() | member-not-found
will not work, since it will not be able to see my_data.
In fact, transform gives us great flexibility over the search path beyond the default/null options. Yazoo reasons: the calling function is entitled to explore whatever it can name, so its transformed code should have the same privileges; the order of things it can name is irrelevant; so why not allow a transformer to make any search path where it can name each step along the path? The way to do this is to put all stops on the search path, from last to first, inside a set which we put in the third argument. Thus transform(my_str, *, { root, a, r.s }) causes code that runs in R_composite to look for members in R_composite first, then r.s, then a, then finally root. Then it will explore the transformer's search path (including the space of the transforming function). To prevent the transformer's path from grafting onto the manually-specified path, we put a void `*' token in the first position of the path.
Here are some examples of transform()-generated search paths.
f :: {
my_code :: string
code
...
transform(my_code, *, *) | R_composite
transform(my_code, *, { * }) | R_composite
transform(my_code, *) | R_composite --> f --> root
transform(my_code, *, { }) | R_composite --> f --> root
transform(my_code, { a, b }) | R_composite --> a --> b --> f --> root
transform(my_code, { *, a, b }) | R_composite --> a --> b
}
Why ever use transform(), if it is obviously more roundabout than defining things directly? One good reason is that all scripts barring the startup script must be transformed into memory from some other script, and transform() is the only tool that can do this. start.zoo transforms user.zoo in order to run it, and user.zoo's run() routine invokes a transformation each time it runs a script. When Yazoo is run interactively, each entry from the command line is essentially a new script that must also be transformed. Some bytecode actually cannot be scripted, because the compiler doesn't have symbols for certain operations (`goto', def-general operators with unorthodox flags, etc.), and in these cases the bytecode has to be written and transformed manually. Both start.zoo and user.zoo at times find it necessary to transform hand-written snippets of bytecode that the compiler cannot produce.
Yazoo bytecode generated by compile() should always be legitimate; if it is not then there is a bug in the compiler. However, it is rather easier to obtain bogus bytecode than the legitimate sort when you're writing out the binary yourself. Therefore transform() makes no assumptions about the quality of bytecode it's given, and it runs a series of checks on it before loading into R_composite. Any error will of course prevent the code from being loaded. Error information is stored in the error registers, notably R_error_code which stores the error code (0 means no error).
trap()
syntax: (numeric) error_code = trap([code to run])
Runs its argument as code in the calling function, and returns any error value. No code marker is needed within a trap() call. Upon error, the argument ceases execution and the error code, index and script are placed into the registers R_error_code, R_error_index and R_error_script respectively; if the argument runs to completion with no error, then R_error_code and R_error_index are both set to 0. R_error_code is the return variable.
Likewise, a warning will cause R_warning_code, R_warning_index and R_warning_script to be set; the former two are cleared if no warning occurred. Since warnings do not stop execution, it is possible for several warnings to have been generated, in which case only the last one will be stored in the registers.
The trap() function possesses the special ability of running its arguments in the function that called trap(), rather than in a private argument variable, as is the case for all other built-in and user-defined functions. Thus variables which are defined within the trap() argument list will be accessible to the rest of the function.
Importantly, the trap() function clears the Yazoo error flag that stops execution. So if trap()'s argument causes an error, Yazoo immediately bails out to the original script, but then execution resumes after the trap() as if the error had never occurred. The only record of the problem is in the error registers, which are for the user's reference only and do not affect execution.
variable_code()
syntax: (string) code_str = variable_code((variable) var, (numeric) code_index)
Returns the specified (compiled) bytecode of a given variable. The way that multiple codes may be present is through the inheritance operator, which concatenates codes. variable_code() copies the bytecode from Yazoo's internal registry into a string (R_string), so it is an inverse operation to transform().
In many respects the output of variable_code() is identical to the original output of the compile() function when that script was compiled, except that variable_code() also returns partial scripts. For example, we can compile and transform the following script:
code
var_a :: { x :: y :: double }
var_b :: string
Now the above code is stored in R_composite, and by invoking
c_code := variable_code(R_composite, 1)
we can obtain the original bytecode (i.e. the direct output of the compile() function). But suppose we run R_composite() to generate the variables var_a and var_b. Then we can also write
c_code := variable_code(R_composite.var_a, 1)
and obtain the compiled version of "x :: y :: double". Of course, we cannot get any code back from var_b since that is a primitive variable.
Both functions/variables and members have code definitions. The member-side counterpart to variable_code() is member_code, which works in an analogous way.
variable_code_ID()
syntax: (string) the_code_ID = variable_code_ID((variable) var, (numeric) code_index)
Returns the global code ID of a given code of a given variable (there may be more than one per variable). transform() assigns a unique code ID to each chunk of bytecode that it loads, although disused codes that are cleaned from memory will generally have their ID numbers recycled. The return variable is R_slong.
As explained in above in the section on variable_code(), a given script may have a number of sub-codes, which are essentially the codes of the various members of whatever variable the script defined. The entire script gets a unique ID number, which transform() returns via its optional argument. All the sub-codes share the same global ID number. Thus one must use variable_code_offset() to discriminate between the codes of objects defined from the same script.
variable_code_offset()
syntax: (string) the_code_offset = variable_code_offset((variable) var, (numeric) code_index)
Returns the code offset of a given code of a given variable, relative to the start of the script in which it was loaded. This serves to distinguish the codes of different composite members that were defined in the same script and thus have the same code ID. The return variable is R_slong.
When a given script is transformed, the recipient variable of the transformation inherits its entire code, with an offset of zero. However, if that code defines some composite member, as in
...
f1 :: { ... }
...
then that function's code is just the part of the larger script's code that is contained within the braces. It will thus have the same ID number. However, it will have a nonzero offset, since the code within braces necessarily begin some whiles into the script. (The offset begins with the first statement inside the braces, not the braces command itself.) The offset is given in units of bytecode words, starting from 1.
variable_code_top()
syntax: (numeric) var_codes_num = variable_code_top((variable) var)
Returns the number of codes contained within the argument variable. A variable may have multiple codes if it was defined using the inheritance operator, or if its template variable had multiple codes. Primitive variables do not have codes so they return a zero. The return variable is R_slong.
variable_member_top()
syntax: (numeric) var_members_num = variable_member_top((variable) var)
Returns the number of members of the argument variable. The number of members is emphatically different from the number of indices, since a given member may have zero, one or multiple indices, or its indices may be `hidden'. Primitive variables return zero as they do not contain members. The return variable is R_slong.
variable_type()
syntax: (numeric) var_type = variable_type((variable) var)
Returns the type of the argument variable. The types IDs are listed in Table 1. A composite variable simply registers as a `10' (composite) even though its type is properly determined by its code list. The return variable is R_slong.
Last update: July 28, 2013