This is the user manual for version 3.0.0 of the Torque meta-assembler.
For a gentle introduction to the language, see Introduction to Torque.
This is the user manual for version 3.0.0 of the Torque meta-assembler.
For a gentle introduction to the language, see Introduction to Torque.
There are three types of value available in a Torque program: blocks, integers, and lists. Every element of the language has been filed underneath the data type that it assembles to.
Comments and macro definitions are treated differently, and have their own sections.
Comments are notes left by the programmer, and are ignored by the assembler.
A comment is a sequence of characters wrapped in () parenthesis characters.
If the content of a comment begins with a : character, the rest of the comment will be interpreted as a file path, and the lines following the comment will be treated as the contents of a file found at that path for the purposes of source attribution for error messages. Comment of this form are generated internally by the automatic library inclusion mechanism and can be seen when using the --format=source command-line option.
( This is a comment. ) (: /this/is/a/filepath )
The fundamental data type of Torque, assembling down to the words of the final program. The outermost level of a Torque program can contain only macro definitions and block values.
Block-type macro arguments are defined using a name wrapped in braces, like {name}.
Word templates are the fundamental block value, assembling down to a single word in the final program. Every other block-type value is either an aggregate of word templates or assembles to nothing.
A word template is a # character followed by zero or more 0 or 1 characters representing bits in the assembled word. The _ character can be used as a visual separator.
A field is a named range of bits within a word template, where each bit in the range has been replaced by copies of a single letter (the letter being the name of that field). When assembled, the bits of the field will be packed with the integer value returned by the zero-argument macro with the same name as that field (often passed as a macro argument). Negative integers will be encoded using two’s compliment form.
#0100_0001 ( 8-bit word with no fields ) #01_00bb_bfff_ffff ( 14-bit word with two fields )
Block literals are used to group together multiple block values so they can be passed to a macro invocation as a single argument.
A block literal is a sequence of zero or more block values grouped together into a single block value by wrapping it with {} brace characters.
{ } ( an empty block literal ) { #001 #010 } ( a two-word block literal )
Conditional blocks are used to conditionally assemble a block value. They can be used in combination with macro recursion to create macros that assemble to variable-width data.
A conditional block is a ? character, followed by an integer value, followed by a block value. It assembles to the block value if the integer is not zero, else it assembles to an empty block. The integer value is called the predicate, and the block value is called the body.
?1 #00000000 ( always assembles to a zero byte ) ?[n 0 >] PAD:[n 1 -] ( only assembles if n is greater than 0 )
Label definitions are used to associate a program address with an identifier, which can then be used as an integer value in the program (a label definition can be thought of as a zero-argument macro definition that assembles down to an integer). The value associated with a label definition is the number of words assembled prior to that definition in the program.
Label definitions come in two varieties: global and local. Local labels are automatically namespaced to allow multiple labels of the same name to exist in different parts of the program.
A global label definition is an @ character followed by an identifier, and can be used anywhere other than inside a macro definition or conditional block. The name of the label is given by the identifier.
A local label definition is an & character followed by an identifier, and can be used anywhere other than inside a conditional block or before a global label definitions. If the local label is defined in the main body of the program, the name will be prefixed with the preceding global label name, in the form global/local. If the local label is defined inside a macro definition, the name will be prefixed with the macro name and suffixed with an incrementing invocation identifier, in the form macro:local:id (this name can only be referenced using the special ~local invocation syntax inside the same macro definition).
%MACRO &here; ( a local label, MACRO:here:# ) @main ( a global label, main ) &loop ( a local label, main/loop ) &end ( a local label, main/end ) @other ( a global label, other ) &loop ( a local label, other/loop )
Pinned addresses define the start address of each segment of assembled code. A segment is a sequence of assembled words starting from the pinned address.
When a pinned address is reached by the assembler, the internal counter that tracks the current address will be overwritten by the value of the pinned address, and the next words to be assembled will start from this address. A pinned address cannot be used to backtrack to a previous address.
Internally, the sequence of assembled words forming the program is grouped into segments. An initial segment is created at address 0, and a new segment is created every time a pinned address is encountered. This allows the empty space between segments to be elided some output formats.
A pinned address is a | character followed by an integer value. The integer value is used as the address.
|0x6000 ( start the next program segment from address 0x6000 ) |DATA-SEGMENT ( start the next program segment from a named integer )
Integers are passed around the program as signed 64-bit integers. They can be packed into words via word templates, can toggle blocks of code via conditional blocks, and can start new program segments via pinned addresses.
Integer-type macro arguments are defined using a plain name, like name.
Integer literals are the fundamental integer value. For all integer literals other than the character literal, the - character can be added as a prefix to make the value negative, and the _ character can be used as a visual separator.
0b, followed by one or more digits in the range 0-1.0o, followed by one or more digits in the range 0-7.0-9.0x, followed by one or more digits in the ranges 0-9, A-F, or a-f.' single-quote characters. It assembles to the Unicode code point of that character.0b1110111011 ( binary literal ) 0o1673 ( octal literal ) 955 ( decimal literal ) 0x3BB ( hexadecimal literal ) 'λ' ( character literal )
Expressions are sequences of values and operators that evaluate down to a single integer. They’re used to perform calculations while the program is being assembled.
An expression is a sequence of list values, integer values, and at least one operator, wrapped with [] square bracket characters. An expression containing only integer values and no operators is a list literal.
[1 2 +] ( adds 1 and 2 ) ["ABC" 0 <nth>] ( extracts the 0th list element )
The terms of an expression are evaluated left-to-right using a stack. When an integer or list value is evaluated, that value will be pushed onto the stack. When an operator is evaluated, a number of values will be popped from the stack, the values will be operated on, and the result will be pushed onto the stack as a value. Exactly one integer value must be left on the stack after the expression has been evaluated, this will be used as the value of the expression.
The following table lists every available operator. b represents the value at the top of the stack, and a represents the next value down. Every operator has a canonical name, and some operators have an alternate name.
| Canon. | Alt. | Result |
|---|---|---|
<eq> |
= |
1 if a equals b, else 0 |
<neq> |
!= |
1 if a does not equal b, else 0 |
<lth> |
< |
1 if a is less than b, else 0 |
<gth> |
> |
1 if a is greater than b, else 0 |
<leq> |
<= |
1 if a is less than or equal to b, else 0 |
<geq> |
>= |
1 if a is greater than or equal to b, else 0 |
<add> |
+ |
a plus b |
<sub> |
- |
a minus b |
<mul> |
* |
a multiplied by b |
<div> |
/ |
a divided by b |
<mod> |
a modulo b |
|
<exp> |
** |
a to the power of b |
<shl> |
<< |
a left-shifted b bits |
<shr> |
>> |
a right-shifted b bits |
<and> |
a bitwise-and b |
|
<or> |
a bitwise-or b |
|
<xor> |
a bitwise-xor b |
|
<not> |
bitwise-not of b |
|
<abs> |
absolute value of b |
|
<sum> |
number of set bits in b |
|
<len> |
length of b in bits |
|
<nth> |
extract element b of list a |
|
<fnd> |
find index of element b in list a |
|
<dbg> |
log the contents of the expression stack |
Lists are sequences of integers.
List-type macro arguments are defined using a name wrapped in square brackets, like [name]. Lists can also be passed to a macro invocation via an integer-type argument, which will cause the macro to be invoked once for each integer in the list.
List literals are used to gather multiple integer values into a single list value.
A list literal is a sequence of integer values wrapped in [] square brackets.
[1 2 3] ( list literal containing only integer literals ) ['A' ADD:1:2] ( list literal containing a character and a macro invocation )
String literals are lists of characters. They assemble to a list that contains one integer for each character in the string, with the value of each integer being the Unicode code point of the corresponding character.
A string literal is a sequence of characters wrapped in " double-quotes.
"This is a string literal."
Macros are declarative templates for fragments of code, and can evaluate to any type of value. Arguments can be passed to a macro when invoked, modifying the value generated.
A macro definition is a % character followed by an identifier, followed by zero or more argument definitions each preceded by a : character, followed by a value that will be the body of the macro, followed by a ; character. The name of the macro is given by the identifier. Multiple different macros can share the same name, as long as the argument count is different for each. Macro definitions have no value, and must be placed at the outermost level of the program
Macro arguments are implemented as local macro definitions that shadow any global macros of the same name. Plain argument names (like name) accept integer values, argument names wrapped in square brackets (like [name]) accept list values, and argument names wrapped in braces (like {name}) accept block values.
A macro is invoked by writing the name of the macro, followed by zero or more argument values each preceded by a : character. The number of arguments passed has to match the argument count of the macro definition, and the type of the value passed into each slot has to match the type of the corresponding argument definition. When a macro is invoked, the invocation is replaced with the body of that macro.
%ONE 1; ( integer macro with no arguments ) %BYTE:n #nnnnnnnn; ( block macro with one argument ) %ADD:a:b [a b +]; ( integer macro with two arguments ) %NAME:int:[list]:{block}; ( block macro with three arguments ) BYTE:ONE ( macro invocation with one argument ) ADD:1:2 ( macro invocation with two arguments )
This section describes the operation of the assembler itself.
Torque includes an automatic symbol resolution mechanism that can search for missing label and macro definitions inside library files and then include those files into the assembled program.
If an undefined symbol is encountered during assembly, Torque will search for a project library or an environment library that defines that symbol. If that library file contains further undefined symbols, Torque will continue to search for matching definitions in library files recursively.
A project library is a Torque source file (with extension .tq) that is held in the same folder as the program being assembled (or a subfolder thereof). The --no-project-libs command-line option prevents Torque from searching for any project libraries.
An environment library is a Torque source file (with extension .tq) that is held in a folder listed in the TORQUE_LIBS environment variable (or a subfolder thereof). The TORQUE_LIBS environment variable must contain a colon-separated list of filesystem paths. The --no-env-libs command-line option prevents Torque from searching for any environment libraries.
Torque source files can be accompanied by up to two sidecar files in the same directory, called the head file and the tail file. If the main file is called library.tq, the head file will be called library.head.tq and the tail file will be called library.tail.tq. The main source file and optional head and tail files form a source unit, and are treated as one file by Torque when searching for definitions.
If a required definition is found in a source unit, that source unit will be included in the program being assembled. Once all required source units have been found, all of the constituent files are concatenated to form a single file to be assembled. All head files come first, followed by all main files (with the main section of the original source file being placed at the top of this section), followed by all tail files.
If the --format=source command-line option was passed, the assembler will stop after symbol resolution and concatenation, returning the single resolved source file. Path comments are inserted at the top of each included file, which allows the assembler to attribute errors to the original file path and line numbers of each library file.
If the --tree command-line option was passed, a symbol resolution tree will be displayed. Every source unit that was included into the program is shown, with decorations showing whether a unit had a head or tail file, and with each unit shown descending from the unit that pulled it in.
These formats are used with the --format=<fmt> argument.
cmd
The program is assembled as the CMD executable file format used by the CP/M operating system. The width of each word must not exceed 8 bits.
The address of the first program segment is used as the entry address of the assembled program.
debug
Each word of the program is printed as a human-readable binary string, with a separator placed every 4 bits.
This is the default output format.
inhx
The program is assembled as the original 8-bit Intel hex format. The width of each word must not exceed 8 bits.
inhx32
The program is assembled as the modified 16-bit Intel hex format used by Microchip. The width of each word must not exceed 16 bits.
raw
The program is assembled as raw bytes. Each word is zero-padded to the nearest byte and then assembled as a sequence of bytes in big-endian order. All words must be the same width.
source
The source code of the program is printed out after symbols have been resolved. This is used to combine a program with all included library files so that it can be shared as a single file.