Manual v2.0.0

This is the user manual for version 2.0.0 of the Torque assembler.

torque is an architecture-independent assembler. Like other assemblers, torque assembles programs down to a sequence of bitstrings called words, with each word representing a processor instruction or a unit of data in the target architecture. Unlike other assemblers, the words that represent each instruction aren’t hardcoded inside the assembler — instead, they’re defined by each program, using the language itself. To write a program for any given architecture, all you need is a datasheet and a few minutes.

Language overview

The base language element is the word template, used to define a new word of any width. An entire program can be written using only word templates. The following program will assemble to six eight-bit words:

#0111_0100
#0110_1111
#0111_0010
#0111_0001
#0111_0101
#0110_0101

Word templates are the only language element that will assemble to words. The rest of the syntax is used to build abstractions on top of word templates, to make writing programs more ergonomic.

torque implements two types of value: blocks and integers.

Blocks

A block is a ‘block of code’, and will assemble to zero or more words. Blocks can be passed into macros, and multiple blocks can be grouped together using a block literal. A word template is a block value.

Integers

An integer is a numeric value, implemented as a signed 64-bit integer, and will assemble to nothing. Integers are used in calculations while the program is being assembled, and must be packed into words in order to be included in the assembled output.

Strings

Strings are a special type of integer. Strings can only be used as arguments to macro invocations, and will behave as if you had invoked the macro once for every character in the string, passing one character of the string as an integer argument to each invocation of the macro.

Macro overview

Macros are used to assign a name to an integer or block value. This value is called the body of the macro.

Macros are invoked by using their names later in the program. When a macro is invoked, the invocation will be replaced with a copy of the macro body. If the macro body also contains invocations, these are resolved first. The following program will assemble to five ten-bit words:

%SEND_A   #01_0110_0001 ;
%NEWLINE  #01_0000_1010 ;
%PRINT   SEND_A NEWLINE ;
%HALT     #00_0000_0000 ;

PRINT PRINT HALT

Macros with arguments

Macros can also be passed a list of argument values when invoked.

In a macro definition, a : separator character is used to specify a list of argument names. When a macro is invoked, the same : separator character is used to pass a corresponding list of argument values to the invocation.

Two more language elements are needed in order to demonstrate macro arguments:

The following program will assemble to three twelve-bit words. The label main will resolve to the program address 0, and the program in this hypothetical architecture will print a value, add 1 to that value, and then jump back to the start.

%PRINT   #0001_0011_0011 ;
%ADD:v   #0010_vvvv_vvvv ;
%GOTO:a  #0101_aaaa_aaaa ;

@main
  PRINT
  ADD:1
  GOTO:main

Language specification

The following table summarises the types of values created by each language element.

Element Type
Comment --
Macro definition --
Label definition --
Pinned address Block
Word template Block
Block literal Block
Conditional block Block
Invocation Block/Integer
Integer literal Integer
Character literal Integer
String literal Integer
Expression Integer

Comment

A comment is an inline note. Comments are discarded by the assembler.

A comment is a sequence of characters starting with a ( character and ending with a ) character.

( This is a comment. )

Macro definition

A macro definition assigns a name to a fragment of code. This fragment of code is called the body of the macro.

A macro definition can define zero or more arguments. An argument allows a block or integer value to be passed into a macro when it is invoked. Argument names will shadow existing values of the same name inside the macro definition.

The type of value accepted by each argument is determined when the argument is defined. A plain argument name denotes an integer argument, and a {} brace-wrapped argument name denotes a block argument.

A macro definition is a @ character followed by a name, followed by zero or more argument definitions, followed by the macro body, followed by a ; character.

An argument definition is a : character followed by an argument name, with the argument name optionally wrapped with the { and } characters.

A macro body is a sequence of zero or more language elements. The macro will evaluate to an integer value if the macro body is a single integer element, else the macro will evaluate to either an integer or a block value if the macro body is a single invocation, else the macro will evaluate to a block value.

%FOR:count:{inner}
  DO:count
  &loop
    inner
    NEXT:~loop ;

Label definition

A label definition assigns a name to an address.

A label definition is a special case of a macro definition, and is equivalent to a macro definition with an integer body and no arguments. However, labels can be referenced before their definition.

Label definitions are either global or local. A global label definition assigns a name to an address. A local label definition assigns a name to an address inside a particular context.

A local label definition inside a macro body defines a label that can only be referenced inside that same macro body. This allows a macro to be invoked multiple times without generating conflicting label definitions. Global labels cannot be defined inside macro bodies.

A local label definition outside a macro body defines a label that is linked to, and can be concisely referenced within, the scope of the most recently defined global label. If the name of the most recently defined global label is ‘global’ and the name of the local label is ‘local’, the actual name of the local label will be ‘global/local’.

A global label definition is a @ character followed by a name. A local label definition is a & character followed by a name.

@main
  &loop
@draw
  &loop

Pinned address

A pinned address defines the start address of a segment of assembled bytecode.

When a pinned address is assembled, the internal counter that tracks the address during assembly will be set to the value of the pinned address. The next word to be assembled will be placed at this address. An error will be reported if the pinned address is less than the current address.

Internally, the list of assembled words is grouped into segments. An initial segment is created at address 0, and a new segment is created every time a pinned address is encountered. This allows the empty space between segments to be elided when using the inhx and inhx32 output formats.

A pinned address is a | character followed by an integer element.

|0x10
|DATA-SEGMENT
|[3 8 >>]

Word template

A word template defines a word as a sequence of bits overlayed with zero or more fields. A word template is a block value.

Each field is a range of bits in the word into which an integer value will be packed. A field is represented as a consecutive string of the same letter, with the name of the field being that same letter.

When a word template is assembled, each field is treated as an invocation of a macro with the same name as the field, with each resolved integer value being packed into the range of bits belonging to that field.

A word template is a # character followed by one or more underscores, letters, and the characters 0 and 1. Each letter and 0 character represents an unset bit, and each 1 character represents a set bit. Underscores are ignored.

The following example shows a fourteen-bit word template with a three-bit field named b and a seven-bit field named f.

#01_01bb_bfff_ffff

Block literal

A block literal is a group of block elements.

A block literal is a { character, followed by zero or more block elements, followed by a } character.

{ @main #0010_0110 GOTO:main }

Conditional block

A conditional block is a block element that will only be assembled if an associated integer element evaluates to a non-zero value.

A conditional block is a ? character, followed by an integer element called the predicate, followed by a block element called the body.

? 1 HALT
?[state debug =] { PRINT_STACK }

Invocation

An invocation is the application of a macro or label, resolving to either an integer value or a block value.

When an invocation is assembled, it is replaced by the value of the associated label, or the body of the associated macro with all argument names replaced with the passed argument values.

An invocation prefixed with a ~ character is a local invocation. A local invocation inside a macro body will resolve to a local label defined inside that same macro. A local invocation outside a macro body will resolve to a local label defined within the scope of the most recently defined global label.

An invocation with arguments can be passed as an invocation argument by wrapping it in {} braces if it will evaluate to a block value, or [] brackets if it will evaluate to an integer value.

An invocation is an optional ~ character, followed by a name, followed by zero or more invocation arguments. An invocation argument is a : separator character followed by either an integer element or a block element.

@main
  &loop
    FOR:8:{ PULSE:DATA-PIN }
    DELAY:1000
    GOTO:~loop

Integer literal

An integer literal is a decimal, hexadecimal, or binary number. An integer literal is an integer value.

Values can range from 0 to 4294967295.

2514
0x9d2
0b1001_1011_0010

Decimal literal

A decimal literal is one or more underscores or characters in the range 0-9. Underscore characters are ignored.

Hexadecimal literal

A hexadecimal literal is a 0x character pair followed by one or more underscores or characters in the ranges 0-9, a-f, and A-Z. Underscore characters are ignored.

Binary literal

A binary literal is a 0b character pair followed by one or more underscores or 0 and 1 characters. Underscore characters are ignored.

Character literal

A character literal is a single character interpreted as an integer with value equal to the Unicode code point of that character.

A character literal is a ' character, followed by a single other character, followed by a ' character.

'a'
'λ'

String literal

A string literal is a sequence of characters. String literals can only be used as arguments to macro invocations.

When a string is passed as an argument to a macro invocation, the macro is invoked once for each character in the string, with each invocation being passed a single character of the string as a character literal.

A string is a " character, followed by zero or more other characters, followed by a " character.

"This is a string literal."

Expression

An expression is a sequence of terms in a stack-based sublanguage that will evaluate to an integer.

Each term in an expression is either an integer or an operator. An integer term can be any valid integer element of the base language. Operator terms are special invocations that are only valid inside an expression.

The terms of an expression are evaluated left-to-right using a stack. When an integer term is evaluated, that integer will be pushed onto the stack. When an operator term is evaluated, one or two values will be popped from the stack, the values will be operated on, and then a result value will be pushed onto the stack.

The stack must contain exactly one value after all terms have been evaluated. This value will be returned as the value of the expression.

The following table lists all available operators. b denotes the value at the top of the stack, and a denotes the next value down. Every operator has a canonical name, and some operators have an alternate name.

Canon. Alt. Result
<eq> = 1 if a equals b, else 0
<neq> != 1 if a does not equal b, else 0
<lth> < 1 if a is less than b, else 0
<gth> > 1 if a is greater than b, else 0
<leq> <= 1 if a is less than or equal to b, else 0
<geq> >= 1 if a is greater than or equal to b, else 0
<add> + a plus b
<sub> - a minus b
<mul> * a multiplied by b
<div> / a divided by b
<exp> ** a to the power of b
<shl> << a left-shifted b bits
<shr> >> a right-shifted b bits
<and> a bitwise-and b
<or> a bitwise-or b
<xor> a bitwise-xor b
<not> bitwise-not of b

An expression is a [ character, followed by one or more terms, followed by a ] character.

[ 1 ]
[ string/end string - ]
[ 0x80 [ 3 0b1000 * ] <add> ]

Examples

Defining instructions

The following is a macro definition for an instruction called TEST, taking two integer arguments called r and b.

%TEST:r:b  #01_1bbb_rrrr ;

Invocations of this macro will assemble to a 10-bit word. The lowest four bits will be replaced with the integer value passed as the first argument r, and the next highest three bits will be replaced with the integer value passed as the second argument b.

The following invocation of the macro receives the value 5 as r and the value 6 as b, and assembles to the word 01_1110_0101.

TEST:5:6

Defining advanced macros

The following is a macro definition for a for loop, taking an integer argument called count and a block argument called block.

%FOR:count:{block}
  SET:count
  &loop
    block
    NEXT:~loop ;

In this example, the macro SET will write the value count to the loop counter, and the macro NEXT will decrement the loop counter and then jump to the address ~loop if not zero.

The ~ character prefixing the invocation ~loop denotes a local invocation, and will resolve to the &loop label defined in this macro definition.

The following demonstrates how to invoke a macro that takes a block argument.

FOR:8:{
    PULSE:GPIO:3
}

The above invocation expands to the following code.

SET:8
@loop
PULSE:GPIO:3
NEXT:loop

Using expressions

Expressions operate on integer values when the program is assembled, resolving to a single integer value. Expressions use a small stack-based postfix language.

The following demonstrates the use of a constant expression to take a label address, right-shift it by eight bits, and then truncate it to the range 0-255. In this example, the macro SET will write the computed value to the register $PAGE.

@main
  SET:$PAGE:[main 8 >> 0xff <and>]

In the C programming language, the equivalent expression would be (main >> 8) & 0xff.

Using strings

The following demonstrates how to encode a string into a sequence of DATA instructions.

%DATA:k #11_0100_kkkk_kkkk;

DATA:"String"

This is equivalent to the following code:

%DATA:k #11_0100_kkkk_kkkk;

DATA:0x53
DATA:0x74
DATA:0x72
DATA:0x69
DATA:0x6e
DATA:0x67

Using constant expressions

The following is an example of how conditional blocks can be used to implement the variable-width UTF-8 character encoding.

%UTF8-B1:c #0ccccccc ;
%UTF8-B2:c #10cccccc ;
%UTF8-B3:c #110ccccc ;
%UTF8-B4:c #1110cccc ;
%UTF8-B5:c #11110ccc ;

%UTF8:c
  ?[c 0x7f <=]
    { UTF8-B1:c }
  ?[c 0x80 >= c 0x07ff <= <and>]
    { UTF8-B3:[c 6 >>] UTF8-B2:[c 0x3f <and>] }
  ?[c 0x0800 >= c 0xffff <= <and>]
    { UTF8-B4:[c 12 >>] UTF8-B2:[c 6 >> 0x3f <and>] UTF8-B2:[c 0x3f <and>] }
  ?[c 0x010000 >= c 0x10ffff <= <and>]
    { UTF8-B5:[c 18 >>] UTF8-B2:[c 12 >> 0x3f <and>] UTF8-B2:[c 6 >> 0x3f <and>] UTF8-B2:[c 0x3f <and>] }
;

UTF8:"tohutō"