Roadmap

Confirmed

Deeper recursion

Recursion is currently implemented in a straightforward manner via nesting function calls inside the assembler. This allows for a few hundred levels of recursion on my machine before the stack overflows and the assembler crashes, and only around a hundred levels of recursion on debug builds.

To fix this, we can implement a call stack with a data structure inside the assembler instead of using the language stack. This should allow much deeper recursion (tens of thousands of levels), and will allow us to print an error message if recursion goes too deep instead of just crashing out.

(use a stack structure, holding the stack of environments)

Error token

Add a built-in invocation that causes an error, useful for enforcing constraints in macro definitions. Consider the name ! or <ERROR>.

Proposals

Allow label references in predicates

Currently, the predicate value of a conditional block cannot contain a label reference. This is because conditional blocks are evaluated during the ‘intermediate’ stage of assembly, but label references are evaluated afterwards in the ‘bytecode’ stage, so label addresses are not available while conditional blocks are being evaluated.

There is good reason for this limitation. Conditional blocks can change the length of the program depending on whether their predicate evaluates to zero or non-zero, and this can change the address of all future label definitions. By allowing label references to be used in predicates, the predicate could evaluate to true, which could insert additional words into the program, which could offset the definition of the label, causing the predicate to retroactively evaluate to false instead. This would result in an undecidable program that cannot be assembled.

However, there would be benefits to investing the effort to at least partially alleviate this restriction. In the Z80 instruction set, amongst others, there is both a JR relative jump instruction and a JP absolute jump instruction. The JR instruction can and should be used when the jump distance can be represented as an 8-bit signed integer, and the JP instruction should be used for all other situations. If label references could be used in condition predicates, we could create a general optimised jump macro, as follows:

%JR:addr  ( --elided-- ) ;
%JP:addr  ( --elided-- ) ;
%IN:n:min:max  [n min <geq> n max <leq> <and>] ;

%GOTO:addr
  ?[IN:[addr ~end -]:-127:128 1 ==]  JR:addr
  ?[IN:[addr ~end -]:-127:128 0 ==]  JP:addr
  &end
;

GOTO:later
( --elided-- )
@later

My current thoughts on how to implement this would be via some sort of brute-force algorithm, where label addresses and conditional blocks are evaluated at the same time, re-evaluating as conditions and addresses shift around and hopefully settle into a stable configuration. If a stable configuration cannot be found, an error would be raised.

The foundations for this have been implemented in Torque 2.4.0, which adds functionality where the final bytecode stage of the assembler can be re-run multiple times until all label addresses have stabilised.

The creator of the ‘flat assembler g’ meta-assembler calls this the oscillator problem and discusses it here.

Generalised lists

It’s only a small step to support generalised lists of integers, handled identically to how strings are currently handled.

The difficulty with this feature is that all the different brackets have already been used for other language elements, so there isn’t a convenient syntax ready to be used for declaring a list. The best solution I can think of is to overload the expression syntax: allow leaving zero or more values on the stack, with the final contents of the stack being the contents of the list. This would give us a very readable list syntax, at the expense of error-checking for expressions:

DATA:[1 2 3]

So a solution to that would be to add some kind of syntactic tag to an expression that allows it to become a list, with the current behaviour of calculating a single integer being the default behaviour if the tag is missing.

Lookup tables

Mapping Unicode character codes to a different character set is a pain, requiring a large table of conditional blocks:

%BYTE:n  #nnnn_nnnn ;
%ISO-8859-1:c
  ?[c 0x7F <=]  BYTE:c
  ?[c '¡'  ==]  BYTE:0xA1
  ?[c '¢'  ==]  BYTE:0xA2
  ?[c '£'  ==]  BYTE:0xA3 ;

ISO-8859-1:"£190.00"

Could there be a better syntax for this, applicable to more than just this one use case?

One potential solution could use string values combined with an indexing expression operation <idx>, as follows:

%ISO-8859-1/CHARS
  " ¡¢£¤¥¦§¨©ª«¬ ®¯°±²³´µ¶·¸¹º»¼½¾¿" ;

%ISO-8859-1:c
  ?[c 0x80 <lth>] BYTE:c
  ?[c 0x80 <geq>] BYTE:[ISO-8859-1/CHARS [c 0x80 -] <idx>] ;

ISO-8859-1:"£190.00"

Debug token

Add a built-in invocation that prints the values of all arguments, useful for debugging macro definitions. Consider the name <DEBUG>.

This would have to resolve at the ‘bytecode’ stage of the assembler.

Multiple output formats

Think about how to allow multiple formats to be output in the same run, to save from having to re-assemble a project multiple times.

Heterogeneous stacks

Allow placing strings and blocks onto a stack in order to calculate integers from them using various specific operators.