Torque: Advanced loops on the Z80

This article is a hands-on demonstration of advanced macro programming techniques for the Torque assembler, showing how to create macros that can generate optimised for-loops for the Zilog Z80 microprocessor.

This article builds on the concepts introduced in Torque: Programming the TRS-80, so it’s recommended to read that article first if you’re unfamiliar with the Torque language.

The language

Torque is a lightweight meta-assembler that gives you the tools to take an instruction set specification from a datasheet and turn it into an expressive and ergonomic programming language. The Torque language is bare-bones, providing only integers, bit sequences, labels, and macro expansion, but it’s general enough to be able to lever the right bits into the right order to write a program for any processor.

The main project page has downloads, source code, and the full user manual.

The processor

The Zilog Z80 microprocessor is the processor used in the TRS-80 home computer, amongst many others. It uses a register-based architecture with a byte-oriented variable-length instruction format. The instruction set contains 150 instructions, ranging from 1 to 4 bytes in length.

The datasheet for the Z80 processor can be found here.

The program

The program we’ll be writing will be a high-level macro that generates an optimised for-loop. The user will pass in a loop body, an iteration count, and the register to use for the loop counter, and the macro will generate the loop code, using an optimised decrement-jump instruction if a particular register is chosen.

This will be enough to demonstrate how to construct high-level abstractions with Torque. If you want to jump to the completed program listing, click here.

Getting started

The first step will be to introduce a few new instructions from the Z80 instruction set that we’ll use to construct our for-loops.

Jumping to a relative address

In Torque: Programming the TRS-80, we introduced the JP absolute jump instruction. A similar instruction is the JR relative jump instruction, which allows us to jump more efficiently when jumping to a nearby address. This is useful with for-loops, because the jump distance is rarely greater than a few dozen bytes.

The datasheet lists the JR e instruction on page 265. The instruction is only two bytes long, with the first byte containing the instruction op-code and the second byte containing the displacement value e. This value is an 8-bit signed integer that will be added to the program counter when jumping, and is calculated by subtracting the address following the JR instruction from the target address.

We can create a macro for this instruction, building on the macros introduced in the previous article. The following program will jump two bytes backwards from the end of the instruction, looping endlessly:

%BYTE:n  #nnnn_nnnn       ;
%JR:e    BYTE:0x18 BYTE:e ;

JR:-2

Instead of manually calculating the displacement, it would be better if we could pass in a label and have Torque automatically calculate the displacement for us. Before we can do this, we need to introduce local labels.

Labels inside macro definitions

The type of label introduced in the previous article is called a global label, so called because it can be referenced anywhere in the program. The drawback to global labels is that they can’t be defined inside macro definitions, because a duplicate label definition would be created each time the macro is invoked.

To avoid this problem, we need to use a local label whenever we want to declare a label inside a macro. Local labels can be declared inside a macro definition, but they will only be accessible from inside that same definition. They otherwise work the same as global labels, acting as macros that expand to the address of the next word in the program.

Local labels are defined using the syntax &name, and are referenced using the syntax ~name.

%LOOP
  &start
  JP:~start ;

LOOP
LOOP

Calculating a displacement

Now that we can reference an address inside a macro, we can create a JR macro that automatically calculates the jump displacement, given a target address:

%JR:addr
  BYTE:0x18
  BYTE:[addr ~end -]
  &end ;

@start
JR:start

This new JR macro uses an expression to calculate the displacement, subtracting the address following the JR instruction from the target address.

Tracking the loop

We currently have an infinite loop, but we want a loop that can run for a given number of iterations before breaking and continuing onwards. To accomplish this, we’ll use a register as a loop counter, setting it to an initial value and then decrementing it in each iteration of the loop.

The LD instruction is used to write a value to a register. There are 35 different versions of this instruction, with each one moving a value from one place to another. We want the version that takes an immediate 8-bit value and writes it to a register, which is listed as LD r,n on page 72 of the datasheet:

This instruction is two bytes long, with the first byte containing the op-code and the register to write the value to, and the second byte containing the value itself. The possible values of the r bit field are listed in the table further down the page, with each value mapping to one of the seven general-purpose registers:

%SET:r:n  #00rr_r110 BYTE:n ;

%A  0b111 ;
%B  0b000 ;
%C  0b001 ;
%D  0b010 ;
%E  0b011 ;
%H  0b100 ;
%L  0b101 ;

SET:A:29

Decrementing the counter

We’ll use the DEC r instruction, listed under DEC m on page 170 of the datasheet, to decrement the loop counter. This instruction takes a similar form to LD r,n, using the r bit field to select a register to decrement:

%DEC:r  #00rr_r101 ;

DEC:A

We can combine this with the SET macro, writing an initial loop value to a register and then decrementing it in each iteration of the loop. We’ll use the B register to store our loop counter:

SET:B:8
@loop
  DEC:B
  JR:loop

Breaking the loop

The final step is to break out of the loop when the loop counter reaches zero. We can achieve this with a conditional jump.

Conditional logic is implemented on the Z80 by testing a bit in the flags register and then jumping if that bit has a given state. The flags register is a special-purpose register, with each bit being set when an arithmetic operation is performed. The bit that we’re interested in is called the zero flag, or Z, and is set when the result of an operation is zero.

Conditional jumps are implemented as variations of the JP and JR instructions. The JR NZ, e instruction, listed on page 273 of the datasheet, performs a relative jump only if the zero flag is not set. We can combine this instruction with the previous example to build a terminating loop:

%JR-NZ:addr
  BYTE:0x20
  BYTE:[addr ~end -]
  &end ;

SET:B:8
@loop
  DEC:B
  JR-NZ:loop

In this example, the value 8 is written to the B register, then the B register is decremented, setting the Z flag only if the result was zero, and then we jump back to the top of the loop if the Z flag is not set. Our loop will run for exactly eight iterations before breaking and continuing onwards.

This is the foundation from which we’ll build our loop macros in the next section.

Building up macros

We now have all of the instructions necessary to start building up to a macro that can create optimised for-loops.

Our first loop macro

We’ll begin by packaging up our progress so far into a macro that can loop n times, with no loop body.

%FOR:n
  SET:B:n
  &loop
    DEC:B
    JR-NZ:~loop

FOR:8

This macro isn’t very useful yet, we also need to be able to pass in a block of code that will run inside the loop.

Passing a loop body

In the previous article we introduced the concept of a macro argument, allowing us to pass integer values into a macro invocation and then use those values inside the generated code.

Integer values aren’t the only type of value that we can pass into a macro. A block value is any element that can be assembled into words, such as a word template or a macro that contains a word template. A block argument is declared by surrounding the argument name with {} braces:

%NOP  #0000_0000 ;

%FOR:n:{body}
  SET:B:n
  &loop
    body
    DEC:B
    JR-NZ:~loop ;

FOR:8:{
  NOP
  NOP
}

In this example, we’ve added a block-type argument called body to the macro. This argument can be used the same as any other macro or argument, and here we’ve invoked it directly following the &loop label so that the block of code passed to the macro will run in each iteration of the loop.

When we invoke the macro, we can bundle together multiple macro invocations or words as one argument by wrapping them in {} braces. The NOP no-operation instruction, listed on page 180 of the datasheet, has no effect on the program.

Nesting loops

We can now pass a loop value and body into our macro, but we aren’t able to create nested loops. This is because register B is hardcoded as our loop counter, so when the inner loop is initialised, the current value of the outer loop will be overwritten.

This can be fixed by passing the register that we want to use for the loop counter as an argument:

%FOR:reg:n:{body}
  SET:reg:n
  &loop
    body
    DEC:reg
    JR-NZ:~loop ;

FOR:C:8:{
  FOR:B:8 {
    NOP
  }
}

Optimising the loop

For-loops are used often in programs, so the Z80 instruction set includes an instruction that allows them to be implemented more efficiently.

The DJNZ instruction, listed on page 278 of the datasheet, is a combination of the DEC and JR-NZ instructions. It requires one fewer byte and takes 19% less time to execute, but it only operates on the B register. We can create a specialised version of the FOR macro to take advantage of these efficiency improvements:

%DJ-NZ:addr
  BYTE:0x10
  BYTE:[addr ~end -]
  &end ;

%FOR-B:n:{body}
  SET:B:n
  &loop
    body
    DJ-NZ:~loop ;

FOR:C:8:{
  FOR-B:8 {
    NOP
  }
}

Every time we want to use the B register for the loop counter, we should use the new FOR-B macro instead.

Merging the macros

It’s a nuisance to remember to use the FOR-B macro when using the B register, instead of the more general FOR macro. We can fix this by combining the two macros, using conditional blocks to determine which form to generate based on which register was passed.

A conditional block is a syntax element that consists of a ? character, followed by an integer value, followed by a block value. The block value will only be included in the program if the integer is non-zero, otherwise it will be discarded.

%FOR:reg:n:{body}
  SET:reg:n
  &loop
    body
    ?[reg B ==] {         DJ-NZ:~loop }
    ?[reg B !=] { DEC:reg JR-NZ:~loop } ;

FOR:C:8:{
  FOR:B:8 {
    NOP
  }
}

We use an expression as our integer value, comparing the passed register to register B. The == ‘equals’ operation returns 1 when both operands are equal, and 0 otherwise. The != ‘not-equals’ operation returns the opposite, so exactly one of the two possible blocks will be included in the generated loop.

This new FOR macro is used exactly the same as the macro we built in the Nesting loops section, but it also generates optimised code for us when using the B register.

Bringing it all together

The completed Torque program is below, including the final implementation of the optimised FOR macro.

%BYTE:n   #nnnn_nnnn        ;  ( 8-bit value                   )
%NOP      #0000_0000        ;  ( Do nothing                    )
%SET:r:n  #00rr_r110 BYTE:n ;  ( Write n to register r         )
%DEC:r    #00rr_r101        ;  ( Decrement register r          )

%JR-NZ:addr
  BYTE:0x20
  BYTE:[addr ~end -] &end   ;  ( Jump to address if not zero   )
%DJ-NZ:addr
  BYTE:0x10
  BYTE:[addr ~end -] &end   ;  ( Decrement B, jump if not zero )

%A  0b111 ;
%B  0b000 ;
%C  0b001 ;
%D  0b010 ;
%E  0b011 ;
%H  0b100 ;
%L  0b101 ;

%FOR:reg:n:{body}
  SET:reg:n
  &loop
    body
    ?[reg B ==] {         DJ-NZ:~loop }
    ?[reg B !=] { DEC:reg JR-NZ:~loop } ;


FOR:C:8:{
  FOR:B:8 {
    NOP
  }
}

This final macro can be improved further still by optimising the cases of one and zero iterations, and by allowing 256 iterations to be requested explicitly, but these would add unnecessary length to the article and are instead left as an exercise for the reader.