Assembler specification

This specification describes the syntax and semantics of the standard Bedrock assembler, which converts human-readable source code into assembled bytecode.

Definitions

Identifier

An identifier is a sequence of one or more characters in the ranges 0-9, A-Z, and a-z, as well as the characters !, *, +, ,, -, ., /, <, =, >, ?, ^, and _. The character : can be used only as the final character of an identifier.

Hexadecimal digit

A hexadecimal digit is any character in the ranges 0-9, A-F, or a-f.

Tokens

Byte literal

A byte literal is a sequence of two hexadecimal digits.

It assembles to one byte, with value equal to the decoded value of the hexadecimal digits.

Double literal

A double literal is a sequence of four hexadecimal digits.

It assembles to two bytes, with value equal to the decoded value of the hexadecimal digits.

Comment

A comment is a character sequence beginning with a ( character, followed by zero or more other characters, and ending with a ) character.

It assembles to nothing.

Raw string

A raw string is a character sequence beginning with a ' character, followed by zero or more other characters called the string content, and ending with a ' character.

It assembles to the string content as a UTF-8 encoded byte sequence.

Null-terminated string

A null-terminated string is a character sequence beginning with a " character, followed by zero or more other characters called the string content, and ending with a " character.

It assembles to the string content as a UTF-8 encoded byte sequence, followed by a null byte.

Pad

A pad is a $ character followed by either two or four hexadecimal digits.

It assembles to a sequence of null bytes, with length equal to the decoded value of the hexadecimal digits.

Block

A block begins with a { character called the block start, followed by zero or more tokens, followed by a } character called the block end that is at the same nesting level as the block start. Blocks can be nested. Every block start and block end must be matched.

The block start assembles to the bytecode address of the block end. The block end assembles to nothing.

Label definition

A label definition is a @ character followed by an identifier. The name of the label is given by the identifier. The name must not be shared by another label, sublabel, or macro.

It assembles to nothing. The bytecode address is recorded and associated with the label name.

Sublabel definition

A sublabel definition is a & character followed by an identifier. The name of the sublabel is given by the name of the most recently defined label if any, followed by a / character, and followed by the identifier. The name must not be shared by another label, sublabel, or macro.

It assembles to nothing. The bytecode address is recorded and associated with the label name.

Macro definition

A macro definition is a % character, followed by an identifier, followed by whitespace, followed by zero or more macro body tokens called the macro body, followed by a ; character. The name of the macro is given by the identifier. The name must not be shared by another label, sublabel, or macro.

It assembles to nothing. The macro body is recorded and associated with the macro name.

A macro body token can be any of the following:

Symbol

A symbol is an optional ~ character followed by an identifier. If a symbol begins with a ~ character, the name of the symbol is given by the name of the most recently defined label if any, followed by a / character, followed by the identifier. Otherwise the name of the symbol is given by the identifier. The name must be shared by a label, a sublabel, or a previously defined macro.

If the name is shared by a label or sublabel, it assembles to two bytes with value equal to the address of that label or sublabel.

If the name is shared by a previously defined macro, it assembles as if the symbol were replaced by the associated macro body.

Appendix A: Instructions

Instructions are implemented as macro definitions where the macro body is a lone byte literal. Each instruction mnemonic in the following table should be given a value equal to the sum of the column value and the row value.

0x00 0x20 0x40 0x60 0x80 0xA0 0xC0 0xE0
0x00 HLT NOP DB1 DB2 DB3 DB4 DB5 DB6
0x01 JMP JMS JMP: JMS: JMPr JMSr JMPr: JMSr:
0x02 JCN JCS JCN: JCS: JCNr JCSr JCNr: JCSr:
0x03 JCK JCK* JCK: JCK*: JCKr JCKr* JCKr: JCKr*:
0x04 LDA LDA* LDA: LDA*: LDAr LDAr* LDAr: LDAr*:
0x05 STA STA* STA: STA*: STAr STAr* STAr: STAr*:
0x06 LDD LDD* LDD: LDD*: LDDr LDDr* LDDr: LDDr*:
0x07 STD STD* STD: STD*: STDr STDr* STDr: STDr*:
0x08 PSH PSH* PSH: PSH*: PSHr PSHr* PSHr: PSHr*:
0x09 POP POP* POP: POP*: POPr POPr* POPr: POPr*:
0x0A CPY CPY* CPY: CPY*: CPYr CPYr* CPYr: CPYr*:
0x0B SPL SPL* SPL: SPL*: SPLr SPLr* SPLr: SPLr*:
0x0C DUP DUP* DUP: DUP*: DUPr DUPr* DUPr: DUPr*:
0x0D OVR OVR* OVR: OVR*: OVRr OVRr* OVRr: OVRr*:
0x0E SWP SWP* SWP: SWP*: SWPr SWPr* SWPr: SWPr*:
0x0F ROT ROT* ROT: ROT*: ROTr ROTr* ROTr: ROTr*:
0x10 ADD ADD* ADD: ADD*: ADDr ADDr* ADDr: ADDr*:
0x11 SUB SUB* SUB: SUB*: SUBr SUBr* SUBr: SUBr*:
0x12 INC INC* INC: INC*: INCr INCr* INCr: INCr*:
0x13 DEC DEC* DEC: DEC*: DECr DECr* DECr: DECr*:
0x14 LTH LTH* LTH: LTH*: LTHr LTHr* LTHr: LTHr*:
0x15 GTH GTH* GTH: GTH*: GTHr GTHr* GTHr: GTHr*:
0x16 EQU EQU* EQU: EQU*: EQUr EQUr* EQUr: EQUr*:
0x17 NQK NQK* NQK: NQK*: NQKr NQKr* NQKr: NQKr*:
0x18 IOR IOR* IOR: IOR*: IORr IORr* IORr: IORr*:
0x19 XOR XOR* XOR: XOR*: XORr XORr* XORr: XORr*:
0x1A AND AND* AND: AND*: ANDr ANDr* ANDr: ANDr*:
0x1B NOT NOT* NOT: NOT*: NOTr NOTr* NOTr: NOTr*:
0x1C SHF SHF* SHF: SHF*: SHFr SHFr* SHFr: SHFr*:
0x1D SHC SHC* SHC: SHC*: SHCr SHCr* SHCr: SHCr*:
0x1E TAL TAL* TAL: TAL*: TALr TALr* TALr: TALr*:
0x1F REV REV* REV: REV*: REVr REVr* REVr: REVr*:

Appendix B: Grammar

The language grammar as a regular expression. If an expression conflicts with the written specification, the specification takes priority.

<whitespace>       := \s|\n|\[|\]

<comment>          := \(.*\)
<raw-string>       := '[^']*'
<null-string>      := "[^"]*"
<block-open>       := \{
<block-close>      := \}

<identifier>       := [!*-9<-?A-Za-z/^_]+:?|:
<label>            := @<identifier>
<sublabel>         := &<identifier>

<hex-digit>        := [0-9A-Fa-f]
<byte-literal>     := <hex-digit>{2}
<double-literal>   := <hex-digit>{4}
<pad>              := $(<byte-literal>|<double-literal>)

<symbol>           := ~?<identifier>
<macro-body-token> := <symbol>|<pad>|<byte-literal>|<double-literal>|
                      <raw-string>|<null-string>|<block-open>|<block-close>
<macro-definition> := %<identifier><whitespace>*(<macro-body-token><whitespace>*)*;

<token>            := <macro-body-token>|<label>|<sublabel>
<program>          := <whitespace>*(<token><whitespace>*)*