This specification describes the syntax and semantics of the standard Bedrock assembler, which converts human-readable source code into assembled bytecode.
Definitions
Identifier
An identifier is a sequence of one or more characters in the ranges 0-9
, A-Z
, and a-z
, as well as the characters !
, *
, +
, ,
, -
, .
, /
, <
, =
, >
, ?
, ^
, and _
. The character :
can be used only as the final character of an identifier.
Hexadecimal digit
A hexadecimal digit is any character in the ranges 0-9
, A-F
, or a-f
.
Tokens
Byte literal
A byte literal is a sequence of two hexadecimal digits.
It assembles to one byte, with value equal to the decoded value of the hexadecimal digits.
Double literal
A double literal is a sequence of four hexadecimal digits.
It assembles to two bytes, with value equal to the decoded value of the hexadecimal digits.
Comment
A comment is a character sequence beginning with a (
character, followed by zero or more other characters, and ending with a )
character.
It assembles to nothing.
Raw string
A raw string is a character sequence beginning with a '
character, followed by zero or more other characters called the string content, and ending with a '
character.
It assembles to the string content as a UTF-8 encoded byte sequence.
Null-terminated string
A null-terminated string is a character sequence beginning with a "
character, followed by zero or more other characters called the string content, and ending with a "
character.
It assembles to the string content as a UTF-8 encoded byte sequence, followed by a null byte.
Pad
A pad is a $
character followed by either two or four hexadecimal digits.
It assembles to a sequence of null bytes, with length equal to the decoded value of the hexadecimal digits.
Block
A block begins with a {
character called the block start, followed by zero or more tokens, followed by a }
character called the block end that is at the same nesting level as the block start. Blocks can be nested. Every block start and block end must be matched.
The block start assembles to the bytecode address of the block end. The block end assembles to nothing.
Label definition
A label definition is a @
character followed by an identifier. The name of the label is given by the identifier. The name must not be shared by another label, sublabel, or macro.
It assembles to nothing. The bytecode address is recorded and associated with the label name.
Sublabel definition
A sublabel definition is a &
character followed by an identifier. The name of the sublabel is given by the name of the most recently defined label if any, followed by a /
character, and followed by the identifier. The name must not be shared by another label, sublabel, or macro.
It assembles to nothing. The bytecode address is recorded and associated with the label name.
Macro definition
A macro definition is a %
character, followed by an identifier, followed by whitespace, followed by zero or more macro body tokens called the macro body, followed by a ;
character. The name of the macro is given by the identifier. The name must not be shared by another label, sublabel, or macro.
It assembles to nothing. The macro body is recorded and associated with the macro name.
A macro body token can be any of the following:
Symbol
A symbol is an optional ~
character followed by an identifier. If a symbol begins with a ~
character, the name of the symbol is given by the name of the most recently defined label if any, followed by a /
character, followed by the identifier. Otherwise the name of the symbol is given by the identifier. The name must be shared by a label, a sublabel, or a previously defined macro.
If the name is shared by a label or sublabel, it assembles to two bytes with value equal to the address of that label or sublabel.
If the name is shared by a previously defined macro, it assembles as if the symbol were replaced by the associated macro body.
Appendix A: Instructions
Instructions are implemented as macro definitions where the macro body is a lone byte literal. Each instruction mnemonic in the following table should be given a value equal to the sum of the column value and the row value.
0x00 |
0x20 |
0x40 |
0x60 |
0x80 |
0xA0 |
0xC0 |
0xE0 |
|
---|---|---|---|---|---|---|---|---|
0x00 |
HLT |
NOP |
DB1 |
DB2 |
DB3 |
DB4 |
DB5 |
DB6 |
0x01 |
JMP |
JMS |
JMP: |
JMS: |
JMPr |
JMSr |
JMPr: |
JMSr: |
0x02 |
JCN |
JCS |
JCN: |
JCS: |
JCNr |
JCSr |
JCNr: |
JCSr: |
0x03 |
JCK |
JCK* |
JCK: |
JCK*: |
JCKr |
JCKr* |
JCKr: |
JCKr*: |
0x04 |
LDA |
LDA* |
LDA: |
LDA*: |
LDAr |
LDAr* |
LDAr: |
LDAr*: |
0x05 |
STA |
STA* |
STA: |
STA*: |
STAr |
STAr* |
STAr: |
STAr*: |
0x06 |
LDD |
LDD* |
LDD: |
LDD*: |
LDDr |
LDDr* |
LDDr: |
LDDr*: |
0x07 |
STD |
STD* |
STD: |
STD*: |
STDr |
STDr* |
STDr: |
STDr*: |
0x08 |
PSH |
PSH* |
PSH: |
PSH*: |
PSHr |
PSHr* |
PSHr: |
PSHr*: |
0x09 |
POP |
POP* |
POP: |
POP*: |
POPr |
POPr* |
POPr: |
POPr*: |
0x0A |
CPY |
CPY* |
CPY: |
CPY*: |
CPYr |
CPYr* |
CPYr: |
CPYr*: |
0x0B |
SPL |
SPL* |
SPL: |
SPL*: |
SPLr |
SPLr* |
SPLr: |
SPLr*: |
0x0C |
DUP |
DUP* |
DUP: |
DUP*: |
DUPr |
DUPr* |
DUPr: |
DUPr*: |
0x0D |
OVR |
OVR* |
OVR: |
OVR*: |
OVRr |
OVRr* |
OVRr: |
OVRr*: |
0x0E |
SWP |
SWP* |
SWP: |
SWP*: |
SWPr |
SWPr* |
SWPr: |
SWPr*: |
0x0F |
ROT |
ROT* |
ROT: |
ROT*: |
ROTr |
ROTr* |
ROTr: |
ROTr*: |
0x10 |
ADD |
ADD* |
ADD: |
ADD*: |
ADDr |
ADDr* |
ADDr: |
ADDr*: |
0x11 |
SUB |
SUB* |
SUB: |
SUB*: |
SUBr |
SUBr* |
SUBr: |
SUBr*: |
0x12 |
INC |
INC* |
INC: |
INC*: |
INCr |
INCr* |
INCr: |
INCr*: |
0x13 |
DEC |
DEC* |
DEC: |
DEC*: |
DECr |
DECr* |
DECr: |
DECr*: |
0x14 |
LTH |
LTH* |
LTH: |
LTH*: |
LTHr |
LTHr* |
LTHr: |
LTHr*: |
0x15 |
GTH |
GTH* |
GTH: |
GTH*: |
GTHr |
GTHr* |
GTHr: |
GTHr*: |
0x16 |
EQU |
EQU* |
EQU: |
EQU*: |
EQUr |
EQUr* |
EQUr: |
EQUr*: |
0x17 |
NQK |
NQK* |
NQK: |
NQK*: |
NQKr |
NQKr* |
NQKr: |
NQKr*: |
0x18 |
IOR |
IOR* |
IOR: |
IOR*: |
IORr |
IORr* |
IORr: |
IORr*: |
0x19 |
XOR |
XOR* |
XOR: |
XOR*: |
XORr |
XORr* |
XORr: |
XORr*: |
0x1A |
AND |
AND* |
AND: |
AND*: |
ANDr |
ANDr* |
ANDr: |
ANDr*: |
0x1B |
NOT |
NOT* |
NOT: |
NOT*: |
NOTr |
NOTr* |
NOTr: |
NOTr*: |
0x1C |
SHF |
SHF* |
SHF: |
SHF*: |
SHFr |
SHFr* |
SHFr: |
SHFr*: |
0x1D |
SHC |
SHC* |
SHC: |
SHC*: |
SHCr |
SHCr* |
SHCr: |
SHCr*: |
0x1E |
TAL |
TAL* |
TAL: |
TAL*: |
TALr |
TALr* |
TALr: |
TALr*: |
0x1F |
REV |
REV* |
REV: |
REV*: |
REVr |
REVr* |
REVr: |
REVr*: |
Appendix B: Grammar
The language grammar as a regular expression. If an expression conflicts with the written specification, the specification takes priority.
<whitespace> := \s|\n|\[|\] <comment> := \(.*\) <raw-string> := '[^']*' <null-string> := "[^"]*" <block-open> := \{ <block-close> := \} <identifier> := [!*-9<-?A-Za-z/^_]+:?|: <label> := @<identifier> <sublabel> := &<identifier> <hex-digit> := [0-9A-Fa-f] <byte-literal> := <hex-digit>{2} <double-literal> := <hex-digit>{4} <pad> := $(<byte-literal>|<double-literal>) <symbol> := ~?<identifier> <macro-body-token> := <symbol>|<pad>|<byte-literal>|<double-literal>| <raw-string>|<null-string>|<block-open>|<block-close> <macro-definition> := %<identifier><whitespace>*(<macro-body-token><whitespace>*)*; <token> := <macro-body-token>|<label>|<sublabel> <program> := <whitespace>*(<token><whitespace>*)*