This specification describes the syntax and semantics of the standard assembler language for the Bedrock computer system.
Overview
The Bedrock assembler is the default assembler for the Bedrock computer system. The language is fast to learn and straight-forward to implement, while still providing the tools needed to build complex programs and high-level abstractions.
The assembler will convert a single program source file into an assembled program as a sequence of bytes. The program will then be able to run on a Bedrock system.
Definitions
Byte
A byte is an 8-bit value, and is the smallest unit of data that can be generated by a language element.
Double
A double (or ‘double-width value’) is a 16-bit value, represented by a pair of bytes in big-endian order.
Character
A character is a single Unicode scalar value.
Address
The address of a language element is equal to the number of bytes assembled up until that element.
Components
Hexadecimal digit
A hexadecimal digit is a character in the range 0-9
, A-F
, or a-f
.
Identifier
An identifier is a sequence of zero or more characters, and is used to identify a label or a macro. There is no restriction to the characters that can be used in an identifier. The maximum length of an identifier is 63 characters.
An identifier longer than 63 characters is an error.
Token
A token is an instance of a language element. Each token is separated from the next by a whitespace character LF
, CR
, TAB
, SPACE
, or by the next token beginning with a delimiter character (
, )
, [
, ]
, {
, }
, ;
, or by the previous token ending with a delimiter character or the terminator character :
.
Language elements
Literal
A byte literal is a pair of hexadecimal digits. It assembles to a byte.
A double literal is a group of four hexadecimal digits. It assembles to a double.
Pad
A pad is a #
character followed by a literal. It assembles to a sequence of zero bytes, with length equal to the literal value.
Mark
A mark is a single [
or ]
character. It assembles to nothing.
String
A raw string is a '
character, followed by zero or more other characters called the string content, followed by a '
character. It assembles to the string content as a UTF-8 encoded byte sequence. The string content can contain '
characters by prefixing each one with a back-slash character.
A terminated string is a "
character, followed by zero or more other characters called the string content, followed by a "
character. It assembles to the string content as a UTF-8 encoded byte sequence, followed by a zero byte. The string content can contain "
characters by prefixing each one with a back-slash character.
An unclosed string is an error.
Comment
A comment is a (
character called the comment start, followed by zero or more non-)
characters, followed by a single )
character called the comment end. It assembles to nothing.
An unmatched comment start or end is an error.
Block
A block is a single {
character called the block start, matched with a single }
character called the block end. Each block end matches the most recent unmatched block start. The start assembles to a double, the value being the address of the matching end. The end assembles to nothing.
An unmatched block start or end is an error. A block start or block end in a macro definition that is not matched in the same definition is an error. A block end with an address greater than 0xFFFF
is an error.
Label
A global label is a @
character followed by an identifier. The address of the label is associated with the identifier. It assembles to nothing.
A local label is a &
character followed by an identifier, called the local identifier. The full identifier of the label is assembled from the identifier of the most recent global label if any, followed by a /
character, followed by the local identifier. The address of the label is associated with the identifier. It assembles to nothing.
A label with a name that is shared by another label or macro is an error. A label with an address greater than 0xFFFF
is an error.
Macro definition
A macro definition is a %
character, followed by an identifier, followed by zero or more macro body tokens called the macro body, followed by a ;
character. The macro body is associated with the identifier. It assembles to nothing.
A macro body token can be a literal, pad, mark, string, comment, block, or symbol.
A macro with a name that is shared by another label or macro is an error. A macro with a body that contains a label or a macro definition is an error.
Symbol
A symbol is a plain identifier. Any token that does not match another language element is a symbol. If the identifier begins with a ~
character, that character is replaced with the identifier of the most recent global label, followed by a /
character.
If the symbol references a label, the symbol assembles to a double, with the value being the address of that label. If the symbol references a macro definition, the symbol is replaced with the body of that macro and is then assembled.
A symbol that references a macro defined later in the program is an error. A symbol that does not reference a label or a macro definition is an error.
Instruction mnemonics
The following set of macro definitions must be predefined by the assembler:
%HLT 00; %NOP 20; %DB1 40; %DB2 60; %DB3 80; %DB4 A0; %DB5 C0; %DB6 E0; %PSH 01; %PSH* 21; %PSH: 41; %PSH*: 61; %PSHr 81; %PSHr* A1; %PSHr: C1; %PSHr*: E1; %: 41; %*: 61; %r: C1; %r*: E1; %POP 02; %POP* 22; %POP: 42; %POP*: 62; %POPr 82; %POPr* A2; %POPr: C2; %POPr*: E2; %CPY 03; %CPY* 23; %CPY: 43; %CPY*: 63; %CPYr 83; %CPYr* A3; %CPYr: C3; %CPYr*: E3; %DUP 04; %DUP* 24; %DUP: 44; %DUP*: 64; %DUPr 84; %DUPr* A4; %DUPr: C4; %DUPr*: E4; %OVR 05; %OVR* 25; %OVR: 45; %OVR*: 65; %OVRr 85; %OVRr* A5; %OVRr: C5; %OVRr*: E5; %SWP 06; %SWP* 26; %SWP: 46; %SWP*: 66; %SWPr 86; %SWPr* A6; %SWPr: C6; %SWPr*: E6; %ROT 07; %ROT* 27; %ROT: 47; %ROT*: 67; %ROTr 87; %ROTr* A7; %ROTr: C7; %ROTr*: E7; %JMP 08; %JMP* 28; %JMP: 48; %JMP*: 68; %JMPr 88; %JMPr* A8; %JMPr: C8; %JMPr*: E8; %JMS 09; %JMS* 29; %JMS: 49; %JMS*: 69; %JMSr 89; %JMSr* A9; %JMSr: C9; %JMSr*: E9; %JCN 0A; %JCN* 2A; %JCN: 4A; %JCN*: 6A; %JCNr 8A; %JCNr* AA; %JCNr: CA; %JCNr*: EA; %JCS 0B; %JCS* 2B; %JCS: 4B; %JCS*: 6B; %JCSr 8B; %JCSr* AB; %JCSr: CB; %JCSr*: EB; %LDA 0C; %LDA* 2C; %LDA: 4C; %LDA*: 6C; %LDAr 8C; %LDAr* AC; %LDAr: CC; %LDAr*: EC; %STA 0D; %STA* 2D; %STA: 4D; %STA*: 6D; %STAr 8D; %STAr* AD; %STAr: CD; %STAr*: ED; %LDD 0E; %LDD* 2E; %LDD: 4E; %LDD*: 6E; %LDDr 8E; %LDDr* AE; %LDDr: CE; %LDDr*: EE; %STD 0F; %STD* 2F; %STD: 4F; %STD*: 6F; %STDr 8F; %STDr* AF; %STDr: CF; %STDr*: EF; %ADD 10; %ADD* 30; %ADD: 50; %ADD*: 70; %ADDr 90; %ADDr* B0; %ADDr: D0; %ADDr*: F0; %SUB 11; %SUB* 31; %SUB: 51; %SUB*: 71; %SUBr 91; %SUBr* B1; %SUBr: D1; %SUBr*: F1; %INC 12; %INC* 32; %INC: 52; %INC*: 72; %INCr 92; %INCr* B2; %INCr: D2; %INCr*: F2; %DEC 13; %DEC* 33; %DEC: 53; %DEC*: 73; %DECr 93; %DECr* B3; %DECr: D3; %DECr*: F3; %LTH 14; %LTH* 34; %LTH: 54; %LTH*: 74; %LTHr 94; %LTHr* B4; %LTHr: D4; %LTHr*: F4; %GTH 15; %GTH* 35; %GTH: 55; %GTH*: 75; %GTHr 95; %GTHr* B5; %GTHr: D5; %GTHr*: F5; %EQU 16; %EQU* 36; %EQU: 56; %EQU*: 76; %EQUr 96; %EQUr* B6; %EQUr: D6; %EQUr*: F6; %NQK 17; %NQK* 37; %NQK: 57; %NQK*: 77; %NQKr 97; %NQKr* B7; %NQKr: D7; %NQKr*: F7; %SHL 18; %SHL* 38; %SHL: 58; %SHL*: 78; %SHLr 98; %SHLr* B8; %SHLr: D8; %SHLr*: F8; %SHR 19; %SHR* 39; %SHR: 59; %SHR*: 79; %SHRr 99; %SHRr* B9; %SHRr: D9; %SHRr*: F9; %ROL 1A; %ROL* 3A; %ROL: 5A; %ROL*: 7A; %ROLr 9A; %ROLr* BA; %ROLr: DA; %ROLr*: FA; %ROR 1B; %ROR* 3B; %ROR: 5B; %ROR*: 7B; %RORr 9B; %RORr* BB; %RORr: DB; %RORr*: FB; %IOR 1C; %IOR* 3C; %IOR: 5C; %IOR*: 7C; %IORr 9C; %IORr* BC; %IORr: DC; %IORr*: FC; %XOR 1D; %XOR* 3D; %XOR: 5D; %XOR*: 7D; %XORr 9D; %XORr* BD; %XORr: DD; %XORr*: FD; %AND 1E; %AND* 3E; %AND: 5E; %AND*: 7E; %ANDr 9E; %ANDr* BE; %ANDr: DE; %ANDr*: FE; %NOT 1F; %NOT* 3F; %NOT: 5F; %NOT*: 7F; %NOTr 9F; %NOTr* BF; %NOTr: DF; %NOTr*: FF;
Appendix A: EBNF grammar
The following is a grammar for the assembler language as a set of EBNF (Extended Backus-Naur Form) rules:
program ::= whitespace { token whitespace } token ::= literal | pad | mark | string | comment | block | symbol | macro-definition | label block ::= '{' program '}' literal ::= digit digit [ digit digit ] pad ::= '#' literal mark ::= '[' | ']' string ::= "'" { any-character } "'" | '"' { any-character } '"' comment ::= '(' { any-character } ')' symbol ::= [ '~' ] identifier label ::= '@' identifier | '&' identifier macro-definition ::= '%' identifier macro-body ';' macro-body ::= whitespace { body-token whitespace } body-token ::= literal | pad | mark | string | comment | macro-block | symbol macro-block ::= '{' macro-body '}' identifier ::= { character } terminator | { character } character terminator ::= ':' whitespace ::= { ' ' | '\t' | '\n' | '\r' } digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
Symbols wrapped in []
brackets are optional. Symbols wrapped in {}
brackets repeat zero or more times. character
is any character that isn’t a terminator or whitespace. any-character
is any character that isn’t the terminating character for that token.