retroforth/image/retro.muri
crc ae55f1cd45 image: sigil:: is now a hook point
FossilOrigin-Name: 3f5f0a6728041ad5465a46269d358b64a98d1b4424b5a8376c9ff8e592e419dd
2021-06-04 13:37:42 +00:00

1869 lines
37 KiB
Text

# RETRO Core 2021.7
RETRO Core is a minimal Forth implementation for the Nga
virtual machine. Like Nga this is intended to be used within
a larger supporting framework adding I/O and other desired
functionality.
## General Notes on the Source
Rx is developed using a literate tool called *unu*. This allows
easy extraction of fenced code blocks into a separate file for
later compilation. I've found the use of a literate style to be
very beneficial as it makes it easier for me to keep the code
and commentary in sync, and helps me to approach development in
a more structured manner.
This source is written in Muri, an assembler for Nga.
Before going on, I should explain a bit about Nga and Muri.
Nga provides a MISC inspired virtual machine for a dual stack
architecture. There are 30 instructions, with up to four packed
into each memory location (*cell*). The instructions are:
0 nop 5 push 10 ret 15 fetch 20 div 25 zret
1 lit 6 pop 11 eq 16 store 21 and 26 halt
2 dup 7 jump 12 neq 17 add 22 or 27 ienum
3 drop 8 call 13 lt 18 sub 23 xor 28 iquery
4 swap 9 ccall 14 gt 19 mul 24 shift 29 iinvoke
I won't explain them here, but if you're familiar with Forth,
it should be pretty easy to figure out.
Packing of instructions lets me save space, but does require a
little care. Instructions that modify the instruction pointer
should be followed by NOP. These are: JUMP, CALL, CCALL,
RETURN, and ZRET. Additionally, if the instruction bundle
contains a LIT, a value must be in the following cell. (One for
each LIT in the bundle)
The reason for this relates to how Nga processes the opcodes.
To illustrate, assume a stack with a couple of values:
#1 #2
And a function that consumes two values before returning a
some new ones:
: function
i mulire..
d 3
(In the `i` line, the instructions are mul, lit, return, and
nop).
If we were to use this along with an instruction bundle (the
line starting with `i`) like:
: test
i licaadre
r function
(In the `i` line, the instructions are lit, call, add, and
return).
On running the code in `test` Nga will:
(1) push a pointer to `function` to the stack
(2) setup a call to the function
(3) add the top values on the stack (#1 #2),
leaving a single value (#3)
(4) setup a return to the caller
At this point the bundle is done, so control goes to the
called function. But we now have only one value on the stack,
so the stack underflows and Nga will crash.
It's not forbidden by my VM implementation, but John's Impexus
implementation differs here in that it does not process the
instructions after jump, call, ccall, ret, or zret. I really
recommend making sure your bundles don't contain anything after
these instructions to keep things portable and predictable in
behavior.
Muri uses the first two characters of each instruction name
when composing the bundles, with NOP being named as two dots.
So:
lit lit add nop
Is a bundle named:
liliad..
And with two `li` instructions, must be followed by two values.
Muri uses a directive in the first line to tell it what to
expect. Valid directives are:
i instruction bundle
d decimal value
r reference to label
: label
s zero terminated string
## In the Beginning...
Nga expects code to start with a jump to the main entry point.
Rx doesn't really have a main entry point (the top level loop
is assumed to be part of the interface layer), but I allocate
the space for a jump here anyway. This makes it possible to
patch the entry point later, if using an interface that adds
the appropriate I/O functionality.
~~~
i liju....
d -1
~~~
With this, it's time to allocate some data elements. These are
always kept in known locations after the initial jump to ensure
that they can be easily identified and interfaced with external
tools. This is important as Nga allows for a variety of I/O
models to be implemented and I don't want to tie this into any
one specific model.
Here's the initial memory map:
| Offset | Contains |
| ------ | --------------------------- |
| 0 | lit call nop nop |
| 1 | Pointer to main entry point |
| 2 | Dictionary |
| 3 | Heap |
| 4 | RETRO version |
| 5 | Pointer to `interpret` |
| 6 | Pointer to `err:notfound` |
| 7 | Pointer to TIB |
| 8 | Pointer to End of TIB |
~~~
: Dictionary
r 9999
: Heap
d 1536
: Version
d 202107
: references
r interpret
r err:notfound
r TIB
d 1535
~~~
Both of these are pointers. `Dictionary` points to the most
recent dictionary entry. (See the *Dictionary* section at the
end of this file.) `Heap` points to the next free address.
This is hard coded to an address beyond the end of the RETRO
kernel. I adjust this as needed if the kernel grows or shinks
significantly. See the *Interpreter & Compiler* section for
more on this.
## Nga Instruction Set
As mentioned earlier, Nga provides 30 instructions. RETRO begins
the actual coding by assigning most of these to a separate
function. These are not intended for direct use; the compiler
will fetch the opcode values to use from these functions when
compiling. Many will also be exposed in the initial dictionary.
~~~
: _nop
i ........
i re......
: _lit
i li......
i re......
: _dup
i du......
i re......
: _drop
i dr......
i re......
: _swap
i sw......
i re......
: _push
i pu......
i re......
: _pop
i po......
i re......
: _jump
i ju......
i re......
: _call
i ca......
: _ret
i re......
: _eq
i eq......
i re......
: _neq
i ne......
i re......
: _lt
i lt......
i re......
: _gt
i gt......
i re......
: _fetch
i fe......
i re......
: _store
i st......
i re......
: _add
i ad......
i re......
: _sub
i su......
i re......
: _mul
i mu......
i re......
: _divmod
i di......
i re......
: _and
i an......
i re......
: _or
i or......
i re......
: _xor
i xo......
i re......
: _shift
i sh......
i re......
: _zret
i zr......
~~~
Though Nga allows for multiple instructions to be packed into a
single memory location (called a *cell*), RETRO only packs a few
specific combinations.
Since calls and jumps take a value from the stack, a typical
call (in Muri assembly) would look like:
i lica....
r bye
Without packing this takes three cells: one for the lit, one
for the address, and one for the call. Packing drops it to two
since the lit/call combination can be fit into a single cell.
Likewise, I use a packed jump for use with quotations. These
saves several hundred cells (and thus fetch/decode cycles) when
loading the standard library.
The raw values for these are:
2049 lica....
1793 liju....
These are hardcoded in a few places later. I had previously
used a lookup, but this proved costly in processing time, so
hard coding proved better. (These places are clearly marked)
## Memory
Memory is a big, flat, linear array. The addressing starts at
zero and counts upwards towards a fixed upper limit (set by the
VM).
The basic memory accesses are handled via `fetch` and `store`.
The next two functions provide easier access to sequences of
data by fetching or storing a value and returning the next
address.
`fetch-next` takes an address and fetches the stored value. It
returns the next address and the stored value.
:fetch-next dup #1 + swap fetch ;
~~~
: fetch-next
i duliadsw
d 1
i fere....
~~~
`store-next` takes a value and an address. It stores the value
to the address and returns the next address.
:store-next dup #1 + push store pop ;
~~~
: store-next
i duliadpu
d 1
i stpore..
~~~
## Conditionals
The RETRO kernel provides three conditional forms:
flag true-pointer false-pointer choose
flag true-pointer if
flag false-pointer -if
`choose` is a conditional combinator which will execute one of
two functions, depending on the state of a flag.
This will treat 0 as false, and non-zero as true.
~~~
: choice:true
d 0
: choice:false
d 0
: choose
i listlist
r choice:false
r choice:true
i lineliad
d 0
r choice:false
i feca....
i re......
~~~
Next the two *if* forms. Note that `-if` falls into `if`. This
saves two cells of memory.
:-if push #0 eq? pop \cc...... ;
:if \cc...... ;
~~~
: -if
i pulieqpo
d 0
: if
i cc......
i re......
~~~
## Strings
The kernel needs two basic string operations for dictionary
searches: obtaining the length and comparing for equality.
Strings in RETRO are zero terminated. This is a bit less elegant
than counted strings, but the implementation is quick and easy.
First up, string length. The process here is trivial:
* Make a copy of the starting point
* Fetch each character, comparing to zero
* If zero, break the loop
* Otherwise discard and repeat
* When done subtract the original address from the current one
* Then subtract one (to account for the zero terminator)
:count repeat fetch-next 0; drop again ;
:s:length dup count #1 - swap - ;
~~~
: count
i lica....
r fetch-next
i zr......
i drliju..
r count
: s:length
i dulica..
r count
i lisuswsu
d 1
i re......
~~~
String comparisons are harder. In high level code this is:
dup fetch push n:inc swap
dup fetch push n:inc pop dup pop
-eq? [ drop-pair drop #0 pop pop drop drop ]
[ 0; drop s:eq? pop pop drop drop ] choose drop-pair #-1 ;
I've rewritten this a few times. The current implementation is
fast enough, and not overly long. It may be worth looking into
a hash based comparsion in the future.
~~~
: mismatch
i drdrdrli
d 0
i popodrdr
i re......
: matched
i zr......
i drlica..
r s:eq
i popodrdr
i re......
: s:eq
i dufepuli
d 1
i adswdufe
i puliadpo
d 1
i duponeli
r mismatch
i lilica..
r matched
r choose
i drdrlire
d -1
~~~
## Interpreter & Compiler
### Compiler Core
The heart of the compiler is `comma` which stores a value into
memory and increments a variable (`Heap`) pointing to the next
free address.
:, &Heap fetch store-next &Heap store ;
~~~
: comma
i lifelica
r Heap
r store-next
i listre..
r Heap
~~~
I also add a couple of additional forms. `comma:opcode` is used
to compile VM instructions into the current defintion. This is
where those functions starting with an underscore come into
play. Each wraps a single instruction. Using this I can avoid
hard coding the opcodes.
This performs a jump to the `comma` word instead of using a
`call/ret` to save a cell and slightly improve performance. I
will use this technique frequently.
:comma:opcode fetch , ;
~~~
: comma:opcode
i feliju..
r comma
~~~
`comma:string` is used to compile a string into the current
definition. As with `comma:opcode`, this uses a `jump` to
eliminate the final tail call.
:($) fetch-next 0; , ($) ;
:s, ($) drop 0 , ;
~~~
: ($)
i lica....
r fetch-next
i zr......
i lica....
r comma
i liju....
r ($)
: comma:string
i lica....
r ($)
i drliliju
d 0
r comma
~~~
With the core functions above it's now possible to setup a few
more things that make compilation at runtime more practical.
First, a variable indicating whether we should compile or run a
function. In traditional Forth this would be **STATE**; I call
it `Compiler`.
This will be used by the *word classes*.
~~~
: Compiler
d 0
~~~
Next is *semicolon*; which compiles the code to terminate a
function and sets the `Compiler` to an off state (0). This
just needs to compile in a RET.
:compiler:off #0 &Compiler store ;
:; &_ret comma:opcode compiler:off ; immediate
~~~
: ;
i lilica..
r _ret
r comma:opcode
: compiler:off
i lilistre
d 0
r Compiler
~~~
### Word Classes
RETRO is built over the concept of *word classes*. Word classes
are a way to group related words, based on their compilation
and execution behaviors. A *class handler* function is defined
to handle an execution token passed to it on the stack.
RETRO provides several classes with differing behaviors:
`class:data` provides for dealing with data structures.
| interpret | compile |
| -------------------- | ----------------------------- |
| leave value on stack | compile value into definition |
:class:data &Compiler fetch 0; drop &_lit comma:opcode comma ;
~~~
: class:data
i lifezr..
r Compiler
i drlilica
r _lit
r comma:opcode
i liju....
r comma
~~~
`class:word` handles most functions.
| interpret | compile |
| -------------------- | ----------------------------- |
| call a function | compile a call to a function |
~~~
: class:word:compile
i lilica..
d 2049
r comma
i liju....
r comma
: class:word
i lifelili
r Compiler
r class:word:compile
r class:macro
i liju....
r choose
~~~
`class:primitive` is a special class handler for functions that
correspond to Nga instructions.
| interpret | compile |
| -------------------- | ------------------------------------ |
| call the function | compile an instruction |
~~~
: class:primitive
i lifelili
r Compiler
r comma:opcode
r class:macro
i liju....
r choose
~~~
`class:macro` is the class handler for *compiler macros*. These
are functions that always get called. They can be used to
extend the language in interesting ways.
| interpret | compile |
| -------------------- | ----------------------------- |
| call the function | call the function |
~~~
: class:macro
i ju......
~~~
The class mechanism is not limited to these classes. You can
write custom classes at any time. On entry the custom handler
should take the XT passed on the stack and do something with
it. Generally the handler should also check the `Compiler`
state to determine what to do in either interpretation or
compilation.
### Dictionary
RETRO has a single dictionary consisting of a linked list of
headers. The current form of a header is shown in the chart
below.
| field | holds | accessor |
| ----- | ---------------------------------- | -------- |
| link | link to the previous entry | d:link |
| xt | link to start of the function | d:xt |
| class | link to the class handler function | d:class |
| name | zero terminated string | d:name |
The initial dictionary is constructed at the end of this file.
It'll take a form like this:
: 0000
d 0
r _dup
r class:primitive
s dup
: 0001
r 0000
r _drop
r class:primitive
s drop
: 0002
r 0001
r _swap
r class:primitive
s swap
Each entry starts with a pointer to the prior entry (with a
pointer to zero marking the first entry in the dictionary), a
pointer to the start of the function, a pointer to the class
handler, and a null terminated string indicating the name
exposed to the RETRO interpreter.
RETRO stores the pointer to the most recent entry in a variable
called `Dictionary`. For simplicity, I just assign the last
entry an arbitrary label of 9999. This is set at the start of
the source. (See *In the Beginning...*)
RETRO provides accessor functions for each field. Since the number
of fields (or their ordering) may change over time, using these
reduces the number of places where field offsets are hard coded.
~~~
: d:link
i re......
: d:xt
i liadre..
d 1
: d:class
i liadre..
d 2
: d:name
i liadre..
d 3
~~~
A traditional Forth has `create` to make a new dictionary entry
pointing to the next free location in `Heap`. RETRO has
`d:add-header` which serves as a slightly more flexible base.
You provide a string for the name, a pointer to the class
handler, and a pointer to the start of the function. RETRO does
the rest.
In actual practice, I never use this outside of RETRO. New words
are made using the `:` sigil, or `d:create` (once defined in
the standard library). At some point I may simplify this by
moving `d:create` into Rx and using it in place of `d:add-header`.
~~~
: d:add-header
i liju....
r _add-header
: _add-header
i lifepuli
r Heap
r Dictionary
i felica..
r comma
i lica....
r comma
i lica....
r comma
i lica....
r comma:string
i polistre
r Dictionary
~~~
RETRO doesn't provide a traditional create as it's designed to
avoid assuming a normal input stream and prefers to take its
data from the stack.
### Dictionary Search
~~~
: Which
d 0
: Needle
d 0
: found
i listlire
r Which
r _nop
: find
i lilistli
d 0
r Which
r Dictionary
i fe......
: find_next
i zr......
i dulica..
r d:name
i lifelica
r Needle
r s:eq
i licc....
r found
i feliju..
r find_next
: d:lookup
i listlica
r Needle
r find
i lifere..
r Which
~~~
### Number Conversion
This code converts a zero terminated string into a number. The
approach is very simple:
* Store an internal multiplier value (-1 for negative, 1 for
positive)
* Clear an internal accumulator value
* Loop:
* Fetch the accumulator value
* Multiply by 10
* For each character, convert to a numeric value and add to
the accumulator
* Store the updated accumulator
* When done, take the accumulator value and the modifier and
multiply them to get the final result
RETRO only supports decimal numbers. If you want more bases, it's
pretty easy to add them later, but it's not needed in the base
kernel.
~~~
: Base
d 10
: adjust
i lisure..
d 7
: next
i lica....
r fetch-next
i zr......
i duligtli
d 58
r adjust
i cc......
i lisuswpu
d 48
i swlifemu
r Base
i adpoliju
r next
: check
i dufelieq
d 45
i zr......
i drswdrli
d -1
i swliadre
d 1
: s:to-number
i liswlica
d 1
r check
i liswlica
d 0
r next
i drmure..
~~~
### Token Processing
An input token has a form like:
<sigil>string
RETRO will check the first character to see if it matches a
known sigil. If it does, it will pass the string (sans sigil)
to the sigil handler. If not, it will attempt to find the token
in the dictionary.
Sigils are handled by functions with specific naming
conventions. A sigil name should be:
sigil:<sigil>
Where <sigil> is the character for the sigil. These are
compiler macros (using the `class:macro` class) and watch the
`Compiler` to decide how to deal with the token. To find a
sigil, RETRO stores the sigil character into a string named
`sigil`. It then searches for this string in the dictionary.
If found, it sets an internal variable (`sigil:handler`) to
the dictionary entry for the handler function. If not found,
`sigil:handler` is set to zero. The check, done by `sigil?`,
also returns a flag.
'_ 'sigil:no s:const
'sigil:_ 'sigil s:const
'sigil:handler var
~~~
: sigil:no
d 32
d 0
: sigil:handler
d 0
: sigil
s sigil:_
~~~
:sigil:prepare
fetch &sigil #6 + store ;
~~~
: sigil:prepare
i feliliad
r sigil
d 6
i stre....
~~~
:sigil:has-token?
dup s:length #1 0; drop-pair &sigil:no ;
~~~
: sigil:has-token?
i dulica..
r s:length
i lieqzr..
d 1
i drdrlire
r sigil:no
~~~
:sigil?
sigil:has-token? sigil:prepare &sigil d:lookup
dup &sigil:handler store #0 -eq? ;
~~~
: sigil?
i lica....
r sigil:has-token?
i lica....
r sigil:prepare
i lilica..
r sigil
r d:lookup
i dulistli
r sigil:handler
d 0
i nere....
~~~
RETRO makes extensive use of sigils for implementing major
parts of the language, including parsing numbers (`#` sigil),
obtaining pointers (`&`), and defining functions (using the
`:` sigil).
| sigil | used for | example |
| ------ | ----------------- | ------- |
| # | numbers | #100 |
| $ | ASCII characters | $e |
| & | pointers | &swap |
| : | definitions | :foo |
| ( | Comments | (n-) |
:sigil:( (s-) drop ;
~~~
: sigil:(
i drre....
~~~
:sigil:# (s-n) s:to-number class:data ;
~~~
: sigil:#
i lica....
r s:to-number
i liju....
r class:data
~~~
:sigil:$ (s-c) fetch class:data ;
~~~
: sigil:$
i feliju..
r class:data
~~~
:sigil:: (s-)
&class:word &Heap fetch d:add-header
&Heap fetch &Dictionary d:xt store
#-1 &Compiler store ;
~~~
: sigil::
i liju....
r (sigil::)
: (sigil::)
i lilifeli
r class:word
r Heap
r d:add-header
i ca......
i lifelife
r Heap
r Dictionary
i lica....
r d:xt
i st......
: compiler:on
i lilistre
d -1
r Compiler
~~~
The `&` sigil is used to return the address of a named item.
This will correspond to the `d:xt` field of the word header.
In higher level Retro this would be:
:sigil:& (s-a) d:lookup d:xt fetch class:data ;
As an example:
#1 #2 &+ call (call_`+`_via_pointer)
&Heap fetch (fetch_from_`Heap`)
In the latter case, the use of `class:data` means that I
don't *need* to use the `&` sigil, but I do this anyway as
I find it helps to provide a visual clue as to the intent of
the code.
This is also useful with combinators. If you are only using
a single word, using `&word` instead of `[ word ]` will be
smaller and faster.
For those familiar with traditional Forth, the `&` sigil
replaces both `'` and `[']`.
:sigil:& d:lookup d:xt fetch class:data ;
~~~
: sigil:&:get-xt
i lica....
r d:xt
i fere....
: sigil:&
i lica....
r d:lookup
i dulineli
d 0
r sigil:&:get-xt
i cc......
i liju....
r class:data
~~~
### Quotations
Quotations are anonymous, nestable blocks of code. RETRO uses
them for control structures and some aspects of data flow. A
quote takes a form like:
[ #1 #2 ]
#12 [ square #144 eq? [ #123 ] [ #456 ] choose ] call
Begin a quotation with `[` and end it with `]`. The code here
is slightly complicated by the fact that these have to be
nestable, and so must compile the appropriate jumps around
the nested blocks, in addition to properly setting and
restoring the `Compiler` state.
~~~
: [
i lifeliad
r Heap
d 2
i lifelica
r Compiler
r compiler:on
i lilica..
d 1793
r comma
i lifelili
r Heap
d 0
r comma
i ca......
i lifere..
r Heap
: ]
i lilica..
r _ret
r comma:opcode
i lifeswli
r Heap
r _lit
i lica....
r comma:opcode
i lica....
r comma
i swstlist
r Compiler
i lifezr..
r Compiler
i drdrre..
~~~
## Lightweight Control Structures
RETRO provides a couple of functions for simple flow control
apart from using quotations. These are `repeat`, `again`, and
`0;`. An example of using them:
:s:length
dup [ repeat fetch-next 0; drop again ] call
swap - #1 - ;
These can only be used within a definition or quotation. If you
need to use them interactively, wrap them in a quote and `call`
it.
~~~
: repeat
i lifere..
r Heap
: again
i lilica..
r _lit
r comma:opcode
i lica....
r comma
i liliju..
r _jump
r comma:opcode
: 0;
i liliju..
r _zret
r comma:opcode
~~~
I take a brief aside here to implement `push` and `pop`, which
move a value to/from the address stack. These are compiler
macros.
~~~
: push
i liliju..
r _push
r comma:opcode
: pop
i liliju..
r _pop
r comma:opcode
~~~
## Interpreter
The *interpreter* is what processes input. What it does is:
* Take a string
* See if the first character has a sigil handler
* Yes: pass the rest of the string to the sigil handler
* No: lookup in the dictionary
* Found: pass xt of word to the class handler
* Not found: report error via `err:notfound`
First, the handler for dealing with words that are not found.
This is defined here as a jump to the handler for the Nga *NOP*
instruction. It is intended that this be hooked into and changed.
As an example, in RETRO code, assuming an I/O interface with some
support for strings and output:
[ $? c:put sp 'word_not_found s:put ]
&err:notfound #1 + store
An interface should either patch the jump, or catch it and do
something to report the error.
~~~
: err:notfound
i liju....
r _nop
i re......
~~~
`call:dt` takes a dictionary token and pushes the contents of
the `d:xt` field to the stack. It then calls the class handler
stored in `d:class`.
~~~
: call:dt
i dulica..
r d:xt
i feswlica
r d:class
i feju....
~~~
~~~
: input:source
d 0
: interpret:sigil
i lifezr..
r sigil:handler
i lifeliad
r input:source
d 1
i swliju..
r call:dt
: interpret:word
i lifeliju
r Which
r call:dt
: interpret:nosigil
i lifelica
r input:source
r d:lookup
i linelili
d 0
r interpret:word
r err:notfound
i liju....
r choose
: interpret
i liju....
r _interpret
: _interpret
i dulistli
r input:source
r sigil?
i ca......
i lililiju
r interpret:sigil
r interpret:nosigil
r choose
~~~
## Muri
Muri is my minimalist assembler for Nga. This is an attempt to
implement something similar in Retro.
This requires some knowledge of the Nga architecture to be
useful. The major elements are:
**Instruction Set**
Nga has 30 instructions. These are:
0 nop 5 push 10 ret 15 fetch 20 div 25 zret
1 lit 6 pop 11 eq 16 store 21 and 26 halt
2 dup 7 jump 12 neq 17 add 22 or 27 ienum
3 drop 8 call 13 lt 18 sub 23 xor 28 iquery
4 swap 9 ccall 14 gt 19 mul 24 shift 29 iinvoke
The mnemonics allow for each name to be reduced to just two
characters. In the same order as above:
0 .. 5 pu 10 re 15 fe 20 di 25 zr
1 li 6 po 11 eq 16 st 21 an 26 ha
2 du 7 ju 12 ne 17 ad 22 or 27 ie
3 dr 8 ca 13 lt 18 su 23 xo 28 iq
4 sw 9 cc 14 gt 19 mu 24 sh 29 ii
Up to four instructions can be packed into a single memory
location. (You can only use *no*p after a *ju*mp, *ca*ll,
*cc*all, *re*t, or *zr*et as these alter the instruction
pointer.)
So a bundled sequence like:
lit 100
lit 200
add
ret
Would look like:
'liliadre i
100 d
200 d
And:
lit s:eq?
call
Would become:
'lica.... i
's:eq? r
Note the use of `..` instead of `no` for the nop's; this is
done to improve readability a little.
Instruction bundles are specified as strings, and are converted
to actual instructions by the `i` word. As in the standard Muri
assembler, the RETRO version uses `d` for decimal values and `r`
for references to named functions.
The instructions table holds a hash value for each instruction
name. Lookups return the index of the instruction, which will
match the actual opcode. A value of zero marks the end of the
table.
Hashes are the two ASCII values multiplied together. E.g., for
`ha`, the hash is `#104 #97 *`.
This table can be regenerated using RETRO:
{ '.. 'li 'du 'dr 'sw 'pu 'po 'ju 'ca 'cc 're 'eq 'ne 'lt
'gt 'fe 'st 'ad 'su 'mu 'di 'an 'or 'xo 'sh 'zr 'ha 'ie
'iq 'ii }
[ fetch-next swap fetch * 'd_ s:put n:put nl ] a:for-each
~~~
: Instructions
d 2116
d 11340
d 11700
d 11400
d 13685
d 13104
d 12432
d 12402
d 9603
d 9801
d 11514
d 11413
d 11110
d 12528
d 11948
d 10302
d 13340
d 9700
d 13455
d 12753
d 10500
d 10670
d 12654
d 13320
d 11960
d 13908
d 10088
d 10605
d 11865
d 11025
d 0
~~~
The `r` word resolves a reference and commas it into memory.
:r d:lookup d:xt fetch , ;
~~~
: muri:r
i lica....
r d:lookup
i liadfe..
d 1
i liju....
r comma
~~~
The `i` word is used to assemble an instruction bundle and comma
it into memory. This is basically:
:i
dup get-opcode
swap #2 + dup get-opcode
swap #2 + dup get-opcode
swap #2 + get-opcode
#-24 shift swap #-16 shift +
swap #-8 shift + + , ;
The actual implementation here is a bit different due to memory
constraints, but the above should make it fairly clear how this
works.
~~~
: muri:i
i dulica..
r get-opcode
i lica....
r (muri:i)
i lica....
r (muri:i)
i swliadli
d 2
r get-opcode
i ca......
i lishswli
d -24
d -16
i shadswli
d -8
i shadad..
i liju....
r comma
~~~
I use `name-to-id` to do a very crude hash of the instruction
names.
:name-to-id dup fetch swap #1 + fetch * ;
:name-to-id fetch-next swap fetch * ;
~~~
: name-to-id
i dufeswli
d 1
i adfemure
~~~
The `opcode-lookup` searches the `Instructions` table for a
hashed opcode name.
:opcode-lookup
repeat fetch-next 0; swap push over -eq? pop swap 0; drop again ;
~~~
: opcode-lookup
i lica....
r fetch-next
i zr......
i swpupudu
i poswnepo
i swzr....
i drliju..
r opcode-lookup
~~~
Inner bits of the `i` word, factored out to save space. The
interesting bit here is `get-opcode` which turns the returned
address in the `Instructions` table to an opcode number.
:get-opcode
name-to-id id-to-opcode &Instructions opcode-lookup
nip &Instructions - #1 - ;
:muri:i
swap #2 + dup get-opcode ;
~~~
: (muri:i)
i swliaddu
d 2
: get-opcode
i lica....
r name-to-id
: id-to-opcode
i lilica....
r Instructions
r opcode-lookup
i swdrlisu
r Instructions
i lisure..
d 1
~~~
## The Initial Dictionary
This sets up the initial dictionary. Maintenance of this bit is
annoying, but it isn't necessary to change this unless you add
or remove new functions in the kernel.
~~~
: 0000
d 0
r _dup
r class:primitive
s dup
: 0001
r 0000
r _drop
r class:primitive
s drop
: 0002
r 0001
r _swap
r class:primitive
s swap
: 0003
r 0002
r _call
r class:primitive
s call
: 0004
r 0003
r _eq
r class:primitive
s eq?
: 0005
r 0004
r _neq
r class:primitive
s -eq?
: 0006
r 0005
r _lt
r class:primitive
s lt?
: 0007
r 0006
r _gt
r class:primitive
s gt?
: 0008
r 0007
r _fetch
r class:primitive
s fetch
: 0009
r 0008
r _store
r class:primitive
s store
: 0010
r 0009
r _add
r class:primitive
s +
: 0011
r 0010
r _sub
r class:primitive
s -
: 0012
r 0011
r _mul
r class:primitive
s *
: 0013
r 0012
r _divmod
r class:primitive
s /mod
: 0014
r 0013
r _and
r class:primitive
s and
: 0015
r 0014
r _or
r class:primitive
s or
: 0016
r 0015
r _xor
r class:primitive
s xor
: 0017
r 0016
r _shift
r class:primitive
s shift
: 0018
r 0017
r push
r class:macro
s push
: 0019
r 0018
r pop
r class:macro
s pop
: 0020
r 0019
r 0;
r class:macro
s 0;
: 0021
r 0020
r fetch-next
r class:word
s fetch-next
: 0022
r 0021
r store-next
r class:word
s store-next
: 0023
r 0022
r s:to-number
r class:word
s s:to-number
: 0024
r 0023
r s:eq
r class:word
s s:eq?
: 0025
r 0024
r s:length
r class:word
s s:length
: 0026
r 0025
r choose
r class:word
s choose
: 0027
r 0026
r if
r class:primitive
s if
: 0028
r 0027
r -if
r class:word
s -if
: 0029
r 0028
r sigil:(
r class:macro
s sigil:(
: 0030
r 0029
r Compiler
r class:data
s Compiler
: 0031
r 0030
r Heap
r class:data
s Heap
: 0032
r 0031
r comma
r class:word
s ,
: 0033
r 0032
r comma:string
r class:word
s s,
: 0034
r 0033
r ;
r class:macro
s ;
: 0035
r 0034
r [
r class:macro
s [
: 0036
r 0035
r ]
r class:macro
s ]
: 0037
r 0036
r Dictionary
r class:data
s Dictionary
: 0038
r 0037
r d:link
r class:word
s d:link
: 0039
r 0038
r d:xt
r class:word
s d:xt
: 0040
r 0039
r d:class
r class:word
s d:class
: 0041
r 0040
r d:name
r class:word
s d:name
: 0042
r 0041
r class:word
r class:word
s class:word
: 0043
r 0042
r class:macro
r class:word
s class:macro
: 0044
r 0043
r class:data
r class:word
s class:data
: 0045
r 0044
r d:add-header
r class:word
s d:add-header
: 0046
r 0045
r sigil:#
r class:macro
s sigil:#
: 0047
r 0046
r sigil::
r class:macro
s sigil::
: 0048
r 0047
r sigil:&
r class:macro
s sigil:&
: 0049
r 0048
r sigil:$
r class:macro
s sigil:$
: 0050
r 0049
r repeat
r class:macro
s repeat
: 0051
r 0050
r again
r class:macro
s again
: 0052
r 0051
r interpret
r class:word
s interpret
: 0053
r 0052
r d:lookup
r class:word
s d:lookup
: 0054
r 0053
r class:primitive
r class:word
s class:primitive
: 0055
r 0054
r Version
r class:data
s Version
: 0056
r 0055
r muri:i
r class:word
s i
: 0057
r 0056
r comma
r class:word
s d
: 0058
r 0057
r muri:r
r class:word
s r
: 0059
r 0058
r Base
r class:data
s Base
: 9999
r 0059
r err:notfound
r class:word
s err:notfound
~~~
~~~
: TIB
~~~
## Appendix: Words, Stack Effects, and Usage
| Word | Stack | Notes |
| --------------- | --------- | ------------------------------------------------- |
| dup | n-nn | Duplicate the top item on the stack |
| drop | nx-n | Discard the top item on the stack |
| swap | nx-xn | Switch the top two items on the stack |
| call | p- | Call a function (via pointer) |
| eq? | nn-f | Compare two values for equality |
| -eq? | nn-f | Compare two values for inequality |
| lt? | nn-f | Compare two values for less than |
| gt? | nn-f | Compare two values for greater than |
| fetch | p-n | Fetch a value stored at the pointer |
| store | np- | Store a value into the address at pointer |
| + | nn-n | Add two numbers |
| - | nn-n | Subtract two numbers |
| * | nn-n | Multiply two numbers |
| /mod | nn-mq | Divide two numbers, return quotient and remainder |
| and | nn-n | Perform bitwise AND operation |
| or | nn-n | Perform bitwise OR operation |
| xor | nn-n | Perform bitwise XOR operation |
| shift | nn-n | Perform bitwise shift |
| fetch-next | a-an | Fetch a value and return next address |
| store-next | na-a | Store a value to address and return next address |
| push | n- | Move value from data stack to address stack |
| pop | -n | Move value from address stack to data stack |
| 0; | n-n OR n- | Exit word (and `drop`) if TOS is zero |
| s:to-number | s-n | Convert a string to a number |
| s:eq? | ss-f | Compare two strings for equality |
| s:length | s-n | Return length of string |
| choose | fpp-? | Execute *p1* if *f* is -1, or *p2* if *f* is 0 |
| if | fp-? | Execute *p* if flag *f* is true (-1) |
| -if | fp-? | Execute *p* if flag *f* is false (0) |
| Compiler | -p | Variable; holds compiler state |
| Heap | -p | Variable; points to next free memory address |
| , | n- | Compile a value into memory at `here` |
| s, | s- | Compile a string into memory at `here` |
| ; | - | End compilation and compile a *return* instruction|
| [ | - | Begin a quotation |
| ] | - | End a quotation |
| Dictionary | -p | Variable; points to most recent header |
| d:link | p-p | Given a DT, return the address of the link field |
| d:xt | p-p | Given a DT, return the address of the xt field |
| d:class | p-p | Given a DT, return the address of the class field |
| d:name | p-p | Given a DT, return the address of the name field |
| class:word | p- | Class handler for standard functions |
| class:primitive | p- | Class handler for Nga primitives |
| class:macro | p- | Class handler for immediate functions |
| class:data | p- | Class handler for data |
| d:add-header | saa- | Add an item to the dictionary |
| sigil:# | s- | # sigil for numbers |
| sigil:: | s- | : sigil for definitions |
| sigil:& | s- | & sigil for pointers |
| sigil:$ | s- | $ sigil for ASCII characters |
| sigil:( | s- | ( sigil for comments |
| repeat | -a | Start an unconditional loop |
| again | a- | End an unconditional loop |
| interpret | s-? | Evaluate a token |
| d:lookup | s-p | Given a string, return the DT (or 0 if undefined) |
| Version | -a | Variable; holds a version identifier |
| err:notfound | - | Handler for token not found errors |
| Base | -a | Variable; holds current numeric base |
## Legalities
RETRO Core is Copyright (c) 2016-2021, Charles Childers
Permission to use, copy, modify, and/or distribute this software
for any purpose with or without fee is hereby granted, provided
that the above copyright notice and this permission notice
appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
My thanks go out to Michal J Wallace, Luke Parrish, JGL, Marc
Simpson, Oleksandr Kozachuk, Jay Skeer, Greg Copeland, Aleksej
Saushev, Foucist, Erturk Kocalar, Kenneth Keating, Ashley
Feniello, Peter Salvi, Christian Kellermann, Jorge Acereda,
Remy Moueza, John M Harrison, and Todd Thomas.
All of these great people helped in the development of RETRO 10
and 11, without which Rx wouldn't have been possible.