waforth/README.md
Remko Tronçon 24ed2a3013 Tweak README
2019-11-08 22:50:31 +01:00

5.2 KiB

WAForth: Forth Interpreter+Compiler for WebAssembly

WAForth is a bootstrapping Forth interpreter and dynamic compiler for WebAssembly. You can see it in a demo here.

It is entirely written in (raw) WebAssembly, and the compiler generates WebAssembly code on the fly. The only parts for which it relies on external (JavaScript) code is to dynamically load modules (since WebAssembly doesn't support JIT yet), and the I/O primitives to read and write a character to a screen.

Parts of the implementation were influenced by jonesforth.

WAForth is still in an experimental stage. It implements most of the ANS Core Words, and passes most of the Forth 200x Test Suite core word tests.

Install Dependencies

The build uses the WebAssembly Binary Toolkit for converting raw WebAssembly text format into the binary format, and Yarn for managing the dependencies of the shell.

brew install wabt yarn
yarn

Building & Running

To build everything:

make

To run the development server:

make dev-server

Testing

The tests are served from /tests by the development server.

You can also run the tests in Node.JS by running

make check

Design

The Preprocessor

The WAForth core is written as a single module in WebAssembly's text format. The text format isn't really meant for writing code in, so it has no facilities like a real assembler (e.g. constant definitions, macro expansion, ...). To help with maintenance, the WebAssembly file is piped through a simple string preprocessing script that replaces constants with defined values.

The Interpreter

The interpreter runs a loop that processes commands, and switches to and from compiler mode.

Contrary to some other Forth systems, this system doesn't use direct threading for executing code. WebAssembly doesn't allow unstructured jumps, let alone dynamic jumps. Instead, WAForth uses subroutine threading, where each word is implemented as a single WebAssembly function, and the system uses calls and indirect calls (see below) to execute words.

The Compiler

While in compile mode for a word, the compiler generates WebAssembly instructions in binary format (since there is no assembler infrastructure in the browser). Since WebAssembly doesn't support JIT compilation yet, a finished word is bundled into a separate binary WebAssembly module, and sent to the loader, which dynamically loads it and registers it with a shared function table at the next offset, which in turn is recorded in the word dictionary.

Because words reside in different modules, all calls to and from the words need to happen as indirect call_indirect calls through the shared function table. This of course introduces some overhead, although it appears limited.

As WebAssembly doesn't support unstructured jumps, control flow words (IF/ELSE/THEN, LOOP, REPEAT, ...) can't be implemented in terms of more basic words, unlike in jonesforth. However, since Forth only requires structured jumps, the compiler can easily be implemented using the loop and branch instructions available in WebAssembly.

Finally, the compiler adds minimal debug information about the compiled word in the name section, making it easier for doing some debugging in the browser.

Debugger view of a compiled
word

The Loader

The loader is a small bit of JavaScript that uses the WebAssembly JavaScript API to dynamically load a compiled word (in the form of a WebAssembly module), and ensuring that the shared function table is large enough for the module to register itself.

The Shell

The shell is a JavaScript class that wraps the WebAssembly module, and loads it in the browser. It provides the I/O primitives to the WebAssembly module to read and write characters to a terminal, and externally provides a run() function to execute a fragment of Forth code.

To tie everything together into an interactive system, there's a small console-based interface around this shell to type Forth code, which you can see in action here.

WAForth Console

Misc notes

  • The exposed return stack isn't used. Control flow is kept implicitly in the code (e.g. through branches, indirect calls, ...). This also means that control flow can't be influenced by code.