waforth/src/waforthc
2022-12-18 18:49:23 +01:00
..
.gitignore waforthc,standalone: Add sieve example 2022-12-18 18:49:23 +01:00
Makefile waforthc,standalone: Add sieve example 2022-12-18 18:49:23 +01:00
README.md waforthc: Add link to download page 2022-11-25 19:59:30 +01:00
rt.c waforthc/rt: Fix Linux warning 2022-11-12 17:22:51 +01:00
waforthc.cpp waforthc: Fix compiler flags 2022-11-16 19:53:33 +01:00

waforthc: (WebAssembly-based) Forth native compiler

waforthc uses WAForth, WABT, and the host's C compiler to compile a Forth program into a native executable. WebAssembly is used as the host runtime platform and intermediate representation during compilation, and then compiled into an executable that no longer contains any WebAssembly infrastructure. See How it works for more details.

Download

You can download a pre-built binary from the Releases page.

Usage example

Let's compile a program hello.fs:

$ cat hello.fs↩
.( Compiling word... ) 
: SAY_HELLO  
  ." Hello, Forth" CR 
; 
.( Compiled! Running compiled word from within compiler: )
SAY_HELLO

$ waforthc --output=hello hello.fs↩
Compiling word... Compiled! Running compiled word from within compiler: Hello, Forth

Since the Forth compiler is dynamic, you can execute previously compiled functions from within the compiler, as you can see from the console output above.

By default, the resulting executable is a standalone interactive Forth interpreter, with the compiled functions available:

$ ./hello↩
4 2 * .↩
8 ok
SAY_HELLO↩
Hello, Forth
ok
BYE↩

Instead of an interactive shell that accepts user input, you can also tell waforthc to make the executable run a fixed script when it is started:

$ waforthc --output=hello --init=SAY_HELLO hello.fs↩
Compiling word... Compiled! Running compiled word from within compiler: Hello, Forth

$ ./hello↩
Hello, Forth

If you want an optimized binary, you can pass extra C compiler flags to the build:

$ waforthc --ccflag=-O2 --output=hello --init=SAY_HELLO hello.fs↩

Contrary to the standalone native WAForth, the resulting binary does not contain a WebAssembly engine, and therefore the compiler infrastructure is no longer available:

$ ls -l hello↩
-rwxr-xr-x 1 remko remko 116k Nov 11 13:18 hello

$ ./hello↩
: SAY_BYE ." Bye" CR ;
Compilation is not available in native compiled mode

If you have a cross-compiling C compiler, you can also cross-compile your Forth program to a different architecture:

$ waforthc --cc=arm-linux-gnueabi-gcc --ccflag=-static --ccflag=-O2 --output=hello --init=SAY_HELLO hello.fs↩

How it works

The waforthc compiler (waforthc.cpp) works as follows:

  • waforthc runs an embedded WAForth WebAssembly module using the WABT reference WebAssembly interpreter, with the given Forth input program as input.
  • While interpreting/compiling, WAForth generates new WebAssembly modules for each compiled word. When a new generated WebAssembly module is loaded into the WebAssembly runtime, waforthc keeps track of the raw binary form of the word.
  • When WAForth is finished running the input program, some state is extracted from the runtime, so it can be restored later:
    • The current pointer to the dictionary entry of the compiled word (aka latest).
    • All the data between the initial data stack pointer and the current data stack pointer (which includes new dictionary entries, new strings, and any data stored in the data area)
  • A new WAForth WebAssembly module is constructed, which replicates the state of the runtime after running the input program:
    • The IR representation of the embedded WAForth module is loaded using WABT's module reader
    • The IR representation of every raw binary WebAssembly module generated during compilation is loaded using WABT's module reader. Every module contains 1 function (with 1 corresponding entry into the shared table).
    • The function and table entry of each of these modules are appended to the WAForth module. The index of the function is updated accordingly in the new module
    • A data segment is appended to the new WAForth module, containing the entire data stack portion recorded after compilation
    • The initializer expression of the global variables that contain the end-of-datastack pointer (here) and the pointer to the latest dictionary entry (latest) are updated to reflect the new values
  • The resulting WebAssembly module (containing the entire WAForth system, including all the newly compiled words and data) is converted to C using WABT's WebAssembly-to-C convertor. A C runtime file is also generated to provide implementations of the I/O methods, and to drive the core's run loop. If an initial program is given to waforthc, this is also included statically into the source code, and used as the input for the new WAForth system (instead of the default, standard input).
  • The resulting C program is compiled into a native executable using the platform's C compiler (gcc).

Future work

Currently, all the compiler does is combine the modules generated by WAForth into a single module. It should be easy to do some post-processing on the resulting module to optimize the result:

  • Compiled words use indirect calls to call other words. This causes significant execution overhead. Since in the resulting binary, all words are in the same WebAssembly module, all indirect calls in the generated modules can be replaced by direct calls.

  • A dead-code elimination pass could remove unnecessary words from the WAForth core.

Instead of compiling the resulting module to native, it can also be used in e.g. the web environment.