mirror of
https://github.com/antirez/aocla
synced 2025-01-13 20:01:40 +01:00
README improved.
This commit is contained in:
parent
788c6fd7b3
commit
cc2c34dddd
1 changed files with 107 additions and 106 deletions
213
README.md
213
README.md
|
@ -1,20 +1,22 @@
|
|||
## Aocla: the Advent of Code toy language
|
||||
|
||||
Aocla (Advent of Code inspired Language) is a toy stack-based programming
|
||||
language written as an extension of [day 13 Advent of Code 2022 puzzle](https://adventofcode.com/2022/day/13).
|
||||
|
||||
It all started with me doing Advent of Code for the first time in my life. I hadn't written a line of code for two years, busy, as I was, writing my [sci-fi novel](https://www.amazon.com/Wohpe-English-Rimmel-Salvatore-Sanfilippo-ebook/dp/B0BQ3HRDPF/). I felt I needed to start coding again, but I was without a project in my hands. The AoC puzzles helped quite a lot, at first, but they tend to become repetitive and a bit futile after some time. Then something interesting happened. After completing day 13, a puzzle about comparing nested lists, I saw many other solutions resorting to `eval`. They are missing the point, I thought. To me, the puzzle seemed an hint at writing parsers for nested objects.
|
||||
This story starts with me doing Advent of Code for the first time in my life. I hadn't written a single line of code for two years, busy, as I was, writing my [sci-fi novel](https://www.amazon.com/Wohpe-English-Rimmel-Salvatore-Sanfilippo-ebook/dp/B0BQ3HRDPF/). I wanted to start coding again, but without a project in my hands, what to do? The AoC puzzles helped quite a lot, at first, but they become repetitive and a bit futile quite soon. After completing day 13, a puzzle about comparing nested lists, I saw many other solutions resorting to `eval`. They are missing the point, I thought. To me, the puzzle seemed a hint at writing parsers for nested objects.
|
||||
|
||||
Now, a nice fact about parsers of lists with integers and nested
|
||||
lists is that they are dangerously near, if written in the proper way, to become interpreters of Lisp-alike or FORTH-alike toy programming languages.
|
||||
lists is that they are dangerously near to become interpreters of Lisp-alike or FORTH-alike programming languages.
|
||||
|
||||
The gentle reader should be aware that I've a soft spot for [little languages](http://oldblog.antirez.com/page/picol.html). However, Picol was too much of a toy, while [Jim](http://jim.tcl.tk/index.html/doc/www/www/index.html) was too big as a coding example. I also like writing small programs that serve as [examples](https://github.com/antirez/kilo) of how you could design bigger programs, while retaining a manageable size. Don't took me wrong: it's not like I believe my code should be taken as an example, it's just that I learned a lot from such small programs, so, from time to time, I like writing new ones and sharing them. This time I wanted to obtain something of roughly the size of the Kilo editor, that is around ~1000 lines of code, showing the real world challenges arising when writing an actual interpreter for a programming language more complex than Picol. That's the result, and it worked for me: after Aocla I started writing more and more code, and now [I've a project, too](https://github.com/antirez/protoview).
|
||||
The gentle reader should be aware that I've a soft spot for [little languages](http://oldblog.antirez.com/page/picol.html). However, Picol was too much of a toy, while [Jim](http://jim.tcl.tk/index.html/doc/www/www/index.html) was too big as a coding example. Other than interpreters, I like writing small programs that serve as [examples](https://github.com/antirez/kilo) of how you could design bigger programs, while retaining a manageable size. Don't take me wrong: it's not like I believe my code should be taken as an example, it's just that I learned a lot from such small programs, so, from time to time, I like writing new ones, and while I'm at it I share them in the hope somebody could be interested. This time I wanted to obtain something of roughly the size of the Kilo editor, that is around ~1000 lines of code, showing the real world challenges arising when writing an actual interpreter for a programming language more complex than Picol. That's the result, and as I side effect I really started coding again: after Aocla I started writing more and more code, and now [I've a new project, too](https://github.com/antirez/protoview).
|
||||
|
||||
## Let's start
|
||||
|
||||
This README will first explain the language briefly. Later we will talk extensively about the implementation and its design. Without counting comments, the Aocla implementation is less than 1000 lines of code, and the core itself is around 500 lines (the rest of the code is the library implementation, the REPL, and other accessory parts): I hope you will find the code easy to follow even if you are not used to C and to writing interpreters. I tried to keep all simple, as I always do when I write code, for myself and the others having the misfortune of modifying it in the future.
|
||||
This README will first explain the language briefly. Later we will talk extensively about the implementation and its design. Without counting comments, the Aocla implementation is shorter than 1000 lines of code, and the core itself is around 500 lines (the rest of the code is the library implementation, the REPL, and other accessory parts). I hope you will find the code easy to follow, even if you are not used to C and to writing interpreters. I tried to keep all simple, as I always do when I write code, for myself and the others having the misfortune of reading or modifying it in the future.
|
||||
|
||||
Not every feature I desired to have is implemented, and certain data types, like the string type, lack any useful procedure to work with them. This choice was made in order to avoid making the source code more complex than needed, and also, on my side, to avoid writing too much useless code, given that this language will never be used in the real world. Besides, implementing some of the missing parts is a good exercise for the willing reader, assuming she or he are new to this kind of stuff. Even with all this limitations, it is possible to write small working programs with Aocla, and that's all we need.
|
||||
Not every feature I desired to have is implemented, and certain data types, like the string type, lack any useful procedure to work with them. This choice was made in order to avoid making the source code more complex than needed, and also, on my side, to avoid writing too much useless code, given that this language will never be used in the real world. Besides, implementing some of the missing parts is a good exercise for the willing reader, assuming she or he are new to this kind of stuff. Even with all this limitations, it is possible to write small working programs with Aocla, and that's all we need for our goals.
|
||||
|
||||
# Aocla
|
||||
## Aocla overview
|
||||
|
||||
Aocla is a very simple language, more similar to Joy than to FORTH (higher level). It has a total of six datatypes:
|
||||
|
||||
|
@ -25,7 +27,7 @@ Aocla is a very simple language, more similar to Joy than to FORTH (higher level
|
|||
* Tuples: `(x y z)`
|
||||
* Strings: `"Hello World!\n"`
|
||||
|
||||
Floating point numbers are not provided for simplicity (writing an implementation should not be too hard, and is a good exercise). Aocla programs are valid Aocla lists, so the language is [homoiconic](https://en.wikipedia.org/wiki/Homoiconicity). While Aocla is a stack-based language, like FORTH, Joy and Factor, it introduces the idea of *local variables capturing*. Because of this construct, Aocla programs look a bit different (and simpler to write and understand in my opinion) compared to other stack-based languages. However locals capturing is optional: any program using locals can be rewritten to avoid using them.
|
||||
Floating point numbers are not provided for simplicity (writing an implementation should not be too hard, and is a good exercise). Aocla programs are valid Aocla lists, so the language is [homoiconic](https://en.wikipedia.org/wiki/Homoiconicity). While Aocla is a stack-based language, like FORTH, Joy and Factor, it introduces the idea of *local variables capturing*. Because of this construct, Aocla programs look a bit different (and simpler to write and understand in my opinion) compared to other stack-based languages. Locals capturing is optional: any program using locals can be rewritten to avoid using them, yet the existence of this feature deeply affects the language in many ways.
|
||||
|
||||
## Our first program
|
||||
|
||||
|
@ -33,7 +35,7 @@ The following is a valid Aocla program, taking 5 and squaring it, to obtain 25.
|
|||
|
||||
[5 dup *]
|
||||
|
||||
Since all the programs must be lists, and thus are enclosed between `[` and `]`, both the Aocla CLI (Command Line Interface) and the execution of programs from files are designed to avoid needing the brackets. Aocla will put the program inside `[]` for you, so the above program should be written like that:
|
||||
Since all the programs must be valid lists, and thus are enclosed between `[` and `]`, both the Aocla CLI (Command Line Interface) and the execution of programs from files are designed to avoid needing the brackets. Aocla will put the program inside `[]` for you, so the above program should be written like that:
|
||||
|
||||
5 dup *
|
||||
|
||||
|
@ -53,15 +55,15 @@ Finally, if an Aocla word is a symbol starting with the `$` character and a sing
|
|||
|
||||
5 (x) $x $x *
|
||||
|
||||
The ability to capture stack values into locals allow to make complex stack manipulation in a simple way, and make programs more explicit to read and easier to write. Still they have the remarkably quality of not making the language semantically more complex (if not for a small thing we will cover later -- search `upeval` inside this document if you want to know ASAP, but if you know the Tcl programming language, you already understood from the name). In general, while locals help the handling of the stack in the local context of the procedure, words communicate via the stack, so the main advantages of stack-based languages are untouched.
|
||||
The ability to capture stack values into locals allow to make complex stack manipulations in a simple way, and makes programs more explicit to read and easier to write. Still locals have the remarkable quality of not making the language semantically more complex (if not for a small thing we will cover later -- search `upeval` inside this document if you want to know ASAP, but if you know the Tcl programming language, you already understood from the name, that is similar to Tcl's `uplevel`). In general, while locals help the handling of the stack in the local context of the procedure, words communicate via the stack, so the main advantages of stack-based languages are untouched.
|
||||
|
||||
*Note: why allowing locals with just single letter names? The only reason is to make the implementation of the Aocla interpreter simpler to understand. This way, we don't need to make use of any dictionary data structure. If I would design Aocla to be a real language, I would remove this limitation.*
|
||||
*Note: why locals must have just single letter names? The only reason is to make the implementation of the Aocla interpreter simpler to understand. This way, we don't need to make use of any dictionary data structure. If I would design Aocla to be a real language, I would remove this limitation.*
|
||||
|
||||
We said that symbols normally trigger a procedure call. But symbols can also be pushed on the stack like any other value. To do so, symbols must be quoted, with the `'` character at the start.
|
||||
|
||||
'Hello printnl
|
||||
|
||||
The `printnl` procedure prints the last element in the stack and also prints a newline character, so the above program will just print `Hello` on the screen. For now you may wonder what's the point of quoting symbols: you could just use strings, but later we'll see this is important in order to write Aocla programs that write Aocla programs.
|
||||
The `printnl` procedure prints the last element in the stack and also prints a newline character, so the above program will just print `Hello` on the screen. You may wonder what's the point of quoting symbols. After all, you could just use strings, but later we'll see how this is important in order to write Aocla programs that write Aocla programs.
|
||||
|
||||
Quoting also works with tuples, so if you want to push the tuple `(a b c)` on the stack, instead of capturing the variables a, b and c, you can write:
|
||||
|
||||
|
@ -76,7 +78,7 @@ in REPL mode (Read Eval Print Loop). You write a code fragment, press enter, the
|
|||
1
|
||||
aocla> 2
|
||||
1 2
|
||||
aocla> ['a 'b "foo"]
|
||||
aocla> [a b "foo"]
|
||||
1 2 [a b "foo"]
|
||||
|
||||
This way you always know the stack content.
|
||||
|
@ -86,18 +88,17 @@ When you execute programs from files, in order to debug their executions you can
|
|||
|
||||
Aocla programs are just lists, and Aocla functions are lists bound to a
|
||||
name. The name is given as a symbol, and the way to bind a list with a
|
||||
symbol is an Aocla procedure itself, and not special syntax:
|
||||
symbol is an Aocla procedure itself. Not special syntax is required.
|
||||
|
||||
[dup *] 'square def
|
||||
|
||||
The `def` procedure will bind the list `[dup *] to the `square` symbol,
|
||||
The `def` procedure will bind the list `[dup *]` to the `square` symbol,
|
||||
so later we can use the `square` symbol and it will call our procedure:
|
||||
|
||||
aocla> 5 square
|
||||
25
|
||||
|
||||
Calling a symbol (not quoted symbols are called by default) that is not
|
||||
bound to any program will produce an error:
|
||||
Calling a symbol that is not bound to any list will produce an error:
|
||||
|
||||
aocla> foobar
|
||||
Symbol not bound to procedure: 'foobar' in unknown:0
|
||||
|
@ -106,12 +107,14 @@ bound to any program will produce an error:
|
|||
|
||||
Lists are the central data structure of the language: they are used to represent programs and are useful as a general purpose data structure to represent data. So most of the very few built-in procedures that Aocla offers are lists manipulation procedures.
|
||||
|
||||
Showing by examples, via the REPL, is probably the simplest way to show how to write Aocla programs. This pushes an empty list on the stack:
|
||||
The more direct way to show how to write Aocla programs is probably showing examples via its REPL, so I'll procede in this way.
|
||||
|
||||
To push an empty list on the stack, you can use:
|
||||
|
||||
aocla> []
|
||||
[]
|
||||
|
||||
We can add elements to the tail or head of the list, using the `<-` and `->` procedures:
|
||||
Then it is possible to add elements to the tail or the head of the list using the `<-` and `->` procedures:
|
||||
|
||||
aocla> 1 swap ->
|
||||
[1]
|
||||
|
@ -147,8 +150,7 @@ Then, to know how many elements there are in the list, we can use the
|
|||
aocla> len
|
||||
4 3
|
||||
|
||||
Other useful list operations are the following, that you may find quite
|
||||
obvious if you have any Lisp background:
|
||||
Other useful list operations are the following:
|
||||
|
||||
aocla> [1 2 3] [4 5 6] cat
|
||||
[1 2 3 4 5 6]
|
||||
|
@ -171,8 +173,7 @@ If you want to do something with list elements, in an imperative way, you can us
|
|||
2
|
||||
3
|
||||
|
||||
There are a few more list procedures. `get@` to get a specific element in
|
||||
a given position, `sort`, to sort a list, and if I remember correctly nothing
|
||||
There are a few more list procedures. There is `get@` to get a specific element in a given position, `sort`, to sort a list, and if I remember correctly nothing
|
||||
more about lists. Many of the above procedures are implemented inside the
|
||||
C source code of Aocla, in Aocla language itself. Others are implemented
|
||||
in C because of performance concerns or because it was simpler to do so.
|
||||
|
@ -188,14 +189,14 @@ For instance, this is the implementation of `foreach`:
|
|||
] while
|
||||
] 'foreach def
|
||||
|
||||
As you can see from the above code, Aocla syntax also supports comments:
|
||||
anything from `//` to the end of the line is ignored.
|
||||
As you can see from the above code, Aocla syntax also supports comments.
|
||||
Anything starting from `//` to the end of the line is ignored.
|
||||
|
||||
## Conditionals
|
||||
|
||||
Aocla conditionals are just `if` and `ifelse`. There is also a
|
||||
quite imperative looping construct, that is `while`. You could loop
|
||||
in the Scheme way, using recursion, but I like to give the language
|
||||
in the Scheme way, using recursion, but I wanted to give the language
|
||||
a Common Lisp vibe, where you can write imperative code, too.
|
||||
|
||||
The words `if` and `ifelse` do what you could imagine:
|
||||
|
@ -208,7 +209,7 @@ The words `if` and `ifelse` do what you could imagine:
|
|||
So `if` takes two programs (two lists), one is evaluated to see if it is
|
||||
true or false. The other is executed only if the first program is true.
|
||||
|
||||
The same is true for ifelse, but it takes three programs: condition, true-program, false-program:
|
||||
The `ifelse` procedure works similarly, but it takes three programs: condition, true-program, false-program:
|
||||
|
||||
aocla> 9 (a)
|
||||
aocla> [$a 11 ==] ["11 reached" printnl] [$a 1 + (a)] ifelse
|
||||
|
@ -230,7 +231,7 @@ And finally, an example of while:
|
|||
2
|
||||
1
|
||||
|
||||
Or, for a longer but more usual program making use of Aocla locals:
|
||||
Or, for a longer but more recognizable program making use of Aocla locals:
|
||||
|
||||
aocla> 10 (x) [$x 0 >] [$x printnl $x 1 - (x)] while
|
||||
10
|
||||
|
@ -244,12 +245,12 @@ Or, for a longer but more usual program making use of Aocla locals:
|
|||
2
|
||||
1
|
||||
|
||||
Basically two programming styles are possible: one that uses the stack
|
||||
In some way, two programming styles are possible: one that uses the stack
|
||||
mainly in order to pass state from different procedures, and otherwise
|
||||
uses locals a lot for local state, and another one where almost everything
|
||||
will use the stack, like in FORTH, and locals will be used only from time
|
||||
to time when stack manipulation is less clear. For instance Imagine
|
||||
I've three values on the stack:
|
||||
will use the stack, like in FORTH. Even in the second case, locals can be
|
||||
used from time to time when stack manipulation is more clear using them.
|
||||
For instance Imagine I've three values on the stack:
|
||||
|
||||
aocla> 1 2 3
|
||||
1 2 3
|
||||
|
@ -268,13 +269,13 @@ implemented in C, even if they could and probably should for performance
|
|||
reasons (and this is why `while` is implemented in C).
|
||||
|
||||
In order to implement procedures that execute code, Aocla provides the
|
||||
`eval` built-in word. It just consumes the list on the top of the
|
||||
`eval` built-in word. It just consumes the list at the top of the
|
||||
stack and evaluates it.
|
||||
|
||||
aocla> 5 [dup dup dup] eval
|
||||
5 5 5 5
|
||||
|
||||
In the above example we executed the list containing the program that calls
|
||||
In the above example, we executed the list containing the program that calls
|
||||
`dup` three times. Let's write a better example, a procedure that executes
|
||||
the same code a specified number of times:
|
||||
|
||||
|
@ -294,19 +295,22 @@ Example usage:
|
|||
## Eval and local variables
|
||||
|
||||
There is a problem with the above implementation of `repeat`, it does
|
||||
not mix well with local variables:
|
||||
not mix well with local variables. The following program will not have the expected behavior:
|
||||
|
||||
aocla> 10 (x) 3 [$x printnl] repeat
|
||||
Unbound local var: '$x' in eval:0 in unknown:0
|
||||
|
||||
Here the problem is that once we call a new procedure, that is `repeat`,
|
||||
the local variable `x` no longer exist in the context of the called
|
||||
procedure. So when `repeat` evaluates our program we get an error.
|
||||
the local variable `x` no longer exists in the context of the called
|
||||
procedure. It belongs to the previous procedure, that is, in this specific
|
||||
case, the *top level* execution stack frame. So when `repeat` evaluates our
|
||||
program we get an error.
|
||||
|
||||
This is the only case where Aocla local variables make the semantics of
|
||||
Aocla more complex than other stack based languages without this feature.
|
||||
In order to solve the problem above, Aocla has a specialized form of
|
||||
`eval` that is called `upeval`: it executes a program in the context
|
||||
(stack frame, in low level terms) of the caller. Let's rewrite
|
||||
In order to solve the problem above, Aocla has a special form of
|
||||
`eval` called `upeval`: it executes a program in the context
|
||||
(again, stack frame, in low level terms) of the caller. Let's rewrite
|
||||
the `repeat` procedure using `upeval`:
|
||||
|
||||
[(n l)
|
||||
|
@ -356,12 +360,15 @@ Because of that, we can write programs writing programs. For instance let's
|
|||
create a program that creates a procedure incrementing a variable of
|
||||
the specified name.
|
||||
|
||||
The procedure exects two elements on the stack: the name of the procedure
|
||||
we want to create, and the variable name that the procedure will increment:
|
||||
The procedure expects two objects on the stack: the name of the procedure
|
||||
we want to create, and the variable name that the procedure will increment. Two symbols, basically:
|
||||
|
||||
proc-name, var-name
|
||||
|
||||
And here is the program to do this:
|
||||
This is the listing of the procedure. Even if each line is commented, being
|
||||
written in a language that you didn't know until ten minutes ago, and even
|
||||
a strange enough language, you may want to carefully read each word it is
|
||||
composed of.
|
||||
|
||||
[ (p v) // Procedure, var.
|
||||
[] // Accumulate our program into an empty list
|
||||
|
@ -382,14 +389,15 @@ a list like that (you can check the intermediate results by adding
|
|||
|
||||
And finally the list is bound to the specified symbol using `def`.
|
||||
|
||||
Certain times programs that write programs can be quite useful. They are a
|
||||
*Note: programs like the above show that, after all, maybe the `->` and `<-` operators should expect the arguments in reverse order. Maybe I'll change my mind.*
|
||||
|
||||
Certain times, programs that write programs can be quite useful. They are a
|
||||
central feature in many Lisp dialects. However in the specific case of
|
||||
Aocla different procedures can be composed via the stack, and we also
|
||||
Aocla, different procedures can be composed via the stack, and we also
|
||||
have `uplevel`, so I feel their usefulness is greatly reduced. Also note
|
||||
that if Aocla was a serious language, it would have a lot more constructs
|
||||
to making writing programs that write programs a lot simpler than the above. Anyway, as you saw earlier, when we implemented the `repeat` procedure, in Aocla
|
||||
you can already do interesting stuff without using this programming
|
||||
paradigm.
|
||||
to make writing programs that write programs a lot simpler than the above. Anyway, as you saw earlier, when we implemented the `repeat` procedure, in Aocla
|
||||
it is possible to do interesting stuff without using this programming paradigm.
|
||||
|
||||
Ok, I think that's enough. We saw the basic of stack languages, the specific
|
||||
stuff Aocla adds and how the language feels like. This isn't a course
|
||||
|
@ -397,19 +405,19 @@ on stack languages, nor I would be the best person to talk about the
|
|||
argument. This is a course on how to write a small interpreter in C, so
|
||||
let's dive into the Aocla interpreter internals.
|
||||
|
||||
# Aocla internals
|
||||
# From puzzle 13 to Aocla
|
||||
|
||||
At the start of this README I told you Aocla started from an Advent of
|
||||
Code puzzles. The Puzzle could be solved by parsing representations
|
||||
of lists like that, and then writing a comparison function for
|
||||
the representations of the lists (well, actually this is how I solved it,
|
||||
Code puzzle. The Puzzle could be solved by parsing the literal representation
|
||||
of lists like the one below, and then writing a comparison function for
|
||||
the the list internal representation (well, actually this is how I solved it,
|
||||
but one could even take the approach of comparing *while* parsing,
|
||||
probably). This is an example of such lists:
|
||||
|
||||
[1,[2,[3,[4,[5,6,7]]]],8,9]
|
||||
|
||||
Parsing such lists representations was not too hard, however this is
|
||||
not single-level object, as it has elements that are sub lists. So
|
||||
Parsing flat lists is not particularly hard, however this is
|
||||
not a single-level object. It has elements that are sub-lists. So
|
||||
a recursive parser was the most obvious solution. This is what I wrote
|
||||
back then, the 13th of December:
|
||||
|
||||
|
@ -432,7 +440,7 @@ Why `elfobj`? Well, because it was Christmas and AoC is about elves.
|
|||
The structure above is quite trivial, just two types and a union in order
|
||||
to represent both types.
|
||||
|
||||
Let's see the parser:
|
||||
Let's see the parser, that is surely more interesting.
|
||||
|
||||
/* Given the string 's' return the elfobj representing the list or
|
||||
* NULL on syntax error. '*next' is set to the next byte to parse, after
|
||||
|
@ -505,22 +513,22 @@ as I already said, recursive. To parse each element of the list we call
|
|||
the same function again and again. This will make the magic of handling
|
||||
any complex nested list without having to do anything special. I know, I know.
|
||||
This is quite obvious for experienced enough programmers, but I claim it
|
||||
is still kinda of magic, like a Mandelbrot set, like standing with a mirror
|
||||
in front of another mirror admiring the infinite repeating images one
|
||||
inside the other. Recursion remains magic even when it was understood.
|
||||
is still kinda of a revelation, like a Mandelbrot set, like standing with a
|
||||
mirror in front of another mirror admiring the infinite repeating images one
|
||||
inside the other. Recursion remains magic even after it is understood.
|
||||
|
||||
Second point to note: the function gets a pointer to a string, and returns
|
||||
the object parsed and the pointer to the start of the next object to parse,
|
||||
that is just at some offset inside the same list. This is a very comfortable
|
||||
way to write such a parser: we can call the same function again to get
|
||||
the next object in a loop to parse all the tokens and sub-tokens. And I'm
|
||||
saying tokens for a reason, because the same exact structure can be used
|
||||
also when writing tokenizers that just return tokens one after the other,
|
||||
without any conversion to object.
|
||||
The second point to note: the function gets a pointer to a string, and returns
|
||||
the object parsed and also, by referene, the pointer to the start of the *next*
|
||||
object to parse, that is just at some offset inside the same string.
|
||||
This is a very comfortable way to write such a parser: we can call the same
|
||||
function again to get the next object in a loop to parse all the tokens and
|
||||
sub-tokens. And I'm saying tokens for a reason, because the same exact
|
||||
structure can be used also when writing tokenizers that just return tokens
|
||||
one after the other, without any conversion to object.
|
||||
|
||||
Now, what I did was to take this program and make it the programming language
|
||||
you just learned about in the first part of this README. How? Well, to
|
||||
start I redefined a much more complex object type:
|
||||
start I upgraded the object structure for more complex object types:
|
||||
|
||||
/* Type are defined so that each type ID is a different set bit, this way
|
||||
* in checkStackType() we may ask the function to check if some argument
|
||||
|
@ -555,12 +563,11 @@ start I redefined a much more complex object type:
|
|||
};
|
||||
} obj;
|
||||
|
||||
Well, important things to note, since this may look like just an extension
|
||||
of the original puzzle 13 code, but look at these differences:
|
||||
A few important things to note, since this may look like just a trivial extension of the original puzzle structure, but it's not:
|
||||
|
||||
1. We now use reference counting. When the object is allocated, it gets a *refcount* of 1. Then the functions retain() and release() are used in order to increment the reference count when we store the same object elsewhere, or when we want to remove a reference. Finally the references drop to zero and the object gets freed.
|
||||
2. The object types now are all power of two. This means we can store or pass to functions multiple types at once in a single integer, just performing the bitwise ore. It's useful. No need for functions with a variable number of arguments just to pass many times.
|
||||
3. There is some information about the line number where a given object was defined in the source code. Aocla can be a toy, but a toy that will try to give you some stack trace if there is a runtime error.
|
||||
1. We now use reference counting. When the object is allocated, it gets a *refcount* of 1. Then the functions `retain()` and `release()` are used in order to increment the reference count when we store the same object elsewhere, or when we want to remove a reference. Finally, when the references drop to zero, the object gets freed.
|
||||
2. The object types now are all powers of two: single bits, in binary representation. This means we can store or pass multiple types at once in a single integer, just performing the *bitwise or*. It is useful in practice. No need for functions with a variable number of arguments just to pass many times at once.
|
||||
3. There is information about the line number where a given object was defined in the source code. Aocla can be a toy, but a toy that will try to give you some stack trace if there is a runtime error.
|
||||
|
||||
This is the release() function.
|
||||
|
||||
|
@ -588,13 +595,13 @@ This is the release() function.
|
|||
}
|
||||
}
|
||||
|
||||
Note that in this implementation deeply nested data structures will produce many recursive calls. This can be avoided using lazy freeing, but not needed for something like Aocla.
|
||||
Note that in this implementation, deeply nested data structures will produce many recursive calls. This can be avoided using *lazy freeing*, but that's not needed for something like Aocla. However some reader may want to search *lazy freeing* on the web.
|
||||
|
||||
So, thanks to our parser, we can take an Aocla program, in the form of a string, parse it and get an Aocla object (`obj*` type) back. Now, in order to run an Aocla program, we have to *execute* this object. Stack based languages are particularly simple to execute: we just go form left to right, and depending on the object type, we do a different action:
|
||||
Thanks to our parser, we can take an Aocla program, in the form of a string, parse it and get an Aocla object (`obj*` type) back. Now, in order to run an Aocla program, we have to *execute* this object. Stack based languages are particularly simple to execute: we just go form left to right, and depending on the object type, we do different actions:
|
||||
|
||||
* If the object is a symbol (and is not quoted, see the `quoted` field in the object structure), we try to lookup a procedure with that name, and if it exists we execute the procedure. How? By recursively execute the list bound to the symbol.
|
||||
* If the object is a symbol (and is not quoted, see the `quoted` field in the object structure), we try to lookup a procedure with that name, and if it exists we execute the procedure. How? By recursively executing the list bound to the symbol.
|
||||
* If the object is a tuple with single characters elements, we capture the variables on the stack.
|
||||
* If it's a symbol starting with `$` we push the variable on the stack, or if the variable is not bound we raise an error.
|
||||
* If it's a symbol starting with `$` we push the variable on the stack, and if the variable is not bound, we raise an error.
|
||||
* For any other type of object, we just push it on the stack.
|
||||
|
||||
The function responsible to execute the program is called `eval()`, and is so short we can put it fully here, but I'll present the function split in different parts, to explain each one carefully. I will start showing just the first three lines, as they already tell us something.
|
||||
|
@ -604,7 +611,7 @@ The function responsible to execute the program is called `eval()`, and is so sh
|
|||
|
||||
for (size_t j = 0; j < l->l.len; j++) {
|
||||
|
||||
Here there are three things going on. Eval() takes a context and a list. The list is our program, and it is scanned left-to-right, as Aocla programs are executed left to right, word by word. So all is obvious but the context, what is an execution context for our program?
|
||||
Here there are three things going on. Eval() takes a context and a list. The list is our program, and it is scanned left-to-right, as Aocla programs are executed left to right, word by word. All should be clear but the context. What is an execution context for our program?
|
||||
|
||||
/* Interpreter state. */
|
||||
#define ERRSTR_LEN 256
|
||||
|
@ -635,7 +642,7 @@ It contains the following elements:
|
|||
|
||||
The stack frame has a pointer to the previous stack frame. This is useful both in order to implement `upeval` and to show a stack trace when an exception happens and the program is halted.
|
||||
|
||||
We can continue looking at eval() now. We stopped at the `for` loop, so now we are inside the iteration doing something with each element of the list:
|
||||
We can continue looking at the remaining parts of eval() now. We stopped at the `for` loop, so now we are inside the iteration doing something with each element of the list:
|
||||
|
||||
obj *o = l->l.ele[j];
|
||||
aproc *proc;
|
||||
|
@ -669,7 +676,7 @@ We can continue looking at eval() now. We stopped at the `for` loop, so now we a
|
|||
}
|
||||
break;
|
||||
|
||||
The essence of the loop is a bit `switch` statement doing something different depending on the object type. The object is just the current element of the list. The first case, is the tuple. Tuples capture local variables, unless they are quoted like this:
|
||||
The essence of the loop is a `switch` statement doing something different depending on the object type. The object is just the current element of the list. The first case is the tuple. Tuples capture local variables, unless they are quoted like this:
|
||||
|
||||
(a b c) // Normal tuple -- This will capture variables
|
||||
`(a b c) // Quoted tuple -- This will be pushed on the stack
|
||||
|
@ -700,8 +707,7 @@ from the Aocla stack to the stack frame, into the array representing the locals.
|
|||
stackPush(ctx,ctx->frame->locals[idx]);
|
||||
retain(ctx->frame->locals[idx]);
|
||||
|
||||
For symbols, as usually we check if the symbol is quoted, an in such case we just push it on the stack. Otherwise, we handle two different cases. The above is the one where symbol names start with a `$`. It is, basically, the reverse of
|
||||
what we saw earlier in tuples capturing local vars. This time the local variable is transferred to the stack. However *we still take the reference* in the local variable array, as the program may want to push the same variable again and again, so, after pushing the object on the stack, we have to call `retain()` to increment the reference count of the object.
|
||||
For symbols, as we did for tuples, we check if the symbol is quoted, an in such case we just push it on the stack. Otherwise, we handle two different cases. The above is the one where symbol names start with a `$`. It is, basically, the reverse operation of what we saw earlier in tuples capturing local vars. This time the local variable is transferred to the stack. However **we still take the reference** in the local variable array, as the program may want to push the same variable again and again, so, after pushing the object on the stack, we have to call `retain()` to increment the reference count of the object.
|
||||
|
||||
If the symbol does not start with `$`, then it's a procedure call:
|
||||
|
||||
|
@ -736,21 +742,18 @@ and returns a list object or, if there is no such procedure defined, NULL.
|
|||
Now what happens immediately after is much more interesting. Aocla procedures
|
||||
are just list objects, but it is possible to implement Aocla procedures
|
||||
directly in C. If the `cproc` is not NULL, then it is a C function pointer
|
||||
implementing a procedure, otherwise the procedure is *used defined*, written
|
||||
in Aocla, and we need to evaluate it, with a nested `eval()` call.
|
||||
As you can see, recursion is crucial in writing interpreters.
|
||||
implementing a procedure, otherwise the procedure is *user defined*, that menas it is written in Aocla, and we need to evaluate it. We do this with a nested `eval()` call. As you can see, recursion is crucial in writing interpreters.
|
||||
|
||||
Another important thing is that each new Aocla procedure has its own set
|
||||
of local variables. The scope of local variables, in Aocla, is the
|
||||
lifetime of the procedure call, like in many other languages. So before
|
||||
calling al Aocla procedure we allocate a new stack frame with `newStackFrame()`, then we call `eval()`, free the stack frame and store the old one. Procedures implemented in C don't need a stack frame, as they will not make any use of Aocla local variables.
|
||||
*A little digression: if we would like to speedup procedure call, we could cache the procedure lookup directly inside the symbol object. However in Aocla procedures can be redefined, so the next time the same procedure name may be bound to a different procedure. To still cache lookedup procedures, a simple way is to use the concept of "epoch". The context has a 64 bit integer called epoch, that is incremented every time a procedure is redefined. So, when we cache the procedure lookup into the object, we also store the current value of the epoch. Then, before using the cached value, we check if the epoch much. If there is no match, we perform the lookup again, and update the cached procedure and the epoch.*
|
||||
|
||||
Sorry, let's go back to our `eval` function. Another important thing that's worth noting is that each new Aocla procedure call has its own set of local variables. The scope of local variables, in Aocla, is the lifetime of the procedure call, like in many other languages. So, in the code above, before calling an Aocla procedure we allocate a new stack frame using `newStackFrame()`, then we can finally call `eval()`, free the stack frame and store the old stack frame back in the context structure. Procedures implemented in C don't need a stack frame, as they will not make any use of Aocla local variables. The following is the last part of the `eval()` function implementation:
|
||||
|
||||
default:
|
||||
stackPush(ctx,o);
|
||||
retain(o);
|
||||
break;
|
||||
|
||||
This is the final, default behavior for all the other objects. They get pushed on the stack, and that's it.
|
||||
This is the default behavior for all the other object types. They get pushed on the stack, and that's it.
|
||||
|
||||
Let's see how Aocla C-coded procedures are implemented, by observing the
|
||||
C function implementing basic mathematical operations such as +, -, ...
|
||||
|
@ -773,7 +776,7 @@ C function implementing basic mathematical operations such as +, -, ...
|
|||
return 0;
|
||||
}
|
||||
|
||||
Here we cheat: the code to implement each procedure would be almost the same so we check the name of the procedure called, and bind all the operators to the same function:
|
||||
Here I cheated: the code required to implement each math procedure separately would be almost the same. So we bind all the operators to the same C function, and check the name of the procedure called inside a single implementation (see the above function). Here is where we register many procedures to the same C function.
|
||||
|
||||
void loadLibrary(aoclactx *ctx) {
|
||||
addProc(ctx,"+",procBasicMath,NULL);
|
||||
|
@ -783,14 +786,13 @@ Here we cheat: the code to implement each procedure would be almost the same so
|
|||
...
|
||||
|
||||
The `procBasicMath()` is quite self-documenting, I guess. The proof of that
|
||||
is that I didn't add any comment inside the function. It checks the type
|
||||
of the top objects on the stack, as they must be integers. Get them
|
||||
with `stackPop()`, perform the math, push a new integer object, release the
|
||||
old ones. That's it.
|
||||
is that I didn't add any comment inside the function. When comments are needed, I add them automatically, I can't help myself. Anyway, what it does is
|
||||
the following: it checks the type of the top objects on the stack, as they
|
||||
must be integers. Get them with `stackPop()`, perform the math, push a new integer object, release the two old ones. That's it.
|
||||
|
||||
## Deep copy of objects
|
||||
|
||||
Well, believe it or not, that's it: you already saw all the most important
|
||||
Believe it or not, that's it: you already saw all the most important
|
||||
parts of the Aocla interpreter. But there are a few corner cases that
|
||||
are forth a few more paragraphs of this README.
|
||||
|
||||
|
@ -800,7 +802,7 @@ Imagine the execution of the following Aocla program:
|
|||
4 $x -> // Now the stack contains the list [1 2 3 4]
|
||||
$x // What will be x now? [1 2 3] or [1 2 3 4]?
|
||||
|
||||
Well, Aocla is designed to be kinda a *pure* language: words manipulate
|
||||
Aocla is designed to be kinda of a *pure* language: words manipulate
|
||||
objects by taking them from the stack and pushing new objects to the
|
||||
stack, that result from certain operations. We don't want to expose the
|
||||
idea of references in such a language, I feel like that would be a mess,
|
||||
|
@ -811,13 +813,12 @@ at `x`.
|
|||
|
||||
At the same time, we don't want to write an inefficient crap where each
|
||||
value is copied again and again. When we push our variable content on
|
||||
the stack we just push the pointer and increment the reference count.
|
||||
In order to have the best of both world, we want to implement something
|
||||
the stack, we just push the pointer to the object and increment the reference
|
||||
count. In order to have the best of both worlds, we want to implement something
|
||||
called *copy on write*. So normally our objects can be shared, and thanks
|
||||
to the count of references we know if it is shared or not, if losing a
|
||||
reference is going to free the object or not. However as soon as some
|
||||
operation is going to alter an object whose reference count is greater
|
||||
than one, it gets copied first, only later modified.
|
||||
to the count of references we know if and object is shared or not.
|
||||
However, as soon as some operation is going to alter an object whose
|
||||
reference count is greater than one, it gets copied first, only later modified.
|
||||
|
||||
In the above program, the list reference count is 2, because the same list
|
||||
is stored in the array of local variables and in the stack. Let's
|
||||
|
@ -845,7 +846,7 @@ give a look at the implementation of the `->` operator:
|
|||
return 0;
|
||||
}
|
||||
|
||||
The interesting like here is the following one:
|
||||
The interesting line here is the following one:
|
||||
|
||||
obj *l = getUnsharedObject(stackPop(ctx));
|
||||
|
||||
|
@ -861,14 +862,14 @@ the work for us. Let's check, in turn, its implementation:
|
|||
}
|
||||
}
|
||||
|
||||
So if the object is already unshared (its *refcount* is one), just return it as it is. Otherwise create a copy and remove a reference from the original object. This may look odd, but think at it: the invariant here should be that the caller of this function is the only owner of this object. If we want the caller to be able to abstract totally what happened inside the function, if the object was shared and we returned the caller a copy, the reference the caller had for the old object should be gone. Let's look at the following example:
|
||||
So if the object is already not shared (its *refcount* is one), just return it as it is. Otherwise create a copy and remove a reference from the original object. Why, on copy, we need to remove a reference from the passed object? This may look odd at a first glance, but think at it: the invariant here should be that the caller of this function is the only owner of the object. We want the caller to be able to abstract totally what happens inside the `getUnsharedObject()` function. If the object was shared and we returned the caller a copy, the reference the caller had for the old object should be gone. Let's look at the following example:
|
||||
|
||||
obj *o = stackPop(ctx);
|
||||
o = getUnsharedObject(o);
|
||||
doSomethingThatChanges(o);
|
||||
stackPush(ctx,o);
|
||||
|
||||
Stack pop and push functions don't change the reference counting of the object,
|
||||
Stack pop and push functions don't change the reference count of the object,
|
||||
so if the object is not shared we get it with a single reference, change it,
|
||||
push it on the stack and the object has still a single reference.
|
||||
|
||||
|
@ -881,8 +882,8 @@ fine. What about the old object stored in the local variable? It should
|
|||
have a reference count of one as well, but if we don't `release()` it
|
||||
in `getUnsharedObject()` it would have two, causing a memory leak.
|
||||
|
||||
I'll not show the `deepCopy()` function, it just allocates a new object of the specified type and copy the content. But guess what? It's a recursive function.
|
||||
I'll not show the `deepCopy()` function, it just allocates a new object of the specified type and copy the content. But guess what? It's a recursive function, too. That's why it is a *deep* copy.
|
||||
|
||||
# The end
|
||||
|
||||
That's it, and thanks for reading that far. To know more about interpreters you have only one thing to do: write your own, or radically modify Aocla in some crazy ways. Get your hands dirty, it's super fun and rewarding. I can only promise that what you will learn will be worthwhile, even if you'll never write an interpreter again.
|
||||
That's it, and thanks for reading that far. To know more about interpreters you have only one thing to do: write your own, or radically modify Aocla in some crazy way. Get your hands dirty, it's super fun and rewarding. I can only promise that what you will learn will be worthwhile, even if you'll never write an interpreter again.
|
||||
|
|
Loading…
Reference in a new issue