They are defined in the math library, but the library is optional.
So, it is necessary to include it by `#include <math.h>` and also link the library with the linker.
- 6: Gets gtk4 library.
- 8: Gets gnome module.See [Meson build system website -- GNUME module](https://mesonbuild.com/Gnome-module.html#gnome-module) for further information.
- 9: Compiles ui file to C source file according to the XML file `turtle.gresource.xml`.
- 11: Gets flex.
- 12: Gets bison.
- 13: Compiles `turtle.y` to `turtle_parser.c` and `turtle_parser.h` by bison.
The function `custom_target` creates a custom top level target.
See [Meson build system website -- custom target](https://mesonbuild.com/Reference-manual.html#custom_target) for further information.
- 14: Compiles `turtle.lex` to `turtle_lex.c` by flex.
- 16: Specifies C source files.
- 18: Compiles C source files including generated files by glib-compile-resources, bison and flex.
The argument `turtleparser[1]` refers to `tirtle_parser.h` which is the second output in the line 13.
## Turtle.lex
### What does flex do?
Flex creates lexical analyzer from flex source file.
Flex source file is a text file.
Its syntactic rule will be explained later.
Generated lexical analyzer is a C source file.
It is also called scanner.
It reads a text file, which is a source file of a program language, and gets variable names, numbers and symbols.
Suppose here is a turtle source file.
~~~
fc (1,0,0) # Foreground color is red, rgb = (1,0,0).
pd # Pen down.
distance = 100
angle = 90
fd distance # Go forward by distance (100) pixels.
tr angle # Turn right by angle (90) degrees.
~~~
The content of the text file is separated into `fc`, `(`, `1` and so on.
The words `fc`, `pd`, `distance`, `angle`, `tr`, `1`, `0`, `100` and `90` are called tokens.
The characters '`(`' (left parenthesis), '`,`' (comma), '`)`' (right parenthesis) and '`=`' (equal sign) are called symbols.
( Sometimes those symbols called tokens, too.)
Flex reads `turtle.lex` and generates the C source file of a scanner.
The file `turtle.lex` specifies tokens, symbols and the behavior which corresponds to each token or symbol.
Turtle.lex isn't a big program.
~~~lex
1 %top{
2 #include<string.h>
3 #include<stdlib.h>
4 #include<glib.h>
5 #include "turtle_parser.h"
6
7 static int nline = 1;
8 static int ncolumn = 1;
9 static void get_location (char *text);
10
11 /* Dinamically allocated memories are added to the single list. They will be freed in the finalize function. */
The file consists of three sections which are separated by "%%" (line 18 and 56).
They are definitions, rules and user code sections.
### Definitions section
- 1-12: Lines between "%top{" and "}" are C source codes.
They will be copied to the top of the generated C source file.
- 2-3: The function `strlen`, in line 65, is defined in `string.h`
The function `atof`, in line 40, is defined in `stdlib.h`.
- 7-9: The current input position is pointed by `nline` and `ncolumn`.
The function `get_location` (line 61-66) sets `yylloc`to point the start and end point of `yytext` in the buffer.
This function is declared here so that it can be called before the function is defined.
- 12: GSlist is used to keep allocated memories.
- 15: This option (`%option noyywrap`) must be specified when you have only single source file to the scanner. Refer to "9 The Generated Scanner" in the flex documentation in your distribution for further information.
(The documentation is not on the internet.)
- 17-18: `REAL_NUMBER` and `IDENTIFIER` are names.
A name begins with a letter or an underscore followed by zero or more letters, digits, underscores (`_`) or dashes (`-`).
They are followed by regular expressions which are their definitions.
They will be used in rules section and will expand to the definition.
You can leave out such definitions here and use regular expressions in rules section directly.
### Rules section
This section is the most important part.
Rules consist of patterns and actions.
The patterns are regular expressions or names surrounded by braces.
The names must be defined in the definitions section.
The definition of the regular expression is written in the flex documentation.
For example, line 40 is a rule.
-`{REAL_NUMBER}` is a pattern
-`get_location (yytext); yylval.NUM = atof (yytext); return NUM;` is an action.
`{REAL_NUMBER}` is defined in the line 17, so it expands to `(0|[1-9][0-9]*)(\.[0-9]+)?`.
This regular expression matches numbers like `0`, `12` and `1.5`.
If an input is a number, it matches the pattern in line 40.
Then the matched text is assigned to `yytext` and corresponding action is executed.
A function `get_location` changes the location variables to the position at the text.
It assigns `atof (yytext)`, which is double sized number converted from `yytext`, to `yylval.NUM` and return `NUM`.
`NUM` is a token kind and it represents integer.
It is defined in `turtle.y`.
The scanner generated by flex has `yylex` function.
If `yylex` is called and the input is "123.4", then it works as follows.
1. A string "123.4" matches `{REAL_NUMBER}`.
2. Update the location variable `ncolumn` and `yylloc`with `get_location`.
3. The function `atof` converts the string "123.4" to double type number 123.4.
4. It is assigned to `yylval.NUM`.
5.`yylex` returns `NUM` to the caller.
Then the caller knows the input is a number (`NUM`), and its value is 123.4.
- 20-58: Rules section.
- 21: The symbol `.` (dot) matches any character except newline.
Therefore, a comment begins `#` followed by any characters except newline.
No action happens.
- 22: White space just increases the variable `ncolumn` by one.
- 23: Tab is assumed to be equal to eight spaces.
- 24: New line increases a variable `nline` by one and resets `ncolumn`.
- 26-38: Keywords just updates the location variables `ncolumn` and `yylloc`, and return the token kinds of the keywords.
- 40: Real number constant.
- 42: `IDENTIFIER` is defined in line 18.
The location variables are updated and the name of the identifier is assigned to `yylval.ID`.
The memory of the name is allocated by the function `g_strdup`.
The memory is registered to the list (GSlist type list).
The memory will be freed after the runtime routine finishes.
A token kind `ID` is returned.
- 46-56: Symbols just update the location variable and return the token kinds.
The token kind is the same as the symbol itself.
- 58: If the input doesn't match the patterns, then it is an error.
A special token kind `YYUNDEF` is returned.
### User code section
This section is just copied to C source file.
- 61-66: A function `get_location`.
The location of the input is recorded to `nline` and `ncolumn`.
A variable `yylloc` is referred by the parser.
It is a C structure and has four members, `first_line`, `first_column`, `last_line` and `last_column`.
They point the start and end of the current input text.
- 68: `YY_BUFFER_STATE` is a pointer points the input buffer.
- 70-73: A function `init_flex` is called by `run_cb` which is a "clicked" signal handler on the `Run` button.
It has one string type parameter.
The caller assigns it with the content of the GtkTextBuffer instance.
A function `yy_scan_string` sets the input buffer for the scanner.
- 75-78: A function `finalize_flex` is called after runtime routine finishes.
It deletes the input buffer.
## Turtle.y
Turtle.y has more than 800 lines so it is difficult to explain all the source code.
So I will explain the key points and leave out other less important parts.
### What does bison do?
Bison creates C source file of a parser from a bison source file.
The bison source file is a text file.
A parser analyzes a program source code according to its grammar.
Suppose here is a turtle source file.
~~~
fc (1,0,0) # Foreground color is red, rgb = (1,0,0).
pd # Pen down.
distance = 100
angle = 90
fd distance # Go forward by distance (100) pixels.
tr angle # Turn right by angle (90) degrees.
~~~
The parser calls `yylex` to get a token.
The token consists of its type (token kind) and value (semantic value).
So, the parser gets items in the following table whenever it calls `yylex`.
| |token kind|yylval.ID|yylval.NUM|
|:-:|:--------:|:-------:|:--------:|
| 1 | FC | | |
| 2 | ( | | |
| 3 | NUM | | 1.0 |
| 4 | , | | |
| 5 | NUM | | 0.0 |
| 6 | , | | |
| 7 | NUM | | 0.0 |
| 8 | ) | | |
| 9 | PD | | |
|10 | ID |distance | |
|11 | = | | |
|12 | NUM | | 100.0 |
|13 | ID | angle | |
|14 | = | | |
|15 | NUM | | 90.0 |
|16 | FD | | |
|17 | ID |distance | |
|18 | TR | | |
|19 | ID | angle | |
Bison source code specifies the grammar rules of turtle language.
For example, `fc (1,0,0)` is called primary procedure.
A procedure is like a void type C function.
It doesn't return any values.
Programmers can define their own procedures.
On the other hand, `fc` is a built-in procedure.
Such procedures are called primary procedures.
It is described in bison source code like:
~~~
primary_procedure: FC '(' expression ',' expression ',' expression ')';
expression: ID | NUM;
~~~
This means:
- Primary procedure is FC followed by '(', expression, ',', expression, ',', expression and ')'.
- expression is ID or NUM.
The description above is called BNF (Backus-Naur form).
Precisely speaking, it is not exactly the same as BNF.
But the difference is small.
The first line is:
~~~
FC '(' NUM ',' NUM ',' NUM ')';
~~~
The parser analyzes the turtle source code and if the input matches the definition above, the parser recognizes it as a primary procedure.
The grammar of turtle is described in the [Turtle manual](https://toshiocp.github.io/Gtk4-tutorial/turtle_doc.html).
The following is an extract from the document.
~~~
program:
statement
| program statement
;
statement:
primary_procedure
| procedure_definition
;
primary_procedure:
PU
| PD
| PW expression
| FD expression
| TR expression
| TL expression
| BC '(' expression ',' expression ',' expression ')'
| FC '(' expression ',' expression ',' expression ')'
| ID '=' expression
| IF '(' expression ')' '{' primary_procedure_list '}'
| RT
| RS
| RP '(' expression ')' '{' primary_procedure_list '}'
| ID '(' ')'
| ID '(' argument_list ')'
;
procedure_definition:
DP ID '(' ')' '{' primary_procedure_list '}'
| DP ID '(' parameter_list ')' '{' primary_procedure_list '}'
;
parameter_list:
ID
| parameter_list ',' ID
;
argument_list:
expression
| argument_list ',' expression
;
primary_procedure_list:
primary_procedure
| primary_procedure_list primary_procedure
;
expression:
expression '=' expression
| expression '>' expression
| expression '<' expression
| expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression
| '-' expression %prec UMINUS
| '(' expression ')'
| ID
| NUM
;
~~~
The grammar rule defines `program` first.
- program is a statement or a program followed by a statement.
The definition is recursive.
-`statement` is program.
-`statement statement` is `program statement`.
Therefore, it is program.
-`statement statement statement` is `program statement`.
Therefore, it is program.
You can find that a sequence of statements is program like this.
`program` and `statement` aren't tokens.
They don't appear in the input.
They are called non terminal symbols.
On the other hand, tokens are called terminal symbols.
The word "token" used here has wide meaning, it includes tokens and symbols which appear in the input.
Non terminal symbols are often shortened to nterm.
-`proc_lookup` looks up a procedure. If the procedure is found, it returns a pointer to the node. Otherwise it returns NULL.
-`var_lookup` looks up a variable. If the variable is found, it returns TRUE and sets the pointer (argument) to point the value. Otherwise it returns FALSE.
-`var_replace` replaces the value of a variable. If the variable hasn't registered yet, it installs the variable.
~~~C
int
tbl_lookup (int type, char *name) {
int i;
if (tp == 0)
return -1;
for (i=0; i<tp;++i)
if (type == table[i].type && strcmp(name, table[i].name) == 0)
if (! stack_replace (name, d)) /* First, tries to replace the value in the stack (parameter).*/
var_replace (name, d); /* If the above fails, tries to replace the value in the table. If the variable isn't in the table, installs it, */
break;
case N_IF:
if (eval (child1(node)))
execute (child2(node));
break;
case N_RT:
ret_level--;
break;
case N_RS:
pen = TRUE;
angle = 90.0;
cur_x = 0.0;
cur_y = 0.0;
line_width = 2.0;
fc.red = 0.0; fc.green = 0.0; fc.blue = 0.0;
/* To change background color, use bc. */
break;
case N_procedure_call:
name = name(child1(node));
node_t *proc = proc_lookup (name);
if (! proc)
runtime_error ("Procedure %s not defined.\n", name);
if (strcmp (name, name(child1(proc))) != 0)
runtime_error ("Unexpected error. Procedure %s is called, but invoked procedure is %s.\n", name, name(child1(proc)));
/* make tuples (parameter (name), argument (value)) and push them to the stack */
node_t *param_list;
node_t *arg_list;
param_list = child2(proc);
arg_list = child2(node);
if (param_list == NULL) {
if (arg_list == NULL) {
stack_push (NULL, 0.0); /* number of argument == 0 */
} else
runtime_error ("Procedure %s has different number of argument and parameter.\n", name);
}else {
/* Don't change the stack until finish evaluating the arguments. */
#define TEMP_STACK_SIZE 20
char *temp_param[TEMP_STACK_SIZE];
double temp_arg[TEMP_STACK_SIZE];
n = 0;
for (; param_list->type == N_parameter_list; param_list = child1(param_list)) {
if (arg_list->type != N_argument_list)
runtime_error ("Procedure %s has different number of argument and parameter.\n", name);
if (n >= TEMP_STACK_SIZE)
runtime_error ("Too many parameters. the number must be %d or less.\n", TEMP_STACK_SIZE);
temp_param[n] = name(child2(param_list));
temp_arg[n] = eval (child2(arg_list));
arg_list = child1(arg_list);
++n;
}
if (param_list->type == N_ID && arg_list -> type != N_argument_list) {
temp_param[n] = name(param_list);
temp_arg[n] = eval (arg_list);
if (++n >= TEMP_STACK_SIZE)
runtime_error ("Too many parameters. the number must be %d or less.\n", TEMP_STACK_SIZE);
temp_param[n] = NULL;
temp_arg[n] = (double) n;
++n;
} else
runtime_error ("Unexpected error.\n");
for (i = 0; i <n;++i)
stack_push (temp_param[i], temp_arg[i]);
}
ret_level = ++proc_level;
execute (child3(proc));
ret_level = --proc_level;
stack_return ();
break;
case N_procedure_definition:
name = name(child1(node));
proc_install (name, node);
break;
case N_primary_procedure_list:
execute (child1(node));
execute (child2(node));
break;
default:
runtime_error ("Unknown statement.\n");
}
}
~~~
A node `N_procedure_call` is created by the parser when it has found a user defined procedure call.
The procedure has been defined in the prior statement.
Suppose the parser reads the following example code.
~~~
dp drawline (angle, distance) {
tr angle
fd distance
}
drawline (90, 100)
drawline (90, 100)
drawline (90, 100)
drawline (90, 100)
~~~
This example draws a square.
When The parser reads the lines from one to four, it creates nodes like this:
![Nodes of drawline](../image/tree2.png)
Runtime routine just stores the procedure to the symbol table with its name and node.
![Symbol table](../image/table.png)
When the parser reads the fifth line in the example, it creates nodes like this:
![Nodes of procedure call](../image/proc_call.png)
When the runtime routine meets `N_procedure_call` node, it behaves like this:
1. Searches the symbol table for the procedure with the name.
2. Gets pointers to the node to parameters and the node to the body.
3. Creates a temporary stack.
Makes a tuple of each parameter name and argument value.
Pushes the tuples into the stack, and (NULL, number of parameters) finally.
If no error occurs, copies them from the temporary stack to the parameter stack.
4. Increases `prc_level` by one.
Sets `ret_level` to the same value as `proc_level`.
`proc_level` is zero when runtime routine runs on the main routine.
If it goes into a procedure, `proc_level` increases by one.
Therefore, `proc_level` is the depth of the procedure call.
`ret_level` is the level to return.
If it is the same as `proc_level`, runtime routine executes commands in order of the commands in the procedure.
If it is smaller than `proc_level`, runtime routine doesn't execute commands until it becomes the same level as `proc_level`.
`ret_level` is used to return the procedure.
5. Executes the node of the body of the procedure.
6. Decreases `proc_level` by one.
Sets `ret_level` to the same value as `proc_level`.
Calls `stack_return`.
When the runtime routine meets `N_RT` node, it decreases `ret_level` by one so that the following commands in the procedure are ignored by the runtime routine.
#### Runtime entry and error functions
A function `run` is the entry of the runtime routine.
A function `runtime_error` reports an error occurred during the runtime routine runs.
(Errors which occur during the parsing are called syntax error and reported by `yyerror`.)
After `runtime_error` reports an error, it stops the command execution and goes back to `run` to exit.
Setjmp and longjmp functions are used.
They are declared in `<setjmp.h>`.
`setjmp (buf)` saves state information in `buf` and returns zero.
`longjmp(buf, 1)` restores the state information from `buf` and returns `1` (the second argument).
Because the information is the status at the time `setjmp` is called, so longjmp resumes the execution at the next of setjmp function call.
In the following program, longjmp resumes at the assignment to the variable `i`.
When setjmp is called, 0 is assigned to `i` and `execute(node_top)` is called.
On the other hand, when longjmp is called, 1 is assigned to `i` and `execute(node_top)` is not called..
`g_slist_free_full` frees all the allocated memories.
A function `runtime_error` has a variable-length argument list.
~~~C
void runtime_error (char *format, ...)
~~~
This is implemented with `<stdarg.h>` header file.
The `va_list` type variable `args` will refer to each argument in turn.
A function `va_start` initializes `args`.
A function `va_arg` returns an argument and moves the reference of `args` to the next.
A function `va_end` cleans up everything necessary at the end.
The function `runtime_error` has a similar format of printf standard function.
But its format has only `%s`, `%f` and `%d`.
The functions declared in `<setjmp.h>` and `<stdarg.h>` are explained in the very famous book "The C programming language" written by Brian Kernighan and Dennis Ritchie.
I referred to the book to write the program above.
The program `turtle` is unsophisticated and unpolished.
If you want to make your own language, you need to know more and more.
I don't know any good textbook about compilers and interpreters.
If you know a good book, please let me know.
However, the following information is very useful (but old).
- Bison documentation
- Flex documentation
- Software tools written by Brian W. Kernighan & P. J. Plauger (1976)
- Unix programming environment written by Brian W. Kernighan and Rob Pike (1984)