cppannotations/yo/concrete/scanner.yo
2009-07-07 15:27:29 +00:00

123 lines
7.5 KiB
Text

The tt(class Scanner), derived as usual from the class ti(yyFlexLexer), is
generated by bf(flex)(1)hi(flex). The derived class has access to data
controlled by the lexical scanner. In particular, it has access to the
following data members:
hi(flex: protected data members)
itemization(
itht(flex: yytext)(char *yytext), containing the hi(matched text) text
matched by a i(regular expression). Clients may access this information using
the scanner's ti(YYText()) member;
itht(flex: yyleng)(int yyleng), the hi(matched text length) length of the
text in tt(yytext). Clients may access this value using the scanner's
ti(YYLeng()) member;
itht(flex yylineno)(int yylineno): the current i(line number). This
variable is only maintained if
tt(%option yylineno) hi(flex: %option yylineno) is specified. Clients
may access this value using the scanner's ti(lineno()) member.
)
Other members are available as well, but are used less often. Details can
be found in ti(FlexLexer.h).
Objects of the tt(class Scanner) perform two tasks:
itemization(
it() They push file information about the current file to a i(file stack);
it() They pop the last-pushed information from the stack once endOfFile()
is detected in a file.
)
Several member functions are used to accomplish these tasks. As they are
auxiliary to the scanner, they are i(private) members. In practice, develop
these private members once the need for them arises. Note
that, apart from the private member functions, several private data members
are defined as well. Let's have a closer look at the implementation of the
class tt(Scanner):
itemization(
it() First, we have a look at the class's initial section, showing the
conditional inclusion of tt(FlexLexer.h), its tt(class) opening, and its
private data. At the top of the class interface the private struct
tt(FileInfo) is defined. tt(FileInfo) is used to store the names and pointers
to open files. The struct has two constructors: one merely accepting a
filename, the other also expecting a tt(bool) argument indicating that the
file is already open and should not be handled by tt(FileInfo). This former
constructor is used only once: as the initial stream is an already open file
there is no need to open it again and so tt(Scanner)'s constructor will use
this constructor to store the name of the initial file only. tt(Scanner)'s
public section starts off by defining the tt(enum Error) defining various
symbolic constants for errors that may be detected:
verbinsert(HEAD)(concrete/lexer/scanner/scanner.h)
it() As they are objects, the class's data members are initialized
automatically by tt(Scanner)'s i(constructor). It activates the initial input
(and output) file and pushes the name of the initial input file, using the
second tt(FileInfo) constructor. Here is its implementation:
verbinclude(concrete/lexer/scanner/scanner.cc)
it() The scanning process proceeds as follows:
once the scanner extracts a filename from an tt(#include) directive, a
switch to another file is performed by tt(pushSource()). If the filename
could not be extracted, the scanner throws an tt(invalidInclude) i(exception)
value. The tt(pushSource()) member and the matching function tt(popSource())
handle file switching. Switching to another file proceeds as follows:
itemization(
it() First, the current depth of the tt(include)-nesting is inspected.
If tt(s_maxDepth) is reached, the stack is considered full, and the scanner
throws a tt(nestingTooDeep) exception.
it() Next, tt(throwOnCircularInclusion()) is called to avoid circular
inclusions when switching to new files. This function throws an exception if a
filename is included twice using a simple literal name check. Here is its
implementation:
verbinclude(concrete/lexer/scanner/throwoncircular.cc)
it() Then the new filename is added to the tt(FileInfo) vector, at the
same time creating a new tt(ifstream) object. If this fails, the scanner
throws a tt(cantRead) exception.
it() Finally, a new ti(yy_buffer_state) is created for the newly
opened stream, and the lexical scanner is instructed to switch to that stream
using tt(yyFlexLexer)'s member function ti(yy_switch_to_buffer()).
)
Here is tt(pushSource())'s implementation:
verbinclude(concrete/lexer/scanner/pushsource.cc)
it() The class tt(yyFlexLexer) provides a series of member functions that
can be used to switch files. The file-switching capability of a
tt(yyFlexLexer) object is founded on the tt(struct yy_buffer_state),
containing the state of the emi(scan-buffer) of the currently read file. This
buffer is pushed on the tt(d_state) stack when an tt(#include) is
encountered. Then tt(yy_buffer_state)'s contents are replaced by the buffer
created for the file to be processed next. Note that in the tt(flex)
specification file the function tt(pushSource()) is called as
centt(pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE);)
ti(YY_CURRENT_BUFFER) and ti(YY_BUF_SIZE) are macros that are em(only)
available in the rules section of the lexer specification file, so they must
be passed as arguments to tt(pushSource()). Currently it is em(not) possible
to use these macros in the tt(Scanner) class's member functions directly.
it() Note that ti(yylineno) is not updated when a i(file switch) is
performed. If line numbers are to be monitored, then the current value of
tt(yylineno) should be pushed on a stack, and tt(yylineno) should be reset by
tt(pushSource()), whereas tt(popSource()) should reinstate a former value of
tt(yylineno) by popping a previously pushed value from the
stack. tt(Scanner)'s current implementation maintains a simple i(stack) of
ti(yy_buffer_state) pointers. Changing that into a stack of
tt(pair<yy_buffer_state *, size_t>) elements would allow us to save (and
restore) line numbers as well. This modification is left as an i(exercise) to
the reader.
it() The member function tt(popSource()) is called to pop the previously
pushed buffer from the stack, allowing the scanner to continue its scan just
beyond the just processed tt(#include) directive. The member tt(popSource())
first inspects the size of the tt(d_state) stack: if empty, tt(false) is
returned and the function terminates. If not empty, then the current buffer is
deleted, to be replaced by the state waiting on top of the stack. The file
switch is performed by the tt(yyFlexLexer) members ti(yy_delete_buffer()) and
tt(yy_switch_to_buffer()). Note that tt(yy_delete_buffer()) does em(not) close
the tt(ifstream) does em(not) delete the memory allocated for this stream by
tt(pushSource()). Therefore the tt(delete) is called for the tt(ifstream)
pointer stored at the back of tt(d_fileInfo) to take care of both. Following
this the last tt(FileInfo) entry is removed from tt(d_fileInfo). Finally the
function returns tt(true):
verbinclude(concrete/lexer/scanner/popsource.cc)
it() Two service members are offered: tt(stackTrace()) dumps the names of
the currently pushed files to the standard error stream. It may be called by
exception catchers. Here is its implementation:
verbinclude(concrete/lexer/scanner/stacktrace.cc)
it() tt(lastFile()) returns the name of the currently processed file. It
may be implemented inline:
verbinsert(LAST)(concrete/lexer/scanner/scanner.h)
it() The lexical scanner itself is defined in tt(Scanner::yylex()).
Therefore, tt(int yylex()) must be declared by the class tt(Scanner), as it
overrides tt(FlexLexer)'s virtual member tt(yylex()).
)