some more doc on dictionary and expected regexp syntax

This commit is contained in:
Antoine Fraboulet 2005-04-19 16:25:06 +00:00
parent 1077565ace
commit b1e67d947b
2 changed files with 22 additions and 8 deletions

View file

@ -73,7 +73,7 @@ The header structure is the following:
#define _COMPIL_KEYWORD_ "_COMPILED_DICTIONARY_" #define _COMPIL_KEYWORD_ "_COMPILED_DICTIONARY_"
typedef struct _Dict_header { typedef struct _Dict_header { // offset
char ident[sizeof(_COMPIL_KEYWORD_)]; // 0x00 char ident[sizeof(_COMPIL_KEYWORD_)]; // 0x00
char unused_1; // 0x16 char unused_1; // 0x16
char unused_2; // 0x17 char unused_2; // 0x17
@ -82,7 +82,7 @@ typedef struct _Dict_header {
unsigned int edgesused; // 0x20 unsigned int edgesused; // 0x20
unsigned int nodesused; // 0x24 unsigned int nodesused; // 0x24
unsigned int nodessaved; // 0x2c unsigned int nodessaved; // 0x2c
unsigned int edgessaved; // 0x28 unsigned int edgessaved; // 0x30
} Dict_header; } Dict_header;
binary output of the header: binary output of the header:
@ -98,11 +98,11 @@ binary output of the header:
0x2c edges saved : 1 00000001 0x2c edges saved : 1 00000001
=================================================================== ===================================================================
The real array of data begins at offset 0x30. Integer are stored in a The real array of data begins at offset 0x34. Integer are stored in a
machine dependent way. This dictionary was compiled on a i386 and is machine dependent way. This dictionary was compiled on a i386 and is
not readable on a machine with a different endianess. The array is not readable on a machine with a different endianess (unless swapping
stored 'as is' right after the header. Each array cell is a all necessary information). The array is stored 'as is' right after
bit-structure: the header. Each array cell is a bit-structure on 4 bytes :
typedef struct _Dawg_edge { typedef struct _Dawg_edge {
unsigned int ptr : 24; unsigned int ptr : 24;
@ -115,8 +115,8 @@ typedef struct _Dawg_edge {
Characters are not stored in ASCII. The order is preserved but Characters are not stored in ASCII. The order is preserved but
we changed the values: A=1, B=2, ... This is very easy to do we changed the values: A=1, B=2, ... This is very easy to do
with the ASCII table as ('A' & 0x1f) == ('a' & 0x1f) == 1. with the ASCII table as ('A' & 0x1f) == ('a' & 0x1f) == 1.
This may not work on machines that are not using ASCII (like This may not work on machines that are not using ASCII. The dictionary
Macintosh.) can thus handle up to 32 different letters but not more.
offs binary structure offs binary structure
---- -------- | ------------------ ---- -------- | ------------------

View file

@ -8,6 +8,10 @@ expressions rationnelles habituelles.
\section{utilisation} \section{utilisation}
Les mots recherchés sont complets : la recherche d'une expression
\verb=e= correspond à l'expression \verb=^e$=. Pour rechercher un
motif \verb=m= dans un mot il faut donc utiliser l'expression \verb=.*m.*=
\subsection{caractères} \subsection{caractères}
\begin{itemize} \begin{itemize}
@ -15,6 +19,8 @@ expressions rationnelles habituelles.
\item \texttt{.} : n'importe quel caractère \item \texttt{.} : n'importe quel caractère
\item \texttt{:v:} : n'importe quelle voyelle \item \texttt{:v:} : n'importe quelle voyelle
\item \texttt{:c:} : n'importe quelle consonne \item \texttt{:c:} : n'importe quelle consonne
\item \texttt{:1:} : liste 1 définie par l'utilisateur
\item \texttt{:2:} : liste 2 définie par l'utilisateur
\end{itemize} \end{itemize}
\subsection{répétitions} \subsection{répétitions}
@ -38,4 +44,12 @@ expressions rationnelles habituelles.
\subsection{exemples} \subsection{exemples}
\begin{itemize}
\item \verb=a.*= : liste des mots débutant par la lettre \verb=a=
\item \verb=.*a= : liste des mots se terminant par la lettre \verb=a=
\item \verb=.*oula.*= : liste des mots contenant le motif \verb=oula=
\item \verb=a.*b= : liste des mots débutant par \verb=a= et se terminant par \verb=b=
\item \verb=.*a.*e.*i.*o.*u.*= : liste des mots contenant les les lettres \verb=aeiou= dans l'ordre
\end{itemize}
\end{document} \end{document}