Before the version 2 of TPG, lexers were context sensitive. That means that the parser commands the
lexer to match some tokens, i.e. different tokens can be matched in a same input string according to the
grammar rules being used. These lexers were very flexible but slower than context free lexers because
TPG backtracking caused tokens to be matched several times.
In TPG 2, the lexer is called before the parser and produces a list of tokens from the input string.
This list is then given to the parser. In this case when TPG backtracks the token list remains
unchanged.
Since TPG 2.1.2, context sensitive lexers have been reintroduced in TPG. By default
lexers are context free but the CSL option (see 5.3.2) turns TPG into a context sensitive
lexer.
8.2Grammar structure
CSL grammar have the same structure than non CSL grammars (see 5.1) except from the CSL option
(see 5.3.2).
8.3CSL lexers
8.3.1Regular expression syntax
The CSL lexer is based on the re module. The difference with non CSL lexers is that the given regular
expression is compiled as this, without any encapsulation. Grouping is then possible and
usable.
8.3.2Token definition
In CSL lexers there is no predefined tokens. Tokens are always inlined and there is no precedance issue
since tokens are matched while parsing, when encountered in a grammar rule.
A token definition can be simulated by defining a rule to match a particular token (see
figure 8.1).
Figure 8.1:
Token definition in CSL parsers example
number/int<n> -> '\d+'/n ;
In non CSL parsers there are two kinds of tokens: true tokens and token separators. To declare
separators in CSL parsers you must use the special separator rule. This rule is implicitly used before
matching a token. It is thus necessary to distinguish lexical rules from grammar rules. Lexical rule
declarations start with the lex keyword. In such rules, the separator rule is not called to avoid infinite
recursion (separator calling separator calling separator ...). The figure 8.2 shows a separator declaration
with nested C++ like comments.
In CSL parsers, tokens are matched as in non CSL parsers (see 6.3). There is a special feature in CSL
parsers. The user can benefit from the grouping possibilities of CSL parsers. The text of the
token can be saved with the infix / operator. The groups of the token can also be saved
with the infix // operator. This operator (available only in CSL parsers) returns all the
groups in a tuple. For example, the figure 8.3 shows how to read entire tokens and to split
tokens.
Figure 8.3:
Token usage in CSL parsers examples
lex identifier/i -> '\w+'/s ; # a single identifier
lex string/s -> "'([^\']*)'"//<s> ; # a string without the quotes
lex item/<key,val> -> "(\w+)=(.*)"//<key,val> ; # a tuple (key, value)
8.4CSL parsers
There is no difference between CSL and non CSL parsers except from lexical rules which look like grammar
rules1.