This fixes more differences compared to "gcc -E" so removes several
cases of erroneously failing test cases. The implementation isn't very
elegant, but it is functional.
The specification of the preprocessor in C99 says that when we see a
macro name that we are already expanding that we refuse to expand it
now, (which we've done for a while), but also that we refuse to ever
expand it later if seen in other contexts at which it would be
legitimate to expand.
We add a test case for that here, and fix it to work. The fix takes
advantage of a new token_t value for tokens and argument words along
with the recently added IDENTIFIER_FINALIZED token type which
instructs the parser to not even look for another expansion.
There's not yet any change in functionality here, (at least according
to the test suite). But we now have the option of specifying a type
for each string in the token list. This will allow us to finalize an
unexpanded macro name so that it won't be subjected to excess
expansion later.
Previously, we would pass original strings back to the original lexer
whenever we needed to re-lex something, (such as an expanded macro or
a macro argument). Now, we instead parse the macro or argument
originally to a string list, and then re-lex by simply returning each
string from this list in turn.
We do this in the recently added glcpp_parser_lex function that sits
on top of the lower-level glcpp_lex that only deals with text.
This doesn't change any behavior (at least according to the existing
test suite which all still passes) but it brings us much closer to
being able to "finalize" an unexpanded macro as required by the
specification.
We rename the generated lexer from yylex to glcpp_lex. Then we
implement our own yylex function in glcpp-parse.y that calls
glcpp_lex. This doesn't change the behavior at all yet, but gives us a
place where we can do implement alternate lexing in the future.
(We want this because instead of re-lexing from strings for macro
expansion, we want to lex from pre-parsed token lists. We need this so
that when we terminate recursion due to an already active macro
expansion, we can ensure that that symbol never gets expanded again
later.)
The rewrite her discards the functions that did direct, recursive
expansion of macro values. Instead, the parser now pushes the macro
definition string over to a stack of buffers for the lexer. This way,
macro expansion gets access to all parsing machinery.
This isn't a small change, but the result is simpler than before (I
think). It passes the entire test suite, including the four tests
added with the previous commit that were failing before.
We provide for this by changing the value of the argument-list
production from a list of strings (string_list_t) to a new
data-structure that holds a list of lists of strings
(argument_list_t).
We accept the structure of arguments in both macro definition and
macro invocation, but we don't yet expand those arguments. This is
just enough code to pass the recently-added tests, but does not yet
provide any sort of useful function-like macro.
This is just a minor style improvement for now. But the same
mechanism, (having the lexer peek into the table of defined macros),
will be essential when we add function-like macros in addition to the
current object-like macros.
The lexer was previously using strdup (expecting the parser to free),
but is now more consistent, easier to use, and slightly more efficent
by using talloc along with the parser.
Also, we add xtalloc and xtalloc_strdup wrappers around talloc and
talloc_strdup to put all of the out-of-memory-checking code in one
place.
We now store a list of tokens in our hash-table rather than a single
string. This lets us replace each macro in the value as necessary.
This code adds a link dependency on talloc which does exactly what we
want in terms of memory management for a parser.
The 3 tests added in the previous commit now pass.
Most of the current problems were (mostly) harmless things like
missing declarations, but there was at least one real error, (reversed
argument order for yyerrror).