This is what the C99 specification demands. And the GLSL specification
says that we should follow the "standard C++" rules for #if condition
expressions rather than the GLSL rules, (which only support a 32-bit
integer).
The operator coverage here is quite complete. The one big thing
missing is that we are not yet doing macro expansion in #if
lines. This makes the whole support fairly useless, so we plan to fix
that shortcoming right away.
So far the only expression implemented is a single integer literal,
but obviously that's easy to extend. Various things including nesting
are tested here.
Previously, we were using the same lexing stack as we use for macro
expansion to also expand macro arguments. Instead, we now do this
earlier by simply recursing over the macro-invocations replacement
list and constructing a new expanded list, (and pushing only *that*
onto the stack).
This is simpler, and also allows us to more easily implement token
pasting in the future.
The last remaining thing here was that when a line ended with a macro,
and the parser looked ahead to the newline token, the lexer was
printing that newline before the parser printed the expansion of the
macro.
The fix is simple, just make the lexer tell the parser that a newline
is needed, and the parser can wait until reducing a production to
print that newline.
With this, we now pass the entire test suite with simply "diff -u", so
we no longer have any diff options hiding whitespace bugs from
us. Hurrah!
This fixes more differences compared to "gcc -E" so removes several
cases of erroneously failing test cases. The implementation isn't very
elegant, but it is functional.
We fix this by moving printing up to the top-level "input" action and
tracking whether a space is needed between one token and the next.
This fixes all actual bugs in test-suite output, but does leave some
tests failing due to differences in the amount of whitespace produced,
(which aren't actual bugs per se).
The fix here is quite simple (and actually only deletes code). When
expanding a macro, we don't return a ',' as a unique token type, but
simply let it fall through to the generic case.
The specification says that commas within a parenthesized group,
(that's not a function-like macro invocation), are passed through
literally and not considered argument separators in any outer macro
invocation.
Add support and a test for this case. This support makes a third
occurrence of the same "FUNC_MACRO (" shift/reduce conflict appear, so
expect that.
This change does introduce a fairly large copy/paste block in the
grammar which is unfortunate. Perhaps if I were more clever I'd find a
way to share the common pieces between argument and argument_or_comma.
The specification of the preprocessor in C99 says that when we see a
macro name that we are already expanding that we refuse to expand it
now, (which we've done for a while), but also that we refuse to ever
expand it later if seen in other contexts at which it would be
legitimate to expand.
We add a test case for that here, and fix it to work. The fix takes
advantage of a new token_t value for tokens and argument words along
with the recently added IDENTIFIER_FINALIZED token type which
instructs the parser to not even look for another expansion.
There's not yet any change in functionality here, (at least according
to the test suite). But we now have the option of specifying a type
for each string in the token list. This will allow us to finalize an
unexpanded macro name so that it won't be subjected to excess
expansion later.
Previously, we would pass original strings back to the original lexer
whenever we needed to re-lex something, (such as an expanded macro or
a macro argument). Now, we instead parse the macro or argument
originally to a string list, and then re-lex by simply returning each
string from this list in turn.
We do this in the recently added glcpp_parser_lex function that sits
on top of the lower-level glcpp_lex that only deals with text.
This doesn't change any behavior (at least according to the existing
test suite which all still passes) but it brings us much closer to
being able to "finalize" an unexpanded macro as required by the
specification.
We rename the generated lexer from yylex to glcpp_lex. Then we
implement our own yylex function in glcpp-parse.y that calls
glcpp_lex. This doesn't change the behavior at all yet, but gives us a
place where we can do implement alternate lexing in the future.
(We want this because instead of re-lexing from strings for macro
expansion, we want to lex from pre-parsed token lists. We need this so
that when we terminate recursion due to an already active macro
expansion, we can ensure that that symbol never gets expanded again
later.)
The support for an object-like amcro within a macro-invocation
argument was also implemented at one level too high in the
grammar. Fortunately, this is a very simple fix.
The previous fix added FUNC_MACRO to a production one higher in teh
grammar than it should have. So it prevented a FUNC_MACRO from
appearing as part of a mutli-token argument rather than just alone as
an argument. Fix this (and add a test).
This adds a second shift/reduce conflict to our grammar. It's basically the
same conflict we had previously, (deciding to shift a '(' after a FUNC_MACRO)
but this time in the "argument" context rather than the "content" context.
It would be nice to not have these, but I think they are unavoidable
(withotu a lot of pain at least) given the preprocessor specification.
This case worked previously, but broke in the recent rewrite of
function- like macro expansion. The recursion was still terminated
correctly, but any parenthesized expression after the macro name was
still being swallowed even though the identifier was not being
expanded as a macro.
The fix is to notice earlier that the identifier is an
already-expanding macro. We let the lexer know this through the
classify_token function so that an already-expanding macro is lexed as
an identifier, not a FUNC_MACRO.
The rewrite her discards the functions that did direct, recursive
expansion of macro values. Instead, the parser now pushes the macro
definition string over to a stack of buffers for the lexer. This way,
macro expansion gets access to all parsing machinery.
This isn't a small change, but the result is simpler than before (I
think). It passes the entire test suite, including the four tests
added with the previous commit that were failing before.
The test has a newline before the left parenthesis, and newlines to
separate the parentheses from the argument.
The fix involves more state in the lexer to only return a NEWLINE
token when termniating a directive. This is very similar to our
previous fix with extra lexer state to only return the SPACE token
when it would be significant for the parser.
With this change, the exact number and positioning of newlines in the
output is now different compared to "gcc -E" so we add a -B option to
diff when testing to ignore that.
The most recent fix to the parser introduced a shift/reduce
conflict. We document this conflict here, and tell bison that it need
not report it (since I verified that it's being resolved in the
direction desired).
For the record, I did write additional lexer code to eliminate this
conflict, but it was quite fragile, (would not accept a newline
between a function-like macro name and the left parenthesis, for
example).
That is, when a function-like macro appears in the content without
parentheses it should be accepted and passed on through, (previously
the parser was regarding this as a syntax error).
The test case here is simply "#define foo foo" and "#define bar foo"
and then attempting to expand "bar".
Previously, our termination condition for the recursion was overly
simple---just looking for the single identifier that began the
expansion. We now fix this to maintain a stack of identifiers and
terminate when any one of them occurs in the replacement list.
This reverts the unconditional return of SPACE tokens from the lexer
from commit 48b94da099 .
That commit seemed useful because it kept the lexer simpler, but the
presence of SPACE tokens is causing lots of extra complication for the
parser itself, (redundant productions other than whitespace
differences, several productions buggy in the case of extra
whitespace, etc.)
Of course, we'd prefer to never have any whitespace token, but that's
not possible with the need to distinguish between "#define foo()" and
"#define foo ()". So we'll accept a little bit of pain in the lexer,
(enough state to support this special-case token), in exchange for
keeping most of the parser blissffully ignorant of whether tokens are
separated by whitespace or not.
This change does mean that our output now differs from that of "gcc -E",
but only in whitespace. So we test with "diff -w now to ignore those
differences.
We were correctly parsing this already, but simply not returning any
value (for no good reason). Fortunately the fix is quite simple.
This makes the test added in the previous commit now pass.
We provide for this by changing the value of the argument-list
production from a list of strings (string_list_t) to a new
data-structure that holds a list of lists of strings
(argument_list_t).
Then we print the final string list up at the top-level content
production along with all other printing.
Additionally, having macro-expansion productions that create values
will make it easier to solve problems like composed function-like
macro invocations in the future.
Previously, printing was occurring all over the place. Here we
document that it should all be happening at the top-level content
production, and we move the printing of directive newlines.
The printing of expanded macros is still happening in lower-level
productions, but we plan to fix that soon.
Instead of "parameter_list" and "replacement_list" just use
"parameters" and "replacements". This is consistent with the existing
"arguments" and keeps the line length down in the face of the
now-longer "string_list_t" rather than "list_t".
It seems strange to always be returning SPACE tokens, but since we
were already needing to return a SPACE token in some cases, this
actually simplifies our lexer.
This also allows us to fix two whitespace-handling differences
compared to "gcc -E" so that now the recent modification to the test
suite passes once again.
Previously our parser was incorrectly treating this case as a
function-like macro. We fix this by conditionally passing a SPACE
token from the lexer, (but only immediately after the identifier
immediately after #define).
Previously, an empty argument could be parsed as either an "argument_list"
directly or first as an "argument" and then an "argument_list".
We fix this by removing the possibility of an empty "argument_list"
directly.
We accept the structure of arguments in both macro definition and
macro invocation, but we don't yet expand those arguments. This is
just enough code to pass the recently-added tests, but does not yet
provide any sort of useful function-like macro.
This is just a minor style improvement for now. But the same
mechanism, (having the lexer peek into the table of defined macros),
will be essential when we add function-like macros in addition to the
current object-like macros.
Previously we had two copies of all top-level actions, (once in a list
context and once in a non-list context). Much simpler to instead have
a single list-context production with no action and then only have the
actions in their own non-list contexts.
We are able to remove all state by simply passing NEWLINE through
as a token unconditionally (as opposed to only passing newline when
on a driective line as we did previously).
This isn't ideal for two reasons:
1. There's a bunch of stateful redundancy in the lexer that should be
cleaned up.
2. The hash table does not provide a mechanism to delete an entry, so
we waste memory to add a new NULL entry in front of the existing
entry with the same key.
But this does at least work, (it passes the recently added undef test
case).
The lexer was previously using strdup (expecting the parser to free),
but is now more consistent, easier to use, and slightly more efficent
by using talloc along with the parser.
Also, we add xtalloc and xtalloc_strdup wrappers around talloc and
talloc_strdup to put all of the out-of-memory-checking code in one
place.
We now store a list of tokens in our hash-table rather than a single
string. This lets us replace each macro in the value as necessary.
This code adds a link dependency on talloc which does exactly what we
want in terms of memory management for a parser.
The 3 tests added in the previous commit now pass.
The fix is as simple as adding a loop to continue to lookup values
in the hash table until one of the following termination conditions:
1. The token we look up has no definition
2. We get back the original symbol we started with
This second termination condition prevents infinite iteration.