The test has a newline before the left parenthesis, and newlines to
separate the parentheses from the argument.
The fix involves more state in the lexer to only return a NEWLINE
token when termniating a directive. This is very similar to our
previous fix with extra lexer state to only return the SPACE token
when it would be significant for the parser.
With this change, the exact number and positioning of newlines in the
output is now different compared to "gcc -E" so we add a -B option to
diff when testing to ignore that.
The most recent fix to the parser introduced a shift/reduce
conflict. We document this conflict here, and tell bison that it need
not report it (since I verified that it's being resolved in the
direction desired).
For the record, I did write additional lexer code to eliminate this
conflict, but it was quite fragile, (would not accept a newline
between a function-like macro name and the left parenthesis, for
example).
This has the added advantage that it will stop traversing the tree as
soon as the first call is found.
The output of all test cases was verified to be the same using diff.
That is, when a function-like macro appears in the content without
parentheses it should be accepted and passed on through, (previously
the parser was regarding this as a syntax error).
The test case here is simply "#define foo foo" and "#define bar foo"
and then attempting to expand "bar".
Previously, our termination condition for the recursion was overly
simple---just looking for the single identifier that began the
expansion. We now fix this to maintain a stack of identifiers and
terminate when any one of them occurs in the replacement list.
The first bug was not allowing whitespace between '#' and the
directive name.
The second bug was swallowing a terminating newline along with any
trailing whitespace on a line.
With these two fixes, and the previous commit to stop emitting SPACE
tokens, the recently added extra-whitespace test now passes.
This reverts the unconditional return of SPACE tokens from the lexer
from commit 48b94da099 .
That commit seemed useful because it kept the lexer simpler, but the
presence of SPACE tokens is causing lots of extra complication for the
parser itself, (redundant productions other than whitespace
differences, several productions buggy in the case of extra
whitespace, etc.)
Of course, we'd prefer to never have any whitespace token, but that's
not possible with the need to distinguish between "#define foo()" and
"#define foo ()". So we'll accept a little bit of pain in the lexer,
(enough state to support this special-case token), in exchange for
keeping most of the parser blissffully ignorant of whether tokens are
separated by whitespace or not.
This change does mean that our output now differs from that of "gcc -E",
but only in whitespace. So we test with "diff -w now to ignore those
differences.
We were correctly parsing this already, but simply not returning any
value (for no good reason). Fortunately the fix is quite simple.
This makes the test added in the previous commit now pass.
The macro invocation is defined to consume all text between a set of
matched parentheses. We previously tested for inner parentheses from a
nested function-like macro invocation. Here we test for inner
parentheses occuring on their own, (not part of another macro
invocation).
We provide for this by changing the value of the argument-list
production from a list of strings (string_list_t) to a new
data-structure that holds a list of lists of strings
(argument_list_t).
Then we print the final string list up at the top-level content
production along with all other printing.
Additionally, having macro-expansion productions that create values
will make it easier to solve problems like composed function-like
macro invocations in the future.
Previously, printing was occurring all over the place. Here we
document that it should all be happening at the top-level content
production, and we move the printing of directive newlines.
The printing of expanded macros is still happening in lower-level
productions, but we plan to fix that soon.
Instead of "parameter_list" and "replacement_list" just use
"parameters" and "replacements". This is consistent with the existing
"arguments" and keeps the line length down in the face of the
now-longer "string_list_t" rather than "list_t".