The take-2 branch started over with a new grammar based directly on
the grammar from the C99 specification. It doesn't try to capture
things like balanced sets of parentheses for macro arguments in the
grammar. Instead, it merely captures things as token lists and then
performs operations like parsing arguments and expanding macros on
those lists.
We merge it here since it's currently behaving better, (passing the
entire test suite). But the code base has proven quite fragile
really. Several of the recently added test cases required additional
special cases in the take-2 branch while working trivially on master.
So this merge point may be useful in the future, since we might have a
cleaner code base by coming back to the state before this merge and
fixing it, rather than accepting all the fragile
imperative/list-munging code from the take-2 branch.
The 071-punctuator test is failing only trivially (whitespace change only).
And the 072-token-pasting-same-line.c test passes just fine here, (more
evidence perhaps that the approach in take-2 is more trouble than it's
worth?).
The 099-c99-example test case is the inspiration for much of the rest
of the test suite. It amazingly passes on the take-2 branch, but
doesn't pass here yet.
The list replacement when token pasting was broken, (failing to
properly update the list's tail pointer). Also, memory management when
pasting was broken, (modifying the original token's string which would
cause problems with multiple calls to a macro which pasted a literal
string). We didn't catch this with previous tests because they only
pasted argument values.
Previously '=' was not included in our PUNCTUATION regeular expression,
but it *was* excldued from our OTHER regular expression, so we were
getting the default (and hamful) lex action of just printing it.
The test we add here is named "punctuator" with the idea that we can
extend it as needed for other punctuator testing.
This isn't a problem here, but on the take-2 branch, it was trickier
at one point to make a non-macro work when the last token of the file.
So we use the simpler test case here and defer the other case until
later.
We take the results of macro expansion and splice them into the
original token list over which we are iterating. This makes it easy
for function-like macro invocations to find their arguments since they
are simply subsequent tokens on the list.
This fixes the recently-introduced regressions (tests 55 and 56) and
also passes new tests 60 and 61 introduced to strees this feature,
(with macro-argument parentheses split between a macro value and the
textual input).
We previously had a confusing thing where _expand_token_onto would
return a non-zero value to indicate that the caller should then call
_expand_function_onto. It's much cleaner for _expand_token_onto to
just do what's needed and call the necessary function.
This behavior was useful when starting the implementation over
("take-2") where the whole test suite was failing. This made it easy
to focus on one test at a time and get each working.
More recently, we got the whole suite working, so we don't need this
feature anymore. And in the previous commit, we regressed a couple of
tests, so it's nice to be able to see all the failures with a single
run of the suite.
Recently I'm seeing cases where "gcc -E" mysteriously omits blank
lines, (even though it prints the blank lines in other very similar
cases). Rather than trying to decipher and imitate this, just get rid
of the blank lines.
This approach with sed to kill the lines before the diff is better
than "diff -B" since when there is an actual difference, the presence
of blank lines won't make the diff harder to read.
This test was tricky to make pass in the take-2 branch. It ends up
passing already here with no additional effort, (since we are lexing
integers as string-valued token except when in the ST_IF state in the
lexer anyway).
To do this correctly, we change the lexer to lex integers as string values,
(new token type of INTEGER_STRING), and only convert to integer values when
evaluating an expression value.
Add a new test case for this, (which does pass now).
For this we always add a new argument to the argument list as soon as
possible, without waiting until we see some argument token. This does
mean we need to take some extra care when comparing the number of
arguments with the number of expected arguments. In addition to
matching numbers, we also support one (empty) argument when zero
arguments are expected.
Add a test case here for this, which does pass.
This just makes these functions easier to understand all around. In
the case of _token_list_append_list this is an actual bug fix, (where
append an empty list onto a non-empty list would previously scramble
the tail pointer of the original list).
That is, a function-like invocation foo(x) is valid as a
single-argument invocation even if 'x' is a macro that expands into a
value with a comma. Add a new COMMA_FINAL token type to handle this,
and add a test for this case, (which passes).
That is, the following case:
#define foo(x) (x)
#define bar
bar(baz)
which now works with this (ugly) commit.
I definitely want to come up with something cleaner than this.
This adds three new pieces of state to the parser, (is_control_line,
newline_as_space, and paren_count), and a large amount of messy
code. I'd definitely like to see a cleaner solution for this.
With this fix, the "define-func-extra-newlines" now passes so we put
it back to test #26 where it was originally (lately it has been known
as test #55).
Also, we tweak test 25 slightly. Previously this test was ending a
file function-like macro name that was not actually a macro (not
followed by a left parenthesis). As is, this fix was making that test
fail because the text_line production expects to see a terminating
NEWLINE, but that NEWLINE is now getting turned into a SPACE here.
This seems unlikely to be a problem in the wild, (function macros
being used in a non-macro sense seems rare enough---but more than
likely they won't happen at the end of a file). Still, we document
this shortcoming in the README.
Previously, the syntax was (array_ref <variable name> <index>), but the
subject is now a general rvalue (not a name). In particular, it might
be a (var_ref ...).
Also, remove "expected ... or (swiz)" from error messages; swiz is not
allowed inside a var_ref.
Array dereferences now point to variable dereferences instead of
pointing directly to variables. This necessitated some changes to the
way the variable is accessed when setting the maximum index array element.
Move the accept method for hierarchical visitors from ir_dereference
to the derived classes. This was mostly straight-forward, but I
suspect that ir_dead_code_local may be broken now.
Create separate subclasses of ir_dereference for variable, array, and
record dereferences. As a side effect, array and record dereferences
no longer point to ir_variable objects directly. Instead they each
point to an ir_dereference_variable object.
This is the first of several steps in the refactoring process. The
intention is that ir_dereference will eventually become an abstract
base class.
To do this we have split the existing "HASH_IF expression" into two
productions:
First is HASH_IF pp_tokens which simply constructs a list of tokens.
Then, with that resulting token list, we first evaluate all DEFINED
operator tokens, then expand all macros, and finally start lexing from
the resulting token list. This brings us to the second production,
IF_EXPANDED expression
This final production works just like our previous "HASH_IF
expression", evaluating a constant integer expression.
The new test (54) added for this case now passes.