Commit graph

141 commits

Author SHA1 Message Date
Carl Worth
2571415d1a Implement comment handling in the lexer (with test).
We support both single-line (//) and multi-line (/* ... */) comments
and add a test for this, (trying to stress the rules just a bit by
embedding one comment delimiter into a comment delimited with the
other style, etc.).

To keep the test suite passing we do now discard any output lines from
glcpp that consist only of spacing, (in addition to blank lines as
previously). We also discard any initial whitespace from gcc output.
In neither case should the absence or presence of this whitespace
affect correctness.
2010-06-01 12:18:43 -07:00
Carl Worth
a771a40e22 Fix #if-skipping to *really* skip the skipped group.
Previously we were avoiding printing within a skipped group, but we
were still evluating directives such as #define and #undef and still
emitting diagnostics for things such as macro calls with the wrong
number of arguments.

Add a test for this and fix it with a high-priority rule in the lexer
that consumes the skipped content.
2010-06-01 11:23:08 -07:00
Carl Worth
96d3994881 Merge branch 'take-2'
The take-2 branch started over with a new grammar based directly on
the grammar from the C99 specification. It doesn't try to capture
things like balanced sets of parentheses for macro arguments in the
grammar. Instead, it merely captures things as token lists and then
performs operations like parsing arguments and expanding macros on
those lists.

We merge it here since it's currently behaving better, (passing the
entire test suite). But the code base has proven quite fragile
really. Several of the recently added test cases required additional
special cases in the take-2 branch while working trivially on master.

So this merge point may be useful in the future, since we might have a
cleaner code base by coming back to the state before this merge and
fixing it, rather than accepting all the fragile
imperative/list-munging code from the take-2 branch.
2010-05-29 06:03:40 -07:00
Carl Worth
ae3fb09cd2 Add three more tests cases recently added to the take-2 branch.
The 071-punctuator test is failing only trivially (whitespace change only).

And the 072-token-pasting-same-line.c test passes just fine here, (more
evidence perhaps that the approach in take-2 is more trouble than it's
worth?).

The 099-c99-example test case is the inspiration for much of the rest
of the test suite. It amazingly passes on the take-2 branch, but
doesn't pass here yet.
2010-05-29 06:01:32 -07:00
Carl Worth
75ef1c75dd Add killer test case from the C99 specification.
Happily, this passes now, (since many of the previously added test
cases were extracted from this one).
2010-05-29 05:57:22 -07:00
Carl Worth
b06096e86e Add test and fix bugs with multiple token-pasting on the same line.
The list replacement when token pasting was broken, (failing to
properly update the list's tail pointer). Also, memory management when
pasting was broken, (modifying the original token's string which would
cause problems with multiple calls to a macro which pasted a literal
string). We didn't catch this with previous tests because they only
pasted argument values.
2010-05-29 05:54:19 -07:00
Carl Worth
631016946c Fix pass-through of '=' and add a test for it.
Previously '=' was not included in our PUNCTUATION regeular expression,
but it *was* excldued from our OTHER regular expression, so we were
getting the default (and hamful) lex action of just printing it.

The test we add here is named "punctuator" with the idea that we can
extend it as needed for other punctuator testing.
2010-05-29 05:07:24 -07:00
Carl Worth
614a9aece0 Add two more (failing) tests from the take-2 branch.
These tests were recently fixed on the take-2 branch, but will require
additional work before they will pass here.
2010-05-28 15:15:59 -07:00
Carl Worth
b1249f69fd Add two (passing) tests from the take-2 branch.
These two tests were tricky to make work on take-2, but happen to
already eb working here.
2010-05-28 15:15:00 -07:00
Carl Worth
792bdcbeee Tweak test 25 slightly, (so the non-macro doesn't end the file).
This isn't a problem here, but on the take-2 branch, it was trickier
at one point to make a non-macro work when the last token of the file.

So we use the simpler test case here and defer the other case until
later.
2010-05-28 15:13:11 -07:00
Carl Worth
c7144dc2e0 Remove some blank lines from the end of some test cases.
To match what we have done on the take-2 branch to these test cases.
2010-05-28 15:12:36 -07:00
Carl Worth
681afbc855 Perform macro by replacing tokens in original list.
We take the results of macro expansion and splice them into the
original token list over which we are iterating. This makes it easy
for function-like macro invocations to find their arguments since they
are simply subsequent tokens on the list.

This fixes the recently-introduced regressions (tests 55 and 56) and
also passes new tests 60 and 61 introduced to strees this feature,
(with macro-argument parentheses split between a macro value and the
textual input).
2010-05-28 15:10:27 -07:00
Carl Worth
3c93d39705 Simplify calling conventions of functions under expand_token_list_onto.
We previously had a confusing thing where _expand_token_onto would
return a non-zero value to indicate that the caller should then call
_expand_function_onto. It's much cleaner for _expand_token_onto to
just do what's needed and call the necessary function.
2010-05-28 08:17:46 -07:00
Carl Worth
9b519f9c79 Stop interrupting the test suite at the first failure.
This behavior was useful when starting the implementation over
("take-2") where the whole test suite was failing. This made it easy
to focus on one test at a time and get each working.

More recently, we got the whole suite working, so we don't need this
feature anymore. And in the previous commit, we regressed a couple of
tests, so it's nice to be able to see all the failures with a single
run of the suite.
2010-05-28 08:04:13 -07:00
Carl Worth
95ec433d59 Revert "Add support for an object-to-function chain with the parens in the content."
This reverts commit 7db2402a80

It doesn't revert the new test case from that commit, just the
extremely ugly second-pass implementation.
2010-05-28 08:02:07 -07:00
Carl Worth
baa17c8748 Remove blank lines from output files before comparing.
Recently I'm seeing cases where "gcc -E" mysteriously omits blank
lines, (even though it prints the blank lines in other very similar
cases). Rather than trying to decipher and imitate this, just get rid
of the blank lines.

This approach with sed to kill the lines before the diff is better
than "diff -B" since when there is an actual difference, the presence
of blank lines won't make the diff harder to read.
2010-05-27 14:53:51 -07:00
Carl Worth
886e05a35a Add test for token-pasting of integers.
This test was tricky to make pass in the take-2 branch. It ends up
passing already here with no additional effort, (since we are lexing
integers as string-valued token except when in the ST_IF state in the
lexer anyway).
2010-05-27 14:45:20 -07:00
Carl Worth
050e3ded1e Implement token pasting of integers.
To do this correctly, we change the lexer to lex integers as string values,
(new token type of INTEGER_STRING), and only convert to integer values when
evaluating an expression value.

Add a new test case for this, (which does pass now).
2010-05-27 14:38:20 -07:00
Carl Worth
85b50e840d Add placeholder tokens to support pasting with empty arguments.
Along with a passing test to verify that this works.
2010-05-27 14:01:18 -07:00
Carl Worth
fb48fcdf9b Add test for macro invocations with empty arguments.
This case was recently solved on the take-2 branch.
2010-05-27 13:44:13 -07:00
Carl Worth
a19297b26e Provide support for empty arguments in macro invocations.
For this we always add a new argument to the argument list as soon as
possible, without waiting until we see some argument token. This does
mean we need to take some extra care when comparing the number of
arguments with the number of expected arguments. In addition to
matching numbers, we also support one (empty) argument when zero
arguments are expected.

Add a test case here for this, which does pass.
2010-05-27 13:29:19 -07:00
Carl Worth
a65cf7b1d2 Make two list-processing functions do nothing with an empty list.
This just makes these functions easier to understand all around.  In
the case of _token_list_append_list this is an actual bug fix, (where
append an empty list onto a non-empty list would previously scramble
the tail pointer of the original list).
2010-05-27 11:55:36 -07:00
Carl Worth
602a34769a Add test 56 for a comma within the expansion of an argument.
This case was tricky on the take-2 branch. It happens to be passing already
here.
2010-05-27 10:14:38 -07:00
Carl Worth
dd7490093d Avoid treating an expanded comma as an argument separator.
That is, a function-like invocation foo(x) is valid as a
single-argument invocation even if 'x' is a macro that expands into a
value with a comma. Add a new COMMA_FINAL token type to handle this,
and add a test for this case, (which passes).
2010-05-27 10:12:33 -07:00
Carl Worth
7db2402a80 Add support (and test) for an object-to-function chain with the parens in the content.
That is, the following case:

	#define foo(x) (x)
	#define bar
	bar(baz)

which now works with this (ugly) commit.

I definitely want to come up with something cleaner than this.
2010-05-26 17:01:57 -07:00
Carl Worth
a8ea26d7c9 Add two tests developed on the take-2 branch.
The define-chain-obj-to-func-parens-in-text test passes here while the
if-with-macros test fails.
2010-05-26 16:18:05 -07:00
Carl Worth
95951ea7bb Treat newlines as space when invoking a function-like macro invocation.
This adds three new pieces of state to the parser, (is_control_line,
newline_as_space, and paren_count), and a large amount of messy
code. I'd definitely like to see a cleaner solution for this.

With this fix, the "define-func-extra-newlines" now passes so we put
it back to test #26 where it was originally (lately it has been known
as test #55).

Also, we tweak test 25 slightly. Previously this test was ending a
file function-like macro name that was not actually a macro (not
followed by a left parenthesis). As is, this fix was making that test
fail because the text_line production expects to see a terminating
NEWLINE, but that NEWLINE is now getting turned into a SPACE here.

This seems unlikely to be a problem in the wild, (function macros
being used in a non-macro sense seems rare enough---but more than
likely they won't happen at the end of a file). Still, we document
this shortcoming in the README.
2010-05-26 16:04:31 -07:00
Carl Worth
0324cad796 All macro lookups should be of type macro_t, not string_list_t.
This is what I get for using a non-type-safe hash-table implementation.
2010-05-26 15:53:05 -07:00
Carl Worth
8e82fcb070 Implement (and test) support for macro expansion within conditional expressions.
To do this we have split the existing "HASH_IF expression" into two
productions:

First is HASH_IF pp_tokens which simply constructs a list of tokens.

Then, with that resulting token list, we first evaluate all DEFINED
operator tokens, then expand all macros, and finally start lexing from
the resulting token list. This brings us to the second production,
IF_EXPANDED expression

This final production works just like our previous "HASH_IF
expression", evaluating a constant integer expression.

The new test (54) added for this case now passes.
2010-05-26 11:15:21 -07:00
Carl Worth
16c1e980e2 Fix lexing of "defined" as an operator, not an identifier.
Simply need to move the rule for IDENTIFIER to be after "defined" and
everything is happy.

With this change, tests 50 through 53 all pass now.
2010-05-26 09:37:14 -07:00
Carl Worth
f6914fd37b Implement #if and friends.
With this change, tests 41 through 49 all pass. (The defined operator
appears to be somehow broken so that test 50 doesn't pass yet.)
2010-05-26 09:33:23 -07:00
Carl Worth
8fed1cddae stash 2010-05-26 09:32:12 -07:00
Carl Worth
ad0dee6bb0 Implement token pasting.
Which makes test 40 now pass.
2010-05-26 09:04:50 -07:00
Carl Worth
ce540f2571 Rename identifier from 'i' to 'node'.
Now that we no longer have nested for loops with 'i' and 'j' we can
use the 'node' that we already have.
2010-05-26 08:30:36 -07:00
Carl Worth
63909fc196 Remove some stale token types.
All the code referencing these was removed some time ago.
2010-05-26 08:16:56 -07:00
Carl Worth
ec4ada01c0 Prevent unexpanded macros from being expanded again in the future.
With this fix, tests 37 - 39 now pass.
2010-05-26 08:15:49 -07:00
Carl Worth
c9dcc08d45 README: Document some known limitations.
None of these are fundamental---just a few things that haven't been
implemented yet.
2010-05-26 08:11:08 -07:00
Carl Worth
b1ae61a2ee Fix a typo in a comment.
Always better to use proper grammar in our grammar.
2010-05-26 08:10:38 -07:00
Carl Worth
d5cd40343f Expand macro arguments before performing argument substitution.
As required by the C99 specification of the preprocessor.

With this fix, tests 33 through 36 now pass.
2010-05-26 08:09:29 -07:00
Carl Worth
0197e9b64f Change macro expansion to append onto token lists rather than printing directly.
This doesn't change any functionality here, but will allow us to make
future changes that were not possible with direct printing.
Specifically, we need to expand macros within macro arguments before
performing argument substitution. And *that* expansion cannot result
in immediate printing.
2010-05-26 08:05:55 -07:00
Carl Worth
c0607d573e Check active expansions before expanding a function-like macro invocation.
With this fix, test 32 no longer recurses infinitely, but now passes.
2010-05-26 08:01:42 -07:00
Carl Worth
039739b2da Defer test 26 until much later (to test 55).
Supporting embedded newlines in a macro invocation is going to be
tricky with our current approach to lexing and parsing. Since this
isn't really an important feature for us, we can defer this until more
important things are resolved.

With this test out of the way, tests 27 through 31 are passing.
2010-05-26 08:00:43 -07:00
Carl Worth
10ae438399 Avoid getting extra trailing whitespace from macros.
This trailing whitespace was coming from macro definitions and from
macro arguments. We fix this with a little extra state in the
token_list. It now remembers the last non-space token added, so that
these can be trimmed off just before printing the list.

With this fix test 23 now passes. Tests 24 and 25 are also passing,
but they probbably would ahve before this fix---just that they weren't
being run earlier.
2010-05-25 20:39:33 -07:00
Carl Worth
5aa7ea0809 Remove a bunch of old code and give the static treatment to what's left.
We're no longer using the expansion stack, so its functions can go
along with most of the body of glcpp_parser_lex that was using it.
2010-05-25 18:39:43 -07:00
Carl Worth
652fa272ea Avoid swallowing initial left parenthesis from nested macro invocation.
We weren't including this left parenthesis in the argument's token
list so the nested function invocation wasn not being recognized.

With this fix, tests 21 and 22 now pass.
2010-05-25 17:45:22 -07:00
Carl Worth
c7581c2e6e Ignore separating whitespace at the beginning of a macro argument.
This causes test 16 to pass. Tests 17-20 are also passing now, (though
they would probably have passed before this change and simply weren't
being run yet).
2010-05-25 17:41:07 -07:00
Carl Worth
9ce18cf983 Implement substitution of function parameters in macro calls.
This makes tests 16 - 19 pass.
2010-05-25 17:32:21 -07:00
Carl Worth
e9397867dd Collapse multiple spaces in input down to a single space.
This is what gcc does, and it's actually less work to do
this. Previously we were having to save the contents of space tokens
as a string, but we don't need to do that now.

We extend test #0 to exercise this feature here.
2010-05-25 17:08:07 -07:00
Carl Worth
f8ec4e0be8 Add a test #0 to ensure that we don't do any inadvertent token pasting.
This simply ensures that spaces in input line are preserved.
2010-05-25 17:06:17 -07:00
Carl Worth
f34a0009dd Pass through literal space values from replacement lists.
This makes test 15 pass and also dramatically simplifies the lexer.

We were previously using a CONTROL state in the lexer to only emit
SPACE tokens when on text lines. But that's not actually what we
want. We need SPACE tokens in the replacement lists as well. Instead
of a lexer state for this, we now simply set a "space_tokens" flag
whenever we start constructing a pp_tokens list and clear the flag
whenever we see a '#' introducing a directive.

Much cleaner this way.
2010-05-25 17:06:08 -07:00