Commit graph

35 commits

Author SHA1 Message Date
Carl Worth
a771a40e22 Fix #if-skipping to *really* skip the skipped group.
Previously we were avoiding printing within a skipped group, but we
were still evluating directives such as #define and #undef and still
emitting diagnostics for things such as macro calls with the wrong
number of arguments.

Add a test for this and fix it with a high-priority rule in the lexer
that consumes the skipped content.
2010-06-01 11:23:08 -07:00
Carl Worth
631016946c Fix pass-through of '=' and add a test for it.
Previously '=' was not included in our PUNCTUATION regeular expression,
but it *was* excldued from our OTHER regular expression, so we were
getting the default (and hamful) lex action of just printing it.

The test we add here is named "punctuator" with the idea that we can
extend it as needed for other punctuator testing.
2010-05-29 05:07:24 -07:00
Carl Worth
050e3ded1e Implement token pasting of integers.
To do this correctly, we change the lexer to lex integers as string values,
(new token type of INTEGER_STRING), and only convert to integer values when
evaluating an expression value.

Add a new test case for this, (which does pass now).
2010-05-27 14:38:20 -07:00
Carl Worth
16c1e980e2 Fix lexing of "defined" as an operator, not an identifier.
Simply need to move the rule for IDENTIFIER to be after "defined" and
everything is happy.

With this change, tests 50 through 53 all pass now.
2010-05-26 09:37:14 -07:00
Carl Worth
8fed1cddae stash 2010-05-26 09:32:12 -07:00
Carl Worth
e9397867dd Collapse multiple spaces in input down to a single space.
This is what gcc does, and it's actually less work to do
this. Previously we were having to save the contents of space tokens
as a string, but we don't need to do that now.

We extend test #0 to exercise this feature here.
2010-05-25 17:08:07 -07:00
Carl Worth
f34a0009dd Pass through literal space values from replacement lists.
This makes test 15 pass and also dramatically simplifies the lexer.

We were previously using a CONTROL state in the lexer to only emit
SPACE tokens when on text lines. But that's not actually what we
want. We need SPACE tokens in the replacement lists as well. Instead
of a lexer state for this, we now simply set a "space_tokens" flag
whenever we start constructing a pp_tokens list and clear the flag
whenever we see a '#' introducing a directive.

Much cleaner this way.
2010-05-25 17:06:08 -07:00
Carl Worth
b1854fdfb6 Implement simplified substitution for function-like macro invocation.
This supports function-like macro invocation but without any argument
substitution. This now makes test 11 through 14 pass.
2010-05-25 16:28:26 -07:00
Carl Worth
9fb8b7a495 Make the lexer pass whitespace through (as OTHER tokens) for text lines.
With this change, we can recreate the original text-line input
exactly. Previously we were inserting a space between every pair of
tokens so our output had a lot more whitespace than our input.

With this change, we can drop the "-b" option to diff and match the
input exactly.
2010-05-25 15:04:32 -07:00
Carl Worth
3ff8167084 Starting over with the C99 grammar for the preprocessor.
This is a fresh start with a much simpler approach for the flex/bison
portions of the preprocessor. This isn't functional yet, (produces no
output), but can at least read all of our test cases without any parse
errors.

The grammar here is based on the grammar provided for the preprocessor
in the C99 specification.
2010-05-25 14:38:15 -07:00
Carl Worth
03f6d5d2d4 Add support for octal and hexadecimal integer literals.
In addition to the decimal literals which we already support. Note
that we use strtoll here to get the large-width integers demanded by
the specification.
2010-05-24 11:29:02 -07:00
Carl Worth
89b933a243 Add the '~' operator to the lexer.
This was simply missing before, (and unnoticed since we had no test of
the '~' operator).
2010-05-24 11:26:42 -07:00
Carl Worth
bcbd587b0f Implement all operators specified for GLSL #if expressions (with tests).
The operator coverage here is quite complete. The one big thing
missing is that we are not yet doing macro expansion in #if
lines. This makes the whole support fairly useless, so we plan to fix
that shortcoming right away.
2010-05-24 10:37:38 -07:00
Carl Worth
b20d33c5c6 Implement #if, #else, #elif, and #endif with tests.
So far the only expression implemented is a single integer literal,
but obviously that's easy to extend. Various things including nesting
are tested here.
2010-05-20 22:27:07 -07:00
Carl Worth
c10a51ba13 Pre-expand macro arguments at time of invocation.
Previously, we were using the same lexing stack as we use for macro
expansion to also expand macro arguments. Instead, we now do this
earlier by simply recursing over the macro-invocations replacement
list and constructing a new expanded list, (and pushing only *that*
onto the stack).

This is simpler, and also allows us to more easily implement token
pasting in the future.
2010-05-20 15:15:26 -07:00
Carl Worth
876e510bda Finish cleaning up whitespace differences.
The last remaining thing here was that when a line ended with a macro,
and the parser looked ahead to the newline token, the lexer was
printing that newline before the parser printed the expansion of the
macro.

The fix is simple, just make the lexer tell the parser that a newline
is needed, and the parser can wait until reducing a production to
print that newline.

With this, we now pass the entire test suite with simply "diff -u", so
we no longer have any diff options hiding whitespace bugs from
us. Hurrah!
2010-05-20 14:38:06 -07:00
Carl Worth
5a6b9a27fd Avoid printing a space at the beginning of lines in the output.
This fixes more differences compared to "gcc -E" so removes several
cases of erroneously failing test cases. The implementation isn't very
elegant, but it is functional.
2010-05-20 14:29:43 -07:00
Carl Worth
b569383bbd Avoid re-expanding a macro name that has once been rejected from expansion.
The specification of the preprocessor in C99 says that when we see a
macro name that we are already expanding that we refuse to expand it
now, (which we've done for a while), but also that we refuse to ever
expand it later if seen in other contexts at which it would be
legitimate to expand.

We add a test case for that here, and fix it to work. The fix takes
advantage of a new token_t value for tokens and argument words along
with the recently added IDENTIFIER_FINALIZED token type which
instructs the parser to not even look for another expansion.
2010-05-20 08:01:44 -07:00
Carl Worth
aaa9acbf10 Perform "re lexing" on string list values rathern than on text.
Previously, we would pass original strings back to the original lexer
whenever we needed to re-lex something, (such as an expanded macro or
a macro argument). Now, we instead parse the macro or argument
originally to a string list, and then re-lex by simply returning each
string from this list in turn.

We do this in the recently added glcpp_parser_lex function that sits
on top of the lower-level glcpp_lex that only deals with text.

This doesn't change any behavior (at least according to the existing
test suite which all still passes) but it brings us much closer to
being able to "finalize" an unexpanded macro as required by the
specification.
2010-05-19 13:28:24 -07:00
Carl Worth
a807fb72c4 Rewrite macro handling to support function-like macro invocation in macro values
The rewrite her discards the functions that did direct, recursive
expansion of macro values. Instead, the parser now pushes the macro
definition string over to a stack of buffers for the lexer. This way,
macro expansion gets access to all parsing machinery.

This isn't a small change, but the result is simpler than before (I
think). It passes the entire test suite, including the four tests
added with the previous commit that were failing before.
2010-05-18 22:10:04 -07:00
Carl Worth
1a29500e72 Fix (and add test for) function-like macro invocation with newlines.
The test has a newline before the left parenthesis, and newlines to
separate the parentheses from the argument.

The fix involves more state in the lexer to only return a NEWLINE
token when termniating a directive. This is very similar to our
previous fix with extra lexer state to only return the SPACE token
when it would be significant for the parser.

With this change, the exact number and positioning of newlines in the
output is now different compared to "gcc -E" so we add a -B option to
diff when testing to ignore that.
2010-05-17 13:21:13 -07:00
Carl Worth
e36a4d5be9 Fix two whitespace bugs in the lexer.
The first bug was not allowing whitespace between '#' and the
directive name.

The second bug was swallowing a terminating newline along with any
trailing whitespace on a line.

With these two fixes, and the previous commit to stop emitting SPACE
tokens, the recently added extra-whitespace test now passes.
2010-05-14 17:29:24 -07:00
Carl Worth
81f01432bd Don't return SPACE tokens unless strictly needed.
This reverts the unconditional return of SPACE tokens from the lexer
from commit 48b94da099 .

That commit seemed useful because it kept the lexer simpler, but the
presence of SPACE tokens is causing lots of extra complication for the
parser itself, (redundant productions other than whitespace
differences, several productions buggy in the case of extra
whitespace, etc.)

Of course, we'd prefer to never have any whitespace token, but that's
not possible with the need to distinguish between "#define foo()" and
"#define foo ()". So we'll accept a little bit of pain in the lexer,
(enough state to support this special-case token), in exchange for
keeping most of the parser blissffully ignorant of whether tokens are
separated by whitespace or not.

This change does mean that our output now differs from that of "gcc -E",
but only in whitespace. So we test with "diff -w now to ignore those
differences.
2010-05-14 17:13:00 -07:00
Carl Worth
48b94da099 Make the lexer return SPACE tokens unconditionally.
It seems strange to always be returning SPACE tokens, but since we
were already needing to return a SPACE token in some cases, this
actually simplifies our lexer.

This also allows us to fix two whitespace-handling differences
compared to "gcc -E" so that now the recent modification to the test
suite passes once again.
2010-05-14 09:48:14 -07:00
Carl Worth
0a93cbbe4f Fix parsing of object-like macro with a definition that begins with '('.
Previously our parser was incorrectly treating this case as a
function-like macro. We fix this by conditionally passing a SPACE
token from the lexer, (but only immediately after the identifier
immediately after #define).
2010-05-14 09:20:13 -07:00
Carl Worth
fcbbb46886 Add support for the structure of function-like macros.
We accept the structure of arguments in both macro definition and
macro invocation, but we don't yet expand those arguments. This is
just enough code to pass the recently-added tests, but does not yet
provide any sort of useful function-like macro.
2010-05-13 09:36:23 -07:00
Carl Worth
9f62a7e9e2 Make the lexer distinguish between identifiers and defined macros.
This is just a minor style improvement for now. But the same
mechanism, (having the lexer peek into the table of defined macros),
will be essential when we add function-like macros in addition to the
current object-like macros.
2010-05-13 07:38:29 -07:00
Carl Worth
012295f94c Simplify lexer significantly (remove all stateful lexing).
We are able to remove all state by simply passing NEWLINE through
as a token unconditionally (as opposed to only passing newline when
on a driective line as we did previously).
2010-05-12 13:20:31 -07:00
Carl Worth
cd27e6413a Add support for the #undef macro.
This isn't ideal for two reasons:

1. There's a bunch of stateful redundancy in the lexer that should be
   cleaned up.

2. The hash table does not provide a mechanism to delete an entry, so
   we waste memory to add a new NULL entry in front of the existing
   entry with the same key.

But this does at least work, (it passes the recently added undef test
case).
2010-05-12 13:11:50 -07:00
Carl Worth
5070a20cd1 Convert lexer to talloc and add xtalloc wrappers.
The lexer was previously using strdup (expecting the parser to free),
but is now more consistent, easier to use, and slightly more efficent
by using talloc along with the parser.

Also, we add xtalloc and xtalloc_strdup wrappers around talloc and
talloc_strdup to put all of the out-of-memory-checking code in one
place.
2010-05-12 12:47:29 -07:00
Carl Worth
33cc400714 Fix defines involving both literals and other defined macros.
We now store a list of tokens in our hash-table rather than a single
string. This lets us replace each macro in the value as necessary.

This code adds a link dependency on talloc which does exactly what we
want in terms of memory management for a parser.

The 3 tests added in the previous commit now pass.
2010-05-12 12:25:34 -07:00
Carl Worth
0b27b5f051 Implment #define
By using the recently-imported hash_table implementation.
2010-05-10 16:16:06 -07:00
Carl Worth
a1e32bcff0 Add some compiler warnings and corresponding fixes.
Most of the current problems were (mostly) harmless things like
missing declarations, but there was at least one real error, (reversed
argument order for yyerrror).
2010-05-10 13:32:29 -07:00
Carl Worth
38aa83560b Make the lexer reentrant (to avoid "still reachable" memory).
This allows the final program to be 100% "valgrind clean", (freeing
all memory that it allocates). This will make it much easier to ensure
that any allocation that parser actions perform are also cleaned up.
2010-05-10 11:52:29 -07:00
Carl Worth
3a37b8701c Add the tiniest shell of a flex/bison-based parser.
It doesn't really *do* anything yet---merlely parsing a stream of
whitespace-separated tokens, (and not interpreting them at all).
2010-05-10 11:46:34 -07:00