Content

Not quite a Yegge long.

C macros: You can’t pre-lex this.

Wednesday 17 March 2010 - Filed under Uncategorized

For those of us who try to build our own C compilers, preprocessor macros are a significant stumbling-block, especially for standard C, where macros are token-oriented, rather than text-oriented.

The (old) naive approach is to just do a dirty text substitution, and then lex the result normally. This appears(!) to be what several production compilers actually do (MSVC, etc). Unfortunately, this gets really messy if you want to be fully compliant with the standard, since it’s really trying to force a text-centric substitution to pretend to be token-centric. The edge cases suck.

The (new) naive approach is to pre-lex the RHS of the macros (as far as pp_tokens). Unfortunately, it turns out *that* doesn’t work either, since the sequence of tokens depends on where it’s expanded. Here’s a quick example that’s sure to get you fired if you did it in production code, but the standard allows it:

#define FAIL_MACRO <stdio.h>

// here, it’s a header-name
#include FAIL_MACRO

// here, it’s a token sequence making up part of an expression
int main( void )
{
        struct { int h; } stdio = { 5 };
        return (2 FAIL_MACRO 3);
}

Contrived, but it’s valid. You have to know the context in order to correctly lex the RHS of a macro. Why am I messing around with C compilation again?

2010-03-17  »  admin

Share your thoughts

Re: C macros: You can’t pre-lex this.







Tags you can use (optional):
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>