C compiler

This page describes the design of the C compiler. If you want to use the compiler, please see ppci-cc or ppci.api.cc().

Overview

digraph x {
codegen [label="code generator" ]
source -> lexer [label="text"]
lexer -> preprocessor [label="tokens"]
preprocessor -> parser [label="tokens"]
preprocessor -> source [label="include"]
parser -> semantics [label="callback functions"]
semantics -> scope
semantics -> codegen [label="parsed program (AST)"]
}

The C compilers task is to take C code, and produce intermediate code for the backend. The first step is pre processing, during which a sequence of tokens is transformed into another sequence of tokens. Next comes parsing and code generation. There are two options here: split the parser and code generation passes, such that there is a clean seperation, or do it all at once. Maybe the best choice is to do it all in one pass, and transform a sequence of tokens into IR-code in a single go.

Limitations

The C compiler has some limitations. This is the list of current known limitations:

  • type qualifiers (const, volatile, etc) are parsed but ignored
  • K&R function declarations are not supported
  • wide characters are not supported
  • VLA is not implemented
  • function declarations in function definitions are not allowed

Pre-processor

The preprocessor is a strange thing. It must handle trigraphs, backslash newlines and macros.

The top level design of the preprocessor is the following:

  • Context: Contains state that would be global otherwise.
  • CLexer: processes a raw file into a sequence of tokens
  • Preprocessor: takes the token sequence a does macro expansion, resulting in another stream of tokens.
  • Output: The token stream maybe outputted to file.
  • Feed to compiler: The token stream might be fed into the rest of the compiler.

C compiler

The c compiler consists of the classical stages: parsing and codegeneration. Code generation is done to ir-code.

Parsing

The C parsing is done by two classes CParser and CSemantics. CParser is a recursive descent parser. It dances a tight dance with the CSemantics class. This idea is taken from the Clang project. The CParser takes a token sequence from the preprocessor and matches the C syntax. Whenever a valid C construct is found, it calls the corresponding function on the CSemantics class. The semantics class keeps track of the current scope and records the global declarations. It also check types and lvalues of expressions. At the end of parsing and this type checking, an abstract syntax tree (AST) is build up. This AST is type checked and variables are resolved. This AST can be used for several purposes, for example static code analysis or style checking. It is also possible to generate C code again from this AST.

L-values

The semantics class determines for each expression node whether it is an lvalue or not. The lvalue (short for location value) indicates whether the expression has a memory address, or is a value in a register. Basically it boils down to: can we take the address of this expression with the ‘&’ operator. Numeric literals and the results of addition are not lvalues. Variables are lvalues. The lvalue property is used during code generation to check whether the value must be loaded from memory or not.

Types

Hosted vs freestanding

A C compiler can be hosted or freestanding. The difference between those two is that a hosted C compiler also provides the standard C library. A freestanding compiler only contains a few really required header files. As a result a hosted compiler is really a combination of a C compiler and a C standard library implementation. Also, the standard library depends on the operating system which is used, where as a freestanding C compiler can be used independent of operating system. Writing an application using a hosted compiler is easier since the standard library is available.

digraph x {
  hosted [label="Hosted C application"]
  freestanding [label="Freestanding C application"]
  os [label="Operating system"]
  libc [label="C standard library"]
  compiler [label="C compiler"]
  hosted -> libc
  freestanding -> compiler
  libc -> os
  libc -> compiler
}

Code generation

Code generation takes the AST as a whole and loops over all its elements and generates the corresponding IR-code snippets from it. At the end of code generation, there is an IR module which can be feed into the optimizers or code generators.

C classes

The C frontend can be used to generate an AST from C code. You can use it to parse C code and analyze its structure like this:

>>> from ppci.api import get_arch
>>> from ppci.lang.c import create_ast
>>> src = "int a, *b;"
>>> ast = create_ast(src, get_arch('msp430').info)
>>> ast
Compilation unit with 2 declarations
>>> from ppci.lang.c import CAstPrinter
>>> printer = CAstPrinter()
>>> printer.print(ast)
Compilation unit with 2 declarations
    Variable [storage=None typ=Basic type int name=a]
        Basic type int
    Variable [storage=None typ=Pointer-type name=b]
        Pointer-type
            Basic type int
>>> ast.declarations[0].name
'a'

Module reference

C front end.

ppci.lang.c.create_ast(src, arch_info, filename='<snippet>', coptions=None)

Create a C ast from the given source

ppci.lang.c.preprocess(f, output_file, coptions=None)

Pre-process a file into the other file.

ppci.lang.c.c_to_ir(source: io.TextIOBase, march, coptions=None, reporter=None)

C to ir translation.

Parameters:
  • source (file-like object) – The C source to compile.
  • march (str) – The targetted architecture.
  • coptions – C specific compilation options.
Returns:

An ppci.ir.Module.

ppci.lang.c.print_ast(ast, file=None)

Display an abstract syntax tree.

ppci.lang.c.parse_text(text, arch='x86_64')

Parse given C sourcecode into an AST

ppci.lang.c.render_ast(ast)

Render a C program as text

For example:

>>> from ppci.lang.c import parse_text, print_ast, render_ast
>>> ast = parse_text('int a;')
>>> print_ast(ast)
Compilation unit with 1 declarations
    Variable [storage=None typ=Basic type int name=a]
        Basic type int
>>> render_ast(ast)
int a;
ppci.lang.c.parse_type(text, context, filename='foo.c')

Parse given C-type AST.

For example:

>>> from ppci.api import get_arch
>>> from ppci.lang.c import parse_type, CContext, COptions
>>> msp430_arch = get_arch('msp430')
>>> coptions = COptions()
>>> context = CContext(coptions, msp430_arch.info)
>>> ast = parse_type('int[2]', context)
>>> context.eval_expr(ast.size)
2
>>> context.sizeof(ast)
4
class ppci.lang.c.CBuilder(arch_info, coptions)

C builder that converts C code into ir-code

class ppci.lang.c.CContext(coptions, arch_info)

A context as a substitute for global data

alignment(typ: ppci.lang.c.nodes.types.CType)

Given a type, determine its alignment in bytes

static error(message, location, hints=None)

Trigger an error at the given location

eval_expr(expr)

Evaluate an expression right now! (=at compile time)

get_field(typ, field_name)

Get the given field.

get_field_offsets(typ)

Get a dictionary with offset of fields

has_field(typ, field_name)

Check if the given type has the given field.

layout_struct(typ)

Layout the fields in the struct.

Things to take in account: - alignment - bit packing - anonynous types

limit_max(typ: ppci.lang.c.nodes.types.CType) → int

Retrieve the maximum value for the given integer type.

offsetof(typ, field)

Returns the offset of a field in a struct/union in bytes

pack(typ, value)

Pack a type into proper memory format

sizeof(typ: ppci.lang.c.nodes.types.CType)

Given a type, determine its size in whole bytes

warning(message, location, hints=None)

Trigger a warning at the given location

class ppci.lang.c.CLexer(coptions)

Lexer used for the preprocessor

lex(src, source_file)

Read a source and generate a series of tokens

lex_c()

Root parsing function

lex_char()

Scan for a complete character constant

lex_float()

Lex floating point number from decimal dot onwards.

lex_number()

Lex a single numeric literal.

lex_string()

Scan for a complete string

lex_text(txt)

Create tokens from the given text

tokenize(characters)

Generate tokens from characters

class ppci.lang.c.COptions

A collection of settings regarding the C language

add_include_path(path)

Add a path to the list of include paths

add_include_paths(paths)

Add all the given include paths

classmethod from_args(args)

Create a new options object from parsed arguments.

process_args(args)

Given a set of parsed arguments, apply those

class ppci.lang.c.CPreProcessor(coptions)

A pre-processor for C source code

concat(lhs, rhs)

Concatenate two tokens

concatenate(tokens)

Handle the ‘##’ token concatenation operator

consume(typ=None, expand=True)

Consume a token of a certain type

copy_tokens(tokens, first_space)

Copy a series of tokens.

define(macro)

Register a macro

define_object_macro(name, text, protected=False)

Define an object like macro.

define_special_macro(name, handler)

Define a spcial macro which has a callback function.

do_if(condition, location)

Handle #if/#ifdef/#ifndef.

eat_line()

Eat up all tokens until the end of the line.

This does not expand macros.

error(msg, hints=None, loc=None)

We hit an error condition.

eval_expr()

Evaluate an expression

expand(macro_token)

Expand a single token into possibly more tokens.

expand_macro(macro, macro_token)

Expand a single macro.

expand_token_sequence(tokens)

Macro expand a sequence of tokens.

gatherargs(macro)

Collect expanded arguments for macro

get_define(name: str)

Retrieve the given define!

handle_define_directive(directive_token)

Process #define directive.

handle_directive(loc)

Handle a single preprocessing directive

handle_elif_directive(directive_token)

Process #elif directive.

handle_else_directive(directive_token)

Process the #else directive.

handle_endif_directive(directive_token)

Process the #endif directive.

handle_error_directive(directive_token)

Process #error directive.

handle_if_directive(directive_token)

Process an #if directive.

handle_ifdef_directive(directive_token)

Handle an #ifdef directive.

handle_ifndef_directive(directive_token)

Handle an #ifndef directive.

handle_include_directive(directive_token)

Process the #include directive.

handle_include_next_directive(directive_token)

Process the #include_next directive.

handle_line_directive(directive_token)

Process #line directive.

handle_pragma_directive(directive_token)

Process #pragma directive.

handle_undef_directive(directive_token)

Process #undef directive.

handle_warning_directive(directive_token)

Process #warning directive.

in_hideset(name)

Test if the given macro is contained in the current hideset.

include(filename, loc, use_current_dir=False, include_next=False)

Turn the given filename into a series of tokens.

is_defined(name: str) → bool

Check if the given define is defined.

locate_include(filename, loc, use_current_dir: bool, include_next)

Determine which file to use given the include filename.

Parameters:
  • loc (-) – the location where this include is included.
  • use_current_dir (-) – If true, look in the directory of the current file.
static make_token(from_token, typ, value)

Create a new token from another token.

next_token(expand=True)

Get next token

normalize_space(args)

Normalize spaces in macro expansions.

If we have space, it will be a single space.

parse_arguments()

Parse arguments for a function like macro.

  • Keep track of parenthesis level.
parse_expression(priority=0)

Parse an expression in an #if

Idea taken from: https://github.com/shevek/jcpp/blob/master/
src/main/java/org/anarres/cpp/Preprocessor.java
parse_included_filename()

Parse filename after #include/#include_next

predefine_builtin_macros()

Define predefined macros

process_file(f, filename=None)

Process the given open file into tokens.

process_tokens()

Process a sequence of tokens into an expanded token sequence.

This function returns an tokens that must be looped over.

push_expansion(expansion)

Push a macro expansion on the stack.

skip_excluded_block()

Skip the block excluded by if/ifdef.

Skip tokens until we hit #endif or alike.

special_macro_counter(macro_token)

Implement __COUNTER__ macro

special_macro_date(macro_token)

Invoked when the __DATE__ macro is expanded

special_macro_file(macro_token)

Invoked when the __FILE__ macro is expanded

special_macro_include_level(macro_token)

Implement __INCLUDE_LEVEL__ macro

special_macro_line(macro_token)

Invoked when the __LINE__ macro is expanded

special_macro_time(macro_token)

Implement __TIME__ macro

stringify(hash_token, snippet, loc)

Handle the ‘#’ stringify operator.

Take care of: - single space between the tokens being stringified - no spaces before first and after last token - escape double quotes of strings and backslash inside strings.

substitute_arguments(macro, args)

Return macro contents with substituted arguments.

Pay special care to # and ## operators, When an argument is used in # or ##, it is not macro expanded.

token

Peek one token ahead without taking it.

tokens_to_string(tokens)

Create a text from the given tokens

undefine(name: str)

Kill a define!

unget_token(token)

Undo token consumption.

class ppci.lang.c.CParser(coptions, semantics)

C Parser.

Implemented in a recursive descent way, like the CLANG[1] frontend for llvm, also without the lexer hack[2]. See for the gcc c parser code [3]. Also interesting is the libfim cparser frontend [4].

The clang parser is very versatile, it can handle C, C++, objective C and also do code completion. This one is very simple, it can only parse C.

[1] http://clang.org/ [2] https://en.wikipedia.org/wiki/The_lexer_hack [3] https://raw.githubusercontent.com/gcc-mirror/gcc/master/gcc/c/ c-parser.c

[4] https://github.com/libfirm/cparser/blob/ 0cc43ed4bb4d475728583eadcf9e9726682a838b/src/parser/parser.c

at_type_id()

Check if the upcoming token is a typedef identifier

is_declaration_statement()

Determine whether we are facing a declaration or not

next_token()

Advance to the next token

parse(tokens)

Here the parsing of C is begun …

Parse the given tokens.

parse_array_designator(init_cursor)

Parse array designator like ‘{2, [10]=4}’

parse_array_string_initializer(typ)

Handle the special case where an array is initialized with a string.

parse_asm_operand()

Parse a single asm operand.

parse_asm_operands()

Parse a series of assembly operands. Empty list is allowed.

parse_asm_statement()

Parse an inline assembly statement.

See also: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

parse_attributes()

Parse some attributes.

Examples are:

__attribute__((noreturn))

parse_break_statement()

Parse a break

parse_call(callee)

Parse a function call

parse_case_statement()

Parse a case.

For example: ‘case 5:’

Or, a gnu extension:

‘case 5 … 10:’

parse_compound_statement()

Parse a series of statements surrounded by ‘{‘ and ‘}’

parse_condition()

Parse an expression between parenthesis

parse_constant_expression()

Parse a constant expression

parse_continue_statement()

Parse a continue statement

parse_decl_group(decl_spec)

Parse the rest after the first declaration spec.

For example we have parsed ‘static int’ and we will now parse the rest. This can be either a function, or a sequence of initializable variables. At least it will probably contain some sort of identifier and an optional pointer stuff.

parse_decl_specifiers(allow_storage_class=True)

Parse declaration specifiers.

At the end we know type, storage class and qualifiers.

Gathers storage classes: - typedef - extern - static

One of the following type specifiers: - void, char, int, unsigned, long, … - typedef-ed type - struct, union or enum

Type qualifiers: - const - volatile

parse_declarations()

Parse normal declarations

parse_declarator(abstract=False)

Given a declaration specifier, parse the rest.

This involves parsing optionally pointers and qualifiers and next the declaration itself.

parse_default_statement()

Parse the default case

parse_do_statement()

Parse a do-while statement

parse_empty_statement()

Parse a statement that does nothing!

parse_enum()

Parse an enum definition

parse_enum_fields(ctyp, location)

Parse enum declarations

parse_expression()

Parse an expression.

See also: http://en.cppreference.com/w/c/language/operator_precedence

parse_for_statement()

Parse a for statement

parse_function_arguments()

Parse function postfix.

We have type and name, now parse function arguments.

parse_function_declaration(decl_spec, declarator)

Parse a function declaration with implementation

parse_gnu_attribute()

Parse a gnu attribute like __attribute__((noreturn))

parse_goto_statement()

Parse a goto

parse_if_statement()

Parse an if statement

parse_initializer(typ)

Parse the C-style array or struct initializer stuff.

Heavily copied from: https://github.com/rui314/8cc/blob/master/parse.c

Argument is the type to initialize.

An initialization can be one of: = 1; = {1, 2, 3}; = {[0] = 3}; // C99 = {.foobar = {23, 3}}; // C99 = {[2..5] = 2}; // C99

parse_initializer_list(typ)

Parse braced initializer list.

Parse opening brace, elements and closing brace.

parse_initializer_list_element(init_cursor)

Parse an initializer list element with optional designators.

parse_initializer_list_sub(init_cursor, typ)

Parse braced initializer list.

Parse opening brace, elements and closing brace.

parse_label()

Parse a label statement

parse_primary_expression()

Parse a primary expression

parse_return_statement()

Parse a return statement

parse_statement()

Parse a statement

parse_statement_or_declaration()

Parse either a statement or a declaration

Returns: a list of statements

parse_struct_designator(init_cursor)

Parse a struct designator in an initializer list.

parse_struct_fields()

Parse struct or union fields

parse_struct_or_union()

Parse a struct or union

parse_switch_statement()

Parse an switch statement

parse_translation_unit()

Top level start of parsing

parse_type_modifiers(abstract=False)

Parse the pointer, name and array or function suffixes.

Can be abstract type, and if so, the type may be nameless.

The result of all this action is a type modifier list with the proper type modifications required by the given code.

parse_typedef(decl_spec, declarator)

Process typedefs

parse_typename()

Parse a type specifier used in sizeof(int) for example.

parse_variable_declaration(decl_spec, declarator)

Parse variable declaration optionally followed by initializer.

parse_while_statement()

Parse a while statement

skip_initializer_lists()

Skip superfluous initial values.

class ppci.lang.c.CAstPrinter(file=None)

Print AST of a C program

visit(node)

Recursively visit node’s child nodes.

class ppci.lang.c.CSemantics(context)

This class handles the C semantics

add_statement(statement)

Helper to emit a statement into the current block.

apply_type_modifiers(type_modifiers, typ)

Apply the set of type modifiers to the given type

begin()

Enter a new file / compilation unit.

check_redeclaration_storage_class(sym, declaration)

Test if we specified an invalid combo of storage class.

coerce(expr: ppci.lang.c.nodes.expressions.CExpression, typ: ppci.lang.c.nodes.types.CType)

Try to fit the given expression into the given type.

define_tag_type(tag: str, klass: ppci.lang.c.nodes.types.TaggedType, location)

Get or create a tagged type with the given tag kind.

end_function(body)

Called at the end of a function

ensure_integer(expr: ppci.lang.c.nodes.expressions.CExpression)

Ensure typ is of any integer type.

ensure_no_void_ptr(expr)

Test if expr has type void*, and if so, generate a warning and an implicit cast statement.

enter_scope()

Enter a new symbol scope.

equal_types(typ1, typ2)

Compare two types for equality.

error(message, location, hints=None)

Trigger an error at the given location

finish_compilation_unit()

Called at the end of a file / compilation unit.

get_common_type(typ1, typ2, location)

Given two types, determine the common type.

The common type is a type they can both be cast to.

get_type(type_specifiers)

Retrieve a type by type specifiers

init_store(init_cursor, value)

Store an initial value at position pointed by cursor.

invalid_redeclaration(sym, declaration, message='Invalid redefinition')

Raise an invalid redeclaration error.

leave_scope()

Leave a symbol scope.

not_impl(message, location)

Call this function to mark unimplemented code

on_array_designator(init_cursor, index, location)

Handle array designator.

on_array_index(base, index, location)

Check array indexing

on_basic_type(type_specifiers, location)

Handle basic type

on_binop(lhs, op, rhs, location)

Check binary operator

on_builtin_offsetof(typ, member_name, location)

Check offsetof builtin function

on_builtin_va_arg(arg_pointer, typ, location)

Check va_arg builtin function

on_builtin_va_copy(dest, src, location)

Check va_copy builtin function

on_builtin_va_start(arg_pointer, location)

Check va_start builtin function

on_call(callee, arguments, location)

Check function call for validity

on_case(value, statement, location)

Handle a case statement

on_cast(to_typ, casted_expr, location)

Check explicit casting

on_char(value, location)

Process a character literal

on_compound_literal(typ, init, location)

Check the consistency of compound literals.

on_default(statement, location)

Handle a default label

on_do(body, condition, location)

The almost extinct dodo!

on_enum(tag, is_definition, location)

Handle enum declaration

on_enum_value(ctyp, name, value, location)

Handle a single enum value definition

on_field_def(storage_class, ctyp, name, modifiers, bitsize, location)

Handle a struct/union field declaration

on_field_designator(init_cursor, field_name, location)

Check field designator.

on_field_select(base, field_name, location)

Check field select expression

on_float(value, location)

Process floating point literal.

on_for(initial, condition, post, body, location)

Check for loop construction

on_function_argument(typ, name, modifiers, location)

Process a function argument into the proper class

on_function_declaration(storage_class, typ, name, modifiers, location)

Handle function declaration

on_if(condition, then_statement, no, location)

Check if statement

on_number(value, location)

React on integer numeric literal

on_return(value, location)

Check return statement

on_sizeof(typ, location)

Handle sizeof contraption

on_string(value, location)

React on string literal

on_struct_or_union(kind, tag, is_definition, fields, location)

Handle struct or union definition.

A definition of a struct occurs when we have: struct S; struct S { int f; }; struct { int f; };

on_switch_exit(expression, statement, location)

Handle switch statement

on_ternop(lhs, op, mid, rhs, location)

Handle ternary operator ‘a ? b : c’

on_type(typ, modifiers, location)

Called when a type itself is described

static on_type_qualifiers(type_qualifiers, ctyp)

Handle type qualifiers

on_typedef(typ, name, modifiers, location)

Handle typedef declaration

on_typename(name, location)

Handle the case when a typedef is refered

on_unop(op, a, location)

Check unary operator semantics

on_variable_access(name, location)

Handle variable access

on_variable_declaration(storage_class, typ, name, modifiers, location)

Given a declaration, and a declarator, create the proper object

on_variable_initialization(variable, expression)

Handle a variable initialized to some value

on_while(condition, body, location)

Handle the while statement

patch_size_from_initializer(typ, initializer)

Fill array size from elements!

pointer(expr)

Handle several conversions.

Things handled: - Array to pointer decay. - Function to function pointer

promote(expr)

Perform integer promotion on expression.

Integer promotion happens when using char and short types in expressions. Those values are promoted to int type before performing the operation.

register_declaration(declaration)

Register declaration into the scope.

warning(message, location, hints=None)

Trigger a warning at the given location

class ppci.lang.c.CSynthesizer

Take an IR-module and convert it into a C-AST.

This does essentially the opposite of the codegenerator.

syn_block(block)

Synthesize an ir block into C

syn_instruction(instruction)

Convert ir instruction to its corresponding C counterpart

class ppci.lang.c.CPrinter(f=None)

Render a C program as text

gen_declaration(declaration)

Spit out a declaration

gen_expr(expr)

Format an expression as text

gen_statement(statement)

Render a single statement as text

print(compile_unit)

Render compilation unit as C

render_type(typ, name=None)

Generate a proper C-string for the given type

class ppci.lang.c.CTokenPrinter

Printer that can turn a stream of token-lines into text

Uml

digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{ArrayType|size\l|decay()\l}", shape="record"];
"1" [label="{BasicType|CHAR : str\lDOUBLE : str\lFLOAT : str\lFLOAT_TYPES : set\lINT : str\lINTEGER_TYPES\lLONG : str\lLONGDOUBLE : str\lLONGLONG : str\lNUMERIC_TYPES\lPROMOTABLE_INTEGER_TYPES : set\lSHORT : str\lSIGNED_INTEGER_TYPES : set\lUCHAR : str\lUINT : str\lULONG : str\lULONGLONG : str\lUNSIGNED_INTEGER_TYPES : set\lUSHORT : str\lVOID : str\ltype_id\l|}", shape="record"];
"2" [label="{CType|is_array\lis_char_array\lis_compound\lis_double\lis_float\lis_function\lis_integer\lis_integer_or_enum\lis_pointer\lis_promotable\lis_scalar\lis_signed\lis_struct\lis_struct_or_union\lis_union\lis_void\lqualifiers : NoneType\l|pointer_to()\l}", shape="record"];
"3" [label="{EnumType|complete\lconstants : NoneType\l|}", shape="record"];
"4" [label="{Field|bitsize\lis_anonymous\lis_bitfield\lname\ltyp\l|}", shape="record"];
"5" [label="{FunctionType|argument_types\larguments\lis_vararg : bool\lreturn_type\l|}", shape="record"];
"6" [label="{IndexableType|element_type\l|}", shape="record"];
"7" [label="{PointerType|\l|}", shape="record"];
"8" [label="{StructOrUnionType|complete\lfields\lfields : NoneType\lincomplete\ltag : NoneType\l|get_field(name)\lget_field_names()\lget_field_path(name)\lget_named_field_paths()\lhas_field(name)\l}", shape="record"];
"9" [label="{StructType|\l|}", shape="record"];
"10" [label="{TaggedType|\l|}", shape="record"];
"11" [label="{UnionType|\l|}", shape="record"];
"0" -> "6" [arrowhead="empty", arrowtail="none"];
"1" -> "2" [arrowhead="empty", arrowtail="none"];
"3" -> "10" [arrowhead="empty", arrowtail="none"];
"5" -> "2" [arrowhead="empty", arrowtail="none"];
"6" -> "2" [arrowhead="empty", arrowtail="none"];
"7" -> "6" [arrowhead="empty", arrowtail="none"];
"8" -> "10" [arrowhead="empty", arrowtail="none"];
"9" -> "8" [arrowhead="empty", arrowtail="none"];
"10" -> "2" [arrowhead="empty", arrowtail="none"];
"11" -> "8" [arrowhead="empty", arrowtail="none"];
}
digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{ArrayIndex|base\lindex\l|}", shape="record"];
"1" [label="{ArrayInitializer|values\l|}", shape="record"];
"2" [label="{BinaryOperator|a\lb\lop\l|}", shape="record"];
"3" [label="{BuiltIn|\l|}", shape="record"];
"4" [label="{BuiltInOffsetOf|member\lquery_typ\l|}", shape="record"];
"5" [label="{BuiltInVaArg|arg_pointer\l|}", shape="record"];
"6" [label="{BuiltInVaCopy|dest\lsrc\l|}", shape="record"];
"7" [label="{BuiltInVaStart|arg_pointer\l|}", shape="record"];
"8" [label="{CExpression|location\llvalue\ltyp\l|}", shape="record"];
"9" [label="{Cast|expr\lto_typ\l|is_array_decay()\l}", shape="record"];
"10" [label="{CharLiteral|\l|}", shape="record"];
"11" [label="{CompoundLiteral|init\l|}", shape="record"];
"12" [label="{FieldSelect|base\lfield\l|}", shape="record"];
"13" [label="{FunctionCall|args\lcallee\l|}", shape="record"];
"14" [label="{ImplicitCast|\l|}", shape="record"];
"15" [label="{ImplicitInitialValue|\l|}", shape="record"];
"16" [label="{Initializer|\l|}", shape="record"];
"17" [label="{InitializerList|\l|}", shape="record"];
"18" [label="{Literal|value\l|}", shape="record"];
"19" [label="{NumericLiteral|\l|}", shape="record"];
"20" [label="{Sizeof|sizeof_typ\l|}", shape="record"];
"21" [label="{StringLiteral|\l|to_bytes()\l}", shape="record"];
"22" [label="{StructInitializer|values : dict\l|}", shape="record"];
"23" [label="{TernaryOperator|a\lb\lc\lop\l|}", shape="record"];
"24" [label="{UnaryOperator|a\lop\l|}", shape="record"];
"25" [label="{UnionInitializer|field : NoneType\lvalue : NoneType\l|}", shape="record"];
"26" [label="{VariableAccess|name\lvariable\l|}", shape="record"];
"0" -> "8" [arrowhead="empty", arrowtail="none"];
"1" -> "17" [arrowhead="empty", arrowtail="none"];
"2" -> "8" [arrowhead="empty", arrowtail="none"];
"3" -> "8" [arrowhead="empty", arrowtail="none"];
"4" -> "3" [arrowhead="empty", arrowtail="none"];
"5" -> "3" [arrowhead="empty", arrowtail="none"];
"6" -> "3" [arrowhead="empty", arrowtail="none"];
"7" -> "3" [arrowhead="empty", arrowtail="none"];
"9" -> "8" [arrowhead="empty", arrowtail="none"];
"10" -> "18" [arrowhead="empty", arrowtail="none"];
"11" -> "8" [arrowhead="empty", arrowtail="none"];
"12" -> "8" [arrowhead="empty", arrowtail="none"];
"13" -> "8" [arrowhead="empty", arrowtail="none"];
"14" -> "9" [arrowhead="empty", arrowtail="none"];
"15" -> "16" [arrowhead="empty", arrowtail="none"];
"16" -> "8" [arrowhead="empty", arrowtail="none"];
"17" -> "16" [arrowhead="empty", arrowtail="none"];
"18" -> "8" [arrowhead="empty", arrowtail="none"];
"19" -> "18" [arrowhead="empty", arrowtail="none"];
"20" -> "8" [arrowhead="empty", arrowtail="none"];
"21" -> "18" [arrowhead="empty", arrowtail="none"];
"22" -> "17" [arrowhead="empty", arrowtail="none"];
"23" -> "8" [arrowhead="empty", arrowtail="none"];
"24" -> "8" [arrowhead="empty", arrowtail="none"];
"25" -> "17" [arrowhead="empty", arrowtail="none"];
"26" -> "8" [arrowhead="empty", arrowtail="none"];
}
digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{Break|\l|}", shape="record"];
"1" [label="{CStatement|location\l|}", shape="record"];
"2" [label="{Case|statement\lvalue\l|}", shape="record"];
"3" [label="{Compound|statements\l|}", shape="record"];
"4" [label="{Continue|\l|}", shape="record"];
"5" [label="{DeclarationStatement|declaration\l|}", shape="record"];
"6" [label="{Default|statement\l|}", shape="record"];
"7" [label="{DoWhile|body\lcondition\l|}", shape="record"];
"8" [label="{Empty|\l|}", shape="record"];
"9" [label="{ExpressionStatement|expression\l|}", shape="record"];
"10" [label="{For|body\lcondition\linit\lpost\l|}", shape="record"];
"11" [label="{Goto|label\l|}", shape="record"];
"12" [label="{If|condition\lno\lyes\l|}", shape="record"];
"13" [label="{InlineAssemblyCode|clobbers\linput_operands\loutput_operands\ltemplate\l|}", shape="record"];
"14" [label="{Label|name\lstatement\l|}", shape="record"];
"15" [label="{RangeCase|statement\lvalue1\lvalue2\l|}", shape="record"];
"16" [label="{Return|value\l|}", shape="record"];
"17" [label="{Switch|expression\lstatement\l|}", shape="record"];
"18" [label="{While|body\lcondition\l|}", shape="record"];
"0" -> "1" [arrowhead="empty", arrowtail="none"];
"2" -> "1" [arrowhead="empty", arrowtail="none"];
"3" -> "1" [arrowhead="empty", arrowtail="none"];
"4" -> "1" [arrowhead="empty", arrowtail="none"];
"5" -> "1" [arrowhead="empty", arrowtail="none"];
"6" -> "1" [arrowhead="empty", arrowtail="none"];
"7" -> "1" [arrowhead="empty", arrowtail="none"];
"8" -> "1" [arrowhead="empty", arrowtail="none"];
"9" -> "1" [arrowhead="empty", arrowtail="none"];
"10" -> "1" [arrowhead="empty", arrowtail="none"];
"11" -> "1" [arrowhead="empty", arrowtail="none"];
"12" -> "1" [arrowhead="empty", arrowtail="none"];
"13" -> "1" [arrowhead="empty", arrowtail="none"];
"14" -> "1" [arrowhead="empty", arrowtail="none"];
"15" -> "1" [arrowhead="empty", arrowtail="none"];
"16" -> "1" [arrowhead="empty", arrowtail="none"];
"17" -> "1" [arrowhead="empty", arrowtail="none"];
"18" -> "1" [arrowhead="empty", arrowtail="none"];
}
digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{CDeclaration|is_function\llocation\lname\lstorage_class\ltyp\l|is_definition()\l}", shape="record"];
"1" [label="{ConstantDeclaration|value\l|}", shape="record"];
"2" [label="{EnumConstantDeclaration|value\l|}", shape="record"];
"3" [label="{EnumDeclaration|constants\l|}", shape="record"];
"4" [label="{FunctionDeclaration|body : NoneType\l|is_definition()\l}", shape="record"];
"5" [label="{ParameterDeclaration|\l|}", shape="record"];
"6" [label="{StorageClass|AUTO : str\lEXTERN : str\lREGISTER : str\lSTATIC : str\l|}", shape="record"];
"7" [label="{Typedef|\l|}", shape="record"];
"8" [label="{VariableDeclaration|initial_value\l|is_definition()\l}", shape="record"];
"1" -> "0" [arrowhead="empty", arrowtail="none"];
"2" -> "0" [arrowhead="empty", arrowtail="none"];
"3" -> "0" [arrowhead="empty", arrowtail="none"];
"4" -> "0" [arrowhead="empty", arrowtail="none"];
"5" -> "0" [arrowhead="empty", arrowtail="none"];
"7" -> "0" [arrowhead="empty", arrowtail="none"];
"8" -> "0" [arrowhead="empty", arrowtail="none"];
}

References

This section contains some links to other compilers. These were used to draw inspiration from. In the spirit of: it is better to steal ideas then invent something bad yourself :).

Other C compilers

A c99 frontend for libfirm: https://github.com/libfirm/cparser

tcc

tcc: tiny c compiler

This compiler is tiny and uses a single pass to parse and generate code.

lcc

https://github.com/drh/lcc

This compiler does parsing and type checking in one go.

cdt

CDT is an eclipse c/c++ ide.

http://wiki.eclipse.org/CDT/designs/Overview_of_Parsing