C3 language


As an example of designing and implementing a custom language within the PPCI framework, the C3 language was created. As pointed out in c2lang, the C language is widely used, but has some strange contraptions. These include the following:

  • The include system. This results in lots of code duplication and file creation. Why would you need filenames in source code?
  • The comma statement: x = a(), 2; assigns 2 to x, after calling function a.
  • C is difficult to parse with a simple parser. The parser has to know what a symbol is when it is parsed. This is also referred to as the lexer hack.

In part for these reasons (and of course, for fun), C3 was created.

The hello world example in C3 is:

module hello;
import io;

function void main()
    io.println("Hello world");

Language reference


Modules in C3 live in file, and can be defined in multiple files. Modules can import each other by using the import statement.

For example:


module pkg1;
import pkg2;


module pkg2;
import pkg1;


Function can be defined by using the function keyword, followed by a type and the function name.

module example;

function void compute()

function void main()


Variables require the var keyword, and can be either global or function-local.

module example;

var int global_var;

function void compute()
    var int x = global_var + 13;
    global_var = 200 - x;


Types can be specified when a variable is declared, and also typedef’ed using the type keyword.

module example;
var int number;
var int* ptr_num;
type int* ptr_num_t;
var ptr_num_t number2;

If statement

The following code example demonstrates the if statement. The else part is optional.

module example;

function void compute(int a)
    var int b = 10;
    if (a > 100)
        b += a;

    if (b > 50)
        b += 1000;
        b = 2;

While statement

The while statement can be used as follows:

module example;

function void compute(int a)
    var int b = 10;
    while (b > a)
        b -= 1;

For statement

The for statement works like in C. The first item is initialized before the loop. The second is the condition for the loop. The third part is executed when one run of the loop is done.

module example;

function void compute(int a)
    var int b = 0;
    for (b = 100; b > a; b -= 1)
        // Do something here!


C3 does not contain a preprocessor. For these kind of things it might be better to use a templating engine such as Jinja2.

Module reference

This is the c3 language front end.

For the front-end a recursive descent parser is created.

digraph c3 {
1 [label="source text"]
10 [label="lexer" ]
20 [label="parser" ]
40 [label="code generation"]
99 [label="IR-code object"]
1 -> 10
10 -> 20
20 -> 40
40 -> 99
class ppci.lang.c3.AstPrinter

Prints an AST as text

class ppci.lang.c3.C3Builder(diag, arch_info)

Generates IR-code from c3 source.

Reports errors to the diagnostics system.

build(sources, imps=())

Create IR-code from sources.

Returns:A context where modules are living in and an ir-module.

Raises compiler error when something goes wrong.

do_parse(src, context)

Lexing and parsing stage (phase 1)

class ppci.lang.c3.CodeGenerator(diag)

Generates intermediate (IR) code from a package.

The entry function is ‘genModule’. The main task of this part is to rewrite complex control structures, such as while and for loops into simple conditional jump statements. Also complex conditional statements are simplified. Such as ‘and’ and ‘or’ statements are rewritten in conditional jumps. And structured datatypes are rewritten.

Type checking is done in one run with code generation.

emit(instruction, loc=None)

Emits the given instruction to the builder.

error(msg, loc=None)

Emit error to diagnostic system and mark package as invalid


Generate code for a whole context


Generate code for assignment statement

gen_binop(expr: ppci.lang.c3.astnodes.Binop)

Generate code for binary operation


Generate code for cases where a boolean value is assigned

gen_cond_code(expr, bbtrue, bbfalse)

Generate conditional logic. Implement sequential logical operators.

gen_dereference(expr: ppci.lang.c3.astnodes.Deref)

dereference pointer type, which means *(expr)

gen_expr_at(ptr, expr)

Generate code at a pointer in memory

gen_expr_code(expr: ppci.lang.c3.astnodes.Expression, rvalue=False) → ppci.ir.Value

Generate code for an expression. Return the generated ir-value


Generate external function


Generate for-loop code


Generate code for a function. This involves creating room for parameters on the stack, and generating code for the function body.


Generate code for a function call

gen_global_ival(ival, typ)

Create memory image for initial value


Generate global variables and modules


Generate code for when an identifier was referenced


Generate code for if statement


Array indexing


Generate code for literal


Initialize a local variable


Generate code for member expression such as struc.mem = 2 This could also be a module deref!

gen_module(mod: ppci.lang.c3.astnodes.Module)

Generate code for a single module


Generate code for return statement

gen_stmt(code: ppci.lang.c3.astnodes.Statement)

Generate code for a statement


Generate code for a switch statement


Generate code for type casting


Generate code for unary operator


Generate code for while statement


Get or create debug type info in the debug information


Get the proper IR function for the given function.

A new function will be created if required.


Given a certain type, get the corresponding ir-type


Determine whether a module is referenced


Create a new basic block into the current function

class ppci.lang.c3.Context(arch_info)

A context is the space where all modules live in.

It is actually the container of modules and the top level scope.

equal_types(a, b, byname=False)

Compare types a and b for structural equavalence.

if byname is True stop on defined types.


Evaluates a constant expression.

get_common_type(a, b, loc)

Determine the greatest common type.

This is used for coercing binary operators.

For example:

  • int + float -> float
  • byte + int -> int
  • byte + byte -> byte
  • pointer to x + int -> pointer to x

Get the constant value, calculate if required

get_module(name, create=True)

Gets or creates the module with the given name

get_type(typ, reveil_defined=True)

Get type given by str, identifier or type.

When reveil_defined is True, defined types are resolved to their backing types.


Check if a module with the given name exists


Determines if the given type is a simple type

Resolve all modules referenced by other modules


Get all the modules in this context


Pack a string an int as length followed by text data


Find out what is designated with x


Determine the byte size of a type

class ppci.lang.c3.Lexer(diag)

Generates a sequence of token from an input stream


Keeps track of the long comments

class ppci.lang.c3.Parser(diag)

Parses sourcecode into an abstract syntax tree (AST)


Add a symbol to the current scope

parse_cast_expression() → ppci.lang.c3.astnodes.Expression

Parse a cast expression.

The C-style type cast conflicts with ‘(‘ expr ‘)’ so introduce extra keyword ‘cast’.


Parse a compound statement, which is bounded by ‘{‘ and ‘}’


Parse a constant definition


Parse array initializers and other constant values


A designator designates an object with a name.

parse_expression(rbp=0) → ppci.lang.c3.astnodes.Expression

Process expressions with precedence climbing.

See also:

http://eli.thegreenplace.net/2012/08/02/ parsing-expressions-by-precedence-climbing

parse_for() → ppci.lang.c3.astnodes.For

Parse a for statement


Parse function definition


Parse a sequence of id’s


Parse if statement


Parse import construct


Parse a module definition

parse_postfix_expression() → ppci.lang.c3.astnodes.Expression

Parse postfix expression

parse_primary_expression() → ppci.lang.c3.astnodes.Expression

Literal and parenthesis expression parsing

parse_return() → ppci.lang.c3.astnodes.Return

Parse a return statement

parse_source(tokens, context)

Parse a module from tokens

parse_statement() → ppci.lang.c3.astnodes.Statement

Determine statement type based on the pending token

parse_switch() → ppci.lang.c3.astnodes.Switch

Parse switch statement


Parse toplevel declaration


Parse a type definition


Parse type specification. Type specs are read from right to left.

A variable spec is given by: var [typeSpec] [modifiers] [pointer/array suffix] variable_name

For example: var int volatile * ptr; creates a pointer to a volatile integer.


Handle unary plus, minus and pointer magic


Parse variable declaration, optionally with initialization.

parse_while() → ppci.lang.c3.astnodes.While

Parses a while statement

class ppci.lang.c3.Visitor(pre=None, post=None)

Visitor that can visit all nodes in the AST and run pre and post functions.


Visit a single node


Visit a node and all its descendants

ppci.lang.c3.c3_to_ir(sources, includes, march, reporter=None)

Compile c3 sources to ir-code for the given architecture.