C3 language

Introduction

As an example of designing and implementing a custom language within the PPCI framework, the C3 language was created. As pointed out in c2lang, the C language is widely used, but has some strange contraptions. These include the following:

  • The include system. This results in lots of code duplication and file creation. Why would you need filenames in source code?
  • The comma statement: x = a(), 2; assigns 2 to x, after calling function a.
  • C is difficult to parse with a simple parser. The parser has to know what a symbol is when it is parsed. This is also referred to as the lexer hack.

In part for these reasons (and of course, for fun), C3 was created.

The hello world example in C3 is:

module hello;
import io;

function void main()
{
    io.println("Hello world");
}

Language reference

Modules

Modules in C3 live in file, and can be defined in multiple files. Modules can import each other by using the import statement.

For example:

pkg1.c3:

module pkg1;
import pkg2;

pkg2.c3:

module pkg2;
import pkg1;

Functions

Function can be defined by using the function keyword, followed by a type and the function name.

module example;

function void compute()
{
}

function void main()
{
    main();
}

Variables

Variables require the var keyword, and can be either global or function-local.

module example;

var int global_var;

function void compute()
{
    var int x = global_var + 13;
    global_var = 200 - x;
}

Types

Types can be specified when a variable is declared, and also typedef’ed using the type keyword.

module example;
var int number;
var int* ptr_num;
type int* ptr_num_t;
var ptr_num_t number2;

If statement

The following code example demonstrates the if statement. The else part is optional.

module example;

function void compute(int a)
{
    var int b = 10;
    if (a > 100)
    {
        b += a;
    }

    if (b > 50)
    {
        b += 1000;
    }
    else
    {
        b = 2;
    }
}

While statement

The while statement can be used as follows:

module example;

function void compute(int a)
{
    var int b = 10;
    while (b > a)
    {
        b -= 1;
    }
}

For statement

The for statement works like in C. The first item is initialized before the loop. The second is the condition for the loop. The third part is executed when one run of the loop is done.

module example;

function void compute(int a)
{
    var int b = 0;
    for (b = 100; b > a; b -= 1)
    {
        // Do something here!
    }
}

Other

C3 does not contain a preprocessor. For these kind of things it might be better to use a templating engine such as Jinja2.

Module reference

This is the c3 language front end.

For the front-end a recursive descent parser is created.

digraph c3 {
rankdir="LR"
1 [label="source text"]
10 [label="lexer" ]
20 [label="parser" ]
40 [label="code generation"]
99 [label="IR-code object"]
1 -> 10
10 -> 20
20 -> 40
40 -> 99
}
class ppci.lang.c3.AstPrinter

Prints an AST as text

class ppci.lang.c3.C3Builder(diag, arch_info)

Generates IR-code from c3 source.

Reports errors to the diagnostics system.

build(sources, imps=())

Create IR-code from sources.

Returns:A context where modules are living in and an ir-module.

Raises compiler error when something goes wrong.

do_parse(src, context)

Lexing and parsing stage (phase 1)

class ppci.lang.c3.CodeGenerator(diag)

Generates intermediate (IR) code from a package.

The entry function is ‘genModule’. The main task of this part is to rewrite complex control structures, such as while and for loops into simple conditional jump statements. Also complex conditional statements are simplified. Such as ‘and’ and ‘or’ statements are rewritten in conditional jumps. And structured datatypes are rewritten.

Type checking is done in one run with code generation.

emit(instruction, loc=None)

Emits the given instruction to the builder.

error(msg, loc=None)

Emit error to diagnostic system and mark package as invalid

gen(context)

Generate code for a whole context

gen_assignment_stmt(code)

Generate code for assignment statement

gen_binop(expr: ppci.lang.c3.astnodes.Binop)

Generate code for binary operation

gen_bool_expr(expr)

Generate code for cases where a boolean value is assigned

gen_cond_code(expr, bbtrue, bbfalse)

Generate conditional logic. Implement sequential logical operators.

gen_dereference(expr: ppci.lang.c3.astnodes.Deref)

dereference pointer type, which means *(expr)

gen_expr_at(ptr, expr)

Generate code at a pointer in memory

gen_expr_code(expr: ppci.lang.c3.astnodes.Expression, rvalue=False) → ppci.ir.Value

Generate code for an expression. Return the generated ir-value

gen_external_function(function)

Generate external function

gen_for_stmt(code)

Generate for-loop code

gen_function(function)

Generate code for a function. This involves creating room for parameters on the stack, and generating code for the function body.

gen_function_call(expr)

Generate code for a function call

gen_global_ival(ival, typ)

Create memory image for initial value

gen_globals(module)

Generate global variables and modules

gen_identifier(expr)

Generate code for when an identifier was referenced

gen_if_stmt(code)

Generate code for if statement

gen_index_expr(expr)

Array indexing

gen_literal_expr(expr)

Generate code for literal

gen_local_var_init(var)

Initialize a local variable

gen_member_expr(expr)

Generate code for member expression such as struc.mem = 2 This could also be a module deref!

gen_module(mod: ppci.lang.c3.astnodes.Module)

Generate code for a single module

gen_return_stmt(code)

Generate code for return statement

gen_stmt(code: ppci.lang.c3.astnodes.Statement)

Generate code for a statement

gen_switch_stmt(switch)

Generate code for a switch statement

gen_type_cast(expr)

Generate code for type casting

gen_unop(expr)

Generate code for unary operator

gen_while(code)

Generate code for while statement

get_debug_type(typ)

Get or create debug type info in the debug information

get_ir_function(function)

Get the proper IR function for the given function.

A new function will be created if required.

get_ir_type(cty)

Given a certain type, get the corresponding ir-type

is_module_ref(expr)

Determine whether a module is referenced

new_block()

Create a new basic block into the current function

class ppci.lang.c3.Context(arch_info)

A context is the space where all modules live in.

It is actually the container of modules and the top level scope.

equal_types(a, b, byname=False)

Compare types a and b for structural equavalence.

if byname is True stop on defined types.

eval_const(expr)

Evaluates a constant expression.

get_common_type(a, b, loc)

Determine the greatest common type.

This is used for coercing binary operators.

For example:

  • int + float -> float
  • byte + int -> int
  • byte + byte -> byte
  • pointer to x + int -> pointer to x
get_constant_value(const)

Get the constant value, calculate if required

get_module(name, create=True)

Gets or creates the module with the given name

get_type(typ, reveil_defined=True)

Get type given by str, identifier or type.

When reveil_defined is True, defined types are resolved to their backing types.

has_module(name)

Check if a module with the given name exists

is_simple_type(typ)

Determines if the given type is a simple type

Resolve all modules referenced by other modules

modules

Get all the modules in this context

pack_string(txt)

Pack a string an int as length followed by text data

resolve_symbol(ref)

Find out what is designated with x

size_of(typ)

Determine the byte size of a type

class ppci.lang.c3.Lexer(diag)

Generates a sequence of token from an input stream

tokenize(text)

Keeps track of the long comments

class ppci.lang.c3.Parser(diag)

Parses sourcecode into an abstract syntax tree (AST)

add_symbol(sym)

Add a symbol to the current scope

parse_cast_expression() → ppci.lang.c3.astnodes.Expression

Parse a cast expression.

The C-style type cast conflicts with ‘(‘ expr ‘)’ so introduce extra keyword ‘cast’.

parse_compound()

Parse a compound statement, which is bounded by ‘{‘ and ‘}’

parse_const_def()

Parse a constant definition

parse_const_expression()

Parse array initializers and other constant values

parse_designator()

A designator designates an object with a name.

parse_expression(rbp=0) → ppci.lang.c3.astnodes.Expression

Process expressions with precedence climbing.

See also:

http://eli.thegreenplace.net/2012/08/02/ parsing-expressions-by-precedence-climbing

parse_for() → ppci.lang.c3.astnodes.For

Parse a for statement

parse_function_def(public=True)

Parse function definition

parse_id_sequence()

Parse a sequence of id’s

parse_if()

Parse if statement

parse_import()

Parse import construct

parse_module(context)

Parse a module definition

parse_postfix_expression() → ppci.lang.c3.astnodes.Expression

Parse postfix expression

parse_primary_expression() → ppci.lang.c3.astnodes.Expression

Literal and parenthesis expression parsing

parse_return() → ppci.lang.c3.astnodes.Return

Parse a return statement

parse_source(tokens, context)

Parse a module from tokens

parse_statement() → ppci.lang.c3.astnodes.Statement

Determine statement type based on the pending token

parse_switch() → ppci.lang.c3.astnodes.Switch

Parse switch statement

parse_top_level()

Parse toplevel declaration

parse_type_def(public=True)

Parse a type definition

parse_type_spec()

Parse type specification. Type specs are read from right to left.

A variable spec is given by: var [typeSpec] [modifiers] [pointer/array suffix] variable_name

For example: var int volatile * ptr; creates a pointer to a volatile integer.

parse_unary_expression()

Handle unary plus, minus and pointer magic

parse_variable_def(public=True)

Parse variable declaration, optionally with initialization.

parse_while() → ppci.lang.c3.astnodes.While

Parses a while statement

class ppci.lang.c3.Visitor(pre=None, post=None)

Visitor that can visit all nodes in the AST and run pre and post functions.

do(node)

Visit a single node

visit(node)

Visit a node and all its descendants

ppci.lang.c3.c3_to_ir(sources, includes, march, reporter=None)

Compile c3 sources to ir-code for the given architecture.