Language tools

The package contains tools for programming language construction.


To provide useful feedback about programming errors, there are classes for sourcecode location and diagnostics.

class ppci.lang.common.SourceLocation(filename, row, col, ln, source=None)

A location that refers to a position in a source file


Return the source line indicated by this location

print_message(message, lines=None, filename=None, file=None)

Print a message at this location in the given source lines

class ppci.lang.common.SourceRange(p1, p2)

Alias for field number 0


Alias for field number 1

class ppci.lang.common.Token(typ, val, loc)

Token is used in the lexical analyzer. The lexical analyzer takes a text and splits it into tokens.

ppci.lang.common.print_line(row, lines, file=None)

Print a single source line



Base class for a lexer.

This class can be overridden to create a lexer. This class handles the regular expression generation and source position accounting.


Feeds the lexer with extra input


Enters a new line

tokenize(txt, eof=False)

Generator that generates lexical tokens from text.

Optionally yield the EOF token.


Meta class which inspects the functions decorated with ‘on’


Simple class for lexing.

Use this class by subclassing it and decorating handler methods with the ‘on’ function.


Find a match at the given position

tokenize(txt, eof=False)

Generator that generates lexical tokens from text.

Optionally yield the EOF token., flags=0, order=0)

Register method to the given pattern.

Parameters:order – a sorting priority. Lower number comes first.


To help to define a grammar for a language the grammar classes can be used to define a language grammar. From this grammar parsers can be generated.

A grammar can be defined like this:

>>> from import Grammar
>>> g = Grammar()
>>> g.add_terminal('term')
>>> g.add_production('expr', ['expr', '+', 'term'])
>>> g.add_production('expr', ['expr', '-', 'term'])
>>> g.dump()
Grammar with 2 rules and 1 terminals
expr -> ('expr', '+', 'term') P_0
expr -> ('expr', '-', 'term') P_0
>>> g.is_terminal('expr')
>>> g.is_terminal('term')

Defines a grammar of a language

add_one_or_more(element_nonterm, list_nonterm)

Helper to add the rule lst: elem lst: lst elem

add_production(name, symbols, semantics=None, priority=0)

Add a production rule to the grammar


Add a terminal name


Add all terminals to terminals for this grammar


Checks no symbols are undefined

create_combinations(rule, non_terminal)

Create n copies of rule where nt is removed. For example replace: A -> B C B by: A -> B C B A -> B C A -> C B A -> C if: B -> epsilon


Print this grammar


Check if a name is a non-terminal


Check if this grammar is normal. Which means: - No empty productions (epsilon productions)


Check if a name is a terminal


Retrieve all productions for a non terminal


Make the grammar free of empty productions. Do this by permutating all combinations of rules that would otherwise contain an empty place.


Get all the symbols defined by this grammar

class, symbols, semantics, priority=0)

Production rule for a grammar. It consists of a left hand side non-terminal and a list of symbols as right hand side. Also it contains a function that must be called when this rule is applied. The right hand side may contain terminals and non-terminals.


Checks if this rule is an epsilon rule

Pretty print a grammar

Earley parser

Implementation of the earley parser strategy.

See also: -

And the book: Parsing Techniques: A Practical Guide (2nd edition)

class, token)

A set of partially parsed items for a given token position


Earley parser.

As opposed to an LR parser, the Earley parser does not construct tables from a grammar. It uses the grammar when parsing.

The Earley parser has 3 key functions:

  • predict: what have we parsed so far, and what productions can be made with this.
  • scan: parse the next input symbol, en create a new set of possible parsings
  • complete: when we have scanned something according to a rule, this rule can be applied.

When an earley parse is complete, the parse can be back-tracked to yield the resulting parse tree or the syntax tree.

complete(completed_item, start_col, current_column)

Complete a rule, check if any other rules can be shifted!

make_tree(columns, nt)

Make a parse tree

parse(tokens, debug_dump=False)

Parse the given token string

predict(item, col)

Add all rules for a certain non-terminal

scan(item, col)

Check if the item can be shifted into the next column

walk(columns, end, nt)

Process the parsed columns back to a parse tree

class, dot, origin)

Partially parsed grammar rule

Recursive descent

Recursive descent parsing (also known as handwritten parsing) is an easy way to parse sourcecode.


Base class for recursive descent parsers


Assert that the next token is typ, and if so, return it.

If typ is a list or tuple, consume one of the given types. If typ is not given, consume the next token.

error(msg, loc=None)

Raise an error at the current location


Checks if the look-ahead token is of type typ, and if so eats the token and returns true


Initialize the parser with the given tokens (an iterator)


Take a look at x tokens ahead


Advance to the next token


Call this function when parsing reaches unimplemented parts


Look at the next token to parse without popping it

LR parsing

class, dotpos, look_ahead)

Represents a partially parsed item It has a production it is looking for, a position in this production called the ‘dot’ and a look ahead symbol that must follow this item.


Gets the symbol after the next symbol, or EPS if at the end


Determines if this item can shift over the given symbol


Check if this item has the dot at the end


Creates a new item that is shifted one position

class, action_table, goto_table)

LR parser automata. This class takes goto and action table and can then process a sequence of tokens.


Parse an iterable with tokens


Construct goto and action tables according to LALR algorithm


Expand itemset by using epsilon moves


The first set is a mapping from a grammar symbol to a set of set of all terminal symbols that can be the first terminal when looking for the grammar symbol


Create all LR1 states


Generates a parser from the grammar


Generate parsing tables


Calculates the initial item set

next_item_set(itemset, symbol)

Determines the next itemset for the current set and a symbol This is the goto procedure


Reduce according to the given rule


Shift over the next token and go to the given state

class, number)

A state in the parsing machine. A state contains a set of items and a state number

Calculate first sets for each grammar symbol This is a dictionary which maps each grammar symbol to a set of terminals that can be encountered first when looking for the symbol.

Parser generator script


Generator that writes generated parser to file


Generate python script with the parser table


Print helper function that prints to output file


Generator that generates lexical tokens from text.

Optionally yield the EOF token.


Implements a recursive descent parser to parse grammar rules.

We could have made an generated parser, but that would yield a chicken egg issue.


Entry parse function into recursive descent parser


Parse the right hand side of a rule definition


Parse a rule definition

Load a parser spec file, generate LR tables and create module