Language tools

The ppci.lang.tools package contains tools for programming language construction.

Diagnostics

To provide useful feedback about programming errors, there are classes for sourcecode location and diagnostics.

class ppci.lang.common.SourceLocation(filename, row, col, ln, source=None)

A location that refers to a position in a source file

get_source_line()

Return the source line indicated by this location

print_message(message, lines=None, filename=None, file=None)

Print a message at this location in the given source lines

class ppci.lang.common.SourceRange(p1, p2)
p1

Alias for field number 0

p2

Alias for field number 1

class ppci.lang.common.Token(typ, val, loc)

Token is used in the lexical analyzer. The lexical analyzer takes a text and splits it into tokens.

ppci.lang.common.print_line(row, lines, file=None)

Print a single source line

Lexing

class ppci.lang.tools.baselex.BaseLexer(tok_spec)

Base class for a lexer.

This class can be overridden to create a lexer. This class handles the regular expression generation and source position accounting.

feed(txt)

Feeds the lexer with extra input

newline()

Enters a new line

tokenize(txt, eof=False)

Generator that generates lexical tokens from text.

Optionally yield the EOF token.

class ppci.lang.tools.baselex.LexMeta

Meta class which inspects the functions decorated with ‘on’

class ppci.lang.tools.baselex.SimpleLexer

Simple class for lexing.

Use this class by subclassing it and decorating handler methods with the ‘on’ function.

gettok()

Find a match at the given position

tokenize(txt, eof=False)

Generator that generates lexical tokens from text.

Optionally yield the EOF token.

ppci.lang.tools.baselex.on(pattern, flags=0, order=0)

Register method to the given pattern.

Parameters:order – a sorting priority. Lower number comes first.

Grammar

To help to define a grammar for a language the grammar classes can be used to define a language grammar. From this grammar parsers can be generated.

A grammar can be defined like this:

>>> from ppci.lang.tools.grammar import Grammar
>>> g = Grammar()
>>> g.add_terminal('term')
>>> g.add_production('expr', ['expr', '+', 'term'])
>>> g.add_production('expr', ['expr', '-', 'term'])
>>> g.dump()
Grammar with 2 rules and 1 terminals
expr -> ('expr', '+', 'term') P_0
expr -> ('expr', '-', 'term') P_0
>>> g.is_terminal('expr')
False
>>> g.is_terminal('term')
True
class ppci.lang.tools.grammar.Grammar

Defines a grammar of a language

add_one_or_more(element_nonterm, list_nonterm)

Helper to add the rule lst: elem lst: lst elem

add_production(name, symbols, semantics=None, priority=0)

Add a production rule to the grammar

add_terminal(name)

Add a terminal name

add_terminals(terminals)

Add all terminals to terminals for this grammar

check_symbols()

Checks no symbols are undefined

create_combinations(rule, non_terminal)

Create n copies of rule where nt is removed. For example replace: A -> B C B by: A -> B C B A -> B C A -> C B A -> C if: B -> epsilon

dump()

Print this grammar

is_nonterminal(name)

Check if a name is a non-terminal

is_normal

Check if this grammar is normal. Which means: - No empty productions (epsilon productions)

is_terminal(name)

Check if a name is a terminal

productions_for_name(name)

Retrieve all productions for a non terminal

rewrite_eps_productions()

Make the grammar free of empty productions. Do this by permutating all combinations of rules that would otherwise contain an empty place.

symbols

Get all the symbols defined by this grammar

class ppci.lang.tools.grammar.Production(name, symbols, semantics, priority=0)

Production rule for a grammar. It consists of a left hand side non-terminal and a list of symbols as right hand side. Also it contains a function that must be called when this rule is applied. The right hand side may contain terminals and non-terminals.

is_epsilon

Checks if this rule is an epsilon rule

ppci.lang.tools.grammar.print_grammar(g)

Pretty print a grammar

Earley parser

Implementation of the earley parser strategy.

See also: - https://en.wikipedia.org/wiki/Earley_parser

And the book: Parsing Techniques: A Practical Guide (2nd edition)

class ppci.lang.tools.earley.Column(i, token)

A set of partially parsed items for a given token position

class ppci.lang.tools.earley.EarleyParser(grammar)

Earley parser.

As opposed to an LR parser, the Earley parser does not construct tables from a grammar. It uses the grammar when parsing.

The Earley parser has 3 key functions:

  • predict: what have we parsed so far, and what productions can be made with this.
  • scan: parse the next input symbol, en create a new set of possible parsings
  • complete: when we have scanned something according to a rule, this rule can be applied.

When an earley parse is complete, the parse can be back-tracked to yield the resulting parse tree or the syntax tree.

complete(completed_item, start_col, current_column)

Complete a rule, check if any other rules can be shifted!

make_tree(columns, nt)

Make a parse tree

parse(tokens, debug_dump=False)

Parse the given token string

predict(item, col)

Add all rules for a certain non-terminal

scan(item, col)

Check if the item can be shifted into the next column

walk(columns, end, nt)

Process the parsed columns back to a parse tree

class ppci.lang.tools.earley.Item(rule, dot, origin)

Partially parsed grammar rule

Recursive descent

Recursive descent parsing (also known as handwritten parsing) is an easy way to parse sourcecode.

class ppci.lang.tools.recursivedescent.RecursiveDescentParser

Base class for recursive descent parsers

consume(typ=None)

Assert that the next token is typ, and if so, return it.

If typ is a list or tuple, consume one of the given types. If typ is not given, consume the next token.

error(msg, loc=None)

Raise an error at the current location

has_consumed(typ)

Checks if the look-ahead token is of type typ, and if so eats the token and returns true

init_lexer(tokens)

Initialize the parser with the given tokens (an iterator)

look_ahead(amount)

Take a look at x tokens ahead

next_token()

Advance to the next token

not_impl(msg='')

Call this function when parsing reaches unimplemented parts

peek

Look at the next token to parse without popping it

LR parsing

class ppci.lang.tools.lr.Item(production, dotpos, look_ahead)

Represents a partially parsed item It has a production it is looking for, a position in this production called the ‘dot’ and a look ahead symbol that must follow this item.

NextNext

Gets the symbol after the next symbol, or EPS if at the end

can_shift_over(symbol)

Determines if this item can shift over the given symbol

is_reduce

Check if this item has the dot at the end

shifted()

Creates a new item that is shifted one position

class ppci.lang.tools.lr.LrParser(grammar, action_table, goto_table)

LR parser automata. This class takes goto and action table and can then process a sequence of tokens.

parse(lexer)

Parse an iterable with tokens

class ppci.lang.tools.lr.LrParserBuilder(grammar)

Construct goto and action tables according to LALR algorithm

closure(itemset)

Expand itemset by using epsilon moves

first

The first set is a mapping from a grammar symbol to a set of set of all terminal symbols that can be the first terminal when looking for the grammar symbol

gen_canonical_set(iis)

Create all LR1 states

generate_parser()

Generates a parser from the grammar

generate_tables()

Generate parsing tables

initial_item_set()

Calculates the initial item set

next_item_set(itemset, symbol)

Determines the next itemset for the current set and a symbol This is the goto procedure

class ppci.lang.tools.lr.Reduce(rule)

Reduce according to the given rule

class ppci.lang.tools.lr.Shift(to_state)

Shift over the next token and go to the given state

class ppci.lang.tools.lr.State(items, number)

A state in the parsing machine. A state contains a set of items and a state number

ppci.lang.tools.lr.calculate_first_sets(grammar)

Calculate first sets for each grammar symbol This is a dictionary which maps each grammar symbol to a set of terminals that can be encountered first when looking for the symbol.

Parser generator script

class ppci.lang.tools.yacc.XaccGenerator

Generator that writes generated parser to file

generate_python_script()

Generate python script with the parser table

print(*args)

Print helper function that prints to output file

class ppci.lang.tools.yacc.XaccLexer
tokenize(txt)

Generator that generates lexical tokens from text.

Optionally yield the EOF token.

class ppci.lang.tools.yacc.XaccParser

Implements a recursive descent parser to parse grammar rules.

We could have made an generated parser, but that would yield a chicken egg issue.

parse_grammar(tokens)

Entry parse function into recursive descent parser

parse_rhs()

Parse the right hand side of a rule definition

parse_rule()

Parse a rule definition

ppci.lang.tools.yacc.load_as_module(filename)

Load a parser spec file, generate LR tables and create module