Language tools¶
The ppci.lang.tools
package contains tools for programming language
construction.
Diagnostics¶
To provide useful feedback about programming errors, there are classes for sourcecode location and diagnostics.
-
class
ppci.lang.common.
SourceLocation
(filename, row, col, ln, source=None)¶ A location that refers to a position in a source file
-
get_source_line
()¶ Return the source line indicated by this location
-
print_message
(message, lines=None, filename=None, file=None)¶ Print a message at this location in the given source lines
-
-
class
ppci.lang.common.
SourceRange
(p1, p2)¶ -
p1
¶ Alias for field number 0
-
p2
¶ Alias for field number 1
-
-
class
ppci.lang.common.
Token
(typ, val, loc)¶ Token is used in the lexical analyzer. The lexical analyzer takes a text and splits it into tokens.
-
ppci.lang.common.
print_line
(row, lines, file=None)¶ Print a single source line
Lexing¶
-
class
ppci.lang.tools.baselex.
BaseLexer
(tok_spec)¶ Base class for a lexer.
This class can be overridden to create a lexer. This class handles the regular expression generation and source position accounting.
-
feed
(txt)¶ Feeds the lexer with extra input
-
newline
()¶ Enters a new line
-
tokenize
(txt, eof=False)¶ Generator that generates lexical tokens from text.
Optionally yield the EOF token.
-
-
class
ppci.lang.tools.baselex.
LexMeta
¶ Meta class which inspects the functions decorated with ‘on’
-
class
ppci.lang.tools.baselex.
SimpleLexer
¶ Simple class for lexing.
Use this class by subclassing it and decorating handler methods with the ‘on’ function.
-
gettok
()¶ Find a match at the given position
-
tokenize
(txt, eof=False)¶ Generator that generates lexical tokens from text.
Optionally yield the EOF token.
-
-
ppci.lang.tools.baselex.
on
(pattern, flags=0, order=0)¶ Register method to the given pattern.
Parameters: order – a sorting priority. Lower number comes first.
Grammar¶
To help to define a grammar for a language the grammar classes can be used to define a language grammar. From this grammar parsers can be generated.
A grammar can be defined like this:
>>> from ppci.lang.tools.grammar import Grammar
>>> g = Grammar()
>>> g.add_terminal('term')
>>> g.add_production('expr', ['expr', '+', 'term'])
>>> g.add_production('expr', ['expr', '-', 'term'])
>>> g.dump()
Grammar with 2 rules and 1 terminals
expr -> ('expr', '+', 'term') P_0
expr -> ('expr', '-', 'term') P_0
>>> g.is_terminal('expr')
False
>>> g.is_terminal('term')
True
-
class
ppci.lang.tools.grammar.
Grammar
¶ Defines a grammar of a language
-
add_one_or_more
(element_nonterm, list_nonterm)¶ Helper to add the rule lst: elem lst: lst elem
-
add_production
(name, symbols, semantics=None, priority=0)¶ Add a production rule to the grammar
-
add_terminal
(name)¶ Add a terminal name
-
add_terminals
(terminals)¶ Add all terminals to terminals for this grammar
-
check_symbols
()¶ Checks no symbols are undefined
-
create_combinations
(rule, non_terminal)¶ Create n copies of rule where nt is removed. For example replace: A -> B C B by: A -> B C B A -> B C A -> C B A -> C if: B -> epsilon
-
dump
()¶ Print this grammar
-
is_nonterminal
(name)¶ Check if a name is a non-terminal
-
is_normal
¶ Check if this grammar is normal. Which means: - No empty productions (epsilon productions)
-
is_terminal
(name)¶ Check if a name is a terminal
-
productions_for_name
(name)¶ Retrieve all productions for a non terminal
-
rewrite_eps_productions
()¶ Make the grammar free of empty productions. Do this by permutating all combinations of rules that would otherwise contain an empty place.
-
symbols
¶ Get all the symbols defined by this grammar
-
-
class
ppci.lang.tools.grammar.
Production
(name, symbols, semantics, priority=0)¶ Production rule for a grammar. It consists of a left hand side non-terminal and a list of symbols as right hand side. Also it contains a function that must be called when this rule is applied. The right hand side may contain terminals and non-terminals.
-
is_epsilon
¶ Checks if this rule is an epsilon rule
-
-
ppci.lang.tools.grammar.
print_grammar
(g)¶ Pretty print a grammar
Earley parser¶
Implementation of the earley parser strategy.
See also: - https://en.wikipedia.org/wiki/Earley_parser
And the book: Parsing Techniques: A Practical Guide (2nd edition)
-
class
ppci.lang.tools.earley.
Column
(i, token)¶ A set of partially parsed items for a given token position
-
class
ppci.lang.tools.earley.
EarleyParser
(grammar)¶ Earley parser.
As opposed to an LR parser, the Earley parser does not construct tables from a grammar. It uses the grammar when parsing.
The Earley parser has 3 key functions:
- predict: what have we parsed so far, and what productions can be made with this.
- scan: parse the next input symbol, en create a new set of possible parsings
- complete: when we have scanned something according to a rule, this rule can be applied.
When an earley parse is complete, the parse can be back-tracked to yield the resulting parse tree or the syntax tree.
-
complete
(completed_item, start_col, current_column)¶ Complete a rule, check if any other rules can be shifted!
-
make_tree
(columns, nt)¶ Make a parse tree
-
parse
(tokens, debug_dump=False)¶ Parse the given token string
-
predict
(item, col)¶ Add all rules for a certain non-terminal
-
scan
(item, col)¶ Check if the item can be shifted into the next column
-
walk
(columns, end, nt)¶ Process the parsed columns back to a parse tree
-
class
ppci.lang.tools.earley.
Item
(rule, dot, origin)¶ Partially parsed grammar rule
Recursive descent¶
Recursive descent parsing (also known as handwritten parsing) is an easy way to parse sourcecode.
-
class
ppci.lang.tools.recursivedescent.
RecursiveDescentParser
¶ Base class for recursive descent parsers
-
consume
(typ=None)¶ Assert that the next token is typ, and if so, return it.
If typ is a list or tuple, consume one of the given types. If typ is not given, consume the next token.
-
error
(msg, loc=None)¶ Raise an error at the current location
-
has_consumed
(typ)¶ Checks if the look-ahead token is of type typ, and if so eats the token and returns true
-
init_lexer
(tokens)¶ Initialize the parser with the given tokens (an iterator)
-
look_ahead
(amount)¶ Take a look at x tokens ahead
-
next_token
()¶ Advance to the next token
-
not_impl
(msg='')¶ Call this function when parsing reaches unimplemented parts
-
peek
¶ Look at the next token to parse without popping it
-
LR parsing¶
-
class
ppci.lang.tools.lr.
Item
(production, dotpos, look_ahead)¶ Represents a partially parsed item It has a production it is looking for, a position in this production called the ‘dot’ and a look ahead symbol that must follow this item.
-
NextNext
¶ Gets the symbol after the next symbol, or EPS if at the end
-
can_shift_over
(symbol)¶ Determines if this item can shift over the given symbol
-
is_reduce
¶ Check if this item has the dot at the end
-
shifted
()¶ Creates a new item that is shifted one position
-
-
class
ppci.lang.tools.lr.
LrParser
(grammar, action_table, goto_table)¶ LR parser automata. This class takes goto and action table and can then process a sequence of tokens.
-
parse
(lexer)¶ Parse an iterable with tokens
-
-
class
ppci.lang.tools.lr.
LrParserBuilder
(grammar)¶ Construct goto and action tables according to LALR algorithm
-
closure
(itemset)¶ Expand itemset by using epsilon moves
-
first
¶ The first set is a mapping from a grammar symbol to a set of set of all terminal symbols that can be the first terminal when looking for the grammar symbol
-
gen_canonical_set
(iis)¶ Create all LR1 states
-
generate_parser
()¶ Generates a parser from the grammar
-
generate_tables
()¶ Generate parsing tables
-
initial_item_set
()¶ Calculates the initial item set
-
next_item_set
(itemset, symbol)¶ Determines the next itemset for the current set and a symbol This is the goto procedure
-
-
class
ppci.lang.tools.lr.
Reduce
(rule)¶ Reduce according to the given rule
-
class
ppci.lang.tools.lr.
Shift
(to_state)¶ Shift over the next token and go to the given state
-
class
ppci.lang.tools.lr.
State
(items, number)¶ A state in the parsing machine. A state contains a set of items and a state number
-
ppci.lang.tools.lr.
calculate_first_sets
(grammar)¶ Calculate first sets for each grammar symbol This is a dictionary which maps each grammar symbol to a set of terminals that can be encountered first when looking for the symbol.
Parser generator script
-
class
ppci.lang.tools.yacc.
XaccGenerator
¶ Generator that writes generated parser to file
-
generate_python_script
()¶ Generate python script with the parser table
-
print
(*args)¶ Print helper function that prints to output file
-
-
class
ppci.lang.tools.yacc.
XaccLexer
¶ -
tokenize
(txt)¶ Generator that generates lexical tokens from text.
Optionally yield the EOF token.
-
-
class
ppci.lang.tools.yacc.
XaccParser
¶ Implements a recursive descent parser to parse grammar rules.
We could have made an generated parser, but that would yield a chicken egg issue.
-
parse_grammar
(tokens)¶ Entry parse function into recursive descent parser
-
parse_rhs
()¶ Parse the right hand side of a rule definition
-
parse_rule
()¶ Parse a rule definition
-
-
ppci.lang.tools.yacc.
load_as_module
(filename)¶ Load a parser spec file, generate LR tables and create module