Back-end

The back-end is more complicated. There are several steps to be taken here.

  1. Tree creation
  2. Instruction selection
  3. Register allocation
  4. Peep hole optimization
digraph codegen {
1 [label="IR-code"]

10 [label="irdag"]
20 [label="dagsplitter"]
30 [label="instruction selector"]
40 [label="register allocator"]

49 [label="assembly parser"]
50 [label="outstream"]
60 [label="object file"]
61 [label="text output"]
1 -> 10
10 -> 20 [label="Selection DAG"]
20 -> 30 [label="Selection Trees"]
30 -> 40 [label="frame"]
40 -> 50 [label="frame"]

49 -> 50
50 -> 60
50 -> 61
}

Code generator

Machine code generator.

The architecture is provided when the generator is created.

class ppci.codegen.codegen.CodeGenerator(arch, optimize_for='size')

Machine code generator

emit_frame_to_stream(frame, output_stream, debug=False)

Add code for the prologue and the epilogue. Add a label, the return instruction and the stack pointer adjustment for the frame. At this point we know how much stack space must be reserved for locals and what registers should be saved.

generate(ircode: ppci.ir.Module, output_stream, reporter, debug=False)

Generate machine code from ir-code into output stream

generate_function(ir_function, output_stream, reporter, debug=False)

Generate code for one function into a frame

select_and_schedule(ir_function, frame, reporter)

Perform instruction selection and scheduling

digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{CodeGenerator|arch\ldebug_db : DebugDb\linstruction_scheduler : InstructionScheduler\linstruction_selector : InstructionSelector1\llogger : Logger, RootLogger, NoneType\lregister_allocator : GraphColoringRegisterAllocator\lsgraph_builder : SelectionGraphBuilder\lverifier : Verifier\l|emit_frame_to_stream()\lgenerate()\lgenerate_function()\lselect_and_schedule()\l}", shape="record"];
}

Canonicalize

During this phase, the IR-code is made simpler. Also unsupported operations are rewritten into function calls. For example soft floating point is introduced here.

Tree building

From IR-code a tree is generated which can be used to select instructions.

IR to DAG

The process of instruction selection is preceeded by the creation of a selection DAG (directed acyclic graph). The dagger take ir-code as input and produces such a dag for instruction selection.

A DAG represents the logic (computation) of a single basic block.

To do selection with tree matching, the DAG is then splitted into a series of tree patterns. This is often referred to as a forest of trees.

class ppci.codegen.irdag.FunctionInfo(frame)

Keeps track of global function data when generating code for part of a functions.

class ppci.codegen.irdag.Operation(op, ty)

A single operation with a type

class ppci.codegen.irdag.SelectionGraphBuilder(arch)

Create a selectiongraph from a function for instruction selection

block_to_sgraph(ir_block: ppci.ir.Block, function_info)

Create dag (directed acyclic graph) from a basic block.

The resulting dag can be used for instruction selection.

build(ir_function: ppci.ir.SubRoutine, function_info, debug_db)

Create a selection graph for the given function.

Selection graph is divided into groups for each basic block.

copy_phis_of_successors(ir_block)

When a terminator instruction is encountered, handle the copy of phi values into the expected virtual register

do_address_of(node)

Process ir.AddressOf instruction

do_alloc(node)

Process the alloc instruction

do_binop(node)

Visit a binary operator and create a DAG node

do_c_jump(node)

Process conditional jump into dag

do_cast(node)

Create a cast of type

do_const(node)

Process constant instruction

do_function_call(node)

Transform a function call

do_literal_data(node)

Literal data is stored after a label

do_load(node)

Create dag node for load operation

do_phi(node)

Refer to the correct copy of the phi node

do_procedure_call(node)

Transform a procedure call

do_return(node)

Move result into result register and jump to epilog

do_store(node)

Create a DAG node for the store operation

do_unop(node)

Visit an unary operator and create a DAG node

get_address(ir_address)

Determine address for load or store.

new_node(name, ty, *args, value=None)

Create a new selection graph node, and add it to the graph

new_vreg(ty)

Generate a new temporary fitting for the given type

ppci.codegen.irdag.depth_first_order(function)

Return blocks in depth first search order

ppci.codegen.irdag.make_map(cls)

Add an attribute to the class that is a map of ir types to handler functions. For example if a function is called do_phi it will be registered into f_map under key ir.Phi.

ppci.codegen.irdag.prepare_function_info(arch, function_info, ir_function)

Fill function info with labels for all basic blocks

DAG splitting into trees.

The selection dag is splitted into trees by this module.

class ppci.codegen.dagsplit.DagSplitter(arch)

Class that splits a DAG into a forest of trees.

This series is sorted such that data dependencies are met. The trees can henceforth be used to match trees.

assign_vregs(sgraph, function_info)

Give vreg values to values that cross block boundaries

check_vreg(node, frame)

Determine whether node outputs need a virtual register

get_reg_class(data_flow)

Determine the register class suited for this data flow line

make_op(op, typ)

Construct a string opcode from an operation and a type

make_trees(nodes, tail_node)

Create a tree from a list of sorted nodes.

split_into_trees(sgraph, ir_function, function_info, debug_db)

Split a forest of trees into a sorted series of trees for each block.

ppci.codegen.dagsplit.topological_sort_modified(nodes, start)

Modified topological sort, start at the end and work back

Instruction selection

The instruction selection phase takes care of scheduling and instruction selection. The output of this phase is a one frame per function with a flat list of abstract machine instructions.

To select instruction, a tree rewrite system is used. This is also called bottom up rewrite generator (BURG). See pyburg.

Instruction selector.

This part of the compiler takes in a DAG (directed acyclic graph) of instructions and selects the proper target instructions.

Selecting instructions from a DAG is a NP-complete problem. The simplest strategy is to split the DAG into a forest of trees and match these trees.

Another solution may be: PBQP (Partitioned Boolean Quadratic Programming)

The selection process creates operations in a selection DAG. The possible operations are listed in the below table:

operation types description
ADD(c0,c1) I,U Add its operands
SUB(c0,c1) I,U Substracts c1 from c0
MUL(c0,c1) I,U Multiplies c0 by c1
DIV(c0,c1) I,U Divides c0 by c1
OR(c0,c1) I,U Bitwise or
AND(c0,c1) I,U Bitwise and
XOR(c0,c1) I,U Bitwise exclusive or
LDR(c0) I,U Load from memory
STR(c0,c1) I,U Store value c1 at memory address c0
FPREL U Frame pointer relative location
CONST I,U Constant value
REG I,U Value in a specific register
JMP I,U Jump to a label
CJMP I,U Conditional jump to a label

Memory move operations:

  • STRI64(REGI32[rax], CONSTI32[1])
  • MOVB()
class ppci.codegen.instructionselector.InstructionContext(frame, arch)

Usable to patterns when emitting code

emit(instruction)

Abstract instruction emitter proxy

move(dst, src)

Generate move

new_label()

Generate a new unique label

new_reg(cls)

Generate a new temporary of a given class

class ppci.codegen.instructionselector.InstructionSelector1(arch, sgraph_builder, weights=(1, 1, 1))

Instruction selector which takes in a DAG and puts instructions into a frame.

This one does selection and scheduling combined.

gen_tree(context, tree)

Generate code from a tree

memcp()

Invoke memcpy arch function

munch_trees(context, trees)

Consume a dag and match it using the matcher to the frame. DAG matching is NP-complete.

The simplest strategy is to split the dag into a forest of trees. Then, the DAG is reduced to only trees, which can be matched.

A different approach is use 0-1 programming, like the NOLTIS algo.

TODO: implement different strategies.

select(ir_function: ppci.ir.SubRoutine, frame, reporter)

Select instructions of function into a frame

class ppci.codegen.instructionselector.TreeSelector(sys)

Tree matcher that can match a tree and generate instructions

apply_rules(context, tree, goal)

Apply all selected instructions to the tree

burm_label(tree)

Label all nodes in the tree bottom up

gen(context, tree)

Generate code for a given tree. The tree will be tiled with patterns and the corresponding code will be emitted

kids(tree, rule)

Determine the kid trees for a rule

nts(rule)

Get the open ends of this rules pattern

Register allocation

Selected instructions use virtual registers and physical registers. Register allocation is the process of assigning a register to the virtual registers.

Some key concepts in the domain of register allocation are:

  • virtual register: A value which must be mapped to a physical register.
  • physical register: A real register
  • interference graph: A graph in which each node represents a register. An edge indicates that the two registers cannot have the same register assigned.
  • pre-colored register: A register that is already assigned a specific physical register.
  • coalescing: The process of merging two nodes in an interference graph which do not interfere and are connected by a move instruction.
  • spilling: Spilling is the process when no physical register can be found for a certain virtual register. Then this value will be placed in memory.
  • register class: Most CPU’s contain registers grouped into classes. For example, there may be registers for floating point, and registers for integer operations.
  • register alias: Some registers may alias to registers in another class. A good example are the x86 registers rax, eax, ax, al and ah.

Interference graph

Each instruction in the instruction list may use or define certain registers. A register is live between a use and define of a register. Registers that are live at the same point in the program interfere with each other. An interference graph is a graph in which each node represents a register and each edge represents interference between those two registers.

Graph coloring

In 1981 Chaitin presented the idea to use graph coloring for register allocation.

In a graph a node can be colored if it has less neighbours than possible colors. This is true because when each neighbour has a different color, there is still a valid color left for the node itself.

Given a graph, if a node can be colored, remove this node from the graph and put it on a stack. When added back to the graph, it can be given a color. Now repeat this process recursively with the remaining graph. When the graph is empty, place all nodes back from the stack one by one and assign each node a color when placed. Remember that when adding back, a color can be found, because this was the criteria during removal.

See: https://en.wikipedia.org/wiki/Chaitin%27s_algorithm

[Chaitin1982]

Coalescing

Coalescing is the process of merging two nodes in an interference graph. This means that two temporaries will be assigned to the same register. This is especially useful if the temporaries are used in move instructions, since when the source and the destination of a move instruction are the same register, the move can be deleted.

Coalescing a move instruction is easy when an interference graph is present. Two nodes that are used by a move instruction can be coalesced when they do not interfere.

However, if we coalesc all moves, the graph can become incolorable, and spilling has to be done. To prevent spilling, coalescing must be done conservatively.

A conservative approach is the following: if the merged node has fewer than K nodes of significant degree, then the nodes can be coalesced. The prove for this is that when all nodes that can be colored are removed and the merged node and its non-colorable neighbours remain, the merged node can be colored. This ensures that the coalescing of the node does not have a negative effect on the colorability of the graph.

[Briggs1994]

Spilling

Iterated register coalescing

Iterated register coalescing (IRC) is a combination of graph coloring, coalescing and spilling. The process consists of the following steps:

  • build an interference graph from the instruction list
  • remove trivial colorable nodes.
  • (optional) coalesc registers to remove redundant moves
  • (optional) spill registers
  • select registers

See: https://en.wikipedia.org/wiki/Register_allocation

[George1996]

Graph coloring with more register classes

Most instruction sets are not ideal, and hence simple graph coloring cannot be used. The algorithm can be modified to work with several register classes that possibly interfere.

[Runeson2003] [Smith2004]

Implementations

The following class can be used to perform register allocation.

class ppci.codegen.registerallocator.GraphColoringRegisterAllocator(arch: ppci.arch.arch.Architecture, instruction_selector)

Target independent register allocator.

Algorithm is iterated register coalescing by Appel and George. Also the pq-test algorithm for more register classes is added.

alloc_frame(frame: ppci.arch.stack.Frame)

Do iterated register allocation for a single frame.

This is the entry function for the register allocator and drives through all stages of the algorithm.

Parameters:frame – The frame to perform register allocation on.
coalesc()

Coalesc moves conservative.

This means, merge the variables of a move into one variable, and delete the move. But do this only when no spill will occur.

freeze()

Give up coalescing on some node, move it to the simplify list and freeze all moves associated with it.

is_colorable(node)

Helper function to determine whether a node is trivially colorable. This means: no matter the colors of the nodes neighbours, the node can be given a color.

In case of one register class, this is: n.degree < self.K

In case of more than one register class, somehow the worst case damage by all neighbours must be determined.

We do this now with the pq-test.

simplify()

Remove nodes from the graph

class ppci.codegen.registerallocator.GraphColoringRegisterAllocator(arch: ppci.arch.arch.Architecture, instruction_selector)

Target independent register allocator.

Algorithm is iterated register coalescing by Appel and George. Also the pq-test algorithm for more register classes is added.

alloc_frame(frame: ppci.arch.stack.Frame)

Do iterated register allocation for a single frame.

This is the entry function for the register allocator and drives through all stages of the algorithm.

Parameters:frame – The frame to perform register allocation on.
apply_colors()

Assign colors to registers

assign_colors()

Add nodes back to the graph to color it.

check_invariants()

Test invariants

coalesc()

Coalesc moves conservative.

This means, merge the variables of a move into one variable, and delete the move. But do this only when no spill will occur.

combine(u, v)

Combine u and v into one node, updating work lists

common_reg_class

Determine the smallest common register class of two nodes

conservative(u, v)

Briggs conservative criteria for coalescing.

If the result of the merge has fewer than K nodes that are not trivially colorable, then coalescing is safe, because when coloring, all other nodes that can be colored will be popped from the graph, leaving the merged node that then can be colored.

In the case of multiple register classes, first determine the new neighbour nodes. Then assume that all nodes that can be colored will be colored, and are taken out of the graph. Then calculate how many registers can be blocked by the remaining nodes. If this is less than the number of available registers, the coalesc is safe!

decrement_degree(m)

If a node was lowered in degree, check if there are nodes that can be moved from the spill list to the freeze of simplify list

freeze()

Give up coalescing on some node, move it to the simplify list and freeze all moves associated with it.

has_edge(t, r)

Helper function to check for an interfering edge

init_data(frame)

Initialize data structures

is_colorable(node)

Helper function to determine whether a node is trivially colorable. This means: no matter the colors of the nodes neighbours, the node can be given a color.

In case of one register class, this is: n.degree < self.K

In case of more than one register class, somehow the worst case damage by all neighbours must be determined.

We do this now with the pq-test.

Check if a node is used by move instructions

ok(t, r)

Implement coalescing testing with pre-colored register

q

The number of class B registers that can be blocked by class C.

remove_redundant_moves()

Remove coalesced moves

rewrite_program(node)

Rewrite program by creating a load and a store for each use

simplify()

Remove nodes from the graph

spill()

Do spilling

class ppci.codegen.registerallocator.MiniGen(arch, selector)

Spill code generator

gen(frame, tree)

Generate code for a given tree

gen_load(frame, vreg, slot)

Generate instructions to load vreg from a stack slot

gen_store(frame, vreg, slot)

Generate instructions to store vreg at a stack slot

code emission

Code is emitted using the outputstream class. The assembler and compiler use this class to emit instructions to. The stream can output to object file or to a logger.

class ppci.binutils.outstream.OutputStream

Interface to generate code with.

Contains the emit function to output instruction to the stream.

do_emit(item)

Actual emit implementation

emit(item)

Encode instruction and add symbol and relocation information

emit_all(items)

Emit all items from an iterable

select_section(name)

Switch output to certain section