Back-end

The back-end is more complicated. There are several steps to be taken here.

  1. Tree creation
  2. Instruction selection
  3. Register allocation
  4. Peep hole optimization
digraph codegen {
1 [label="IR-code"]

10 [label="irdag"]
20 [label="dagsplitter"]
30 [label="instruction selector"]
40 [label="register allocator"]

49 [label="assembly parser"]
50 [label="outstream"]
60 [label="object file"]
61 [label="text output"]
1 -> 10
10 -> 20 [label="Selection DAG"]
20 -> 30 [label="Selection Trees"]
30 -> 40 [label="frame"]
40 -> 50 [label="frame"]

49 -> 50
50 -> 60
50 -> 61
}

Code generator

Machine code generator.

The architecture is provided when the generator is created.

digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{CodeGenerator|arch\ldebug_db : DebugDb\linstruction_scheduler : InstructionScheduler\linstruction_selector : InstructionSelector1\llogger : RootLogger, NoneType\lregister_allocator : GraphColoringRegisterAllocator\lsgraph_builder : SelectionGraphBuilder\lverifier : Verifier\l|emit_frame_to_stream()\lgenerate()\lgenerate_function()\lselect_and_schedule()\l}", shape="record"];
}

Canonicalize

During this phase, the IR-code is made simpler. Also unsupported operations are rewritten into function calls. For example soft floating point is introduced here.

Tree building

From IR-code a tree is generated which can be used to select instructions.

IR to DAG

The process of instruction selection is preceeded by the creation of a selection DAG (directed acyclic graph). The dagger take ir-code as input and produces such a dag for instruction selection.

A DAG represents the logic (computation) of a single basic block.

To do selection with tree matching, the DAG is then splitted into a series of tree patterns. This is often referred to as a forest of trees.

class ppci.codegen.irdag.SelectionGraphBuilder(arch)

Create a selectiongraph from a function for instruction selection

build(ir_function: ppci.ir.SubRoutine, function_info, debug_db)

Create a selection graph for the given function.

Selection graph is divided into groups for each basic block.

DAG splitting into trees.

The selection dag is splitted into trees by this module.

class ppci.codegen.dagsplit.DagSplitter(arch)

Class that splits a DAG into a forest of trees.

This series is sorted such that data dependencies are met. The trees can henceforth be used to match trees.

split_into_trees(sgraph, ir_function, function_info, debug_db)

Split a forest of trees into a sorted series of trees for each block.

Instruction selection

The instruction selection phase takes care of scheduling and instruction selection. The output of this phase is a one frame per function with a flat list of abstract machine instructions.

To select instruction, a tree rewrite system is used. This is also called bottom up rewrite generator (BURG). See pyburg.

Instruction selector.

This part of the compiler takes in a DAG (directed acyclic graph) of instructions and selects the proper target instructions.

Selecting instructions from a DAG is a NP-complete problem. The simplest strategy is to split the DAG into a forest of trees and match these trees.

Another solution may be: PBQP (Partitioned Boolean Quadratic Programming)

The selection process creates operations in a selection DAG. The possible operations are listed in the below table:

operation types description
ADD(c0,c1) I,U Add its operands
SUB(c0,c1) I,U Substracts c1 from c0
MUL(c0,c1) I,U Multiplies c0 by c1
DIV(c0,c1) I,U Divides c0 by c1
OR(c0,c1) I,U Bitwise or
AND(c0,c1) I,U Bitwise and
XOR(c0,c1) I,U Bitwise exclusive or
LDR(c0) I,U Load from memory
STR(c0,c1) I,U Store value c1 at memory address c0
FPREL U Frame pointer relative location
CONST I,U Constant value
REG I,U Value in a specific register
JMP I,U Jump to a label
CJMP I,U Conditional jump to a label

Memory move operations:

  • STRI64(REGI32[rax], CONSTI32[1])
  • MOVB()

Register allocation

Selected instructions use virtual registers and physical registers. Register allocation is the process of assigning a register to the virtual registers.

Some key concepts in the domain of register allocation are:

  • virtual register: A value which must be mapped to a physical register.
  • physical register: A real register
  • interference graph: A graph in which each node represents a register. An edge indicates that the two registers cannot have the same register assigned.
  • pre-colored register: A register that is already assigned a specific physical register.
  • coalescing: The process of merging two nodes in an interference graph which do not interfere and are connected by a move instruction.
  • spilling: Spilling is the process when no physical register can be found for a certain virtual register. Then this value will be placed in memory.
  • register class: Most CPU’s contain registers grouped into classes. For example, there may be registers for floating point, and registers for integer operations.
  • register alias: Some registers may alias to registers in another class. A good example are the x86 registers rax, eax, ax, al and ah.

Interference graph

Each instruction in the instruction list may use or define certain registers. A register is live between a use and define of a register. Registers that are live at the same point in the program interfere with each other. An interference graph is a graph in which each node represents a register and each edge represents interference between those two registers.

Graph coloring

In 1981 Chaitin presented the idea to use graph coloring for register allocation.

In a graph a node can be colored if it has less neighbours than possible colors. This is true because when each neighbour has a different color, there is still a valid color left for the node itself.

Given a graph, if a node can be colored, remove this node from the graph and put it on a stack. When added back to the graph, it can be given a color. Now repeat this process recursively with the remaining graph. When the graph is empty, place all nodes back from the stack one by one and assign each node a color when placed. Remember that when adding back, a color can be found, because this was the criteria during removal.

See: https://en.wikipedia.org/wiki/Chaitin%27s_algorithm

[Chaitin1982]

Coalescing

Coalescing is the process of merging two nodes in an interference graph. This means that two temporaries will be assigned to the same register. This is especially useful if the temporaries are used in move instructions, since when the source and the destination of a move instruction are the same register, the move can be deleted.

Coalescing a move instruction is easy when an interference graph is present. Two nodes that are used by a move instruction can be coalesced when they do not interfere.

However, if we coalesc all moves, the graph can become incolorable, and spilling has to be done. To prevent spilling, coalescing must be done conservatively.

A conservative approach is the following: if the merged node has fewer than K nodes of significant degree, then the nodes can be coalesced. The prove for this is that when all nodes that can be colored are removed and the merged node and its non-colorable neighbours remain, the merged node can be colored. This ensures that the coalescing of the node does not have a negative effect on the colorability of the graph.

[Briggs1994]

Spilling

Iterated register coalescing

Iterated register coalescing (IRC) is a combination of graph coloring, coalescing and spilling. The process consists of the following steps:

  • build an interference graph from the instruction list
  • remove trivial colorable nodes.
  • (optional) coalesc registers to remove redundant moves
  • (optional) spill registers
  • select registers

See: https://en.wikipedia.org/wiki/Register_allocation

[George1996]

Graph coloring with more register classes

Most instruction sets are not ideal, and hence simple graph coloring cannot be used. The algorithm can be modified to work with several register classes that possibly interfere.

[Runeson2003] [Smith2004]

Implementations

The following class can be used to perform register allocation.

class ppci.codegen.registerallocator.GraphColoringRegisterAllocator(arch: ppci.arch.arch.Architecture, instruction_selector)

Target independent register allocator.

Algorithm is iterated register coalescing by Appel and George. Also the pq-test algorithm for more register classes is added.

alloc_frame(frame: ppci.arch.stack.Frame)

Do iterated register allocation for a single frame.

This is the entry function for the register allocator and drives through all stages of the algorithm.

Parameters:frame – The frame to perform register allocation on.
coalesc()

Coalesc moves conservative.

This means, merge the variables of a move into one variable, and delete the move. But do this only when no spill will occur.

freeze()

Give up coalescing on some node, move it to the simplify list and freeze all moves associated with it.

is_colorable(node)

Helper function to determine whether a node is trivially colorable. This means: no matter the colors of the nodes neighbours, the node can be given a color.

In case of one register class, this is: n.degree < self.K

In case of more than one register class, somehow the worst case damage by all neighbours must be determined.

We do this now with the pq-test.

simplify()

Remove nodes from the graph

code emission

Code is emitted using the outputstream class. The assembler and compiler use this class to emit instructions to. The stream can output to object file or to a logger.

class ppci.binutils.outstream.OutputStream

Interface to generate code with.

Contains the emit function to output instruction to the stream.