Back-end¶
The back-end is more complicated. There are several steps to be taken here.
- Tree creation
- Instruction selection
- Register allocation
- Peep hole optimization
![digraph codegen {
1 [label="IR-code"]
10 [label="irdag"]
20 [label="dagsplitter"]
30 [label="instruction selector"]
40 [label="register allocator"]
49 [label="assembly parser"]
50 [label="outstream"]
60 [label="object file"]
61 [label="text output"]
1 -> 10
10 -> 20 [label="Selection DAG"]
20 -> 30 [label="Selection Trees"]
30 -> 40 [label="frame"]
40 -> 50 [label="frame"]
49 -> 50
50 -> 60
50 -> 61
}](../_images/graphviz-00fabbefa2df4d9e17aa0b5c347e36ab5a93691c.png)
Code generator¶
Machine code generator.
The architecture is provided when the generator is created.
-
class
ppci.codegen.codegen.
CodeGenerator
(arch, optimize_for='size')¶ Machine code generator
-
emit_frame_to_stream
(frame, output_stream, debug=False)¶ Add code for the prologue and the epilogue. Add a label, the return instruction and the stack pointer adjustment for the frame. At this point we know how much stack space must be reserved for locals and what registers should be saved.
-
generate
(ircode: ppci.ir.Module, output_stream, reporter, debug=False)¶ Generate machine code from ir-code into output stream
-
generate_function
(ir_function, output_stream, reporter, debug=False)¶ Generate code for one function into a frame
-
select_and_schedule
(ir_function, frame, reporter)¶ Perform instruction selection and scheduling
-
![digraph "classes_foo" {
charset="utf-8"
rankdir=BT
"0" [label="{CodeGenerator|arch\ldebug_db : DebugDb\linstruction_scheduler : InstructionScheduler\linstruction_selector : InstructionSelector1\llogger : Logger, RootLogger, NoneType\lregister_allocator : GraphColoringRegisterAllocator\lsgraph_builder : SelectionGraphBuilder\lverifier : Verifier\l|emit_frame_to_stream()\lgenerate()\lgenerate_function()\lselect_and_schedule()\l}", shape="record"];
}](../_images/graphviz-231d20b8a3aab591336ef5a505d64df44138cfcc.png)
Canonicalize¶
During this phase, the IR-code is made simpler. Also unsupported operations are rewritten into function calls. For example soft floating point is introduced here.
Tree building¶
From IR-code a tree is generated which can be used to select instructions.
IR to DAG
The process of instruction selection is preceeded by the creation of a selection DAG (directed acyclic graph). The dagger take ir-code as input and produces such a dag for instruction selection.
A DAG represents the logic (computation) of a single basic block.
To do selection with tree matching, the DAG is then splitted into a series of tree patterns. This is often referred to as a forest of trees.
-
class
ppci.codegen.irdag.
FunctionInfo
(frame)¶ Keeps track of global function data when generating code for part of a functions.
-
class
ppci.codegen.irdag.
Operation
(op, ty)¶ A single operation with a type
-
class
ppci.codegen.irdag.
SelectionGraphBuilder
(arch)¶ Create a selectiongraph from a function for instruction selection
-
block_to_sgraph
(ir_block: ppci.ir.Block, function_info)¶ Create dag (directed acyclic graph) from a basic block.
The resulting dag can be used for instruction selection.
-
build
(ir_function: ppci.ir.SubRoutine, function_info, debug_db)¶ Create a selection graph for the given function.
Selection graph is divided into groups for each basic block.
-
copy_phis_of_successors
(ir_block)¶ When a terminator instruction is encountered, handle the copy of phi values into the expected virtual register
-
do_address_of
(node)¶ Process ir.AddressOf instruction
-
do_alloc
(node)¶ Process the alloc instruction
-
do_binop
(node)¶ Visit a binary operator and create a DAG node
-
do_c_jump
(node)¶ Process conditional jump into dag
-
do_cast
(node)¶ Create a cast of type
-
do_const
(node)¶ Process constant instruction
-
do_function_call
(node)¶ Transform a function call
-
do_literal_data
(node)¶ Literal data is stored after a label
-
do_load
(node)¶ Create dag node for load operation
-
do_phi
(node)¶ Refer to the correct copy of the phi node
-
do_procedure_call
(node)¶ Transform a procedure call
-
do_return
(node)¶ Move result into result register and jump to epilog
-
do_store
(node)¶ Create a DAG node for the store operation
-
do_unop
(node)¶ Visit an unary operator and create a DAG node
-
get_address
(ir_address)¶ Determine address for load or store.
-
new_node
(name, ty, *args, value=None)¶ Create a new selection graph node, and add it to the graph
-
new_vreg
(ty)¶ Generate a new temporary fitting for the given type
-
-
ppci.codegen.irdag.
depth_first_order
(function)¶ Return blocks in depth first search order
-
ppci.codegen.irdag.
make_map
(cls)¶ Add an attribute to the class that is a map of ir types to handler functions. For example if a function is called do_phi it will be registered into f_map under key ir.Phi.
-
ppci.codegen.irdag.
prepare_function_info
(arch, function_info, ir_function)¶ Fill function info with labels for all basic blocks
DAG splitting into trees.
The selection dag is splitted into trees by this module.
-
class
ppci.codegen.dagsplit.
DagSplitter
(arch)¶ Class that splits a DAG into a forest of trees.
This series is sorted such that data dependencies are met. The trees can henceforth be used to match trees.
-
assign_vregs
(sgraph, function_info)¶ Give vreg values to values that cross block boundaries
-
check_vreg
(node, frame)¶ Determine whether node outputs need a virtual register
-
get_reg_class
(data_flow)¶ Determine the register class suited for this data flow line
-
make_op
(op, typ)¶ Construct a string opcode from an operation and a type
-
make_trees
(nodes, tail_node)¶ Create a tree from a list of sorted nodes.
-
split_into_trees
(sgraph, ir_function, function_info, debug_db)¶ Split a forest of trees into a sorted series of trees for each block.
-
-
ppci.codegen.dagsplit.
topological_sort_modified
(nodes, start)¶ Modified topological sort, start at the end and work back
Instruction selection¶
The instruction selection phase takes care of scheduling and instruction selection. The output of this phase is a one frame per function with a flat list of abstract machine instructions.
To select instruction, a tree rewrite system is used. This is also called bottom up rewrite generator (BURG). See pyburg.
Instruction selector.
This part of the compiler takes in a DAG (directed acyclic graph) of instructions and selects the proper target instructions.
Selecting instructions from a DAG is a NP-complete problem. The simplest strategy is to split the DAG into a forest of trees and match these trees.
Another solution may be: PBQP (Partitioned Boolean Quadratic Programming)
The selection process creates operations in a selection DAG. The possible operations are listed in the below table:
operation | types | description |
---|---|---|
ADD(c0,c1) | I,U | Add its operands |
SUB(c0,c1) | I,U | Substracts c1 from c0 |
MUL(c0,c1) | I,U | Multiplies c0 by c1 |
DIV(c0,c1) | I,U | Divides c0 by c1 |
OR(c0,c1) | I,U | Bitwise or |
AND(c0,c1) | I,U | Bitwise and |
XOR(c0,c1) | I,U | Bitwise exclusive or |
LDR(c0) | I,U | Load from memory |
STR(c0,c1) | I,U | Store value c1 at memory address c0 |
FPREL | U | Frame pointer relative location |
CONST | I,U | Constant value |
REG | I,U | Value in a specific register |
JMP | I,U | Jump to a label |
CJMP | I,U | Conditional jump to a label |
…
Memory move operations:
- STRI64(REGI32[rax], CONSTI32[1])
- MOVB()
-
class
ppci.codegen.instructionselector.
InstructionContext
(frame, arch)¶ Usable to patterns when emitting code
-
emit
(instruction)¶ Abstract instruction emitter proxy
-
move
(dst, src)¶ Generate move
-
new_label
()¶ Generate a new unique label
-
new_reg
(cls)¶ Generate a new temporary of a given class
-
-
class
ppci.codegen.instructionselector.
InstructionSelector1
(arch, sgraph_builder, weights=(1, 1, 1))¶ Instruction selector which takes in a DAG and puts instructions into a frame.
This one does selection and scheduling combined.
-
gen_tree
(context, tree)¶ Generate code from a tree
-
memcp
()¶ Invoke memcpy arch function
-
munch_trees
(context, trees)¶ Consume a dag and match it using the matcher to the frame. DAG matching is NP-complete.
The simplest strategy is to split the dag into a forest of trees. Then, the DAG is reduced to only trees, which can be matched.
A different approach is use 0-1 programming, like the NOLTIS algo.
TODO: implement different strategies.
-
select
(ir_function: ppci.ir.SubRoutine, frame, reporter)¶ Select instructions of function into a frame
-
-
class
ppci.codegen.instructionselector.
TreeSelector
(sys)¶ Tree matcher that can match a tree and generate instructions
-
apply_rules
(context, tree, goal)¶ Apply all selected instructions to the tree
-
burm_label
(tree)¶ Label all nodes in the tree bottom up
-
gen
(context, tree)¶ Generate code for a given tree. The tree will be tiled with patterns and the corresponding code will be emitted
-
kids
(tree, rule)¶ Determine the kid trees for a rule
-
nts
(rule)¶ Get the open ends of this rules pattern
-
Register allocation¶
Selected instructions use virtual registers and physical registers. Register allocation is the process of assigning a register to the virtual registers.
Some key concepts in the domain of register allocation are:
- virtual register: A value which must be mapped to a physical register.
- physical register: A real register
- interference graph: A graph in which each node represents a register. An edge indicates that the two registers cannot have the same register assigned.
- pre-colored register: A register that is already assigned a specific physical register.
- coalescing: The process of merging two nodes in an interference graph which do not interfere and are connected by a move instruction.
- spilling: Spilling is the process when no physical register can be found for a certain virtual register. Then this value will be placed in memory.
- register class: Most CPU’s contain registers grouped into classes. For example, there may be registers for floating point, and registers for integer operations.
- register alias: Some registers may alias to registers in another class. A good example are the x86 registers rax, eax, ax, al and ah.
Interference graph
Each instruction in the instruction list may use or define certain registers. A register is live between a use and define of a register. Registers that are live at the same point in the program interfere with each other. An interference graph is a graph in which each node represents a register and each edge represents interference between those two registers.
Graph coloring
In 1981 Chaitin presented the idea to use graph coloring for register allocation.
In a graph a node can be colored if it has less neighbours than possible colors. This is true because when each neighbour has a different color, there is still a valid color left for the node itself.
Given a graph, if a node can be colored, remove this node from the graph and put it on a stack. When added back to the graph, it can be given a color. Now repeat this process recursively with the remaining graph. When the graph is empty, place all nodes back from the stack one by one and assign each node a color when placed. Remember that when adding back, a color can be found, because this was the criteria during removal.
See: https://en.wikipedia.org/wiki/Chaitin%27s_algorithm
Coalescing
Coalescing is the process of merging two nodes in an interference graph. This means that two temporaries will be assigned to the same register. This is especially useful if the temporaries are used in move instructions, since when the source and the destination of a move instruction are the same register, the move can be deleted.
Coalescing a move instruction is easy when an interference graph is present. Two nodes that are used by a move instruction can be coalesced when they do not interfere.
However, if we coalesc all moves, the graph can become incolorable, and spilling has to be done. To prevent spilling, coalescing must be done conservatively.
A conservative approach is the following: if the merged node has fewer than K nodes of significant degree, then the nodes can be coalesced. The prove for this is that when all nodes that can be colored are removed and the merged node and its non-colorable neighbours remain, the merged node can be colored. This ensures that the coalescing of the node does not have a negative effect on the colorability of the graph.
Spilling
Iterated register coalescing
Iterated register coalescing (IRC) is a combination of graph coloring, coalescing and spilling. The process consists of the following steps:
- build an interference graph from the instruction list
- remove trivial colorable nodes.
- (optional) coalesc registers to remove redundant moves
- (optional) spill registers
- select registers
See: https://en.wikipedia.org/wiki/Register_allocation
Graph coloring with more register classes
Most instruction sets are not ideal, and hence simple graph coloring cannot be used. The algorithm can be modified to work with several register classes that possibly interfere.
Implementations
The following class can be used to perform register allocation.
-
class
ppci.codegen.registerallocator.
GraphColoringRegisterAllocator
(arch: ppci.arch.arch.Architecture, instruction_selector)¶ Target independent register allocator.
Algorithm is iterated register coalescing by Appel and George. Also the pq-test algorithm for more register classes is added.
-
alloc_frame
(frame: ppci.arch.stack.Frame)¶ Do iterated register allocation for a single frame.
This is the entry function for the register allocator and drives through all stages of the algorithm.
Parameters: frame – The frame to perform register allocation on.
-
coalesc
()¶ Coalesc moves conservative.
This means, merge the variables of a move into one variable, and delete the move. But do this only when no spill will occur.
-
freeze
()¶ Give up coalescing on some node, move it to the simplify list and freeze all moves associated with it.
-
is_colorable
(node)¶ Helper function to determine whether a node is trivially colorable. This means: no matter the colors of the nodes neighbours, the node can be given a color.
In case of one register class, this is: n.degree < self.K
In case of more than one register class, somehow the worst case damage by all neighbours must be determined.
We do this now with the pq-test.
-
simplify
()¶ Remove nodes from the graph
-
-
class
ppci.codegen.registerallocator.
GraphColoringRegisterAllocator
(arch: ppci.arch.arch.Architecture, instruction_selector) Target independent register allocator.
Algorithm is iterated register coalescing by Appel and George. Also the pq-test algorithm for more register classes is added.
-
alloc_frame
(frame: ppci.arch.stack.Frame) Do iterated register allocation for a single frame.
This is the entry function for the register allocator and drives through all stages of the algorithm.
Parameters: frame – The frame to perform register allocation on.
-
apply_colors
()¶ Assign colors to registers
-
assign_colors
()¶ Add nodes back to the graph to color it.
-
check_invariants
()¶ Test invariants
-
coalesc
() Coalesc moves conservative.
This means, merge the variables of a move into one variable, and delete the move. But do this only when no spill will occur.
-
combine
(u, v)¶ Combine u and v into one node, updating work lists
-
common_reg_class
¶ Determine the smallest common register class of two nodes
-
conservative
(u, v)¶ Briggs conservative criteria for coalescing.
If the result of the merge has fewer than K nodes that are not trivially colorable, then coalescing is safe, because when coloring, all other nodes that can be colored will be popped from the graph, leaving the merged node that then can be colored.
In the case of multiple register classes, first determine the new neighbour nodes. Then assume that all nodes that can be colored will be colored, and are taken out of the graph. Then calculate how many registers can be blocked by the remaining nodes. If this is less than the number of available registers, the coalesc is safe!
-
decrement_degree
(m)¶ If a node was lowered in degree, check if there are nodes that can be moved from the spill list to the freeze of simplify list
-
freeze
() Give up coalescing on some node, move it to the simplify list and freeze all moves associated with it.
-
has_edge
(t, r)¶ Helper function to check for an interfering edge
-
init_data
(frame)¶ Initialize data structures
-
is_colorable
(node) Helper function to determine whether a node is trivially colorable. This means: no matter the colors of the nodes neighbours, the node can be given a color.
In case of one register class, this is: n.degree < self.K
In case of more than one register class, somehow the worst case damage by all neighbours must be determined.
We do this now with the pq-test.
Check if a node is used by move instructions
-
ok
(t, r)¶ Implement coalescing testing with pre-colored register
-
q
¶ The number of class B registers that can be blocked by class C.
-
remove_redundant_moves
()¶ Remove coalesced moves
-
rewrite_program
(node)¶ Rewrite program by creating a load and a store for each use
-
simplify
() Remove nodes from the graph
-
spill
()¶ Do spilling
-
code emission¶
Code is emitted using the outputstream class. The assembler and compiler use this class to emit instructions to. The stream can output to object file or to a logger.
-
class
ppci.binutils.outstream.
OutputStream
¶ Interface to generate code with.
Contains the emit function to output instruction to the stream.
-
do_emit
(item)¶ Actual emit implementation
-
emit
(item)¶ Encode instruction and add symbol and relocation information
-
emit_all
(items)¶ Emit all items from an iterable
-
select_section
(name)¶ Switch output to certain section
-