Compiler design =============== This chapter describes the design of the compiler. The compiler consists a frontend, mid-end and back-end. The frontend deals with source file parsing and semantics checking. The mid-end performs optimizations. This is optional. The back-end generates machine code. The front-end produces intermediate code. This is a simple representation of the source. The back-end can accept this kind of representation. The compiler is greatly influenced by the `LLVM`_ design. .. _llvm: http://www.llvm.org .. graphviz:: digraph x { rankdir="LR" 1 [label="c3 source file"] 10 [label="c3 front end" ] 11 [label="language X front end" ] 20 [label="mid end" ] 30 [label="back end for X86" ] 31 [label="back end for ARM" ] 40 [label="object file"] 1 -> 10 10 -> 20 [label="IR-code"] 11 -> 20 [label="IR-code"] 20 -> 30 [label="IR-code"] 20 -> 31 [label="IR-code"] 30 -> 40 } C3 Front-end ------------ .. automodule:: ppci.lang.c3 Brainfuck frontend ------------------ The compiler has a front-end for the brainfuck language. .. autoclass:: ppci.lang.bf.BrainFuckGenerator IR-code ------- The intermediate representation (IR) of a program de-couples the front end from the backend of the compiler. See :doc:`ir` for details about all the available instructions. Optimization ------------ The IR-code generated by the front-end can be optimized in many ways. The compiler does not have the best way to optimize code, but instead has a bag of tricks it can use. .. autoclass:: ppci.opt.Mem2RegPromotor .. autoclass:: ppci.opt.LoadAfterStorePass .. autoclass:: ppci.opt.DeleteUnusedInstructionsPass .. autoclass:: ppci.opt.RemoveAddZeroPass .. autoclass:: ppci.opt.CommonSubexpressionEliminationPass Uml ~~~ .. uml:: ppci.opt Back-end -------- The back-end is more complicated. There are several steps to be taken here. #. Tree creation #. Instruction selection #. Register allocation #. Peep hole optimization .. graphviz:: digraph codegen { 1 [label="IR-code"] 10 [label="irdag"] 20 [label="dagsplitter"] 30 [label="instruction selector"] 40 [label="register allocator"] 49 [label="assembly parser"] 50 [label="outstream"] 60 [label="object file"] 61 [label="text output"] 1 -> 10 10 -> 20 [label="Selection DAG"] 20 -> 30 [label="Selection Trees"] 30 -> 40 [label="frame"] 40 -> 50 [label="frame"] 49 -> 50 50 -> 60 50 -> 61 } Code generator ~~~~~~~~~~~~~~ .. automodule:: ppci.codegen.codegen .. uml:: ppci.codegen.codegen Canonicalize ~~~~~~~~~~~~ During this phase, the IR-code is made simpler. Also unsupported operations are rewritten into function calls. For example soft floating point is introduced here. Tree building ~~~~~~~~~~~~~ From IR-code a tree is generated which can be used to select instructions. .. automodule:: ppci.codegen.irdag Instruction selection ~~~~~~~~~~~~~~~~~~~~~ The instruction selection phase takes care of scheduling and instruction selection. The output of this phase is a one frame per function with a flat list of abstract machine instructions. To select instruction, a tree rewrite system is used. This is also called bottom up rewrite generator (BURG). See pyburg. Register allocation ~~~~~~~~~~~~~~~~~~~ .. automodule:: ppci.codegen.registerallocator code emission ~~~~~~~~~~~~~ Code is emitted using the outputstream class. The assembler and compiler use this class to emit instructions to. The stream can output to object file or to a logger. .. autoclass:: ppci.binutils.outstream.OutputStream