====== Code Optimization and Transformation ====== In modern computer science and engineering, code transformation techniques are critical to achieve the combined goals of combining programmer productivity and program execution efficiency in terms of time and energy. Yet, it is a skill mastered by few – there are less than 1.5 compiler construction expert for every //1000// software engineers, but almost 2 jobs in compilers for every 100 in software engineering! The course is designed to provide an overview of code transformation, analysis and optimization techniques needed to effectively produce optimized code. To //compiler and EDA tool engineers//, the course provides the basic tools to design and implement compilers and other code transformation and analysis tools, as well as an introduction to the most popular modern compiler framework, LLVM. To //software engineers//, the course provides a grounding in system software design and development, as well as insights on the benefits and limitations of automation in software engineering. Moreover, as a compiler is a paramount of complex software systems, it provides a hands-on introduction to the design and implementation process for such systems. Finally, many advanced software engineering techniques such as program slicing are implemented on top of algorithms used in compiler construction. To //computer architects and embedded software engineers//, the course provides crucial insights on the power and limits of compiler optimization, as well as to the need any processor architecture has of appropriate compilers. To all //computer science students//, the course provides a “behind the scenes” view of the operation of software, and its automated manipulation – understanding compilers means being able to write better, more efficient code. ==== Learning Goals ==== * Understand the internal structure of a real-world compiler * Understand the effectiveness and limitations of code analysis and optimization techniques * Be able to construct a full compiler for a toy language, generating assembly code for a RISC architecture {{teaching:cot_slides_linker.pdf|Slide seminario su Linker}} ==== Schedule ==== ^ Room ^ Date ^ Topic ^ | D26 | 08/03/2016 | Introduction to compiler construction | | D01 | 10/03/2016 | Intermediate Representations | | D26 | 15/03/2016 | Code Generation, Data Layout in Memory | | D01 | 17/03/2016 | Lab: Front-end ({{teaching:sample.tgz|here is the lab material}}) | | D26 | 22/03/2016 | Dataflow Analysis | | D01 | 24/03/2016 | Lab: Intermediate Representation | | D26 | 29/03/2016 | Register Allocation | | D01 | 31/03/2016 | Lab: Intermediate Representation | | D26 | 05/04/2016 | Instruction Scheduling | | D01 | 07/04/2016 | Lab: Lowering | | D26 | 12/04/2016 | {{teaching:cot_slides_llvm_1.pdf|Introduction to LLVM, part 1}} | | D26 | 19/04/2016 | {{teaching:cot_slides_llvm_2.pdf|Introduction to LLVM, part 2}} | | Sala Seminari DEIB | 21/04/2016 | [[http://www.deib.polimi.it/ita/eventi/dettagli/1272|Italian LLVM Meeting]] | | D26 | 03/05/2016 | The Linker | | D01 | 05/05/2016 | Lab: Lowering | | D26 | 10/05/2016 | {{teaching:cot_slides_llvm_3.pdf|Introduction to LLVM, part 3}} | | D01 | 12/05/2016 | Lab: Lowering | | D26 | 17/05/2016 | Lab: Lowering | | D01 | 19/05/2016 | Lab: Data Layout | | D26 | 24/05/2016 | {{teaching:cot_slides_security.pdf|Compiler Techniques for Security}} | | D01 | 26/05/2016 | Lab: Data Layout | | D26 | 31/05/2016 | Lab: Register Allocation | | D26 | 07/06/2016 | Lab: Register Allocation | | D01 | 09/06/2016 | Lab: Code Generation | | Aula Alario (building 21 - 4th floor | 14/06/2016 | [[http://www.deib.polimi.it/ita/eventi/dettagli/1324|Seminar]] | | D01 | 16/06/2016 | Lab: Wrapping it up | | D26 | 21/06/2016 | Lab: Wrapping it up | ==== Syllabus ==== === Introduction to Compiler Construction === * Why compiling? Compilers vs interpreters * When to compile? JIT, AOT and static compilers * What to compile? Compilation units * Where to compile? Cross-compilation and split compilation * Overview of a compiler framework * Lexical analysis & parsing (review) * Statement and Data Structure Lowering * Optimization: machine independent and machine-dependent * Code Generation * //Reading//: Compiler Construction === Intermediate Representations === * The Abstract Syntax Tree * Basic Blocks and branches * The Control Flow Graph * The Static Single Assignment Form * //Reading//: Program Dependence Graph * //Reading//: The SUIF Compiler Framework === Semantic Analysis & Type Checking === * Symbol Tables * Type Checking * Runtime Organization * Data Memory layout * Activation Records * Dynamic Memory allocation * //Reading//: Garbage Collection === Code Generation === * Code generation techniques: CISC and RISC processors * Low-level optimization techniques * //Reading//: Low-level Optimization === Dataflow Optimization === * Principles and Fixed Point Computation * Applications * Reaching Definitions * Liveness Analysis * Constant Propagation * //Reading//: Dataflow Optimizations === Register Allocation === * When to allocate registers * Graph Coloring * Linear scanning * //Reading//: Linear Scan Register Allocation === Parallelization and other optimization techniques === * Instruction Scheduling * Loop Optimization (Software Pipelining, Loop Unrolling) * Limits to optimization: the aliasing problem * //Reading//: Program Optimization * //Reading//: Alias Analysis * //Reading//: Cache Optimization === Advanced Topics === * Advanced Optimization Techniques: Polyhedral Transformations * The LLVM Compiler Framework ==== Evaluation ==== Evaluation is performed through a combination of oral exam and project work. The oral exam consists of a discussion of the topics covered in the course. The project work can be taken in groups of one to three students, and in the following two modes: An independent project activity, consisting in the implementation of an optimization or analysis pass in the LLVM compiler framework (suggested only for students with previous experience in C/C++); {{teaching:cot-projects-2018.pdf|Projects for 2018}} A supervised project activity (21-28 hours), consisting in the implementation of a compiler for a toy language targeting the ARM assembly language. The sessions will be organized as follows: * Recursive descent language parsing * Transforming the Abstract Syntax Tree into a Control Flow Graph * Design and implementation of the symbol table * Function call translation * Liveness analysis * Register allocation (linear scan) * ARM code generation (table-based) ==== Readings ==== === Compiler Construction === LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, by C. Lattner and V. Adve. === Intermediate Representations === The Program Dependence Graph and Its Use in Optimization, by J. Ferrante et al. The Basic SUIF Programming Guide, by G. Aigner et al. Alias Analyis Interprocedural pointer alias analysis, by M. Hind et. al. === Register Allocation === Linear Scan Register Allocation, by M. Poletto and V. Sarkar. === Optimization === Compiler Transformations for High-Performance Computing, by D. Bacon et al. === Cache Optimization === A Retrospective: A Data Locality Optimizing Algorithm, by M. S. Lam. === Garbage Collection === Uniprocessor Garbage Collection Techniques: a complete survey of Garbage Collection by P. Wilson.