Week 1.1
Week 1.1
π·οΈ Replace this section with some keywords that can be used for searching, tagging and the indexing of this page.
Sections
π± Links to all Heading1 Blocks in this page.
- Phases of a Compiler
- Compilers vs Interpreters
- Syntax of PL0
1.0 - Phases of a Compiler
1_Phases_of_a_Compiler-Ian_Hayes_2019.pdf
- The figure to the right shows the phases of a compiler as indicated in the
blue boxes(theyellow boxesare the inputs and outputs to each phase of the compiler). - The first input to the compiler is the source code which is treated as a text file or sequence of characters.

Hayes, I - Phases of a Compiler
A Note (From the Textbook)
π± Iβm adding this in here as it helps me conceptualise the steps performed by a compiler.
The steps of a compiler goes through can be broken down into
analysisandsynthesisIn the
analysisphase, the compiler breaks up the source program into constituent pieces (tokens) and imposes a grammatical structure on them.Uses the tokens to create an intermediate representation of the source program (in the form of a syntax tree)
If a syntactic error is detected in the source program, the compiler must provide informative messages to the user
In the
synthesisphase, the compiler generates the program for the target machine using the intermediate representation (syntax tree) and the information contained in the symbol table.
1.1 - Lexical Analysis
π± Lexical Analysis is the step in the compiler where the source code is converted into lexical tokens
InputA sequence of characters representing a programOutputSequence of lexical tokens- The lexical analyser ignores whitespace, blanks, tabs, newlines, carriage returns, form feeds
- Whitespace is still important, as it is used to delimit lexical tokens.
- Comments are treated as whitespace
- More Information
1.1.1 - Lexical Analysis Example
3_PL0-CompilerExample.pdf
Suppose we have the following input (sequence of characters):
if x < 0 then z := -x else z := xThe output of lexical analysis would be
KW_IF, IDENTIFIER("x"), LESS, NUMBER(0), KW_THEN, IDENTIFIER("z"), ASSIGN, MINUS, IDENTIFIER("x"), KW_ELSE, IDENTIFIER("z"), ASSIGN, IDENTIFIER("x")The grammar of the language can be used to extract the structure of the program statement (from its sequence of lexical tokens).
The structure can be described using a concrete syntax tree
1.2 - Syntax Analysis
π± Given a series of lexical tokens, generate an abstract syntax tree and a symbol table.
InputSeries of lexical tokens generated by the lexical analysis phase.OutputAn Abstract Syntax Tree (AST) and symbol tableSymbol Tablecontains important information about all of the identifiers that are defined in the program (+ a few predefined ones)- May be organised into scopes (e.g. identifiers within a procedure; may be organised hierarchically)
- more information
Syntax TreeAn intermediate representation of the target program that depicts the grammatical structure of the text input (source code)- In this phase, we also identify any syntactical errors in the source code.
1.3 - Type Checking (Static Semantic Analysis / Static Analysis)
π± Check for type and other static semantic errors
InputSymbol TableandAbstract Syntax Tree (AST)OutputUpdatedSymbol TableandAbstract Syntax Tree (AST)- Resolves all references to identifiers
Updates the symbol table entries with type information for each identifier
Checks abstract syntax tree for type correctness; type checking is performed
- e.g. the compiler may check if the value for an array index is an integer, and will report an error if a floating-point number is used as an index into an array.
Updates abstract syntax tree with type coercions
π± A binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the operator is applied to a floating-point number and an integer, the compiler may convert / coerce the integer into a floating-point number.
1.4 - Code Generation
π± Given the updated Symbol Table and Abstract Syntax Tree, generate code for the target machine.
InputSymbol Tableand UpdatedAbstract Syntax Tree (AST)OutputCode for the target machine- The code generation phase may include:
- Machine-independent optimisation
- Machine-dependent optimisations
- Instruction selection
- Register allocation
- At this point, we know that the syntax and static semantics of our program (source code) is correct.
2.0 - Compilers vs Interpreters
π± Interpreters are essentially the same as compilers, except they donβt perform the last step of Code Generation.
- Compilers output programs compiled for a target machine
2.1 - About Interpreters
π± Interprets the abstract syntax tree directly to execute the program
InputSymbol Tableand UpdatedAbstract Syntax Tree (AST)- Less time spent on the compilation step (as compared to a Compiler) as no code generated
- Slower to execute the program (as code generation is done statement by statement as required)
- More commonly used for high-level dynamically typed languages
- Can typically give better error diagnostics than a compiler because it executes the statements in the source program sequentially.
3.0 - Syntax of PL0
π± PL0 is an educational language, used for teaching Compilers and Interpreters. It is a subset of the Pascal language.
There are two default types in Pascal -
booleanandintegervar b: boolean; // Boolean r: int; // IntegerWe can also define our own types:
const C = 42; // A constant integer type S = [0..C]; var x: S; // Subrange typeln PL0, semicolons (;) are used as statement separators not statement terminators.
begin i := 0; res := 0; // Other statements here i := i + 1 // This statement doesn't need a semicolon as there no statements after endWhen defining a while loop, if there are multiple statements in its body, it gets wrapped in a
beginandendclause.while i != x do res += res + i; end// For multiple statements; while i != x do begin res := res + i; i := i + 1 end end
const C = 42; // in PL0 we can define constants
type // We can also define our own variable types
S = [0..C]; // Subrange 0 to 42
var
b: boolean; // Boolean variable
r: int; // Integer variable
x: S; // Subrange type (defined above)
res: int; // Another integer
procedure sum() =
// Procedures start out with declarations
var
i: S; // Local variable to sum
res: int; // Masks the scope of other `res` (if we choose to define res again here)
begin
i := 0; res := 0;
while i != x do
begin
res := res + i;
i := i + 1
end // while loop contents
end // while loop
begin // main
read r;
b := ( r <= C); // b is a boolean
if b then
x := r
else
x := 0;
call sum();
write res
end
| ASSIGN β β:=β | GEQUALS β β>=β | KW_BEGIN β βbeginβ | KW_TYPE β βtypeβ |
|---|---|---|---|
| COLON β β:β | GREATER β β>β | KW_CALL β βcallβ | KW_VAR β βvarβ |
| RANGE β ββ¦β | LEQUALS β β<=β | KW_CONST β βconstβ | KW_WHILE β βwhileβ |
| LPAREN β β(β | LESS β β<β | KW_DO β βdoβ | KW_WRITE β βwriteβ |
| RPAREN β β)β | PLUS β β+β | W_END β βendβ | NUMBER β Digit Digitβ [1] |
| LBRACKET β β[β | PLUS β β+β | KW_IF β βifβ | IDENTIFIER β Alpha (Alpha |
| RBRACKET β β]β | MINUS β βββ | KW_PROCEDURE β βprocedureβ | |
| EQUALS β β=β | TIMES β β*β | KW_READ β βreadβ | |
| NEQUALS β β!=β | DIVIDE β β/β | KW_THEN β βthenβ |
Concrete Syntax Tree Example