Week 8

CourseCompilers & Interpreters
SemesterS1 2022

Week 8

🏷️ Replace this section with some keywords that can be used for searching, tagging and the indexing of this page.

Sections

🌱 Links to all Heading1 Blocks in this page.

  1. LALR(1) Parsing in Java-CUP
  2. Symbol Table for LALR(1) Compilers

1.0 - LALR(1) Parsing in Java-CUP

  • The Java-CUP Parser Generator generates a LALR(1) Parser and its complementary parsing automaton.

1.1 - Running the Java-CUP Parser Generator

  • We defined this simple grammar before, in the Bottom-Up Parsing Lecture, in which we have the non-terminal symbols A,(,)A, (, ).

  • The grammar is defined by two productions

    Aa  (A)A\rightarrow a \ |\ (A)

  • We can define this in Java-CUP using the following syntax

    import java_cup.runtime.*;
    terminal a;
    terminal LPAREN, RPAREN;
    nonterminal A;
    A ::= a | LPAREN A RPAREN;
    
  • When running the Java-CUP Parser Generator, we can use the -dump option to get it to display the LALR(1) Parsing Automaton it generates.

1.1.1 - Grammar Summary

  • The first thing in Java-CUP’s output (with the -dump option) is a summary of the grammar

    ===== Terminals =====
    [0]EOF [1]error [2]a [3]LPAREN [4]RPAREN
    
    ===== Non terminals =====
    [0] A
    
    ===== Productions =====
    [0] A ::= a
    [1] $$
    

START ::= A EOF [2] A ::= LPAREN A RPAREN

```
  1. Terminals
    • JavaCUP inserts two new terminal symbols - EOF symbol and error symbol.
  2. Non-Terminals
  3. Productions
    • Our two productions from our original grammar are [0] and [2].
    • JavaCUP inserts the production `

START :== A EOF` just as we did in the theory before - it introduces a new start symbol, `START`, and a new production where the new start symbol produces the original start symbol. - This production has a small technical difference to our lecture example - here JavaCUP inserts the EOF symbol at the end of the production. ### 1.1.2 - Automaton Summary - Viable Prefix Recogniser - For each state, we define the LR(1) parsing items - which are pairs of LR(0) parsing items, and lookahead sets. ```java ===== Viable Prefix Recogniser ===== START lalr_state [0]: { [A ::= (*) LPAREN A RPAREN, {EOF }} [

START ::= () A EOF, {EOF }] [A ::= () a, {EOF }] } transition on LPAREN to state [3] transition on a to state [2] transition on A to state [1]


### 1.1.3 - Action Table

- The Action Table is a summary of parsing actions
    
    ```java
    From state #0
     [term 2:SHIFT(to state 2)] [term 3:SHIFT(to state 3]
    ```
    
- In this case term x is given by the terminal symbol summary above:
    
    ```java
    ===== Terminals =====
    [0]EOF [1]error [2]a [3]LPAREN [4]RPAREN
    ```
    
    - Where term 2 = a, term 3 = LPAREN

### 1.1.4 - Reduce Table

- The Reduce Table is a summary of transitions out of each state on non-terminal symbols
- Once the parser generates the LALR(1) parsing automaton, it really only needs to execute the Action Table and Reduce Table.
    
    ```java
    -----------------------------
    -------- REDUCE_TABLE--------
    From state #0
     [non term 0->state 1] 
    From state #1
    ```
    

## 1.2 - Giving JavaCUP a Non-LALR(1) Grammar

- What happens when we give JavaCUP a non-LALR(1) grammar?
- This grammar is not LALR(1) as it is ambiguous
    
    
$$L\rightarrow \epsilon\ |\ E\ |\ L\ E\\
    E\rightarrow n\ |\ LPAREN\ L\ RPAREN$
    
- It’s defined in JavaCUP syntax as follows:
    
    ```java
    import jav_cup.runtime.*;
    terminal n;
    terminal LPAREN, RPAREN;
    nonterminal L, E;
    
    L ::= /* empty */ | E | L E;
    E ::= n | LPAREN L RPAREN;
    ```
    
- When we try to run this however, we get a series of error messages saying that JavaCUP has run into a whole host of action conflicts.
- Removing the ambiguity (i.e. removing the $L\rightarrow E$ production) gives us a LALR(1) grammar, and therefore, JavaCUP is able to generate a parsing automaton.

## 1.3 - Giving JavaCUP an Ambiguous Grammar

- Suppose we introduce the dangling-else problem
    
    ```java
    Statement ::= KW_WHILE Condition:c KW_DO Statement:s 
    			{:
    					RESULT = new StatementNode.WhileNode(cxleft, c, s);
    			:}
    	| ...
    
    	| KW_IF Condition:c KW_THEN Statement:s1 ElsePart:s2
    			{:
    					RESULT = new StatementNode.IfNode(cxleft, c, s1, s2);
    			:}
    
    ElsePart ::=  KW_ELSE Statement:s2
    			{:
    					RESULT = s2;
    			:}
    		| /* empty */
    			{:
    					RESULT = new StatementNode.SkipNodec(ErrorHandler.NO_LOCATION);
    			:}
    			%prec KW_IF
    		;
    
    ```
    
- When we attempt to create the parsing automaton for this ambiguous grammar, we get conflicts:
    
    ```java
    Error : *** More conflicts encountered than expected - parser generation aborted
    ------ Cup v0.11b 20160615 (GIT 4ac7450) Parser Generation Summary -------
    	1 error and 3 warnings
    	43 terminals, 33 non-terminals and 79 productions declared, 
    	producing 130 unique parsing states
    	...
    ```
    
- To solve this, we can either:
    - Come up with a better grammar that resolves the abiguity
    - Specify precedence in PL0.cup which tells JavaCUP how to resolve ambiguities
        - We can tell JavaCUP to match the else to the closest if using precedence operators.
- To specify precedence, we can write the following, which specifies that matching KW_ELSE has a higher precedence than matching KW_IF
    
    ```java
    precedence nonassoc KW_IF;
    precedence nonassoc KW_ELSE;
    ```
    
- Note that in the code block above, we use the syntax `%prec` KW_IF which essentially means that we give the null alternative the same precedence as KW_IF (meaning that matching KW_ELSE has a higher precedence).
    - Therefore, JavaCUP will always try to match the KW_ELSE if possible, and default to matching the empty alternative otherwise.
- Re-running the JavaCUP Parser Generator, we receive no errors, and the parsing automaton is generated properly.

## 1.4 - JavaCUP Syntax Error Recovery

- JavaCUP attempts to perform Syntax Error Recovery.
- When JavaCUP detects a syntax error, it tries to replace some of the input with an error terminal symbol and continues.
    - We can use this symbol to try to recover.

### 1.4.1 - JavaCUP Syntax Error Recovery in Practice - LValue

- Here, if we can’t match an LValue, we return an ExpNode.ErrorNode to try to handle errors that arise..
    
    ```java
    LValue ::= IDENTIFIER:id
    		{:
    				/* At this stage the identifier could be either a constant identifier or
    				 * a variable identifier but which cannot be determined until static 
             * checking when the IdentifierNode will be transformed into either
             * ConstNode or a VariableNode or detected as invalid.
    				 */
    				 RESULT = new ExpNode.IdentifierNode(idxleft, id);
    		:}
    	| error:e
    		{:
    				 RESULT = new ExpNode.ErrorNode(exleft);
    		:}
    ```
    
- We use LValues in Factors:
    
    ```java
    Factor ::= PLUS Factor:e
            {:
                RESULT = e; 
            :}
        |  UnaryOperator:op Factor:e
            {:
                RESULT = new ExpNode.UnaryNode(opxleft, op, e);
            :}
        |  LPAREN Condition:c RPAREN
            {:
                RESULT = c;
            :}
        |  NUMBER:n
            {:
                RESULT = new ExpNode.ConstNode(nxleft, 
                        Predefined.INTEGER_TYPE, n);
            :}
        |  LValue:lval
            {:
                RESULT = lval;
            :}
        ;
    ```
    
- Suppose we wanted to introduce an error alternative into Factor, if we couldn’t match a Factor:
    
    ```java
    		| error:e
    				{:
    						RESULT = new ExpNode.ErrorNode(exleft);
    				:}
    ```
    
    - However, this is the wrong thing to do, as an LValue already has an error alternative; this would introduce an ambiguity as the compiler doesn’t know whether to match the error alternative of LValue or Factor.
- Notice that since $\text{Term}\rightarrow\text{LValue}$, $\text{Factor}\rightarrow\text{LValue}$ and $\text{Exp}\rightarrow\text{LValue}$ none of these need an error alternative.
- In general, our approach is to put these error alternatives as deeply nested as possible in our syntax tree.
- In addition to $\text{LValue}$, a $\text{Statement}$ also has an error alternative.

# 2.0 - Symbol Table for LALR(1) Compilers.


🌱 This example is talking specifically about the Assignment 2 LALR(1) Compiler.



- The parser generates both an Abstract Syntax Tree and a Symbol Table.
- The type checker type checks both the AST and Symbol Table.

- The symbol tracks information about all of the identifiers defined in our program
- Generate symbol table for PL0, which is a procedural programming language
    - This means that our program is hierarchically structured into procedures
    - Therefore, our symbol table is hierarchically structured into scopes.

- Consider the following PL0 program.
- The main method has a scope, which keeps track of the following:
    - Declaration of variables `x`, `y` & `res`
    - Declaration of `procedure C`
    - The main method has a parent scope - this special predefined scope is the scope that contains all pre-defined constants and types in our program.
- The scope of `procedure C` keeps track of the following:
    - Declaration of variables `n` & `f`
    - Declaration of `procedure fact`
    - This scope’s parent scope is the main procedure’s scope.
- The scope of procedure fact has:
    - (no declarations)
    - Parent scope is the scope of procedure C
- The static level describes how deep a scope is
    
    
    | Static Lvl | Scope |
    | --- | --- |
    | 0 | Predefined Scope |
    | 1 | Main Method |
    | 2 | procedure C (& any procedure defined in main) |
    | 3 | procedure fact |
    | ... | ... |

```pascal
var x: int;  // first argument to C
    y: int;  // second argument to C
    res: int;  // result from C
procedure C() =
  var n: int;  // argument to fact
      f: int;  // result from fact
  procedure fact() =
    begin
      // assume 0 <= n
      if n = 0 then
        f := 1
        // f = n!
      else // 0 < n
       begin
        n := n - 1;
        call fact();
        // f = n!
        n:= n + 1;
        f := f * n
        // f = n!
       end
    end; // fact
  begin // C
    n := x;
    call fact();
    // f = x!
    res := f;
    n := y;
    call fact();
    // f = y!
    res := res / f;
    // res = x! / y!
    n := x - y;
    call fact();
    // f = (x-y)!
    res := res / f
    // res = x! / (y! * (x-y!))
  end; // C
begin // main program
  x := 6;
  y := 2;
  call C();
  write res
end

  • We can observe that the Scope method implements these features:

    public class Scope {
        /**
         * Offset of start of local variables from frame pointer
         */
        final static int LOCALS_BASE = 3;
        /**
         * Parent Scope
         */
        private final Scope parent;
        /**
         * Static level of this scope
         */
        private final int level;
        /**
         * Symbol table entry for the procedure (main program) that owns this scope
         */
        private final SymEntry.ProcedureEntry ownerEntry;
    
  • Note that the ownerEntry is an interesting feature:

    • The main procedure “owns” the variable declarations of x, y & res
    • The procedure C “owns” the variable declarations of n & f
    • The predefined procedure is a special case, as it doesn’t have a procedure that owns it.
    var x: int;  // first argument to C
        y: int;  // second argument to C
        res: int;  // result from C
    procedure C() =
      var n: int;  // argument to fact
          f: int;  // result from fact
    
  • Our Scope class also defines a SortedMap which contains all of our symbol table entries

    • These symbol table entries are defined in src/syms/SymEntry.java
    /**
     * Symbol table entries. A SortedMap is used to avoid issues with
     * hashing functions working differently on different implementations.
     * This only affects minor things like the order of allocating variables
     * and dumping symbol tables in trace backs. A Map/HashMap would still give a
     * valid implementation.
     */
    private final SortedMap<String, SymEntry> entries;
    
  • Additionally our Scope class has an integer variable that keeps track of the amount of space allocated for variables in this current scope.

    /**
     * space allocated for local variables within this scope
     */
    private int variableSpace;
    
  • Additionally, scope extensions have been defined (for implemeting “for” statements)

    /**
     * true if this is an extension of its parent scope
     */
    private boolean extension;
    

2.1 - Identifier Lookup in Scope

  • To look up an identifier in a symbol table, we first check whether it’s within the current scope.
  • If it is not, we try looking it up in the parent scope (which recursively checks through parent scopes until we either find it, or reach the predefined scope).
/**
 * Lookup id starting in the current scope and
 * thence in the parent scope and so on.
 *
 * @param id identifier to search for.
 * @return symbol table entry for the id, or null if not found.
 */
public SymEntry lookup(String id) {
    /* Lookup the entry in the current scope */
    SymEntry entry = entries.get(id);
    if (entry == null && parent != null) {
        /* If the entry is not in the current scope
         * look it up in the parent scope, if there is one.
         */
        return parent.lookup(id);
    }
    return entry;
}

2.1 - Setup of Predefined Scope

public class SymbolTable {
    private final Scope predefinedScope;

    /**
     * Construct a symbol table and build the predefined scope
     * as its initial scope.
     */
    public SymbolTable() {
        super();
        SymEntry.ProcedureEntry predefined =
                new SymEntry.ProcedureEntry("",
                        ErrorHandler.NO_LOCATION, null);
        predefinedScope = new Scope(null, 0, predefined);
        predefined.setLocalScope(predefinedScope);
        Predefined.addPredefinedEntries(predefinedScope);
    }
  • The Symbol Table class sets up the predefined scope for us - that is, when we construct a new SymbolTable class instance, it configures the predefinedScope.
    • We create a new SymTable.ProcedureEntry instance as a dummy procedure (predefined), and pass it to the Scope constructor,
  • Since a procedure needs to know its scope, and a scope its procedure, we use the SymTable.ProcedureEntry.setLocalScope method to set the local scope of the procedure entry.
  • We additionally want to populate the predefined scope with everything that should be in it - that’s what Predefined.addPredefinedEntries(Scope predefined) does.
    • This method adds in a series of types, operators
predefined.addType("int", ErrorHandler.NO_LOCATION, INTEGER_TYPE);
predefined.addType("boolean", ErrorHandler.NO_LOCATION, BOOLEAN_TYPE);
predefined.addOperator(Operator.NEG_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_UNARY);
predefined.addOperator(Operator.ADD_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.SUB_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.MUL_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.DIV_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GREATER_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LESS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);

2.2 - PL0 Parser - Setup of Predefined Scope

  • In PL0.cup, we need to set up both the predefined scope and the scope for the main method

2.2.1 - Parsing a Program

  • In setting up a new scope for the main method, we need to do a few things:
    1. Create a new procedure entry for the main method

      SymEntry.ProcedureEntry procMain = 
      		            currentScope.addProcedure("", ErrorHandler.NO_LOCATION);
      
    2. We then test that everything has gone well, and that nothing weird has happened.

      if(procMain  == null) {
          errors.fatal("Could not add main program to symbol table",
                  ErrorHandler.NO_LOCATION);
      }
      
    3. We finally take our current scope, and add a new scope to it, for the main method

      • In doing this, we also update the current scope to be the scope of the main method.
      • It’s important that we do that before moving on and parsing other things (the next thing is a Block) as we want to parse these other things with respect to the main method’s scope.
      currentScope = currentScope.newScope(procMain);
      
Program ::=  /* empty */ 
		    {:
		        /* This action occurs before the whole program is recognised.
		         * It constructs the initial symbol table with current scope the
		         * predefined scope. */
		        currentScope = new SymbolTable().getPredefinedScope();
		        /* Set up a dummy symbol table entry for the main program */
		        SymEntry.ProcedureEntry procMain = 
		            currentScope.addProcedure("", ErrorHandler.NO_LOCATION);
		        if(procMain  == null) {
		            errors.fatal("Could not add main program to symbol table",
		                    ErrorHandler.NO_LOCATION);
		        }
		        /* Enter the scope for the main program and save the new local
		         * scope in main's symbol table entry */
		        currentScope = currentScope.newScope(procMain);
		    :}

2.2.2 - Parsing a Block

Block ::= DeclarationList:dl CompoundStatement:b
        {:
            RESULT = new StatementNode.BlockNode(bxleft, dl, b, currentScope);
        :}
    ;
  • A block is a DeclarationList followed by a CompoundStatement.

2.2.3 - Parsing a DeclarationList

DeclarationList ::= DeclarationList:dl ProcedureDef:p 
        {:
            /* Add a procedure declaration to the list of declarations */
            dl.addDeclaration(p);
            RESULT = dl;
        :}
    | DeclarationList:dl Declaration
        {:
            /* A non-procedure declaration is not added to the list
             * but added to the symbol table during its parsing. */
            RESULT = dl;
        :}
    | /* empty */
        {:
            RESULT = new DeclNode.DeclListNode();
        :}
    ;
  • As we parse each declaration, we append it to the declaration list.

    Declaration ::= KW_CONST ConstDefSeq
        |  KW_TYPE TypeDefSeq
        |  KW_VAR  VarDeclSeq
        ;
    
  • These declarations can either be:

    1. ProcedureDef (Procedure Definitions / Procedure Declarations) - these are produced by the first alternative in the DeclarationList production.
    2. Constant Declarations
    3. Type Declarations
    4. Variable Declarations

2.2.4 - Adding Declarations to the Symbol Table

  • Consider the following declarations

    const N = 10; M = -N;    // Constant Declaration
    var x : T;               // Variable Declaration
    type T = [1..N];         // Type Declaration
    
  • And their corresponding symbol table entries

\def\rq{\text{\textquoteright}} \begin{aligned} \lq N \rq &\mapsto \text{ConstEntry}(_, \text{NumberNode(INT_TYPE, 10))}\ \lq M \rq&\mapsto \text{ConstEntry}(_, \text{NegateConst(IdentConst(\lq N\rq)))}\ \lq x \rq &\mapsto \text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}\ \lq T\rq &\mapsto\text{TypeEntry(SubrangeType(_,NumberNode(INT_TYPE, 1),}\ &\text{ \ \ \ \ \ \ \ \ \ \ \ \ ConstIdNode(\lq N\rq, currentScope)))} \end{aligned}

l - Where $\text{INT\_TYPE}$ abbreviates $\text{INTEGER\_TYPE}$ in the compiler. - During the parsing phase, these declarations will be added to our symbol table - When we add these declarations to the symbol table, there is type information that hasn’t been resolved yet - this needs to get done during Type Checking. **Parsing a Constant**

\lq N \rq \mapsto \text{ConstEntry}(_, \text{NumberNode(INT_TYPE, 10))}\

- When parsing a constant, we add a new `ConstEntry(type, value)` to the symbol table. - However, at this stage, we don’t know what type it is (that’s what the first parameter is, and is currently left blank as it is still unknown). - Additionally, we haven’t yet resolved its value (here, denoted as a $\text{NumberNode}$). - We represent the constant as a small Abstract Syntax Tree (with the root being $\text{ConstEntry}$) - This AST will be later evaluated by the type checker - It is the role of the type checker to go through all of these declarations to evaluate the value. - A benefit of doing it this way is that some of our definitions depend on declarations after it - in this example, look at the declaration of the variable $x$, which depends on the type $T$. - Therefore, we collect all of our information in the parsing phase, and then resolve the values later. **Performing Type Checking** - After performing type checking, we know both the type and value of our constant declarations.

\def\rq{\text{\textquoteright}}
\begin{aligned}
\lq N \rq &\mapsto 
\text{ConstEntry({\color{lightblue}INT\_TYPE, 10})}\\
\lq M \rq&\mapsto \text{ConstEntry({\color{lightblue}INT\_TYPE, -10})}\\
\lq x \rq &\mapsto \text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}\\
\lq T\rq &\mapsto\text{TypeEntry(SubrangeType(\_,NumberNode(INT\_TYPE, 1),}\\
&\text{ 
\ \ \ \ \ \ \ \ \ \ \ \ ConstIdNode(\lq N\rq, currentScope)))}
\end{aligned}
$$

2.2.5 - Parsing Constant Definitions in JavaCUP

  • A sequence of constant definitions is just a series of Constant definitions.

    ConstDefSeq ::= ConstDef
        |  ConstDefSeq ConstDef
        ;
    
  • A constant definition is of the form IDENTIFIER = {constant};

    • When we parse a constant definition, we attempt to add it to the current scope (i.e. add to the symbol table.
    • the addConstant method returns null if there is already an identifier with that name - in that case we throw an error.
    ConstDef ::= IDENTIFIER:id EQUALS Constant:c SEMICOLON
            {:
                /* The attribute idxleft represents the location of the first
                 * character of the IDENTIFIER token in the input stream.
                 * addConstant returns null if id is already defined 
                 * in the current scope */
                if(currentScope.addConstant(id, idxleft, c) == null) {
                    errors.error(id + " already declared in this scope", idxleft);
                }
            :}
        |  error 
        ;
    
  • When we parse these constants, we return a small abstract syntax tree (as described before) that will be resolved in the type checking phase.

    /* The rules for Constant construct a (mini) abstract syntax tree
     * for constant expressions (not to be confused with ExpNodes).
     */
    Constant ::= NUMBER:n
            {: 
                RESULT = new ConstExp.NumberNode(nxleft, Predefined.INTEGER_TYPE, n);
            :}
        |  MINUS:op Constant:c
            {: 
                RESULT = new ConstExp.NegateNode(opxleft, c);
            :}
        |  IDENTIFIER:id
            {: 
                RESULT = new ConstExp.ConstIdNode(idxleft, id, currentScope);
            :}
        |  error:err
            {:
                RESULT = new ConstExp.ErrorNode(errxleft);
            :}
        ;
    

2.2.6 - Parsing Variables in JavaCUP

xVarEntry(ReferenceType(IdRefType(‘T’)))\def\rq{\text{\textquoteright}} \lq x \rq\mapsto \text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}

  • When we declare a variable, we know what the name of the variable is, and its type.

    var x : T
    
  • At this point, we represent the type TT as IdRefType(‘T’)\def\rq{\text{\textquoteright}}\text{IdRefType(\lq T\rq)}

    • This essentially just denotes that we know that TT is some identifier, but we don’t know what type it is yet
    • As it is just a placeholder until the static checking phase, it will be removed (and replaced) by the appropriate type during static checking.
  • After the static checking phase, the type is resolved, and the symbol table entry for the type becomes:

    TTypeEntry(SubrangeType(INT_TYPE, 1, 10))\def\rq{\text{\textquoteright}} \lq T\rq\mapsto\text{\color{lightblue}TypeEntry(SubrangeType(INT\_TYPE, 1, 10))}

Type ::= TypeIdentifier:type
        {: 
            RESULT = type;
        :}
    | LBRACKET:subr Constant:lo RANGE Constant:hi RBRACKET
        {:
            RESULT = new Type.SubrangeType(subrxleft, lo, hi);
        :}
    | LCURLY:e EnumerationList:elist RBRACKET
        {:
            RESULT = new Type.EnumerationType(exleft, elist, currentScope);
        :}
    ;

TypeIdentifier ::= IDENTIFIER:id
        {: /* As the type identifier may not be defined at this point.
            * IdRefType records the id, as well as the symbol table scope
            * to look it up during type resolution in the static checker.
            */
            RESULT = new Type.IdRefType(idxleft, id, currentScope);
        :}
    | error:err
        {:
            RESULT = Type.ERROR_TYPE;
        :}
    ;
  • Note that in the code above that when we parse a TypeIdentifier, we return a new Type.IdRefType just as we have discussed above.

TTypeEntry(SubrangeType(_,NumberNode(INT_TYPE,1), ConstIdNode(‘N’, currentScope)))\def\rq{\text{\textquoteright}} \lq T\rq \mapsto\text{TypeEntry(SubrangeType(\_,NumberNode(INT\_TYPE,1), }\\\text{ConstIdNode(\lq N\rq, currentScope)))}

  • In resolving the type above, we don’t yet know what the base type of the subrange is.
    • However, at this point we do know its low value NumberNode(INT_TYPE)\color{lightblue}\text{NumberNode(INT\_TYPE)} and high value $$ \def\rq{\text{\textquoteright}} \color{lightblue}\text{ConstIdNode(\lq N\rq, currentScope)}

```java LBRACKET:subr Constant:lo RANGE Constant:hi RBRACKET {: RESULT = new Type.SubrangeType(subrxleft, lo, hi); :} ``` - This is once again reflected in our code - When we parse a subrange type, we pass in the location, low value and high value of our subrange type - The type of the subrange is dependent on what types the low and high values resolve to - Since they both resolve to integers, the subrange has base type of integer. - Once we’ve resolved the type entry $T$, we can then resolve the AST entry $\text{IdRefType(\lq T\rq)}$ and therefore resolve the type of the variable $x$. # 3.0 - Static Checking ## 3.1 - Static Checking a Program - To perform static checking on a program, we perform static checking on the main procedure: ```java public void visitProgramNode(DeclNode.ProcedureNode node) { beginCheck("Program"); // The main program is a special case of a procedure visitProcedureNode(node); endCheck("Program"); ``` ## 3.2 - Static Checking a Procedure - The main procedure node is static checked as follows: ```java /** * Procedure, function or main program node */ public void visitProcedureNode(DeclNode.ProcedureNode node) { beginCheck("Procedure"); SymEntry.ProcedureEntry procEntry = node.getProcEntry(); // Save the block's abstract syntax tree in the procedure entry procEntry.setBlock(node.getBlock()); // The local scope is that for the procedure. Scope localScope = procEntry.getLocalScope(); /* Resolve all references to identifiers within the declarations. */ localScope.resolveScope(); // Enter the local scope of the procedure currentScope = localScope; // Check the block of the procedure. visitBlockNode(node.getBlock()); // Restore the symbol table to the parent scope currentScope = currentScope.getParent(); endCheck("Procedure"); } ``` - We first resolve the (local) scope of the procedure. - The resolveScope() method resolves the type of each entry inside the symbol table (where beforehand, it was a mini AST). - We also statically check the block of code inside that procedure - However, we need to statically check it with respect to the scope of the procedure, so we update the currentScope before static checking the block. ```java currentScope = localScope; visitBlockNode(node.getBlock()); ``` - After we finish checking the block, we return to the parent scope ```java currentScope = currentScope.getParent(); ``` ### 3.2.1 - Detecting Cyclic Definitions - Consider the following declaration in which we cyclic definitions of both constants and types. ```pascal const N = M; M = -N; type S = T; T = S; ``` - If we detect a cyclic definition in the type resolution, the compiler should stop and warn the programmer. - In the case where we don’t have cyclic definitions, but definitions that depend on each other, we need to be careful about the order in which we resolve the types. - Consider the following case in which the variable x depends on the type T ```pascal const N = 10; M = -N; var x : T; type T = [1..N]; ``` # 4.0 - Resolving Symbol Table Entries - In our symbol table entry class, SymEntry we have the method resolve() intended to resolve the type of any symbol table entry. - The default implementation is listed below, in which it tries to resolve the entry if it hasn’t yet been resolved. ```java /** * Resolve any references to type identifiers in supplied scope */ public void resolve() { if (!resolved) { type = type.resolveType(); resolved = true; } } ``` - For the ConstEntry and VarEntry, this resolve method is re-defined. ## 4.1 - Resolving a Variable Entry - In resolving a variable entry, we: - Call the base resolve method - Determine the type of the variable - Allocate space for the variable, based on its type. ```java public static class VarEntry extends SymEntry { /** * resolving a variable requires allocating space for it */ @Override public void resolve() { if (!resolved) { // resolve the type of the variable super.resolve(); /* Space is allocated for the variable and the address of that * location placed in the entry for the variable. * The space allocated depends on the size of its type. */ Type baseType = ((Type.ReferenceType) type).getBaseType(); offset = scope.allocVariableSpace(baseType.getSpace()); } } } ``` - To resolve a variable entry, we need to resolve the type - how do we do this? ### 4.1.1 - Parsing a Variable ```java VarDecl ::= IDENTIFIER:id COLON TypeIdentifier:type SEMICOLON {: // Variables are always of ReferenceType. if(currentScope.addVariable(id, idxleft, type) == null) { errors.error(id + " already declared in this scope", idxleft); } :} | error ; ``` - When we parse a variable, we use the addVariable method of the current scope. ```java /** * Add a VARIABLE entry to the current scope. * * @return a reference to the new entry unless an entry with the same name * already exists in the current scope, in which case return null. */ public SymEntry.VarEntry addVariable(String name, Location loc, Type type) { SymEntry.VarEntry entry = new SymEntry.VarEntry(name, loc, type); return (SymEntry.VarEntry) addEntry(entry); } ``` - The addVariable method creates a new symbol table entry instance, with the provided location and type. - When we construct a new instance of SymEntry.VarEntry, it sets the type as ReferenceType(t) ```java /** * Symbol table entry for a variable identifier */ public static class VarEntry extends SymEntry { /** * offset of variable starting from 0 */ int offset; public VarEntry(String id, Location loc, Type t) { super(id, loc, new ReferenceType(t), false); readOnly = false; } } ``` - Therefore, when we specify our Java CUP syntax, we don’t want to pass it a `ReferenceType(t)`, but rather t itself (as the construction of a reference type is handled by the `SymEntry.VarEntry`. - If we were parsing the following variable definition, the type at this point would be set to $\text{IdRefType(\lq T\rq)}$

\def\rq{\text{\textquoteright}}
\lq x\rq\mapsto\text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}
$$
  • Additionally, observe that the Java CUP definition for a TypeIdentifier returns a new Type.IdRefType.

    TypeIdentifier ::= IDENTIFIER:id
            {: /* As the type identifier may not be defined at this point.
                * IdRefType records the id, as well as the symbol table scope
                * to look it up during type resolution in the static checker.
                */
                RESULT = new Type.IdRefType(idxleft, id, currentScope);
            :}
        | error:err
            {:
                RESULT = Type.ERROR_TYPE;
            :}
        ;
    

4.1.2 - Resolving a Variable Type.

  • In resolving a SymEntry that is a variable type, SymEntry.resolve() is called.

    /**
     * Resolve any references to type identifiers in supplied scope
     */
    public void resolve() {
        if (!resolved) {
            type = type.resolveType();
            resolved = true;
        }
    }
    
  • If the symbol table entry is a variable, and it hasn’t been resolved then the type will be an IdRefType\text{IdRefType}.

  • Therefore, type.resolveType() calls Type.IdRefType.resolveType().

  • This is another case where we must be careful of cyclic definitions.

    • If our type is unresolved, we try to look up the type in the symbol table
      • If the type exists in the symbol table, we resolve the symbol table entry.
      • Otherwise, we throw an error, as the type has not yet been defined.
    /**
     * Resolve the type identifier and return the real type.
     */
    @Override
    public Type resolveType() {
        // System.out.println("Resolving type id " + id);
        switch (status) {
            case Unresolved:
                status = Status.Resolving;
                realType = ERROR_TYPE;
                SymEntry entry = scope.lookupType(id);
                if (entry != null) {
                    /* resolve identifiers in the referenced type */
                    entry.resolve();
                    /* if status of this entry has resolved then there was a
                     * circular reference and we leave the realType as
                     * ERROR_TYPE to avoid other parts of the compiler getting
                     * into infinite loops chasing types.
                     */
                    if (status == Status.Resolving) {
                        realType = entry.getType();
                    }
                    assert realType != null;
                } else {
                    errors.error("undefined type: " + id, loc);
                }
                status = Status.Resolved;
                break;
            case Resolving:
                errors.error(id + " is circularly defined", loc);
                /* Will resolve to ERROR_TYPE */
                status = Status.Resolved;
                break;
            case Resolved:
                /* Already resolved */
                break;
        }
        // System.out.println("Resolved type id " + name + " to " + realType);
        return realType;
    }
    

  • We now need to resolve a SymEntry.TypeEntry (that is the object returned by the scope.lookupType(id) method called in IdRefType.resolve() above.

4.1.3 - Resolving a Subrange Type

  • Suppose we have the following type that we want to resolve - how do we do this?

4.2 - Resolving a Constant Entry

  • The resolution of a constant entry can have three possible states:

    • Unresolved The constant entry hasn’t yet been resolved, but we are now resolving it.
    • Resolving We are in the process of resolving the constant entry - we get into the “Resolving” state when we are currently in the process of resolving another constant entry - which requires the current constant entry’s type to be resolved.
      • This only occurs when there is a circular reference, so we report the error.
    • Resolved There is nothing to do, as the entry is resolved.
    public static class ConstantEntry extends SymEntry {
    		/**
         * Resolve references to constant identifiers and evaluate expression
         */
        @Override
        public void resolve() {
            switch (status) {
                case Unresolved:
                    status = Status.Resolving;
                    value = tree.getValue();
                    type = tree.getType();
                    status = Status.Resolved;
                    resolved = true;
                    break;
                case Resolving:
                    error("circular reference in constant expression", loc);
                    status = Status.Resolved;
                    resolved = true;
                    break;
                case Resolved:
                    break;
            }
        }
    }
    
  • If the current status is “Unresolved”, how do we resolve a constant entry

    • We evaluate the value of the variable “tree” which is of type ConstExp

    • The ConstExp type is essentially a small abstract syntax tree representing the value and type of the constant node.

    • The ConstExp.getValue method is defined as follows:

      public int getValue() {
          if (status == Status.Unresolved) {
              evaluate();
          }
          return value;
      }
      
    • The evaluate() method doesn’t have a default implementation; the specific method that gets called depends on what type of ConstExp has been defined

4.2.1 - Resolving a ConstEntry(NumberNode(...))

  • Before, we looked at the example where a constant was simply a NumberNode, defined as follows:

    NConstEntry(_, NumberNode(INT_TYPE, 10)\def\rq{\text{\textquoteright}} \lq N\rq \mapsto \text{ConstEntry(\_, NumberNode(INT\_TYPE, 10)}

  • NumberNode types are trivial to resolve, as we already know their type and value.

  • Therefore, the NumberNode type doesn’t have an implementation of the resolve method - it is already known.

  • In resolving it, we just set the type and value of the ConstEntry to the value of that NumberNode

  • This lookup is done by the ConstantEntry.resolve() method listed above:

    value = tree.getValue();
    type = tree.getType();
    

4.2.2 - Resolving a ConstEntry(NegateConst(...))

  • Before, we looked at an example where a constant was a negation node, defined as follows:

    NConstEntry(_, NegateConst(IdentConst(‘N’))\def\rq{\text{\textquoteright}} \lq N\rq\mapsto \text{ConstEntry(\_, NegateConst(IdentConst(\lq N\rq))}

  • To resolve the type of a negation node, we first need to resolve the type of it’s subexpression

    /**
     * A constant expression consisting of a negated constant expression
     */
    public static class NegateNode extends ConstExp {
        private final ConstExp subExp;
    
        public NegateNode(Location loc, ConstExp subExp) {
            super(loc, Status.Unresolved);
            this.subExp = subExp;
        }
    
        @Override
        protected void evaluate() {
            type = subExp.getType();
            if (type != Predefined.INTEGER_TYPE) {
                errors.error("can only negate an integer", loc);
            } else {
                value = -subExp.getValue();
            }
            status = Status.Resolved;
        }
    }
    
  • In this particular instance, our sub-expression is an instance of IdentConst(‘N’)\def\rq{\text{\textquoteright}}\text{IdentConst(\lq N\rq)}.

  • Note that in our evaluation method for a NegateNode we call subExp.getType() - this method essentially calls

  • Following our constant identifier entry instance $$ \def\rq{\text{\textquoteright}} \text{IdentConst(\lq N\rq)}

which we need to resolve to get the type. - In this specific case, our identifier entry type has already been resolved - We then set the type of the $\text{NegateNode}$ to that of the $\text{IdentConst}$ - If the type isn’t an integer we throw an error. - Otherwise, we set it to the negative of the subexpression and set the status of the node to resolved. ### 4.2.3 - Resolving a ConstIdNode 🌱 A `ConstIdNode` is a constant expression consisting of a reference to an identifier. - In resolving a Constant identifier node, we first attempt to look up the identifier in the symbol table. - We then check if it’s a constant entry. Else, throw an error. - We then evaluate (and set) its type and value - We finish by setting its status to resolved. ```java protected void evaluate() { // System.out.println( " ConstExp Resolving " + id + " " + status ); switch (status) { case Unresolved: status = Status.Resolving; SymEntry entry = scope.lookup(id); if (entry instanceof SymEntry.ConstantEntry) { SymEntry.ConstantEntry constEntry = (SymEntry.ConstantEntry) entry; type = constEntry.getType(); value = constEntry.getValue(); status = Status.Resolved; } else { errors.error("Constant identifier expected", loc); } break; case Resolving: errors.error(id + " is circularly defined", loc); /* Will resolve to error type and silly value. * Set to Resolved to avoid repeated attempts to * resolve the unresolvable, and hence avoid * unnecessary repeated error messages. */ status = Status.Resolved; break; case Resolved: /* Already resolved */ break; } } ``` ## 4.2 - Example Code and Scope. ```pascal var x: int; // first argument to C y: int; // second argument to C res: int; // result from C procedure C() = var n: int; // argument to fact f: int; // result from fact procedure fact() = begin // assume 0 <= n if n = 0 then f := 1 // f = n! else // 0 < n begin n := n - 1; call fact(); // f = n! n:= n + 1; f := f * n // f = n! end end; // fact begin // C n := x; call fact(); // f = x! res := f; n := y; call fact(); // f = y! res := res / f; // res = x! / y! n := x - y; call fact(); // f = (x-y)! res := res / f // res = x! / (y! * (x-y!)) end; // C begin // main program x := 6; y := 2; call C(); write res end ``` - Given the code shown above, we have four scopes ```java 0 : Predefined "int" ↦ TypeEntry(INT_TYPE) "boolean" ↦ TypeEntry(BOOL_TYPE) "false" ↦ ConstEntry(BOOL_TYPE, 0) "true" ↦ OperatorEntry(PLUS_OP, ARITHMETIC_BINARY) "_+_" ↦ OperatorEntry(PLIS_OP, ARITHMETIC_BINARY) 1: Main "x" ↦ VarEntry(ReferenceType(INT_TYPE)) "y" ↦ VarEntry(ReferenceType(INT_TYPE)) "res" ↦ VarEntry(ReferenceType(INT_TYPE)) "C" ↦ ProcedureEntry() 2: C "n" ↦ VarEntry(ReferenceType(INT_TYPE)) "f" ↦ VarEntry(ReferenceType(INT_TYPE)) "fact" ↦ ProcedureEntry() 3: fact ```