Week 8

🏷️ Replace this section with some keywords that can be used for searching, tagging and the indexing of this page.

Sections

🌱 Links to all Heading1 Blocks in this page.

LALR(1) Parsing in Java-CUP
Symbol Table for LALR(1) Compilers

1.0 - LALR(1) Parsing in Java-CUP

The Java-CUP Parser Generator generates a LALR(1) Parser and its complementary parsing automaton.

1.1 - Running the Java-CUP Parser Generator

We defined this simple grammar before, in the Bottom-Up Parsing Lecture, in which we have the non-terminal symbols $A, (, )$ .
The grammar is defined by two productions
$A\rightarrow a \ |\ (A)$

We can define this in Java-CUP using the following syntax

import java_cup.runtime.*;
terminal a;
terminal LPAREN, RPAREN;
nonterminal A;
A ::= a | LPAREN A RPAREN;

When running the Java-CUP Parser Generator, we can use the -dump option to get it to display the LALR(1) Parsing Automaton it generates.

1.1.1 - Grammar Summary

The first thing in Java-CUP’s output (with the -dump option) is a summary of the grammar

===== Terminals =====
[0]EOF [1]error [2]a [3]LPAREN [4]RPAREN

===== Non terminals =====
[0] A

===== Productions =====
[0] A ::= a
[1] $$

START ::= A EOF [2] A ::= LPAREN A RPAREN

```

Terminals
- JavaCUP inserts two new terminal symbols - EOF symbol and error symbol.
Non-Terminals
Productions
- Our two productions from our original grammar are [0] and [2].
- JavaCUP inserts the production `

START :== A EOF` just as we did in the theory before - it introduces a new start symbol, `START`, and a new production where the new start symbol produces the original start symbol. - This production has a small technical difference to our lecture example - here JavaCUP inserts the EOF symbol at the end of the production. ### 1.1.2 - Automaton Summary - Viable Prefix Recogniser - For each state, we define the LR(1) parsing items - which are pairs of LR(0) parsing items, and lookahead sets. ```java ===== Viable Prefix Recogniser ===== START lalr_state [0]: { [A ::= (*) LPAREN A RPAREN, {EOF }} [

START ::= () A EOF, {EOF }] [A ::= () a, {EOF }] } transition on LPAREN to state [3] transition on a to state [2] transition on A to state [1]


### 1.1.3 - Action Table

- The Action Table is a summary of parsing actions
    
    ```java
    From state #0
     [term 2:SHIFT(to state 2)] [term 3:SHIFT(to state 3]
    ```
    
- In this case term x is given by the terminal symbol summary above:
    
    ```java
    ===== Terminals =====
    [0]EOF [1]error [2]a [3]LPAREN [4]RPAREN
    ```
    
    - Where term 2 = a, term 3 = LPAREN

### 1.1.4 - Reduce Table

- The Reduce Table is a summary of transitions out of each state on non-terminal symbols
- Once the parser generates the LALR(1) parsing automaton, it really only needs to execute the Action Table and Reduce Table.
    
    ```java
    -----------------------------
    -------- REDUCE_TABLE--------
    From state #0
     [non term 0->state 1] 
    From state #1
    ```
    

## 1.2 - Giving JavaCUP a Non-LALR(1) Grammar

- What happens when we give JavaCUP a non-LALR(1) grammar?
- This grammar is not LALR(1) as it is ambiguous
    
    
$$L\rightarrow \epsilon\ |\ E\ |\ L\ E\\
    E\rightarrow n\ |\ LPAREN\ L\ RPAREN$
    
- It’s defined in JavaCUP syntax as follows:
    
    ```java
    import jav_cup.runtime.*;
    terminal n;
    terminal LPAREN, RPAREN;
    nonterminal L, E;
    
    L ::= /* empty */ | E | L E;
    E ::= n | LPAREN L RPAREN;
    ```
    
- When we try to run this however, we get a series of error messages saying that JavaCUP has run into a whole host of action conflicts.
- Removing the ambiguity (i.e. removing the $L\rightarrow E$ production) gives us a LALR(1) grammar, and therefore, JavaCUP is able to generate a parsing automaton.

## 1.3 - Giving JavaCUP an Ambiguous Grammar

- Suppose we introduce the dangling-else problem
    
    ```java
    Statement ::= KW_WHILE Condition:c KW_DO Statement:s 
    			{:
    					RESULT = new StatementNode.WhileNode(cxleft, c, s);
    			:}
    	| ...
    
    	| KW_IF Condition:c KW_THEN Statement:s1 ElsePart:s2
    			{:
    					RESULT = new StatementNode.IfNode(cxleft, c, s1, s2);
    			:}
    
    ElsePart ::=  KW_ELSE Statement:s2
    			{:
    					RESULT = s2;
    			:}
    		| /* empty */
    			{:
    					RESULT = new StatementNode.SkipNodec(ErrorHandler.NO_LOCATION);
    			:}
    			%prec KW_IF
    		;
    
    ```
    
- When we attempt to create the parsing automaton for this ambiguous grammar, we get conflicts:
    
    ```java
    Error : *** More conflicts encountered than expected - parser generation aborted
    ------ Cup v0.11b 20160615 (GIT 4ac7450) Parser Generation Summary -------
    	1 error and 3 warnings
    	43 terminals, 33 non-terminals and 79 productions declared, 
    	producing 130 unique parsing states
    	...
    ```
    
- To solve this, we can either:
    - Come up with a better grammar that resolves the abiguity
    - Specify precedence in PL0.cup which tells JavaCUP how to resolve ambiguities
        - We can tell JavaCUP to match the else to the closest if using precedence operators.
- To specify precedence, we can write the following, which specifies that matching KW_ELSE has a higher precedence than matching KW_IF
    
    ```java
    precedence nonassoc KW_IF;
    precedence nonassoc KW_ELSE;
    ```
    
- Note that in the code block above, we use the syntax `%prec` KW_IF which essentially means that we give the null alternative the same precedence as KW_IF (meaning that matching KW_ELSE has a higher precedence).
    - Therefore, JavaCUP will always try to match the KW_ELSE if possible, and default to matching the empty alternative otherwise.
- Re-running the JavaCUP Parser Generator, we receive no errors, and the parsing automaton is generated properly.

## 1.4 - JavaCUP Syntax Error Recovery

- JavaCUP attempts to perform Syntax Error Recovery.
- When JavaCUP detects a syntax error, it tries to replace some of the input with an error terminal symbol and continues.
    - We can use this symbol to try to recover.

### 1.4.1 - JavaCUP Syntax Error Recovery in Practice - LValue

- Here, if we can’t match an LValue, we return an ExpNode.ErrorNode to try to handle errors that arise..
    
    ```java
    LValue ::= IDENTIFIER:id
    		{:
    				/* At this stage the identifier could be either a constant identifier or
    				 * a variable identifier but which cannot be determined until static 
             * checking when the IdentifierNode will be transformed into either
             * ConstNode or a VariableNode or detected as invalid.
    				 */
    				 RESULT = new ExpNode.IdentifierNode(idxleft, id);
    		:}
    	| error:e
    		{:
    				 RESULT = new ExpNode.ErrorNode(exleft);
    		:}
    ```
    
- We use LValues in Factors:
    
    ```java
    Factor ::= PLUS Factor:e
            {:
                RESULT = e; 
            :}
        |  UnaryOperator:op Factor:e
            {:
                RESULT = new ExpNode.UnaryNode(opxleft, op, e);
            :}
        |  LPAREN Condition:c RPAREN
            {:
                RESULT = c;
            :}
        |  NUMBER:n
            {:
                RESULT = new ExpNode.ConstNode(nxleft, 
                        Predefined.INTEGER_TYPE, n);
            :}
        |  LValue:lval
            {:
                RESULT = lval;
            :}
        ;
    ```
    
- Suppose we wanted to introduce an error alternative into Factor, if we couldn’t match a Factor:
    
    ```java
    		| error:e
    				{:
    						RESULT = new ExpNode.ErrorNode(exleft);
    				:}
    ```
    
    - However, this is the wrong thing to do, as an LValue already has an error alternative; this would introduce an ambiguity as the compiler doesn’t know whether to match the error alternative of LValue or Factor.
- Notice that since $\text{Term}\rightarrow\text{LValue}$, $\text{Factor}\rightarrow\text{LValue}$ and $\text{Exp}\rightarrow\text{LValue}$ none of these need an error alternative.
- In general, our approach is to put these error alternatives as deeply nested as possible in our syntax tree.
- In addition to $\text{LValue}$, a $\text{Statement}$ also has an error alternative.

# 2.0 - Symbol Table for LALR(1) Compilers.


🌱 This example is talking specifically about the Assignment 2 LALR(1) Compiler.



- The parser generates both an Abstract Syntax Tree and a Symbol Table.
- The type checker type checks both the AST and Symbol Table.

- The symbol tracks information about all of the identifiers defined in our program
- Generate symbol table for PL0, which is a procedural programming language
    - This means that our program is hierarchically structured into procedures
    - Therefore, our symbol table is hierarchically structured into scopes.

- Consider the following PL0 program.
- The main method has a scope, which keeps track of the following:
    - Declaration of variables `x`, `y` & `res`
    - Declaration of `procedure C`
    - The main method has a parent scope - this special predefined scope is the scope that contains all pre-defined constants and types in our program.
- The scope of `procedure C` keeps track of the following:
    - Declaration of variables `n` & `f`
    - Declaration of `procedure fact`
    - This scope’s parent scope is the main procedure’s scope.
- The scope of procedure fact has:
    - (no declarations)
    - Parent scope is the scope of procedure C
- The static level describes how deep a scope is
    
    
    | Static Lvl | Scope |
    | --- | --- |
    | 0 | Predefined Scope |
    | 1 | Main Method |
    | 2 | procedure C (& any procedure defined in main) |
    | 3 | procedure fact |
    | ... | ... |

```pascal
var x: int;  // first argument to C
    y: int;  // second argument to C
    res: int;  // result from C
procedure C() =
  var n: int;  // argument to fact
      f: int;  // result from fact
  procedure fact() =
    begin
      // assume 0 <= n
      if n = 0 then
        f := 1
        // f = n!
      else // 0 < n
       begin
        n := n - 1;
        call fact();
        // f = n!
        n:= n + 1;
        f := f * n
        // f = n!
       end
    end; // fact
  begin // C
    n := x;
    call fact();
    // f = x!
    res := f;
    n := y;
    call fact();
    // f = y!
    res := res / f;
    // res = x! / y!
    n := x - y;
    call fact();
    // f = (x-y)!
    res := res / f
    // res = x! / (y! * (x-y!))
  end; // C
begin // main program
  x := 6;
  y := 2;
  call C();
  write res
end

We can observe that the Scope method implements these features:

public class Scope {
    /**
     * Offset of start of local variables from frame pointer
     */
    final static int LOCALS_BASE = 3;
    /**
     * Parent Scope
     */
    private final Scope parent;
    /**
     * Static level of this scope
     */
    private final int level;
    /**
     * Symbol table entry for the procedure (main program) that owns this scope
     */
    private final SymEntry.ProcedureEntry ownerEntry;

Note that the ownerEntry is an interesting feature:
- The main procedure “owns” the variable declarations of x, y & res
- The procedure C “owns” the variable declarations of n & f
- The predefined procedure is a special case, as it doesn’t have a procedure that owns it.
```
var x: int;  // first argument to C
    y: int;  // second argument to C
    res: int;  // result from C
procedure C() =
  var n: int;  // argument to fact
      f: int;  // result from fact
```

Our Scope class also defines a SortedMap which contains all of our symbol table entries

These symbol table entries are defined in src/syms/SymEntry.java

/**
 * Symbol table entries. A SortedMap is used to avoid issues with
 * hashing functions working differently on different implementations.
 * This only affects minor things like the order of allocating variables
 * and dumping symbol tables in trace backs. A Map/HashMap would still give a
 * valid implementation.
 */
private final SortedMap<String, SymEntry> entries;

Additionally our Scope class has an integer variable that keeps track of the amount of space allocated for variables in this current scope.
```
/**
 * space allocated for local variables within this scope
 */
private int variableSpace;
```

Additionally, scope extensions have been defined (for implemeting “for” statements)

/**
 * true if this is an extension of its parent scope
 */
private boolean extension;

2.1 - Identifier Lookup in Scope

To look up an identifier in a symbol table, we first check whether it’s within the current scope.
If it is not, we try looking it up in the parent scope (which recursively checks through parent scopes until we either find it, or reach the predefined scope).

/**
 * Lookup id starting in the current scope and
 * thence in the parent scope and so on.
 *
 * @param id identifier to search for.
 * @return symbol table entry for the id, or null if not found.
 */
public SymEntry lookup(String id) {
    /* Lookup the entry in the current scope */
    SymEntry entry = entries.get(id);
    if (entry == null && parent != null) {
        /* If the entry is not in the current scope
         * look it up in the parent scope, if there is one.
         */
        return parent.lookup(id);
    }
    return entry;
}

2.1 - Setup of Predefined Scope

public class SymbolTable {
    private final Scope predefinedScope;

    /**
     * Construct a symbol table and build the predefined scope
     * as its initial scope.
     */
    public SymbolTable() {
        super();
        SymEntry.ProcedureEntry predefined =
                new SymEntry.ProcedureEntry("<predefined>",
                        ErrorHandler.NO_LOCATION, null);
        predefinedScope = new Scope(null, 0, predefined);
        predefined.setLocalScope(predefinedScope);
        Predefined.addPredefinedEntries(predefinedScope);
    }

The Symbol Table class sets up the predefined scope for us - that is, when we construct a new SymbolTable class instance, it configures the predefinedScope.
- We create a new SymTable.ProcedureEntry instance as a dummy procedure (predefined), and pass it to the Scope constructor,
Since a procedure needs to know its scope, and a scope its procedure, we use the SymTable.ProcedureEntry.setLocalScope method to set the local scope of the procedure entry.
We additionally want to populate the predefined scope with everything that should be in it - that’s what Predefined.addPredefinedEntries(Scope predefined) does.
- This method adds in a series of types, operators

predefined.addType("int", ErrorHandler.NO_LOCATION, INTEGER_TYPE);
predefined.addType("boolean", ErrorHandler.NO_LOCATION, BOOLEAN_TYPE);
predefined.addOperator(Operator.NEG_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_UNARY);
predefined.addOperator(Operator.ADD_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.SUB_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.MUL_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.DIV_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GREATER_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LESS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);

2.2 - PL0 Parser - Setup of Predefined Scope

In PL0.cup, we need to set up both the predefined scope and the scope for the main method

2.2.1 - Parsing a Program

In setting up a new scope for the main method, we need to do a few things:
1. Create a new procedure entry for the main method
```
SymEntry.ProcedureEntry procMain = 
		            currentScope.addProcedure("<main>", ErrorHandler.NO_LOCATION);
```
2. We then test that everything has gone well, and that nothing weird has happened.
```
if(procMain  == null) {
    errors.fatal("Could not add main program to symbol table",
            ErrorHandler.NO_LOCATION);
}
```
3. We finally take our current scope, and add a new scope to it, for the main method
  - In doing this, we also update the current scope to be the scope of the main method.
  - It’s important that we do that before moving on and parsing other things (the next thing is a Block) as we want to parse these other things with respect to the main method’s scope.
```
currentScope = currentScope.newScope(procMain);
```

Program ::=  /* empty */ 
		    {:
		        /* This action occurs before the whole program is recognised.
		         * It constructs the initial symbol table with current scope the
		         * predefined scope. */
		        currentScope = new SymbolTable().getPredefinedScope();
		        /* Set up a dummy symbol table entry for the main program */
		        SymEntry.ProcedureEntry procMain = 
		            currentScope.addProcedure("<main>", ErrorHandler.NO_LOCATION);
		        if(procMain  == null) {
		            errors.fatal("Could not add main program to symbol table",
		                    ErrorHandler.NO_LOCATION);
		        }
		        /* Enter the scope for the main program and save the new local
		         * scope in main's symbol table entry */
		        currentScope = currentScope.newScope(procMain);
		    :}

2.2.2 - Parsing a Block

Block ::= DeclarationList:dl CompoundStatement:b
        {:
            RESULT = new StatementNode.BlockNode(bxleft, dl, b, currentScope);
        :}
    ;

A block is a DeclarationList followed by a CompoundStatement.

2.2.3 - Parsing a DeclarationList

DeclarationList ::= DeclarationList:dl ProcedureDef:p 
        {:
            /* Add a procedure declaration to the list of declarations */
            dl.addDeclaration(p);
            RESULT = dl;
        :}
    | DeclarationList:dl Declaration
        {:
            /* A non-procedure declaration is not added to the list
             * but added to the symbol table during its parsing. */
            RESULT = dl;
        :}
    | /* empty */
        {:
            RESULT = new DeclNode.DeclListNode();
        :}
    ;

As we parse each declaration, we append it to the declaration list.

Declaration ::= KW_CONST ConstDefSeq
    |  KW_TYPE TypeDefSeq
    |  KW_VAR  VarDeclSeq
    ;

These declarations can either be:
1. ProcedureDef (Procedure Definitions / Procedure Declarations) - these are produced by the first alternative in the DeclarationList production.
2. Constant Declarations
3. Type Declarations
4. Variable Declarations

2.2.4 - Adding Declarations to the Symbol Table

Consider the following declarations

const N = 10; M = -N;    // Constant Declaration
var x : T;               // Variable Declaration
type T = [1..N];         // Type Declaration

And their corresponding symbol table entries

\def\rq{\text{\textquoteright}} \begin{aligned} \lq N \rq &\mapsto \text{ConstEntry}(_, \text{NumberNode(INT_TYPE, 10))}\ \lq M \rq&\mapsto \text{ConstEntry}(_, \text{NegateConst(IdentConst(\lq N\rq)))}\ \lq x \rq &\mapsto \text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}\ \lq T\rq &\mapsto\text{TypeEntry(SubrangeType(_,NumberNode(INT_TYPE, 1),}\ &\text{ \ \ \ \ \ \ \ \ \ \ \ \ ConstIdNode(\lq N\rq, currentScope)))} \end{aligned}

l - Where $\text{INT\_TYPE}$ abbreviates $\text{INTEGER\_TYPE}$ in the compiler. - During the parsing phase, these declarations will be added to our symbol table - When we add these declarations to the symbol table, there is type information that hasn’t been resolved yet - this needs to get done during Type Checking. **Parsing a Constant**

\lq N \rq \mapsto \text{ConstEntry}(_, \text{NumberNode(INT_TYPE, 10))}\

- When parsing a constant, we add a new `ConstEntry(type, value)` to the symbol table. - However, at this stage, we don’t know what type it is (that’s what the first parameter is, and is currently left blank as it is still unknown). - Additionally, we haven’t yet resolved its value (here, denoted as a $\text{NumberNode}$). - We represent the constant as a small Abstract Syntax Tree (with the root being $\text{ConstEntry}$) - This AST will be later evaluated by the type checker - It is the role of the type checker to go through all of these declarations to evaluate the value. - A benefit of doing it this way is that some of our definitions depend on declarations after it - in this example, look at the declaration of the variable $x$, which depends on the type $T$. - Therefore, we collect all of our information in the parsing phase, and then resolve the values later. **Performing Type Checking** - After performing type checking, we know both the type and value of our constant declarations.

\def\rq{\text{\textquoteright}}
\begin{aligned}
\lq N \rq &\mapsto 
\text{ConstEntry({\color{lightblue}INT\_TYPE, 10})}\\
\lq M \rq&\mapsto \text{ConstEntry({\color{lightblue}INT\_TYPE, -10})}\\
\lq x \rq &\mapsto \text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}\\
\lq T\rq &\mapsto\text{TypeEntry(SubrangeType(\_,NumberNode(INT\_TYPE, 1),}\\
&\text{ 
\ \ \ \ \ \ \ \ \ \ \ \ ConstIdNode(\lq N\rq, currentScope)))}
\end{aligned}
$$

2.2.5 - Parsing Constant Definitions in JavaCUP

A sequence of constant definitions is just a series of Constant definitions.
```
ConstDefSeq ::= ConstDef
    |  ConstDefSeq ConstDef
    ;
```

A constant definition is of the form IDENTIFIER = {constant};

When we parse a constant definition, we attempt to add it to the current scope (i.e. add to the symbol table.
the addConstant method returns null if there is already an identifier with that name - in that case we throw an error.

ConstDef ::= IDENTIFIER:id EQUALS Constant:c SEMICOLON
        {:
            /* The attribute idxleft represents the location of the first
             * character of the IDENTIFIER token in the input stream.
             * addConstant returns null if id is already defined 
             * in the current scope */
            if(currentScope.addConstant(id, idxleft, c) == null) {
                errors.error(id + " already declared in this scope", idxleft);
            }
        :}
    |  error 
    ;

When we parse these constants, we return a small abstract syntax tree (as described before) that will be resolved in the type checking phase.

/* The rules for Constant construct a (mini) abstract syntax tree
 * for constant expressions (not to be confused with ExpNodes).
 */
Constant ::= NUMBER:n
        {: 
            RESULT = new ConstExp.NumberNode(nxleft, Predefined.INTEGER_TYPE, n);
        :}
    |  MINUS:op Constant:c
        {: 
            RESULT = new ConstExp.NegateNode(opxleft, c);
        :}
    |  IDENTIFIER:id
        {: 
            RESULT = new ConstExp.ConstIdNode(idxleft, id, currentScope);
        :}
    |  error:err
        {:
            RESULT = new ConstExp.ErrorNode(errxleft);
        :}
    ;

2.2.6 - Parsing Variables in JavaCUP

\def\rq{\text{\textquoteright}} \lq x \rq\mapsto \text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}

When we declare a variable, we know what the name of the variable is, and its type.
```
var x : T
```
At this point, we represent the type $T$ as $\def\rq{\text{\textquoteright}}\text{IdRefType(\lq T\rq)}$
- This essentially just denotes that we know that $T$ is some identifier, but we don’t know what type it is yet
- As it is just a placeholder until the static checking phase, it will be removed (and replaced) by the appropriate type during static checking.
After the static checking phase, the type is resolved, and the symbol table entry for the type becomes:
$\def\rq{\text{\textquoteright}} \lq T\rq\mapsto\text{\color{lightblue}TypeEntry(SubrangeType(INT\_TYPE, 1, 10))}$

Type ::= TypeIdentifier:type
        {: 
            RESULT = type;
        :}
    | LBRACKET:subr Constant:lo RANGE Constant:hi RBRACKET
        {:
            RESULT = new Type.SubrangeType(subrxleft, lo, hi);
        :}
    | LCURLY:e EnumerationList:elist RBRACKET
        {:
            RESULT = new Type.EnumerationType(exleft, elist, currentScope);
        :}
    ;

TypeIdentifier ::= IDENTIFIER:id
        {: /* As the type identifier may not be defined at this point.
            * IdRefType records the id, as well as the symbol table scope
            * to look it up during type resolution in the static checker.
            */
            RESULT = new Type.IdRefType(idxleft, id, currentScope);
        :}
    | error:err
        {:
            RESULT = Type.ERROR_TYPE;
        :}
    ;

Note that in the code above that when we parse a TypeIdentifier, we return a new Type.IdRefType just as we have discussed above.

\def\rq{\text{\textquoteright}} \lq T\rq \mapsto\text{TypeEntry(SubrangeType(\_,NumberNode(INT\_TYPE,1), }\\\text{ConstIdNode(\lq N\rq, currentScope)))}

In resolving the type above, we don’t yet know what the base type of the subrange is.
- However, at this point we do know its low value $\color{lightblue}\text{NumberNode(INT\_TYPE)}$ and high value $$ \def\rq{\text{\textquoteright}} \color{lightblue}\text{ConstIdNode(\lq N\rq, currentScope)}

```java LBRACKET:subr Constant:lo RANGE Constant:hi RBRACKET {: RESULT = new Type.SubrangeType(subrxleft, lo, hi); :} ``` - This is once again reflected in our code - When we parse a subrange type, we pass in the location, low value and high value of our subrange type - The type of the subrange is dependent on what types the low and high values resolve to - Since they both resolve to integers, the subrange has base type of integer. - Once we’ve resolved the type entry $T$, we can then resolve the AST entry $\text{IdRefType(\lq T\rq)}$ and therefore resolve the type of the variable $x$. # 3.0 - Static Checking ## 3.1 - Static Checking a Program - To perform static checking on a program, we perform static checking on the main procedure: ```java public void visitProgramNode(DeclNode.ProcedureNode node) { beginCheck("Program"); // The main program is a special case of a procedure visitProcedureNode(node); endCheck("Program"); ``` ## 3.2 - Static Checking a Procedure - The main procedure node is static checked as follows: ```java /** * Procedure, function or main program node */ public void visitProcedureNode(DeclNode.ProcedureNode node) { beginCheck("Procedure"); SymEntry.ProcedureEntry procEntry = node.getProcEntry(); // Save the block's abstract syntax tree in the procedure entry procEntry.setBlock(node.getBlock()); // The local scope is that for the procedure. Scope localScope = procEntry.getLocalScope(); /* Resolve all references to identifiers within the declarations. */ localScope.resolveScope(); // Enter the local scope of the procedure currentScope = localScope; // Check the block of the procedure. visitBlockNode(node.getBlock()); // Restore the symbol table to the parent scope currentScope = currentScope.getParent(); endCheck("Procedure"); } ``` - We first resolve the (local) scope of the procedure. - The resolveScope() method resolves the type of each entry inside the symbol table (where beforehand, it was a mini AST). - We also statically check the block of code inside that procedure - However, we need to statically check it with respect to the scope of the procedure, so we update the currentScope before static checking the block. ```java currentScope = localScope; visitBlockNode(node.getBlock()); ``` - After we finish checking the block, we return to the parent scope ```java currentScope = currentScope.getParent(); ``` ### 3.2.1 - Detecting Cyclic Definitions - Consider the following declaration in which we cyclic definitions of both constants and types. ```pascal const N = M; M = -N; type S = T; T = S; ``` - If we detect a cyclic definition in the type resolution, the compiler should stop and warn the programmer. - In the case where we don’t have cyclic definitions, but definitions that depend on each other, we need to be careful about the order in which we resolve the types. - Consider the following case in which the variable x depends on the type T ```pascal const N = 10; M = -N; var x : T; type T = [1..N]; ``` # 4.0 - Resolving Symbol Table Entries - In our symbol table entry class, SymEntry we have the method resolve() intended to resolve the type of any symbol table entry. - The default implementation is listed below, in which it tries to resolve the entry if it hasn’t yet been resolved. ```java /** * Resolve any references to type identifiers in supplied scope */ public void resolve() { if (!resolved) { type = type.resolveType(); resolved = true; } } ``` - For the ConstEntry and VarEntry, this resolve method is re-defined. ## 4.1 - Resolving a Variable Entry - In resolving a variable entry, we: - Call the base resolve method - Determine the type of the variable - Allocate space for the variable, based on its type. ```java public static class VarEntry extends SymEntry { /** * resolving a variable requires allocating space for it */ @Override public void resolve() { if (!resolved) { // resolve the type of the variable super.resolve(); /* Space is allocated for the variable and the address of that * location placed in the entry for the variable. * The space allocated depends on the size of its type. */ Type baseType = ((Type.ReferenceType) type).getBaseType(); offset = scope.allocVariableSpace(baseType.getSpace()); } } } ``` - To resolve a variable entry, we need to resolve the type - how do we do this? ### 4.1.1 - Parsing a Variable ```java VarDecl ::= IDENTIFIER:id COLON TypeIdentifier:type SEMICOLON {: // Variables are always of ReferenceType. if(currentScope.addVariable(id, idxleft, type) == null) { errors.error(id + " already declared in this scope", idxleft); } :} | error ; ``` - When we parse a variable, we use the addVariable method of the current scope. ```java /** * Add a VARIABLE entry to the current scope. * * @return a reference to the new entry unless an entry with the same name * already exists in the current scope, in which case return null. */ public SymEntry.VarEntry addVariable(String name, Location loc, Type type) { SymEntry.VarEntry entry = new SymEntry.VarEntry(name, loc, type); return (SymEntry.VarEntry) addEntry(entry); } ``` - The addVariable method creates a new symbol table entry instance, with the provided location and type. - When we construct a new instance of SymEntry.VarEntry, it sets the type as ReferenceType(t) ```java /** * Symbol table entry for a variable identifier */ public static class VarEntry extends SymEntry { /** * offset of variable starting from 0 */ int offset; public VarEntry(String id, Location loc, Type t) { super(id, loc, new ReferenceType(t), false); readOnly = false; } } ``` - Therefore, when we specify our Java CUP syntax, we don’t want to pass it a `ReferenceType(t)`, but rather t itself (as the construction of a reference type is handled by the `SymEntry.VarEntry`. - If we were parsing the following variable definition, the type at this point would be set to $\text{IdRefType(\lq T\rq)}$

\def\rq{\text{\textquoteright}}
\lq x\rq\mapsto\text{VarEntry(ReferenceType(IdRefType(\lq T\rq)))}
$$

Additionally, observe that the Java CUP definition for a TypeIdentifier returns a new Type.IdRefType.

TypeIdentifier ::= IDENTIFIER:id
        {: /* As the type identifier may not be defined at this point.
            * IdRefType records the id, as well as the symbol table scope
            * to look it up during type resolution in the static checker.
            */
            RESULT = new Type.IdRefType(idxleft, id, currentScope);
        :}
    | error:err
        {:
            RESULT = Type.ERROR_TYPE;
        :}
    ;

4.1.2 - Resolving a Variable Type.

In resolving a SymEntry that is a variable type, SymEntry.resolve() is called.

/**
 * Resolve any references to type identifiers in supplied scope
 */
public void resolve() {
    if (!resolved) {
        type = type.resolveType();
        resolved = true;
    }
}

If the symbol table entry is a variable, and it hasn’t been resolved then the type will be an $\text{IdRefType}$ .
Therefore, type.resolveType() calls Type.IdRefType.resolveType().

This is another case where we must be careful of cyclic definitions.

If our type is unresolved, we try to look up the type in the symbol table
- If the type exists in the symbol table, we resolve the symbol table entry.
- Otherwise, we throw an error, as the type has not yet been defined.

/**
 * Resolve the type identifier and return the real type.
 */
@Override
public Type resolveType() {
    // System.out.println("Resolving type id " + id);
    switch (status) {
        case Unresolved:
            status = Status.Resolving;
            realType = ERROR_TYPE;
            SymEntry entry = scope.lookupType(id);
            if (entry != null) {
                /* resolve identifiers in the referenced type */
                entry.resolve();
                /* if status of this entry has resolved then there was a
                 * circular reference and we leave the realType as
                 * ERROR_TYPE to avoid other parts of the compiler getting
                 * into infinite loops chasing types.
                 */
                if (status == Status.Resolving) {
                    realType = entry.getType();
                }
                assert realType != null;
            } else {
                errors.error("undefined type: " + id, loc);
            }
            status = Status.Resolved;
            break;
        case Resolving:
            errors.error(id + " is circularly defined", loc);
            /* Will resolve to ERROR_TYPE */
            status = Status.Resolved;
            break;
        case Resolved:
            /* Already resolved */
            break;
    }
    // System.out.println("Resolved type id " + name + " to " + realType);
    return realType;
}

We now need to resolve a SymEntry.TypeEntry (that is the object returned by the scope.lookupType(id) method called in IdRefType.resolve() above.

4.1.3 - Resolving a Subrange Type

Suppose we have the following type that we want to resolve - how do we do this?

4.2 - Resolving a Constant Entry

The resolution of a constant entry can have three possible states:

Unresolved The constant entry hasn’t yet been resolved, but we are now resolving it.
Resolving We are in the process of resolving the constant entry - we get into the “Resolving” state when we are currently in the process of resolving another constant entry - which requires the current constant entry’s type to be resolved.
- This only occurs when there is a circular reference, so we report the error.
Resolved There is nothing to do, as the entry is resolved.

public static class ConstantEntry extends SymEntry {
		/**
     * Resolve references to constant identifiers and evaluate expression
     */
    @Override
    public void resolve() {
        switch (status) {
            case Unresolved:
                status = Status.Resolving;
                value = tree.getValue();
                type = tree.getType();
                status = Status.Resolved;
                resolved = true;
                break;
            case Resolving:
                error("circular reference in constant expression", loc);
                status = Status.Resolved;
                resolved = true;
                break;
            case Resolved:
                break;
        }
    }
}

If the current status is “Unresolved”, how do we resolve a constant entry
- We evaluate the value of the variable “tree” which is of type ConstExp
- The ConstExp type is essentially a small abstract syntax tree representing the value and type of the constant node.
- The ConstExp.getValue method is defined as follows:
```
public int getValue() {
    if (status == Status.Unresolved) {
        evaluate();
    }
    return value;
}
```
- The evaluate() method doesn’t have a default implementation; the specific method that gets called depends on what type of ConstExp has been defined

4.2.1 - Resolving a `ConstEntry(NumberNode(...))`

Before, we looked at the example where a constant was simply a NumberNode, defined as follows:
$\def\rq{\text{\textquoteright}} \lq N\rq \mapsto \text{ConstEntry(\_, NumberNode(INT\_TYPE, 10)}$
NumberNode types are trivial to resolve, as we already know their type and value.
Therefore, the NumberNode type doesn’t have an implementation of the resolve method - it is already known.
In resolving it, we just set the type and value of the ConstEntry to the value of that NumberNode
This lookup is done by the ConstantEntry.resolve() method listed above:
```
value = tree.getValue();
type = tree.getType();
```

4.2.2 - Resolving a `ConstEntry(NegateConst(...))`

Before, we looked at an example where a constant was a negation node, defined as follows:
$\def\rq{\text{\textquoteright}} \lq N\rq\mapsto \text{ConstEntry(\_, NegateConst(IdentConst(\lq N\rq))}$

To resolve the type of a negation node, we first need to resolve the type of it’s subexpression

/**
 * A constant expression consisting of a negated constant expression
 */
public static class NegateNode extends ConstExp {
    private final ConstExp subExp;

    public NegateNode(Location loc, ConstExp subExp) {
        super(loc, Status.Unresolved);
        this.subExp = subExp;
    }

    @Override
    protected void evaluate() {
        type = subExp.getType();
        if (type != Predefined.INTEGER_TYPE) {
            errors.error("can only negate an integer", loc);
        } else {
            value = -subExp.getValue();
        }
        status = Status.Resolved;
    }
}

In this particular instance, our sub-expression is an instance of $\def\rq{\text{\textquoteright}}\text{IdentConst(\lq N\rq)}$ .
Note that in our evaluation method for a NegateNode we call subExp.getType() - this method essentially calls
Following our constant identifier entry instance $$ \def\rq{\text{\textquoteright}} \text{IdentConst(\lq N\rq)}

which we need to resolve to get the type. - In this specific case, our identifier entry type has already been resolved - We then set the type of the $\text{NegateNode}$ to that of the $\text{IdentConst}$ - If the type isn’t an integer we throw an error. - Otherwise, we set it to the negative of the subexpression and set the status of the node to resolved. ### 4.2.3 - Resolving a ConstIdNode 🌱 A `ConstIdNode` is a constant expression consisting of a reference to an identifier. - In resolving a Constant identifier node, we first attempt to look up the identifier in the symbol table. - We then check if it’s a constant entry. Else, throw an error. - We then evaluate (and set) its type and value - We finish by setting its status to resolved. ```java protected void evaluate() { // System.out.println( " ConstExp Resolving " + id + " " + status ); switch (status) { case Unresolved: status = Status.Resolving; SymEntry entry = scope.lookup(id); if (entry instanceof SymEntry.ConstantEntry) { SymEntry.ConstantEntry constEntry = (SymEntry.ConstantEntry) entry; type = constEntry.getType(); value = constEntry.getValue(); status = Status.Resolved; } else { errors.error("Constant identifier expected", loc); } break; case Resolving: errors.error(id + " is circularly defined", loc); /* Will resolve to error type and silly value. * Set to Resolved to avoid repeated attempts to * resolve the unresolvable, and hence avoid * unnecessary repeated error messages. */ status = Status.Resolved; break; case Resolved: /* Already resolved */ break; } } ``` ## 4.2 - Example Code and Scope. ```pascal var x: int; // first argument to C y: int; // second argument to C res: int; // result from C procedure C() = var n: int; // argument to fact f: int; // result from fact procedure fact() = begin // assume 0 <= n if n = 0 then f := 1 // f = n! else // 0 < n begin n := n - 1; call fact(); // f = n! n:= n + 1; f := f * n // f = n! end end; // fact begin // C n := x; call fact(); // f = x! res := f; n := y; call fact(); // f = y! res := res / f; // res = x! / y! n := x - y; call fact(); // f = (x-y)! res := res / f // res = x! / (y! * (x-y!)) end; // C begin // main program x := 6; y := 2; call C(); write res end ``` - Given the code shown above, we have four scopes ```java 0 : Predefined "int" ↦ TypeEntry(INT_TYPE) "boolean" ↦ TypeEntry(BOOL_TYPE) "false" ↦ ConstEntry(BOOL_TYPE, 0) "true" ↦ OperatorEntry(PLUS_OP, ARITHMETIC_BINARY) "_+_" ↦ OperatorEntry(PLIS_OP, ARITHMETIC_BINARY) 1: Main "x" ↦ VarEntry(ReferenceType(INT_TYPE)) "y" ↦ VarEntry(ReferenceType(INT_TYPE)) "res" ↦ VarEntry(ReferenceType(INT_TYPE)) "C" ↦ ProcedureEntry() 2: C "n" ↦ VarEntry(ReferenceType(INT_TYPE)) "f" ↦ VarEntry(ReferenceType(INT_TYPE)) "fact" ↦ ProcedureEntry() 3: fact ```