Week 3.2

🏷️ Replace this section with some keywords that can be used for searching, tagging and the indexing of this page.

Sections

🌱 Links to all Heading1 Blocks in this page.

Recursive Descent Parsing
Static Semantics of PL0
Symbol Table
1. Well-Typed Expressions
2. Static Semantic Rules
3. Well-Formed Rules
Implementation of Static Semantic Checking

1.0 - Recursive Descent Parsing

1.1 - Tutorial 4 Recursive Descent Parser Code

🌱 So we’ve got some Java code that can parse PL0 code. How do we get it to actually parse the code?

The src/parse/Parser.java file contains the definition for the Parser class.
When opened in IntelliJ, we have a run configuration that is configured to run on the file that is currently open.
We are essentially running the PL0 Recursive Descent Parser, pl0.PL0_RD on the file that’s currently open (denoted by $FilePath$ ) in the current directory $MODULE_WORKING_DIRS$
Suppose we open the test-base0-abs.pl0 file in the test-pgm directory of the project root and compile it:

Untitled

Compiling test-base0-abs.pl0
Parsing complete
Static semantic analysis complete
Running ...
100

var x: int;
    y: int;
begin
  x := -100;
  if x < 0 then y := -x else y := x;
  write y
end

Parsing complete indicates that the program has no syntactical errors.
Static semantic analysis complete indicates that there are no type checking errors

🌱 The PL0 Recursive Descent Parser implemented here is using the error recovery strategy described in the previous lecture. What would the output be if we intentionally mess up the code?

Suppose in the source code we replace the assignment operator := with =

var x: int;
    y: int;
begin
  x = -100; // Replaced :=  with =
  if x < 0 then y := -x else y := x;
  write y
end

Compiling test-base0-abs.pl0
    4   x = -100;
*****     ^  Error: Parse error, expecting ':=' in Assignment
Parsing complete
Static semantic analysis complete
1 error detected.

What if we had two errors in our source code:
1. The replacement from before
2. The extra then keyword

var x: int;
    y: int;
begin
  x = -100; // Replaced :=  with =
  if x < 0 then then y := -x else y := x; // Extra then keyword.
  write y
end

Compiling test-base0-abs.pl0
    4   x = -100;
*****     ^  Error: Parse error, expecting ':=' in Assignment
    5   if x < 0 then then y := -x else y := x;
*****                 ^ Error: 'then' cannot start a statement.
Parsing complete
Static semantic analysis complete
2 errors detected.

1.2 - Tutorial 4 PL0 Parser Implementation

🌱 Specifically looking at the implementation of the PL0 parser for Expressions and Statements.

This parser is based on the error recovery scheme that was discussed in the previous lecture.
To implement this, we have a series of parse methods that perform the synchronisation for us at the start and end of each parse method for Expressions, Statements etc.

private final ParseMethod<ExpNode> exp = new ParseMethod<>(
    (Location loc) -> new ExpNode.ErrorNode(loc));

Configure exp.parse to return an ExpNode
If it encounters an error during the synchronisation phase, return an ExpNode.ErrorNode to indicate that there was a problem.

1.2.1 - Constant Declarations

🌱 Create constants that represent the start sets for our non-terminal symbols.

/**
 * Set of tokens that may start an LValue.
 */
private final static TokenSet LVALUE_START_SET =
        new TokenSet(Token.IDENTIFIER);
private final static TokenSet FACTOR_START_SET =
        LVALUE_START_SET.union(Token.NUMBER, Token.LPAREN);
private final static TokenSet TERM_START_SET =
        FACTOR_START_SET;
private final static TokenSet EXP_START_SET =
        TERM_START_SET.union(Token.PLUS, Token.MINUS);
private final static TokenSet REL_CONDITION_START_SET =
        EXP_START_SET;
private final static TokenSet CONDITION_START_SET =
        REL_CONDITION_START_SET;

1.2.2 - ParseRelCondition

private ExpNode parseRelCondition(TokenSet recoverSet) {
    return exp.parse("RelCondition", REL_CONDITION_START_SET, recoverSet,
            () -> {
                /* The current token is in REL_CONDITION_START_SET */
                ExpNode cond = parseExp(recoverSet.union(REL_OPS_SET));
                if (tokens.isIn(REL_OPS_SET)) {
                    Location loc = tokens.getLocation();
                    Operator operatorCode =
                            parseRelOp(recoverSet.union(EXP_START_SET));
                    ExpNode right = parseExp(recoverSet);
                    cond = new ExpNode.BinaryNode(loc, operatorCode, cond, right);
                }
                return cond;
            });
}

Note that TokenSet recoverSet is passed as a parameter - this is required as we’re doing error recovery.
Use exp.parse(...) to perform the synchronisation at the start and end of the parsing for us. The parameters to the method are:
1. Name This is used in the debugger to print out more meaningful messages of what failed.
2. StartSet The set of tokens from which a RelCondition can start with
3. RecoverSet The set of tokens from which a RelCondition can start with
4. Anonymous Function A function that does parsing for the RelCondition
5. The Anonymous Function is used to define the behaviour of the parser for a RelCondition (in this case)
Since we want to build up the AST representation of the code that we’re parsing, we want to return an ExpNode
- So rather than using parse(...) we use return exp.parse(...)
- A RelCondition is defined by the following production:
  $\text{RelCondition}\rightarrow\text{Exp [RelOp Exp]}$
- If the RelCondition is just an Expression, we can just use the parse method and return it
```
ExpNode cond = parseExp(recoverSet.unino(REL_OPS_SET));
...
return cond;
```
- Otherwise, we have to parse the $\text{[RelOp Exp]}$ component of the production
```
ExpNode cond = parseExp(recoverSet.union(REL_OPS_SET));
if (tokens.isIn(REL_OPS_SET)) { // I.e. are we at the start of a RelOp
    Location loc = tokens.getLocation(); // Required for BinaryNode constructor
    Operator operatorCode =
            parseRelOp(recoverSet.union(EXP_START_SET));
    ExpNode right = parseExp(recoverSet);
    cond = new ExpNode.BinaryNode(loc, operatorCode, cond, right);
}
return cond;
```
  - The parseRelOp operation is defined below here
  - Note that the recoverSet is anything that a RelCondition can start with, and the set of tokens that can follow what we’re parsing
    - So when we’re parsing the first Expression in the production, the recoverSet added with the RelOp start set is what we use as the recover set in this instance.
    - When we’re parsing the RelOp, the recoverSet added with the Expression start set is what we use as the recover set in this instance.
    - When we’re parsing the final expression, the recoverSet is just the set of tokens that the production can start with (as there is nothing that follows)
  - After we finish parsing the tokens in the production, we can create the BinaryNode and return it.

1.2.3 - ParseRelOp

🌱 Note that the parseRelOp method doesn’t return an ExpNode - it returns an instance of Operator.

The parseRelOp method must parse the following production
$\small\text{RelOp}\rightarrow\text{EQUALS | NEQUALS | LESS | GREATER | LEQUALS | GEQUALS}$
To do this, we initialise a parse method that returns an Operator:
```
private final ParseMethod<Operator> op = new ParseMethod<>(
            (Location loc) -> Operator.INVALID_OP);
```
- Note here that if there’s an error, we return an Operator.INVALID_OP instead of an ExpNode.ErrorNode
- Also note that the tree package contains all implementations of the nodes used in an Abstract Syntax Tree (AST), such as ExpNode and StatementNode

We then define the parseRelOp method itself

private Operator parseRelOp(TokenSet recoverSet) {
    return op.parse("RelOp", REL_OPS_SET, recoverSet,
            () -> {
                Operator operatorCode = Operator.INVALID_OP;
                switch (tokens.getKind()) {
                    case EQUALS:
                        operatorCode = Operator.EQUALS_OP;
                        tokens.match(Token.EQUALS); /* cannot fail */
                        break;
                    case NEQUALS:
                        operatorCode = Operator.NEQUALS_OP;
                        tokens.match(Token.NEQUALS); /* cannot fail */
                        break;
                    case LESS:
                        operatorCode = Operator.LESS_OP;
                        tokens.match(Token.LESS); /* cannot fail */
                        break;
                    case GREATER:
                        operatorCode = Operator.GREATER_OP;
                        tokens.match(Token.GREATER); /* cannot fail */
                        break;
                    case LEQUALS:
                        operatorCode = Operator.LEQUALS_OP;
                        tokens.match(Token.LEQUALS); /* cannot fail */
                        break;
                    case GEQUALS:
                        operatorCode = Operator.GEQUALS_OP;
                        tokens.match(Token.GEQUALS); /* cannot fail */
                        break;
                    default:
                        // If we get to here, there has been a fatal implementation 
                        // error in the cmocompiler.
                        fatal("parseRelOp");
                }
                return operatorCode;
            });
}

Note here that we could choose conditional statements to implement the choice in this production
- However, here we use a switch statement as all of the branch have a start set that contains just one token.
- If we get to the default: case, there is a fatal error, as these are all of the possible terminal symbols for the RelOp non-terminal symbol.

1.3 - StatementNodes

🌱 Parsing Statements

As before, we define a ParseMethod that returns a StatementNode

private final ParseMethod<StatementNode> stmt = new ParseMethod<>(
        (Location loc) -> return new StatementNode.ErrorNode(loc));

Returns a StatementNode.ErrorNode if an error is encountered when parsing

1.3.1 - Parsing a Conditional Statement / If-Statement

private StatementNode parseIfStatement(TokenSet recoverSet) {
  return stmt.parse("If Statement", Token.KW_IF, recoverSet,
      () -> {
          /* The current token is KW_IF */
          tokens.match(Token.KW_IF); /* cannot fail */
          Location loc = tokens.getLocation();
          ExpNode cond = parseCondition(recoverSet.union(Token.KW_THEN));
          tokens.match(Token.KW_THEN, STATEMENT_START_SET);
          StatementNode thenClause =
              parseStatement(recoverSet.union(Token.KW_ELSE));
          tokens.match(Token.KW_ELSE, STATEMENT_START_SET);
          StatementNode elseClause = parseStatement(recoverSet);
          return new StatementNode.IfNode(loc, cond, thenClause, elseClause);
      });
}

We use the stmt.parse(...) method to perform synchronisation at both the start and end of the parsing
Supply as arguments
1. Name of rule,
2. Start set
3. Recover set (provided as parameter)
4. Anonymous parsing function
In the anonymous parsing function, we want to parse the production
$\scriptsize\text{IfStatement}\rightarrow\text{KW\_IF Condition KW\_THEN Statement KW\_ELSE Statement}$
- This line cannot fail as it should only be called if the current token is Token.KW_IF
  - Therefore, don’t need to use the error recovery version.
```
tokens.match(Token.KW_IF); // Match and consume
```
- We then store the current location for later.
  - This is done, as we essentially want to treat the condition of the if statement as the location of the Condition itself (instead of the location of Token.KW_IF)
```
Location loc = tokens.getLocation();
```
- Following this, we parse the condition itself.
  - The recoverSet in this case is anything that the if statement can start with and the next token in the production (KW_THEN since we’re parsing Condition)
```
ExpNode cond = parseCondition(recoverSet.union(Token.KW_THEN));
```
- We then parse the KW_THEN token - we need to use the version with error recovery, as we can’t guarantee that the next token is KW_THEN.
  - The set of tokens that can immediately follow KW_THEN are contained in the TokenSet, STATEMENT_START_SET
```
tokens.match(Token.KW_THEN, STATEMENT_START_SET);
```
- After we parse the KW_THEN token, we parse the statement immediately following it (the then clause).
  - We store this so that we can use it to build up our AST
```
StatementNode thenClause = parseStatement(recoverSet.union(Token.KW_ELSE));
```
- Next, we’re expecting the KW_ELSE token but we’re not sure so we use the error recovery version of tokens.match
```
tokens.match(Token.KW_ELSE, STATEMENT_START_SET);
```
- After we parse the KW_ELSE token we parse the statement immediately following it (the then clause).
  - Here, the recoverSet is just the recoverSet parameter passed in as we are at the end of the production
```
StatementNode elseClause = parseStatement(recoverSet);
```
- We then construct the StatementNode.IfNode object and return that as part of our AST
```
return new StatementNode.IfNode(loc, cond, thenClause, elseClause);
});
```

1.3.2 - Parsing a While Statement

From the parseStatement function, we have:

private StatementNode parseStatement(TokenSet recoverSet) {
    return stmt.parse("Statement", STATEMENT_START_SET, recoverSet,
        () -> {
            /* The current token is in STATEMENT_START_SET.
             * Instead of using a cascaded if-the-else, as indicated in
             * the recursive descent parsing notes, a simpler approach
             * of using a switch statement can be used because the
             * start set of every alternative contains just one token. */
            switch (tokens.getKind()) {
                case IDENTIFIER:
                    return parseAssignment(recoverSet);
                case KW_WHILE:
                    return parseWhileStatement(recoverSet);
                ...
                default:
                    fatal("parseStatement");
                    // To keep the Java compiler happy - can't reach here
                    return new StatementNode.ErrorNode(tokens.getLocation());
            }
        });
}

So when we call parseWhileStatement we know that the token is Token.KW_WHILE.
We still perform synchronisation so that (a) the format is consistent (b) synchronisation is performed at the end.

The production for a While Statement is given as:

\scriptsize\text{WhileStatement}\rightarrow\text{KW\_WHILE Condition KW\_DO Statement}

private StatementNode parseWhileStatement(TokenSet recoverSet) {
    return stmt.parse("While Statement", Token.KW_WHILE, recoverSet,
            () -> {
                /* The current token is KW_WHILE */
                tokens.match(Token.KW_WHILE); /* cannot fail */
                Location loc = tokens.getLocation();
                ExpNode cond = parseCondition(recoverSet.union(Token.KW_DO));
                tokens.match(Token.KW_DO, STATEMENT_START_SET);
                StatementNode statement = parseStatement(recoverSet);
                return new StatementNode.WhileNode(loc, cond, statement);
            });
}

1.4 - Types of AST Nodes

1.4.1 - Expression Nodes

The ExpNode class in the tree package implements subclasses for each of the subtypes of ExpNodes
- ErrorNode, ConstNode, IdentifierNode, VariableNode, BinaryNode, UnaryNode, DereferenceNode, NarrowSubrangeNode, WidenSubrangeNode
- Each of these nodes have a location and type as well as their own individual parameters
  - For example, a BinaryNode has the constructor:
```
public BinaryNode(Location loc, Operator op, ExpNode left, ExpNode right);
```
- We define more constant declarations that will help us with parsing statements:
```
/**
 * Set of tokens that may start a Statement.
 */
private final static TokenSet STATEMENT_START_SET =
        LVALUE_START_SET.union(Token.KW_WHILE, Token.KW_IF,
                Token.KW_READ, Token.KW_WRITE,
                Token.KW_CALL, Token.KW_BEGIN);
```

2.0 - Static Semantics of PL0

🌱 If implemented correctly, the static semantic checker of our compiler should pick up on all of the type errors.

Consider the following code - it is syntactically correct but it is not semantically / type correct.

const C = 42;
type  S = [-C..C];
var   b : boolean;
      y : S;
begin // main
  y := b + 42;   // Addition between boolean and integer
  C := 27;       // Assigmment to a constant
  if y then y := 0 else y := 1 // y as condition (subrange type) is not boolean
end

2.1 - PL0 Concrete vs Abstract Syntax

🌱 Our static semantics rules determine what we allow (and conversely, what we don’t allow) in our programming language grammar

The parsing phase of the compiler parses the concrete syntax of the program and generates an abstract syntax tree
The (concrete) syntax of a programming language describes the form of the language when it is treated as a sequence of terminal symbols (lexical tokens)
The abstract syntax of a language represents the same constructs as a data structure, an abstract syntax tree
- All static semantics rules are expressed using the abstract syntax of the language

2.1.1 - Abstract Syntax of PL0

\text{program\ \ \ \ ::==\ \ \ \ block}\\ \text{block\ \ \ \ ::== blk(ds,s)}\\ \text{ds\ \ \ \ ::== id}\mapsto \text{d}\\ \text{d\ \ \ \ ::==\ \ \ \ const(c)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ | type(t)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ | var(t)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | proc(block)}\\ \text{c\ \ \ \ ::== n | id | op(-\_,c)}\\ \text{t\ \ \ \ ::==\ \ \ \ id | [c..c]}\\ \text{s\ \ \ \ ::== assign(lv,e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | write(e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | read(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | call(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | if(e,s,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | while(e,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | list(seq s)}\\ \text{lv\ \ \ \ ::==\ \ \ \ id}\\ \text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}\\ \text{unary\ \ \ \ ::==\ \ \ \ -\_}\\ \text{binary\ \ \ \ ::==\ \ \ \ \_+\_ | \_-\_ | \_*\_ | \_/\_ | \_=\_ | \_}\ne\text{\_ | \_<\_ | \_}\le\text{\_ | \_>\_ | \_}\ge\text{\_}

$\color{lightblue}\text{program\ \ \ \ ::==\ \ \ \ block}$ A program is a block
$\color{lightblue}\text{block\ \ \ \ ::==\ \ \ \ blk(ds,s)}$ A block has both declarations (ds) and statements (s)
$\color{lightblue}\text{ds\ \ \ \ ::==\ \ \ \ id}\mapsto\text{d}$ A declaration is a mapping from identifiers to individual declarations
$\color{lightblue}\text{d\ \ \ \ ::==\ \ \ \ const(c) | type(t) | var(t) | proc(block)}$ $d ::== const(c) | type(t) | var(t) | proc(block)$ Each individual declaration is either:
- A constant
- A type
- A variable
- A procedure
$\color{lightblue}\text{c\ \ \ \ ::== n | id | op(-\_,c)}$ A constant has a parameter c, which is either a number, identifier, or unary operator applied to a number
$\color{lightblue}\text{t\ \ \ \ ::==\ \ \ \ id | [c..c]}$ A type $\text{t}$ is either an identifier or subrange from one constant to another

\color{lightblue}{\text{s\ \ \ \ ::== assign(lv,e) | write(e)}\ \text{ | read(lv)}\ \text{ | call(lv)}\ \text{ | if(e,s,s)}\ \text{ | while(e,s)}\ \text{ | list(seq s)}\}

A statement is either an assignment, a write operation, read operation, call operation, conditional statement, while loop or a list of sequences - $\color{lightblue}lv\ \ \ \ ::==\ \ \ \ id$ A left value is an identifier - $\color{lightblue} \text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}$ Expressions are either numbers, left values, unary operators or binary operators. - $\color{lightblue}\text{unary\ \ \ \ ::==\ \ \ \ -\_}$ Our starting unary operator is the $\color{lightblue}-$ symbol (the underscore indicates where arguments shall go.) - $\color{lightblue}\text{binary\ \ \ \ ::==\ \ \ \ \_+\_ | \_-\_ | \_*\_ | \_/\_ | \_=\_ | \_}\ne\text{\_ | \_<\_ | \_}\le\text{\_ | \_>\_ | \_}\ge\text{\_}$ The binary operators in PL0 are indicated above, where the underscore indicates where arguments shall go ### 2.1.2 - Abstract Syntax Example 🌱 Consider the following code, and its’ abstract syntax. ```pascal const C = 42; type S = [-C..C]; var b : boolean; y : S; ``` - From this, let’s construct it’s abstract syntax - Let’s first map out the declarations, $\color{lightblue}\text{ds}$

\color{lightblue}\text{ds}=\{C\mapsto\text{const(42),}\\
\text{S}\mapsto\text{type([op(-\_,C)..C]}\\
\text{b}\mapsto\text{var(boolean),}\\
\text{y}\mapsto\text{var(S)}

\}
$$

Next, let’s map out the statements:
$\color{lightblue}\text{s}=\text{assign(y, op(\_+\_,(op(-\_,C),27)))}$
- What does this actually mean?
1. We assign the value of $\color{lightblue}\text{op(\_+\_,(op(-\_,C),27))}$ to the variable $\color{lightblue}\text{y}$
2. In the statement $\color{lightblue}\text{op(\_+\_,(op(-\_,C),27))}$ , we are applying the binary operator $\color{lightblue}\text{+}$ to two arguments - $\color{lightblue}\text{op(-\_,C)}$ and $\color{lightblue}\text{27}$ .
3. The statement $\color{lightblue}\text{op(-\_,C)}$ applies the unary operator $\color{lightblue}-$ to the constant $\color{lightblue}27$

2.2 - PL0 Types - Scalar and Reference Types

The scalar types are:
- Integers (type int)
- Booleans (type boolean)
- Subrange, where subrange(T, lower, upper) is the subrange of values in base type T from lower to upper where $\text{lower}\le\text{upper}$
- These are our primitive, predefined types.

There are also reference (or location) types, ref(T) that describe variables

To fully describe variables, we not only need to know its type, but also where it’s stored
Hence, variables are reference types.

const  C = 42;          // C has type int
type   S = [-C..C];     // S is the type subrange(int, -42, 42)
var    b : boolean;     // b has type ref(boolean)
       y : S;           // y has type ref(subrange(int, -42, 42)

There are product types, where $\color{lightblue}T_1\times T_2$ is the type of pairs $\color{lightblue}(x,y)$ for $\color{lightblue}x\in T_1,y\in T_2$
There are function types, where $\color{lightblue}T_1\rightarrow T_2$ is the type of functions with arguments of type $\color{lightblue}T_1$ and results of type $\color{lightblue}T_2$

We can combine product types and function types to create more sophisticated operators:

$-\_$	$\text{int}\rightarrow\text{int}$
$\_+\_$	$\text{int}\times\text{int}\rightarrow\text{int}$
$\_=\_$	$\text{int}\times\text{int}\rightarrow\text{boolean}$
$\_=\_$	$\text{boolean}\times\text{boolean}\rightarrow\text{boolean}$

Note here that the definition for equality is overloaded - for both $\color{lightblue}\text{int}\times\text{int}$ and $\color{lightblue}\text{boolean}\times\text{boolean}$

The following summarises the semantic types shown above:
$\color{lightblue}T\ \ ::==\ \ \text{int | boolean | subrange(T,}\Z,\Z\text{) | ref(T) | T}\times\text{T | T}\rightarrow\text{T}$
- Where T for a subrange must be $\color{lightblue}\text{int}$ or $\color{lightblue}\text{boolean}$
- Where $\color{lightblue}\Z$ stands for the integers

3.0 - Symbol Table

🌱 In the Parsing phase of a compiler, an AST and Symbol Table are created.

Entries in the symbol table are mappings from identifiers ( $\color{lightblue}\text{id}$ ) to symbol table entries $\color{lightblue}\text{SymEntry}$ .
$\color{lightblue}\text{syms}\in\text{id}\mapsto\text{SymEntry}$
We have different types for Constants, Types, Variables and Procedures in our symbol table
$\color{lightblue}\small\text{SymEntry\ \ ::== ContEntry(T,}\Z\text{) | TypeEntry(T) | VarEntry(T) | ProcEntry(block)}$

From before, we have the following code:

const  C = 42;          // C has type int
type   S = [-C..C];     // S is the type subrange(int, -42, 42)
var    b : boolean;     // b has type ref(boolean)
       y : S;           // y has type ref(subrange(int, -42, 42)

And we’ve generated its declarations
$\color{lightblue}\text{ds}=\{C\mapsto\text{const(42),}\\ \text{S}\mapsto\text{type([op(-\_,C)..C]}\\ \text{b}\mapsto\text{var(boolean),}\\ \text{y}\mapsto\text{var(S)} \}$
We then generate the symbol table entries for the declarations
$\color{lightblue}\text{syms = \{C}\mapsto \text{ConstEntry(int, 42),}\\ \text{S}\mapsto\text{TypeEntry(subrange(int,-42,42)),}\\ \text{b}\mapsto\text{VarEntry(ref(boolean)),}\\ \text{y}\mapsto\text{VarEntry(ref(subrange(int,-42,42))} \}$

3.1 - Well-Typed Expressions

Expressions need to be well-typed and we need to be able to determine the type of an expression
We have type inferences which can be denoted as:
$\color{lightblue}\text{syms}\vdash e:T$
- Which essentially states that in the context of symbol table $\color{lightblue}\text{syms}$ , the expression $\color{lightblue}\text{e}$ is well-typed and has type $\color{lightblue}\text{T}$

3.2 - Rules of Static Semantics

3.2.1 - Integer Value

\color{lightblue}\text{syms}\vdash n : \text{int}

This rule has a conclusion without any premises.
Any expression $\color{lightblue}\text{n}$ , where $\color{lightblue}\text{n}$ is an integer is well typed, and has type $\color{lightblue}\text{int}$

Example: Type Inference for 27

$\color{lightblue}\text{syms}\vdash 27 : \text{int}$

3.2.2 - Symbolic Constant

\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id) = ConstEntry(T,v)}}\\ \text{syms}\vdash \text{id : T}

This is essentially saying that an identifier $\color{lightblue}\text{id}$ $id$ of type $\color{lightblue}\text{T}$ $T$ is well typed iff:
1. If the identifier $\color{lightblue}\text{id}$ is in the domain of the symbol table (i.e. $\color{lightblue}\text{id}\in\text{dom(syms)}$ ) and;
2. If the entry associated with that identifier $\color{lightblue}\text{syms(id)}$ is of type $\color{lightblue}\text{ConstEntry(T,v)}$

Example: Type Inference for Constant C

From before, we have the following entry in our symbol table
$\color{lightblue}\text{syms}=\{\cdots\text{,C}\mapsto\text{ConstEntry(int,42),}\cdots\}$
So we know that $\color{lightblue}\text{C}\in\text{dom(syms)}$ and additionally, from the symbol table entry we know that $\color{lightblue}\text{syms(C)=ConstEntry(int,42)}$ .
Therefore, $\color{lightblue}\text{syms}\vdash\text{C : int}$

3.2.3 - Variable Identifier

\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

This is essentially saying that an identifier $\color{lightblue}\text{id}$ $id$ of type $\color{lightblue}\text{T}$ $T$ is well typed iff:
1. If the identifier $\color{lightblue}\text{id}$ is in the domain of the symbol table (i.e. $\color{lightblue}\text{id}\in\text{dom(syms)}$ ) and;
2. If the entry associated with that identifier $\color{lightblue}\text{syms(id)}$ is of type $\color{lightblue}\text{VarEntry(T,v)}$

Example: Type Inference for b

From before, we have the following entry in our symbol table
$\color{lightblue}\text{syms}=\{\cdots\text{,b}\mapsto\text{VarEntry(ref(boolean)),}\cdots\}$

3.2.4 - Unary Negation

\color{lightblue}\underline{\text{syms}\vdash {\text{e : int}}}\\ \text{syms}\vdash \text{op(-\_,e) : int}

This is essentially saying that for the unary operator $\color{lightblue}-$ $-$ applied to an expression $\color{lightblue}\text{e}$ $e$ is well typed in the context of the symbol table $\color{lightblue}\text{syms}$ $syms$ and is of type $\color{lightblue}\text{int}$ $int$ iff:
- The expression $\color{lightblue}\text{e}$ is well typed in the context of the same symbol table and is of type integer

Example: Type Inference for -C

\color{lightblue}\underline{\text{syms}\vdash {\text{C : int}}}\\ \text{syms}\vdash \text{op(-\_,C) : int}

Therefore, we only need to show that $\color{lightblue}\text{syms}\vdash {\text{C : int}}$ to prove that $\color{lightblue}\text{syms}\vdash\text{op(-\_,C) : int}$ is well typed.
From before, we proved that the symbol with the declaration $\color{lightblue}C\mapsto\text{ConstEntry(int,42)}$ is type correct and therefore $\color{lightblue}\text{op(-\_,C)}$ is type correct.

3.2.5 - Binary Operator

\color{lightblue} \text{syms}\vdash\text{e1 : T1}\\ \text{syms}\vdash\text{e2 : T2}\\ \underline{\text{syms}\vdash\_\odot\_\ :\ T1\times T2\rightarrow T3}\\ \text{syms}\rightarrow \text{op}(\_\odot\_,\text{e1,e2)) : T3}

Note that the symbol $\color{lightblue}\odot$ denotes any binary operator.
We have the expression $\color{lightblue}\text{op(}\_\odot\_\text{,(e1,e2))}$ $op(_⊙_,(e1,e2))$ which essentially denotes the application of an arbitrary binary operator $\color{lightblue}\odot$ $⊙$ being applied to expressions $\color{lightblue}\text{e1, e2}$ $e1, e2$ is type correct, and has type $\color{lightblue}\text{T3}$ $T3$ iff
1. $\color{lightblue}e_1$ is well typed in the context of the symbol table $\color{lightblue}\text{syms}$ and has type $\color{lightblue}\text{T1}$
2. $\color{lightblue}e_2$ is well typed in the context of the symbol table $\color{lightblue}\text{syms}$ and has type $\color{lightblue}\text{T2}$
3. There exists a binary operator $\color{lightblue}\odot$ that takes in type $\color{lightblue}\text{T1,T2}$ and produces type $\color{lightblue}\text{T3}$

Example: Type Inference for -C + 27

We want to know whether $\color{lightblue}-C+27$ is well typed.
In this interpretation, $\color{lightblue}\odot$ is replaced with $\color{lightblue}+$
To know whether $\color{lightblue}-C+27$ is type correct, we essentially want to know if the following holds true
$\color{lightblue}\text{syms}\vdash \text{op(\_+\_,op(-\_,C),27)) : int}$
To do this, we can use the rule described above, and prove the following three things:
1. $\color{lightblue}\text{syms}\vdash\text{op(-\_,C) : int}$ $syms ⊢ op(-_,C) : int$ Is the first argument $\color{lightblue}\text{op(-\_,C)}$ $op(-_,C)$ or $\color{lightblue}\text{-C}$ $-C$ type correct, and is it an integer?
  - In the context of a symbol table with the following declaration:
    $\color{lightblue}\text{syms}=\{\cdots,\text{C}\mapsto\text{Constentry(int,42)},\cdots\}$
  - We can conclude that the statement is type correct by Unary Negation, and is in fact an integer
2. $\color{lightblue}\text{syms} \vdash27:\text{int}$ $syms ⊢ 27 : int$ Is the argument $\color{lightblue}\text{27}$ $27$ type correct, and is it an integer?
  - By Integer Value we can conclude that this statement is type correct and is in fact an integer
3. $\color{lightblue}\text{syms}\vdash\_+\_ : \text{int} \times \text{int}\rightarrow\text{int}$ $syms ⊢_+_: int \times int \to int$ In the context of our symbol table $\color{lightblue}\text{syms}$ $syms$ , is addition well typed, takes input $\color{lightblue}\text{int}\times\text{int}$ $int \times int$ and produce a value of type $\color{lightblue}\text{int}$ $int$
  - This holds true by definition of the addition binary operator.

3.2.6 - Dereference

🌱 The dereference rule allows us to treat an expression of type $\color{lightblue}\text{ref(T)}$ as an expression of type $\color{lightblue}\text{T}$ itself.

\color{lightblue}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

In the context of a symbol table $\color{lightblue}\text{syms}$ , an expression $\color{lightblue}\text{e}$ is well typed (& can be treated as if it was of type $\color{lightblue}\text{T}$ ) iff the expression $\color{lightblue}\text{e}$ is well typed, and of type $\color{lightblue}\text{ref(T)}$ in the context of the symbol table $\color{lightblue}\text{syms}$

Example: Type Inference for y

\color{lightblue}\underline{\text{syms}\vdash\text{y : ref(T)}}\\ \text{syms}\vdash\text{y : T}

Recall that we have an entry in our symbol table:
$\color{lightblue}\text{syms}=\{\cdots\text{, y}\mapsto\text{VarEntry(ref(subrange(int,-42,42))),}\cdots\}$
And therefore, since $\color{lightblue}\text{y}$ is well typed and of type $\color{lightblue}\text{ref(subrange(...))}$ so we can reason that $\color{lightblue}\text{y}$ is wel typed, and can be treated as type $\color{lightblue}\text{subrange(...)}$ as a result of the premise $\color{lightblue}\text{syms}\vdash\text{ref(subrange(int,-42,42))}$ holding.
We know that this premise holds in the context of the symbol table $\color{lightblue}\text{syms}$ by the Variable Identifier rule.

3.2.7 - Widen Subrange

\color{lightblue}\underline{\text{syms} \vdash e :\text{subrange(T,i,j)}}\\ \text{syms}\vdash e : T

Suppose we had a subrange of type $T$ from $i$ to $j$ . We can treat this as a variable of type $T$

Proof for Type Inference for y - Widen Subrange

y\in\text{dom(syms)}\\ \underline{\text{syms(id)}=\text{VarEntry(ref(subrange(int, -42, 42)))}}\\ \underline{\text{syms}\vdash \text{y : ref(subrange(int, -42, 42))}}\\ \underline{\text{syms}\vdash\text{y : subrange(int, -42, 42)}}\\ \text{syms}\vdash\text{y : int}

Step 1: Use Variable Identifier Rule

Since $\color{lightblue}y\in\text{dom(syms)}$ , we can modify the expression from $\color{lightblue}\text{syms(id)}=\text{VarEntry(ref(subrange(int,-42,42))}$ to $\color{lightblue}\text{syms}\vdash \text{y:ref(subrange(int,-42,42))}$

\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

Step 2: Use Dereference Rule

Since we have a statement of the form $\color{lightblue}\text{syms}\vdash \text{e:ref(T)}$ we can use the Dereference rule to modify the expression from $\color{lightblue}\text{syms}\vdash \text{y:ref(subrange(int,-42,42))}$ to $\color{lightblue}\text{syms}\vdash \text{y:subrange(int,-42,42)}$

\color{lightblue}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

Step 3: Use Widen Subrange Rule

Since we have a statement of the form $\color{lightblue}\text{syms}\vdash \text{e : subrange(T,i,j)}$ we can modify the expression from $\color{lightblue}\text{syms}\vdash \text{y:subrange(int,-42,42)}$ to $\color{lightblue}\text{syms}\vdash \text{y : int}$

\color{lightblue}\underline{\text{syms} \vdash e :\text{subrange(T,i,j)}}\\ \text{syms}\vdash e : T

3.2.8 - Narrow Subrange

\color{lightblue}\text{syms}\vdash\text{e : T}\\ i \le j\\ \underline{T\in\{\text{int, boolean}\}}\\ \text{syms}\vdash\text{e : subrange(T,i,j)}

Given that $\color{lightblue}\text{e}$ is well typed and of type $\color{lightblue}\text{T}$ , where $\color{lightblue}\text{T}\in\{\text{int, boolean}\}$ and $\color{lightblue}i \le j$ , then we can say that $\color{lightblue}\text{syms}\vdash\text{e : subrange(T,i,j)}$

Example: Type Inference for 27

\color{lightblue}\text{syms}\vdash\text{27 : int}\\ -42 \le42\\ \underline{\text{int}\in\{\text{int, boolean}\}}\\ \text{syms}\vdash\text{27 : subrange(int, -42, 42)}

Suppose we were trying to argue that the integer 50 was in the subrange of -42 to 42
- This wouldn’t cause a static semantics error - we treat it as a runtime error.
- We treat it as a runtime error as the integer that we’re trying to coerce into the subrange may not be a constant value and we therefore cannot guarantee the value.

3.3 - Well-Formed Statements

Statements need to be well-formed
Our type inferences will be of the form:
$\color{lightblue}\text{syms}\vdash\text{WFStatement(s)}$
- This inference states that in the context of the symbol table $\color{lightblue}\text{syms}$ , the statement $\color{lightblue}\text{s}$ is well-formed.

3.3.1 - Assignment

🌱 How do we know whether an assignment is well formed?

That is, how do we prove that $\color{lightblue}\text{assign(lv,e)}$ is well formed?
In the context of the symbol table $\color{lightblue}\text{syms}$ , we require:
- The left value $\color{lightblue}\text{lv}$ to be well typed, and has to be a reference to some type $\color{lightblue}\text{T}$ , $\color{lightblue}\text{ref(T)}$
- The expression $\color{lightblue}\text{e}$ to be well typed in the context of the symbol table, and have type $\color{lightblue}\text{T}$
$\color{lightblue}\text{syms}\vdash\text{lv : ref(T)}\\ \underline{\text{syms}\vdash\text{e : T}}\\ \text{syms}\vdash\text{WFStatement(assign(lv,e))}$

Example - Type Inference for y := 4

\color{lightblue}\text{syms}\vdash\text{y : ref(subrange(int, -42,42))}\\ \underline{\text{syms}\vdash\text{4 : subrange(int,-42,-42)}}\\ \text{syms}\vdash\text{WFStatement(assign(y,4))}

In this example, we instantiate $\color{lightblue}T$ to be the subrange from $\color{lightblue}(-42,42)$
For the statement to be well formed, we also require that $\color{lightblue}4$ is also of the same type

We can reason that the inference $\color{lightblue}\text{syms}\vdash\text{4 : subrange(int,-42,42)}$ is correct based on the following:

We can use the Narrowing Subrange rule to justify the inference:
- Since $\color{lightblue}-42\le42$ and $\color{lightblue}y$ is an $\color{lightblue}\text{int}$ we can reason that $\color{lightblue}4$ is well typed and is contained within the subrange type $\color{lightblue}\text{subrange(int, -42, 42)}$

\footnotesize\color{lightblue}\text{syms}\vdash\text{e : T}\\ i \le j\\ \underline{T\in\{\text{int, boolean}\}}\\ \text{syms}\vdash\text{e : subrange(T,i,j)}

We can reason that the inference $\color{lightblue}\text{syms}\vdash\text{y : ref(subrange(int, -42,42))}$ based on the following:

We can use the Variable Identifier rule to justify the inference:
- Since $\color{lightblue}y\in\text{dom(syms)}$ and $\color{lightblue}\text{y}$ has a VarEntry of the subrange type:

\scriptsize\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

\color{lightblue}\text{y}\in\text{dom(syms)}\\ \underline{\text{syms(y)}=\text{VarEntry(ref(subrange(int,-42,42)))}}\\ \text{syms}\vdash\text{y : ref(subrange(int,-42,42))}

3.3.2 - Procedure Call

\color{lightblue}\text{id}\in\text{dom(syms)}\\ \underline{\text{syms(id)=ProcEntry(block)}}\\ \text{syms}\vdash\text{WFStatement(call(id))}

A procedure call will be a well formed statement in the context of the symbol table as long as:
1. $\color{lightblue}\text{id}\in\text{dom(syms)}$ We call an identifier that is in the domain of the symbol table
2. $\color{lightblue}\text{syms(id)=ProcEntry(block)}$ The identifier is a procedure entry.

3.3.3 - Read

🌱 In other programming languages, read and write wouldn’t be primitives, but we do this in PL0 for simplicity.

\color{lightblue}\text{syms}\vdash\text{lv : ref(T)}\\ \underline{\text{T=int}\vee\text{T=subrange(int,i,j)}}\\ \text{syms}\vdash\text{WFStatement(read(lv))}

$\color{lightblue}\text{read(lv)}$ $read(lv)$ is well formed in the context of the symbol table iff:
- $\color{lightblue}\text{lv : ref(T)}$ The left value is a reference type (i.e. something that we can assign to)
- $\color{lightblue}\text{T=int}\vee\text{T=subrange(int,i,j)}$ The value read has a type of either an integer or subrange of integers.

3.3.4 - Write

\color{lightblue}\text{syms}\vdash\text{e : int}\\ \overline{\text{syms}\vdash\text{WFStatement(write(e))}}

$\color{lightblue}\text{write(e)}$ $write(e)$ will be well formed in the context of the symbol table iff:
1. $\color{lightblue}\text{e}$ is of type $\color{lightblue}\text{integer}$
2. $\color{lightblue}\text{e}$ is well typed

3.3.5 - Conditional Rule

\footnotesize\color{lightblue}\text{syms}\vdash\text{e : boolean}\\ \text{syms}\vdash\text{WFStatement(s1)}\\ \text{syms}\vdash\text{WFStatement(s2)}\\ \overline{\text{syms}\vdash\text{WFStatement(if(e,s1,s2))}}

For a conditional (denoted $\color{lightblue}\text{if(e,s1,s2)}$ $if(e,s1,s2)$ ) to be well formed, we require:
- The condition $\color{lightblue}\text{e}$ to be well-typed, and return a boolean type in the context of the symbol table
- The statements $\color{lightblue}\text{s1,s2}$ to be well formed statements in the context of the symbol table.

Example - Well Formed Conditional

🌱 Show that if x < 0 then y := -x else y := x is a well formed conditional statement.

From this statement, we can generate the following:
$\color{lightblue}\text{syms}\vdash\text{op(\_<\_,(x,0)) : boolean}\\ \text{syms}\vdash\text{WFStatement(assign(y,op(-\_,x)))}\\ \text{syms}\vdash\text{WFStatement(assign(y,x))}\\ \overline{\text{syms}\vdash\text{WFStatement(if(op(\_<\_,(x,0)),assign(y,op(-\_,x)),assign(y,x)))}}$
From our symbol table, we know that:
$\color{lightblue}\text{syms}=\{\cdots,\\\text{x}\mapsto\text{VarEntry(ref(int))}\\ \text{y}\mapsto\text{VarEntry(ref(int))},\\\cdots\}$

Firstly, we prove that the statement $\color{lightblue}\text{op(\_<\_,(x,0))}$ is well formed:

That is: $\color{lightblue}\text{syms}\vdash\text{op(\_<\_,(x,0)) : boolean}$

From the Variable Identifier rule, we can infer that:

\footnotesize\color{lightblue}\text{x}\in\text{dom(syms)}\\ \text{syms(x)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ x : ref(int)}}

\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

We know that $\color{lightblue}\text{x}$ is in the domain of our symbol table as it is inherently in the symbol table:
$\color{lightblue}\text{syms}=\{\cdots,\text{x}\mapsto\text{VarEntry(ref(int)), }\cdots\}$

From the Dereference rule, we can infer that:

$\footnotesize\color{lightblue}\underline{\text{syms}\vdash\text{x : ref(int)}}\\\text{syms}\vdash\text{ x : int}$

\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

From the Integer Value rule, we can inherently infer that:

$\footnotesize\color{lightblue}\text{syms}\vdash\text{0 : int}$

$\color{gray}\text{syms}\vdash n : \text{int}$

From the Binary Operator rule, we can inherently infer that:

$\footnotesize\color{lightblue}\text{syms}\vdash\text{(op(\_<\_,(x,0)) : boolean)}$

Secondly, we prove that the statement $\color{lightblue}\text{assign(y,-x)}$ is well formed:

That is: $\color{lightblue}\text{syms}\vdash\text{WFStatement(assign(y,-x))}$

From the Variable Identifier rule, we can infer that:

\footnotesize\color{lightblue}\text{y}\in\text{dom(syms)}\\ \text{syms(y)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ y : ref(int)}}

\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

This is inherently true as y is in the domain of the symbol table BECAUSE it is in the symbol table.

We can also do the same for the variable $\color{lightblue}\text{x}$

\footnotesize\color{lightblue}\text{x}\in\text{dom(syms)}\\ \text{syms(x)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ x : ref(int)}}

\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

From the Dereference rule, we can infer that:

$\footnotesize\color{lightblue}\underline{\text{syms}\vdash\text{x : ref(int)}}\\\text{syms}\vdash\text{ x : int}$

\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

From the Unary Negation rule, we can infer that:

\footnotesize\color{lightblue}\text{syms}\vdash\text{x : int}\\ \overline{\text{syms}\vdash\text{op(-\_,x) : int}}

\footnotesize\color{gray}\text{syms}\vdash\text{e : int}\\ \overline{\text{syms}\vdash\text{op(-\_,e) : int}}

And then combining Step 1 and Step 4 using the Statement Assignment rule.

\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{op(-\_,x) : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,op(-\_, x)))}}

Thirdly, we prove the statement $\color{lightblue}\footnotesize\text{assign(y,x)}$ is well formed:

That is: $\color{lightblue}\text{syms}\vdash\text{WFStatement(assign(y,x))}$

From the Variable Identifier rule, we can infer that:

\footnotesize\color{lightblue}\text{y}\in\text{dom(syms)}\\ \text{syms(y)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ y : ref(int)}}

\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

This is inherently true as y is in the domain of the symbol table BECAUSE it is in the symbol table.

We can also do the same for the variable $\color{lightblue}\text{x}$

\footnotesize\color{lightblue}\text{x}\in\text{dom(syms)}\\ \text{syms(x)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ x : ref(int)}}

\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

From the Dereference rule, we can infer that:

$\footnotesize\color{lightblue}\underline{\text{syms}\vdash\text{x : ref(int)}}\\\text{syms}\vdash\text{ x : int}$

\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

And then combining Step 1 and Step 3 using the Statement Assignment rule.

\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{x : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,x))}}

Finally, putting it all together:

\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{op(-\_,x) : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,op(-\_, x)))}}

\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{x : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,x))}}

$\footnotesize\color{lightblue}\text{syms}\vdash\text{(op(\_<\_,(x,0)) : boolean)}$

Combining all of these conclusions using the Conditional Statement rule, we get:

\footnotesize\color{lightblue}\text{syms}\vdash\text{(op(\_<\_,(x,0)) : boolean)}\\ \text{syms}\vdash\text{WFStatement(assign(y,op(-\_, x)))}\\ \text{syms}\vdash\text{WFStatement(assign(y,x))}\\ \overline{\text{syms}\vdash\text{WFStatement(if(op(\_<\_,(x,0), assign(y,op(-\_,x)),assign(y,x)))}}

3.3.6 - Iteration (While Rule)

🌱 For a while loop, if the expression is well formed in the context of the symbol table and is a boolean AND if the statement is well formed, then the while statement will be well formed.

\color{lightblue}\footnotesize \text{syms}\vdash\text{e : boolean}\\ \text{syms}\vdash\text{WFStatement(s)}\\ \overline{\text{syms}\vdash\text{WFStatement(while(e,s))}}

3.3.7 - Statement List

🌱 For a statement list, if all statements are well formed in the context of the symbol table, then the statement list will be well formed.

\footnotesize\color{lightblue} \underline{\forall\text{s}\in\text{elems(ls)}\ \bullet\ \text{(syms }\vdash \text{ WFStatement(s))}}\\ \text{syms}\vdash\text{WFStatement(list(ls))}

4.0 - Implementation of Static Semantic Checking

🌱 The PL0 Static Checker is implemented in the tree package.

4.1 - Abstract Syntax Tree Classes

The tree package contains all of the abstract syntax tree classes.
- The StatementNode class implements all of the statement classes
  $\color{lightblue}\footnotesize\text{s\ \ \ \ ::== assign(lv,e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | write(e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | read(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | call(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | if(e,s,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | while(e,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | list(seq s)}\\$
- The ExpNode class implements all of the expression classes
  $\footnotesize\color{lightblue}\\\text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}\\$

4.2 - Symbol Table Implementations

The syms package contains all of the data type implementations from the symbol table.
- Type is an abstract class which other classes can use to implement their own desired behaviour.
- E.g. ScalarType, ProductType, ReferenceType, FunctionType
- In the Predefined class, we configure all of our predefined types and symbols in our symbol table.
  - E.g. Scalars - Integers and Booleans
  - Predefined operators (the actual symbol is defined in tree/Operator.java as enum)

predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION, LOGICAL_BINARY);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION, LOGICAL_BINARY);
predefined.addOperator(Operator.NEG_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_UNARY);
predefined.addOperator(Operator.ADD_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.SUB_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.MUL_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.DIV_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GREATER_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LESS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);

Notice that our addOperator function takes three inputs:
- Note that when we define these, we use the ErrorHandler.NO_LOCATION as a location - they’re predefined and thus are not defined in the source code.

SymEntry.OperatorEntry addOperator(Operator op, Location loc, Type.FunctionType type);

For our arithmetic operators, we define a FunctionType above:

FunctionType ARITHMETIC_BINARY = new FunctionType(PAIR_INTEGER_TYPE, INTEGER_TYPE);

4.3 - Static Checker Implementation

The static checker traverses the abstract syntax tree and type checks the tree as it traverses through
- As we traverse the AST, we may be updating it with type information
- The tree traversal is done using the Visitor pattern

Consider the following code, which is essentially an abstraction and simplification of an AST

public abstract class Tree {
  public abstract void accept(Visitor v);
  public static class ATree extends Tree {
    // Tree class with two children
    Tree t1, t2;
    
    Public ATree(Tree t1, Tree t2) {
      super();  this.t1 = t1;  this.t2 = t2;
    }

    public void accept(Visitor v) {  v.visitATree(this);  }
  }

  public static class BTree extends Tree {
    // A leaf node of the tree (i.e. has no subtrees)
    int n;

    public BTree(int n) {
      super();  this.n = n;
    }

    public void accept(Visitor v) {  v.visitBTree(this);  }
  }

  public static class CTree extends Tree {
    // Tree subclass with three children
    Tree t1,t2,t3;

    public CTree extends Tree {
      public CTree(Tree t1, Tree t2, Tree t3) {
        super();  this.t1 = t1;  this.t2 = t2;  this.t3 = t3;
      }
    }
  }
}

4.3.1 - Visitor Pattern

We would like to create the visit methods shown here to implement tree traversal using the visitor pattern.
Create individual methods for visiting ATree, BTree and CTree
Ideally, we’d just like to do something like this:

public class TraversalIdeal {
  public void visit(Tree.ATree t) {
    visit(t.t1);
    visit(t.t2);
  }
  public void visit(Tree.BTree t) {
     System.out.println(t.n);
  }
  public void visit(Tree.CTree t) {
    visit(t.t1);
    visit(t.t2);
    visit(t.t3);
  }
}

However, we don’t know actually know the (dynamic) type of t.tn - we know that it’s a Tree (static type), but we don’t know what implementation of Tree it is.
- The compiler is looking for a method of the form:
```
public void visit(Tree t) { ... }
```

We can solve this by using the visitor pattern - we create an interface that has a traversal method for each tree subtype.

public interface Visitor {
  public void visitATree(Tree.ATree t);
	public void visitBTree(Tree.BTree t);
	public void visitCTree(Tree.CTree t);
}

We then add an accept method in each of our tree implementations:

public abstract class Tree {
  public abstract void accept(Visitor v);
  public static class ATree extends Tree {
		Tree t1,t2;
    public ATree(Tree t1, Tree t2){ super(); this.t1 = t1; this.t2 = t2; }
    public void accept(Visitor v) { v.visitATree(this); }
  }
}

And then our Traversal class implements these methods:

public class Traversal implements Visitor {
  @Override
  public void visitATree(Tree.ATree t) {
    // Traversal using the aceept() method instead of visit() method
    // Dynamically dispatches the call to the right visitXTree method.
		t.t1.accept(this);  t.t2.accept(this); 
	}
}