Week 3.2

CourseCompilers & Interpreters
SemesterS1 2022

Week 3.2

🏷️ Replace this section with some keywords that can be used for searching, tagging and the indexing of this page.

Sections

🌱 Links to all Heading1 Blocks in this page.

  1. Recursive Descent Parsing

  2. Static Semantics of PL0

  3. Symbol Table

    1. Well-Typed Expressions
    2. Static Semantic Rules
    3. Well-Formed Rules
  4. Implementation of Static Semantic Checking

1.0 - Recursive Descent Parsing

1.1 - Tutorial 4 Recursive Descent Parser Code

🌱 So we’ve got some Java code that can parse PL0 code. How do we get it to actually parse the code?

  • The src/parse/Parser.java file contains the definition for the Parser class.

  • When opened in IntelliJ, we have a run configuration that is configured to run on the file that is currently open.

  • We are essentially running the PL0 Recursive Descent Parser, pl0.PL0_RD on the file that’s currently open (denoted by $FilePath$) in the current directory $MODULE_WORKING_DIRS$

  • Suppose we open the test-base0-abs.pl0 file in the test-pgm directory of the project root and compile it:

Untitled

Compiling test-base0-abs.pl0
Parsing complete
Static semantic analysis complete
Running ...
100
var x: int;
    y: int;
begin
  x := -100;
  if x < 0 then y := -x else y := x;
  write y
end
  • Parsing complete indicates that the program has no syntactical errors.
  • Static semantic analysis complete indicates that there are no type checking errors

🌱 The PL0 Recursive Descent Parser implemented here is using the error recovery strategy described in the previous lecture. What would the output be if we intentionally mess up the code?

  • Suppose in the source code we replace the assignment operator := with =
var x: int;
    y: int;
begin
  x = -100; // Replaced :=  with =
  if x < 0 then y := -x else y := x;
  write y
end
Compiling test-base0-abs.pl0
    4   x = -100;
*****     ^  Error: Parse error, expecting ':=' in Assignment
Parsing complete
Static semantic analysis complete
1 error detected.
  • What if we had two errors in our source code:
    1. The replacement from before
    2. The extra then keyword
var x: int;
    y: int;
begin
  x = -100; // Replaced :=  with =
  if x < 0 then then y := -x else y := x; // Extra then keyword.
  write y
end
Compiling test-base0-abs.pl0
    4   x = -100;
*****     ^  Error: Parse error, expecting ':=' in Assignment
    5   if x < 0 then then y := -x else y := x;
*****                 ^ Error: 'then' cannot start a statement.
Parsing complete
Static semantic analysis complete
2 errors detected.

1.2 - Tutorial 4 PL0 Parser Implementation

🌱 Specifically looking at the implementation of the PL0 parser for Expressions and Statements.

  • This parser is based on the error recovery scheme that was discussed in the previous lecture.
  • To implement this, we have a series of parse methods that perform the synchronisation for us at the start and end of each parse method for Expressions, Statements etc.
private final ParseMethod exp = new ParseMethod<>(
    (Location loc) -> new ExpNode.ErrorNode(loc));
  • Configure exp.parse to return an ExpNode
  • If it encounters an error during the synchronisation phase, return an ExpNode.ErrorNode to indicate that there was a problem.

1.2.1 - Constant Declarations

🌱 Create constants that represent the start sets for our non-terminal symbols.

/**
 * Set of tokens that may start an LValue.
 */
private final static TokenSet LVALUE_START_SET =
        new TokenSet(Token.IDENTIFIER);
private final static TokenSet FACTOR_START_SET =
        LVALUE_START_SET.union(Token.NUMBER, Token.LPAREN);
private final static TokenSet TERM_START_SET =
        FACTOR_START_SET;
private final static TokenSet EXP_START_SET =
        TERM_START_SET.union(Token.PLUS, Token.MINUS);
private final static TokenSet REL_CONDITION_START_SET =
        EXP_START_SET;
private final static TokenSet CONDITION_START_SET =
        REL_CONDITION_START_SET;

1.2.2 - ParseRelCondition

private ExpNode parseRelCondition(TokenSet recoverSet) {
    return exp.parse("RelCondition", REL_CONDITION_START_SET, recoverSet,
            () -> {
                /* The current token is in REL_CONDITION_START_SET */
                ExpNode cond = parseExp(recoverSet.union(REL_OPS_SET));
                if (tokens.isIn(REL_OPS_SET)) {
                    Location loc = tokens.getLocation();
                    Operator operatorCode =
                            parseRelOp(recoverSet.union(EXP_START_SET));
                    ExpNode right = parseExp(recoverSet);
                    cond = new ExpNode.BinaryNode(loc, operatorCode, cond, right);
                }
                return cond;
            });
}
  1. Note that TokenSet recoverSet is passed as a parameter - this is required as we’re doing error recovery.

  2. Use exp.parse(...) to perform the synchronisation at the start and end of the parsing for us. The parameters to the method are:

    1. Name This is used in the debugger to print out more meaningful messages of what failed.
    2. StartSet The set of tokens from which a RelCondition can start with
    3. RecoverSet The set of tokens from which a RelCondition can start with
    4. Anonymous Function A function that does parsing for the RelCondition
    5. The Anonymous Function is used to define the behaviour of the parser for a RelCondition (in this case)
  3. Since we want to build up the AST representation of the code that we’re parsing, we want to return an ExpNode

    • So rather than using parse(...) we use return exp.parse(...)

    • A RelCondition is defined by the following production:

      RelConditionExp [RelOp Exp]\text{RelCondition}\rightarrow\text{Exp [RelOp Exp]}

    • If the RelCondition is just an Expression, we can just use the parse method and return it

      ExpNode cond = parseExp(recoverSet.unino(REL_OPS_SET));
      ...
      return cond;
      
    • Otherwise, we have to parse the [RelOp Exp]\text{[RelOp Exp]} component of the production

      ExpNode cond = parseExp(recoverSet.union(REL_OPS_SET));
      if (tokens.isIn(REL_OPS_SET)) { // I.e. are we at the start of a RelOp
          Location loc = tokens.getLocation(); // Required for BinaryNode constructor
          Operator operatorCode =
                  parseRelOp(recoverSet.union(EXP_START_SET));
          ExpNode right = parseExp(recoverSet);
          cond = new ExpNode.BinaryNode(loc, operatorCode, cond, right);
      }
      return cond;
      
      • The parseRelOp operation is defined below here
      • Note that the recoverSet is anything that a RelCondition can start with, and the set of tokens that can follow what we’re parsing
        • So when we’re parsing the first Expression in the production, the recoverSet added with the RelOp start set is what we use as the recover set in this instance.
        • When we’re parsing the RelOp, the recoverSet added with the Expression start set is what we use as the recover set in this instance.
        • When we’re parsing the final expression, the recoverSet is just the set of tokens that the production can start with (as there is nothing that follows)
      • After we finish parsing the tokens in the production, we can create the BinaryNode and return it.

1.2.3 - ParseRelOp

🌱 Note that the parseRelOp method doesn’t return an ExpNode - it returns an instance of Operator.

  • The parseRelOp method must parse the following production

    RelOpEQUALS | NEQUALS | LESS | GREATER | LEQUALS | GEQUALS\small\text{RelOp}\rightarrow\text{EQUALS | NEQUALS | LESS | GREATER | LEQUALS | GEQUALS}

  • To do this, we initialise a parse method that returns an Operator:

    private final ParseMethod op = new ParseMethod<>(
                (Location loc) -> Operator.INVALID_OP);
    
    • Note here that if there’s an error, we return an Operator.INVALID_OP instead of an ExpNode.ErrorNode
    • Also note that the tree package contains all implementations of the nodes used in an Abstract Syntax Tree (AST), such as ExpNode and StatementNode
  • We then define the parseRelOp method itself

    private Operator parseRelOp(TokenSet recoverSet) {
        return op.parse("RelOp", REL_OPS_SET, recoverSet,
                () -> {
                    Operator operatorCode = Operator.INVALID_OP;
                    switch (tokens.getKind()) {
                        case EQUALS:
                            operatorCode = Operator.EQUALS_OP;
                            tokens.match(Token.EQUALS); /* cannot fail */
                            break;
                        case NEQUALS:
                            operatorCode = Operator.NEQUALS_OP;
                            tokens.match(Token.NEQUALS); /* cannot fail */
                            break;
                        case LESS:
                            operatorCode = Operator.LESS_OP;
                            tokens.match(Token.LESS); /* cannot fail */
                            break;
                        case GREATER:
                            operatorCode = Operator.GREATER_OP;
                            tokens.match(Token.GREATER); /* cannot fail */
                            break;
                        case LEQUALS:
                            operatorCode = Operator.LEQUALS_OP;
                            tokens.match(Token.LEQUALS); /* cannot fail */
                            break;
                        case GEQUALS:
                            operatorCode = Operator.GEQUALS_OP;
                            tokens.match(Token.GEQUALS); /* cannot fail */
                            break;
                        default:
                            // If we get to here, there has been a fatal implementation 
                            // error in the cmocompiler.
                            fatal("parseRelOp");
                    }
                    return operatorCode;
                });
    }
    
    • Note here that we could choose conditional statements to implement the choice in this production
      • However, here we use a switch statement as all of the branch have a start set that contains just one token.
      • If we get to the default: case, there is a fatal error, as these are all of the possible terminal symbols for the RelOp non-terminal symbol.

1.3 - StatementNodes

🌱 Parsing Statements

  • As before, we define a ParseMethod that returns a StatementNode

    private final ParseMethod stmt = new ParseMethod<>(
            (Location loc) -> return new StatementNode.ErrorNode(loc));
    
    • Returns a StatementNode.ErrorNode if an error is encountered when parsing

1.3.1 - Parsing a Conditional Statement / If-Statement

private StatementNode parseIfStatement(TokenSet recoverSet) {
  return stmt.parse("If Statement", Token.KW_IF, recoverSet,
      () -> {
          /* The current token is KW_IF */
          tokens.match(Token.KW_IF); /* cannot fail */
          Location loc = tokens.getLocation();
          ExpNode cond = parseCondition(recoverSet.union(Token.KW_THEN));
          tokens.match(Token.KW_THEN, STATEMENT_START_SET);
          StatementNode thenClause =
              parseStatement(recoverSet.union(Token.KW_ELSE));
          tokens.match(Token.KW_ELSE, STATEMENT_START_SET);
          StatementNode elseClause = parseStatement(recoverSet);
          return new StatementNode.IfNode(loc, cond, thenClause, elseClause);
      });
}
  • We use the stmt.parse(...) method to perform synchronisation at both the start and end of the parsing

  • Supply as arguments

    1. Name of rule,
    2. Start set
    3. Recover set (provided as parameter)
    4. Anonymous parsing function
  • In the anonymous parsing function, we want to parse the production

    IfStatementKW_IF Condition KW_THEN Statement KW_ELSE Statement\scriptsize\text{IfStatement}\rightarrow\text{KW\_IF Condition KW\_THEN Statement KW\_ELSE Statement}

    • This line cannot fail as it should only be called if the current token is Token.KW_IF
      • Therefore, don’t need to use the error recovery version.
    tokens.match(Token.KW_IF); // Match and consume
    
    • We then store the current location for later.
      • This is done, as we essentially want to treat the condition of the if statement as the location of the Condition itself (instead of the location of Token.KW_IF)
    Location loc = tokens.getLocation();
    
    • Following this, we parse the condition itself.
      • The recoverSet in this case is anything that the if statement can start with and the next token in the production (KW_THEN since we’re parsing Condition)
    ExpNode cond = parseCondition(recoverSet.union(Token.KW_THEN));
    
    • We then parse the KW_THEN token - we need to use the version with error recovery, as we can’t guarantee that the next token is KW_THEN.
      • The set of tokens that can immediately follow KW_THEN are contained in the TokenSet, STATEMENT_START_SET
    tokens.match(Token.KW_THEN, STATEMENT_START_SET);
    
    • After we parse the KW_THEN token, we parse the statement immediately following it (the then clause).
      • We store this so that we can use it to build up our AST
    StatementNode thenClause = parseStatement(recoverSet.union(Token.KW_ELSE));
    
    • Next, we’re expecting the KW_ELSE token but we’re not sure so we use the error recovery version of tokens.match
    tokens.match(Token.KW_ELSE, STATEMENT_START_SET);
    
    • After we parse the KW_ELSE token we parse the statement immediately following it (the then clause).
      • Here, the recoverSet is just the recoverSet parameter passed in as we are at the end of the production
    StatementNode elseClause = parseStatement(recoverSet);
    
    • We then construct the StatementNode.IfNode object and return that as part of our AST
    return new StatementNode.IfNode(loc, cond, thenClause, elseClause);
    });
    

1.3.2 - Parsing a While Statement

  • From the parseStatement function, we have:

    private StatementNode parseStatement(TokenSet recoverSet) {
        return stmt.parse("Statement", STATEMENT_START_SET, recoverSet,
            () -> {
                /* The current token is in STATEMENT_START_SET.
                 * Instead of using a cascaded if-the-else, as indicated in
                 * the recursive descent parsing notes, a simpler approach
                 * of using a switch statement can be used because the
                 * start set of every alternative contains just one token. */
                switch (tokens.getKind()) {
                    case IDENTIFIER:
                        return parseAssignment(recoverSet);
                    case KW_WHILE:
                        return parseWhileStatement(recoverSet);
                    ...
                    default:
                        fatal("parseStatement");
                        // To keep the Java compiler happy - can't reach here
                        return new StatementNode.ErrorNode(tokens.getLocation());
                }
            });
    }
    
  • So when we call parseWhileStatement we know that the token is Token.KW_WHILE.

  • We still perform synchronisation so that (a) the format is consistent (b) synchronisation is performed at the end.

  • The production for a While Statement is given as:

    WhileStatementKW_WHILE Condition KW_DO Statement\scriptsize\text{WhileStatement}\rightarrow\text{KW\_WHILE Condition KW\_DO Statement}

    private StatementNode parseWhileStatement(TokenSet recoverSet) {
        return stmt.parse("While Statement", Token.KW_WHILE, recoverSet,
                () -> {
                    /* The current token is KW_WHILE */
                    tokens.match(Token.KW_WHILE); /* cannot fail */
                    Location loc = tokens.getLocation();
                    ExpNode cond = parseCondition(recoverSet.union(Token.KW_DO));
                    tokens.match(Token.KW_DO, STATEMENT_START_SET);
                    StatementNode statement = parseStatement(recoverSet);
                    return new StatementNode.WhileNode(loc, cond, statement);
                });
    }
    

1.4 - Types of AST Nodes

1.4.1 - Expression Nodes

  • The ExpNode class in the tree package implements subclasses for each of the subtypes of ExpNodes
    • ErrorNode, ConstNode, IdentifierNode, VariableNode, BinaryNode, UnaryNode, DereferenceNode, NarrowSubrangeNode, WidenSubrangeNode

    • Each of these nodes have a location and type as well as their own individual parameters

      • For example, a BinaryNode has the constructor:

        public BinaryNode(Location loc, Operator op, ExpNode left, ExpNode right);
        
    • We define more constant declarations that will help us with parsing statements:

      /**
       * Set of tokens that may start a Statement.
       */
      private final static TokenSet STATEMENT_START_SET =
              LVALUE_START_SET.union(Token.KW_WHILE, Token.KW_IF,
                      Token.KW_READ, Token.KW_WRITE,
                      Token.KW_CALL, Token.KW_BEGIN);
      

2.0 - Static Semantics of PL0

🌱 If implemented correctly, the static semantic checker of our compiler should pick up on all of the type errors.

  • Consider the following code - it is syntactically correct but it is not semantically / type correct.

    const C = 42;
    type  S = [-C..C];
    var   b : boolean;
          y : S;
    begin // main
      y := b + 42;   // Addition between boolean and integer
      C := 27;       // Assigmment to a constant
      if y then y := 0 else y := 1 // y as condition (subrange type) is not boolean
    end
    

2.1 - PL0 Concrete vs Abstract Syntax

🌱 Our static semantics rules determine what we allow (and conversely, what we don’t allow) in our programming language grammar

  • The parsing phase of the compiler parses the concrete syntax of the program and generates an abstract syntax tree
  • The (concrete) syntax of a programming language describes the form of the language when it is treated as a sequence of terminal symbols (lexical tokens)
  • The abstract syntax of a language represents the same constructs as a data structure, an abstract syntax tree
    • All static semantics rules are expressed using the abstract syntax of the language

2.1.1 - Abstract Syntax of PL0

program    ::==    blockblock    ::== blk(ds,s)ds    ::== iddd    ::==    const(c)              | type(t)            | var(t)                     | proc(block)c    ::== n | id | op(-_,c)t    ::==    id | [c..c]s    ::== assign(lv,e)                 | write(e)                 | read(lv)               | call(lv)               | if(e,s,s)                   | while(e,s)                   | list(seq s)lv    ::==    ide    :==    n | lv | op(unary, e) | op(binary, (e,e))unary    ::==    -_binary    ::==    _+_ | _-_ | _*_ | _/_ | _=_ | __ | _<_ | __ | _>_ | __\text{program\ \ \ \ ::==\ \ \ \ block}\\ \text{block\ \ \ \ ::== blk(ds,s)}\\ \text{ds\ \ \ \ ::== id}\mapsto \text{d}\\ \text{d\ \ \ \ ::==\ \ \ \ const(c)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ | type(t)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ | var(t)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | proc(block)}\\ \text{c\ \ \ \ ::== n | id | op(-\_,c)}\\ \text{t\ \ \ \ ::==\ \ \ \ id | [c..c]}\\ \text{s\ \ \ \ ::== assign(lv,e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | write(e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | read(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | call(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | if(e,s,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | while(e,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | list(seq s)}\\ \text{lv\ \ \ \ ::==\ \ \ \ id}\\ \text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}\\ \text{unary\ \ \ \ ::==\ \ \ \ -\_}\\ \text{binary\ \ \ \ ::==\ \ \ \ \_+\_ | \_-\_ | \_*\_ | \_/\_ | \_=\_ | \_}\ne\text{\_ | \_<\_ | \_}\le\text{\_ | \_>\_ | \_}\ge\text{\_}

  • program    ::==    block\color{lightblue}\text{program\ \ \ \ ::==\ \ \ \ block} A program is a block
  • block    ::==    blk(ds,s)\color{lightblue}\text{block\ \ \ \ ::==\ \ \ \ blk(ds,s)} A block has both declarations (ds) and statements (s)
  • ds    ::==    idd\color{lightblue}\text{ds\ \ \ \ ::==\ \ \ \ id}\mapsto\text{d} A declaration is a mapping from identifiers to individual declarations
  • d    ::==    const(c) | type(t) | var(t) | proc(block)\color{lightblue}\text{d\ \ \ \ ::==\ \ \ \ const(c) | type(t) | var(t) | proc(block)} Each individual declaration is either:
    • A constant
    • A type
    • A variable
    • A procedure
  • c    ::== n | id | op(-_,c)\color{lightblue}\text{c\ \ \ \ ::== n | id | op(-\_,c)} A constant has a parameter c, which is either a number, identifier, or unary operator applied to a number
  • t    ::==    id | [c..c]\color{lightblue}\text{t\ \ \ \ ::==\ \ \ \ id | [c..c]} A type t\text{t} is either an identifier or subrange from one constant to another

\color{lightblue}{\text{s\ \ \ \ ::== assign(lv,e) | write(e)}\ \text{ | read(lv)}\ \text{ | call(lv)}\ \text{ | if(e,s,s)}\ \text{ | while(e,s)}\ \text{ | list(seq s)}\}

A statement is either an assignment, a write operation, read operation, call operation, conditional statement, while loop or a list of sequences - $\color{lightblue}lv\ \ \ \ ::==\ \ \ \ id$ A left value is an identifier - $\color{lightblue} \text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}$ Expressions are either numbers, left values, unary operators or binary operators. - $\color{lightblue}\text{unary\ \ \ \ ::==\ \ \ \ -\_}$ Our starting unary operator is the $\color{lightblue}-$ symbol (the underscore indicates where arguments shall go.) - $\color{lightblue}\text{binary\ \ \ \ ::==\ \ \ \ \_+\_ | \_-\_ | \_*\_ | \_/\_ | \_=\_ | \_}\ne\text{\_ | \_<\_ | \_}\le\text{\_ | \_>\_ | \_}\ge\text{\_}$ The binary operators in PL0 are indicated above, where the underscore indicates where arguments shall go ### 2.1.2 - Abstract Syntax Example 🌱 Consider the following code, and its’ abstract syntax. ```pascal const C = 42; type S = [-C..C]; var b : boolean; y : S; ``` - From this, let’s construct it’s abstract syntax - Let’s first map out the declarations, $\color{lightblue}\text{ds}$

\color{lightblue}\text{ds}=\{C\mapsto\text{const(42),}\\
\text{S}\mapsto\text{type([op(-\_,C)..C]}\\
\text{b}\mapsto\text{var(boolean),}\\
\text{y}\mapsto\text{var(S)}

\}
$$
  • Next, let’s map out the statements:

    s=assign(y, op(_+_,(op(-_,C),27)))\color{lightblue}\text{s}=\text{assign(y, op(\_+\_,(op(-\_,C),27)))}

    • What does this actually mean?
    1. We assign the value of op(_+_,(op(-_,C),27))\color{lightblue}\text{op(\_+\_,(op(-\_,C),27))} to the variable y\color{lightblue}\text{y}
    2. In the statement op(_+_,(op(-_,C),27))\color{lightblue}\text{op(\_+\_,(op(-\_,C),27))}, we are applying the binary operator +\color{lightblue}\text{+} to two arguments - op(-_,C)\color{lightblue}\text{op(-\_,C)} and 27\color{lightblue}\text{27}.
    3. The statement op(-_,C)\color{lightblue}\text{op(-\_,C)} applies the unary operator \color{lightblue}- to the constant 27\color{lightblue}27

2.2 - PL0 Types - Scalar and Reference Types

  • The scalar types are:

    • Integers (type int)
    • Booleans (type boolean)
    • Subrange, where subrange(T, lower, upper) is the subrange of values in base type T from lower to upper where lowerupper\text{lower}\le\text{upper}
    • These are our primitive, predefined types.
  • There are also reference (or location) types, ref(T) that describe variables

    • To fully describe variables, we not only need to know its type, but also where it’s stored
    • Hence, variables are reference types.
    const  C = 42;          // C has type int
    type   S = [-C..C];     // S is the type subrange(int, -42, 42)
    var    b : boolean;     // b has type ref(boolean)
           y : S;           // y has type ref(subrange(int, -42, 42)
    
  • There are product types, where T1×T2\color{lightblue}T_1\times T_2 is the type of pairs (x,y)\color{lightblue}(x,y) for xT1,yT2\color{lightblue}x\in T_1,y\in T_2

  • There are function types, where T1T2\color{lightblue}T_1\rightarrow T_2is the type of functions with arguments of type T1\color{lightblue}T_1 and results of type T2\color{lightblue}T_2

  • We can combine product types and function types to create more sophisticated operators:

    _-\_intint\text{int}\rightarrow\text{int}
    _+_\_+\_int×intint\text{int}\times\text{int}\rightarrow\text{int}
    _=_\_=\_int×intboolean\text{int}\times\text{int}\rightarrow\text{boolean}
    _=_\_=\_boolean×booleanboolean\text{boolean}\times\text{boolean}\rightarrow\text{boolean}
    • Note here that the definition for equality is overloaded - for both int×int\color{lightblue}\text{int}\times\text{int} and boolean×boolean\color{lightblue}\text{boolean}\times\text{boolean}
  • The following summarises the semantic types shown above:

    T  ::==  int | boolean | subrange(T,Z,Z) | ref(T) | T×T | TT\color{lightblue}T\ \ ::==\ \ \text{int | boolean | subrange(T,}\Z,\Z\text{) | ref(T) | T}\times\text{T | T}\rightarrow\text{T}

    • Where T for a subrange must be int\color{lightblue}\text{int} or boolean\color{lightblue}\text{boolean}
    • Where Z\color{lightblue}\Z stands for the integers

3.0 - Symbol Table

🌱 In the Parsing phase of a compiler, an AST and Symbol Table are created.

  • Entries in the symbol table are mappings from identifiers (id\color{lightblue}\text{id}) to symbol table entries SymEntry\color{lightblue}\text{SymEntry}.

    symsidSymEntry\color{lightblue}\text{syms}\in\text{id}\mapsto\text{SymEntry}

  • We have different types for Constants, Types, Variables and Procedures in our symbol table

    SymEntry  ::== ContEntry(T,Z) | TypeEntry(T) | VarEntry(T) | ProcEntry(block)\color{lightblue}\small\text{SymEntry\ \ ::== ContEntry(T,}\Z\text{) | TypeEntry(T) | VarEntry(T) | ProcEntry(block)}

  • From before, we have the following code:

    const  C = 42;          // C has type int
    type   S = [-C..C];     // S is the type subrange(int, -42, 42)
    var    b : boolean;     // b has type ref(boolean)
           y : S;           // y has type ref(subrange(int, -42, 42)
    
  • And we’ve generated its declarations

    ds={Cconst(42),Stype([op(-_,C)..C]bvar(boolean),yvar(S)}\color{lightblue}\text{ds}=\{C\mapsto\text{const(42),}\\ \text{S}\mapsto\text{type([op(-\_,C)..C]}\\ \text{b}\mapsto\text{var(boolean),}\\ \text{y}\mapsto\text{var(S)} \}

  • We then generate the symbol table entries for the declarations

    syms = {CConstEntry(int, 42),STypeEntry(subrange(int,-42,42)),bVarEntry(ref(boolean)),yVarEntry(ref(subrange(int,-42,42))}\color{lightblue}\text{syms = \{C}\mapsto \text{ConstEntry(int, 42),}\\ \text{S}\mapsto\text{TypeEntry(subrange(int,-42,42)),}\\ \text{b}\mapsto\text{VarEntry(ref(boolean)),}\\ \text{y}\mapsto\text{VarEntry(ref(subrange(int,-42,42))} \}

3.1 - Well-Typed Expressions

  • Expressions need to be well-typed and we need to be able to determine the type of an expression

  • We have type inferences which can be denoted as:

    symse:T\color{lightblue}\text{syms}\vdash e:T

    • Which essentially states that in the context of symbol table syms\color{lightblue}\text{syms}, the expression e\color{lightblue}\text{e} is well-typed and has type T\color{lightblue}\text{T}

3.2 - Rules of Static Semantics

3.2.1 - Integer Value

symsn:int\color{lightblue}\text{syms}\vdash n : \text{int}

  • This rule has a conclusion without any premises.
  • Any expression n\color{lightblue}\text{n}, where n\color{lightblue}\text{n} is an integer is well typed, and has type int\color{lightblue}\text{int}

Example: Type Inference for 27

syms27:int\color{lightblue}\text{syms}\vdash 27 : \text{int}

3.2.2 - Symbolic Constant

iddom(syms)syms(id) = ConstEntry(T,v)symsid : T\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id) = ConstEntry(T,v)}}\\ \text{syms}\vdash \text{id : T}

  • This is essentially saying that an identifier id\color{lightblue}\text{id} of type T\color{lightblue}\text{T} is well typed iff:
    1. If the identifier id\color{lightblue}\text{id} is in the domain of the symbol table (i.e. iddom(syms)\color{lightblue}\text{id}\in\text{dom(syms)}) and;
    2. If the entry associated with that identifier syms(id)\color{lightblue}\text{syms(id)} is of type ConstEntry(T,v)\color{lightblue}\text{ConstEntry(T,v)}

Example: Type Inference for Constant C

  • From before, we have the following entry in our symbol table

    syms={,CConstEntry(int,42),}\color{lightblue}\text{syms}=\{\cdots\text{,C}\mapsto\text{ConstEntry(int,42),}\cdots\}

  • So we know that Cdom(syms)\color{lightblue}\text{C}\in\text{dom(syms)} and additionally, from the symbol table entry we know that syms(C)=ConstEntry(int,42)\color{lightblue}\text{syms(C)=ConstEntry(int,42)}.

  • Therefore, symsC : int\color{lightblue}\text{syms}\vdash\text{C : int}

3.2.3 - Variable Identifier

iddom(syms)syms(id)=VarEntry(T)symsid : T\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

  • This is essentially saying that an identifier id\color{lightblue}\text{id} of type T\color{lightblue}\text{T} is well typed iff:
    1. If the identifier id\color{lightblue}\text{id} is in the domain of the symbol table (i.e. iddom(syms)\color{lightblue}\text{id}\in\text{dom(syms)}) and;
    2. If the entry associated with that identifier syms(id)\color{lightblue}\text{syms(id)} is of type VarEntry(T,v)\color{lightblue}\text{VarEntry(T,v)}

Example: Type Inference for b

  • From before, we have the following entry in our symbol table

    syms={,bVarEntry(ref(boolean)),}\color{lightblue}\text{syms}=\{\cdots\text{,b}\mapsto\text{VarEntry(ref(boolean)),}\cdots\}

3.2.4 - Unary Negation

symse : intsymsop(-_,e) : int\color{lightblue}\underline{\text{syms}\vdash {\text{e : int}}}\\ \text{syms}\vdash \text{op(-\_,e) : int}

  • This is essentially saying that for the unary operator \color{lightblue}- applied to an expression e\color{lightblue}\text{e} is well typed in the context of the symbol table syms\color{lightblue}\text{syms} and is of type int\color{lightblue}\text{int} iff:
    • The expression e\color{lightblue}\text{e} is well typed in the context of the same symbol table and is of type integer

Example: Type Inference for -C

symsC : intsymsop(-_,C) : int\color{lightblue}\underline{\text{syms}\vdash {\text{C : int}}}\\ \text{syms}\vdash \text{op(-\_,C) : int}

  • Therefore, we only need to show that symsC : int\color{lightblue}\text{syms}\vdash {\text{C : int}} to prove that symsop(-_,C) : int\color{lightblue}\text{syms}\vdash\text{op(-\_,C) : int} is well typed.
  • From before, we proved that the symbol with the declaration CConstEntry(int,42)\color{lightblue}C\mapsto\text{ConstEntry(int,42)} is type correct and therefore op(-_,C)\color{lightblue}\text{op(-\_,C)} is type correct.

3.2.5 - Binary Operator

symse1 : T1symse2 : T2syms__ : T1×T2T3symsop(__,e1,e2)) : T3\color{lightblue} \text{syms}\vdash\text{e1 : T1}\\ \text{syms}\vdash\text{e2 : T2}\\ \underline{\text{syms}\vdash\_\odot\_\ :\ T1\times T2\rightarrow T3}\\ \text{syms}\rightarrow \text{op}(\_\odot\_,\text{e1,e2)) : T3}

  • Note that the symbol \color{lightblue}\odot denotes any binary operator.
  • We have the expression op(__,(e1,e2))\color{lightblue}\text{op(}\_\odot\_\text{,(e1,e2))} which essentially denotes the application of an arbitrary binary operator \color{lightblue}\odot being applied to expressions e1, e2\color{lightblue}\text{e1, e2} is type correct, and has type T3\color{lightblue}\text{T3} iff
    1. e1\color{lightblue}e_1 is well typed in the context of the symbol table syms\color{lightblue}\text{syms} and has type T1\color{lightblue}\text{T1}
    2. e2\color{lightblue}e_2 is well typed in the context of the symbol table syms\color{lightblue}\text{syms} and has type T2\color{lightblue}\text{T2}
    3. There exists a binary operator \color{lightblue}\odot that takes in type T1,T2\color{lightblue}\text{T1,T2} and produces type T3\color{lightblue}\text{T3}

Example: Type Inference for -C + 27

  • We want to know whether C+27\color{lightblue}-C+27 is well typed.

  • In this interpretation, \color{lightblue}\odot is replaced with +\color{lightblue}+

  • To know whether C+27\color{lightblue}-C+27 is type correct, we essentially want to know if the following holds true

    symsop(_+_,op(-_,C),27)) : int\color{lightblue}\text{syms}\vdash \text{op(\_+\_,op(-\_,C),27)) : int}

  • To do this, we can use the rule described above, and prove the following three things:

    1. symsop(-_,C) : int\color{lightblue}\text{syms}\vdash\text{op(-\_,C) : int} Is the first argument op(-_,C)\color{lightblue}\text{op(-\_,C)} or -C\color{lightblue}\text{-C} type correct, and is it an integer?
      • In the context of a symbol table with the following declaration:

        syms={,CConstentry(int,42),}\color{lightblue}\text{syms}=\{\cdots,\text{C}\mapsto\text{Constentry(int,42)},\cdots\}

      • We can conclude that the statement is type correct by Unary Negation, and is in fact an integer

    2. syms27:int\color{lightblue}\text{syms} \vdash27:\text{int} Is the argument 27\color{lightblue}\text{27} type correct, and is it an integer?
      • By Integer Value we can conclude that this statement is type correct and is in fact an integer
    3. syms_+_:int×intint\color{lightblue}\text{syms}\vdash\_+\_ : \text{int} \times \text{int}\rightarrow\text{int} In the context of our symbol table syms\color{lightblue}\text{syms}, is addition well typed, takes input int×int\color{lightblue}\text{int}\times\text{int} and produce a value of type int\color{lightblue}\text{int}
      • This holds true by definition of the addition binary operator.

3.2.6 - Dereference

🌱 The dereference rule allows us to treat an expression of type ref(T)\color{lightblue}\text{ref(T)} as an expression of type T\color{lightblue}\text{T} itself.

symse : ref(T)symse : T\color{lightblue}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

  • In the context of a symbol table syms\color{lightblue}\text{syms}, an expression e\color{lightblue}\text{e} is well typed (& can be treated as if it was of type T\color{lightblue}\text{T}) iff the expression e\color{lightblue}\text{e} is well typed, and of type ref(T)\color{lightblue}\text{ref(T)} in the context of the symbol table syms\color{lightblue}\text{syms}

Example: Type Inference for y

symsy : ref(T)symsy : T\color{lightblue}\underline{\text{syms}\vdash\text{y : ref(T)}}\\ \text{syms}\vdash\text{y : T}

  • Recall that we have an entry in our symbol table:

    syms={, yVarEntry(ref(subrange(int,-42,42))),}\color{lightblue}\text{syms}=\{\cdots\text{, y}\mapsto\text{VarEntry(ref(subrange(int,-42,42))),}\cdots\}

  • And therefore, since y\color{lightblue}\text{y} is well typed and of type ref(subrange(...))\color{lightblue}\text{ref(subrange(...))} so we can reason that y\color{lightblue}\text{y} is wel typed, and can be treated as type subrange(...)\color{lightblue}\text{subrange(...)} as a result of the premise symsref(subrange(int,-42,42))\color{lightblue}\text{syms}\vdash\text{ref(subrange(int,-42,42))} holding.

  • We know that this premise holds in the context of the symbol table syms\color{lightblue}\text{syms} by the Variable Identifier rule.

3.2.7 - Widen Subrange

symse:subrange(T,i,j)symse:T\color{lightblue}\underline{\text{syms} \vdash e :\text{subrange(T,i,j)}}\\ \text{syms}\vdash e : T

  • Suppose we had a subrange of type TT from ii to jj. We can treat this as a variable of type TT

Proof for Type Inference for y - Widen Subrange

ydom(syms)syms(id)=VarEntry(ref(subrange(int, -42, 42)))symsy : ref(subrange(int, -42, 42))symsy : subrange(int, -42, 42)symsy : inty\in\text{dom(syms)}\\ \underline{\text{syms(id)}=\text{VarEntry(ref(subrange(int, -42, 42)))}}\\ \underline{\text{syms}\vdash \text{y : ref(subrange(int, -42, 42))}}\\ \underline{\text{syms}\vdash\text{y : subrange(int, -42, 42)}}\\ \text{syms}\vdash\text{y : int}

Step 1: Use Variable Identifier Rule

  • Since ydom(syms)\color{lightblue}y\in\text{dom(syms)}, we can modify the expression from syms(id)=VarEntry(ref(subrange(int,-42,42))\color{lightblue}\text{syms(id)}=\text{VarEntry(ref(subrange(int,-42,42))} to symsy:ref(subrange(int,-42,42))\color{lightblue}\text{syms}\vdash \text{y:ref(subrange(int,-42,42))}

iddom(syms)syms(id)=VarEntry(T)symsid : T\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

Step 2: Use Dereference Rule

  • Since we have a statement of the form symse:ref(T)\color{lightblue}\text{syms}\vdash \text{e:ref(T)} we can use the Dereference rule to modify the expression from symsy:ref(subrange(int,-42,42))\color{lightblue}\text{syms}\vdash \text{y:ref(subrange(int,-42,42))} to symsy:subrange(int,-42,42)\color{lightblue}\text{syms}\vdash \text{y:subrange(int,-42,42)}

symse : ref(T)symse : T\color{lightblue}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

Step 3: Use Widen Subrange Rule

  • Since we have a statement of the form symse : subrange(T,i,j)\color{lightblue}\text{syms}\vdash \text{e : subrange(T,i,j)} we can modify the expression from symsy:subrange(int,-42,42)\color{lightblue}\text{syms}\vdash \text{y:subrange(int,-42,42)} to symsy : int\color{lightblue}\text{syms}\vdash \text{y : int}

symse:subrange(T,i,j)symse:T\color{lightblue}\underline{\text{syms} \vdash e :\text{subrange(T,i,j)}}\\ \text{syms}\vdash e : T

3.2.8 - Narrow Subrange

symse : TijT{int, boolean}symse : subrange(T,i,j)\color{lightblue}\text{syms}\vdash\text{e : T}\\ i \le j\\ \underline{T\in\{\text{int, boolean}\}}\\ \text{syms}\vdash\text{e : subrange(T,i,j)}

  • Given that e\color{lightblue}\text{e} is well typed and of type T\color{lightblue}\text{T}, where T{int, boolean}\color{lightblue}\text{T}\in\{\text{int, boolean}\} and ij\color{lightblue}i \le j, then we can say that symse : subrange(T,i,j)\color{lightblue}\text{syms}\vdash\text{e : subrange(T,i,j)}

Example: Type Inference for 27

syms27 : int4242int{int, boolean}syms27 : subrange(int, -42, 42)\color{lightblue}\text{syms}\vdash\text{27 : int}\\ -42 \le42\\ \underline{\text{int}\in\{\text{int, boolean}\}}\\ \text{syms}\vdash\text{27 : subrange(int, -42, 42)}

  • Suppose we were trying to argue that the integer 50 was in the subrange of -42 to 42
    • This wouldn’t cause a static semantics error - we treat it as a runtime error.
    • We treat it as a runtime error as the integer that we’re trying to coerce into the subrange may not be a constant value and we therefore cannot guarantee the value.

3.3 - Well-Formed Statements

  • Statements need to be well-formed

  • Our type inferences will be of the form:

    symsWFStatement(s)\color{lightblue}\text{syms}\vdash\text{WFStatement(s)}

    • This inference states that in the context of the symbol table syms\color{lightblue}\text{syms}, the statement s\color{lightblue}\text{s} is well-formed.

3.3.1 - Assignment

🌱 How do we know whether an assignment is well formed?

  • That is, how do we prove that assign(lv,e)\color{lightblue}\text{assign(lv,e)} is well formed?

  • In the context of the symbol table syms\color{lightblue}\text{syms}, we require:

    • The left value lv\color{lightblue}\text{lv} to be well typed, and has to be a reference to some type T\color{lightblue}\text{T}, ref(T)\color{lightblue}\text{ref(T)}
    • The expression e\color{lightblue}\text{e} to be well typed in the context of the symbol table, and have type T\color{lightblue}\text{T}

    symslv : ref(T)symse : TsymsWFStatement(assign(lv,e))\color{lightblue}\text{syms}\vdash\text{lv : ref(T)}\\ \underline{\text{syms}\vdash\text{e : T}}\\ \text{syms}\vdash\text{WFStatement(assign(lv,e))}


Example - Type Inference for y := 4

symsy : ref(subrange(int, -42,42))syms4 : subrange(int,-42,-42)symsWFStatement(assign(y,4))\color{lightblue}\text{syms}\vdash\text{y : ref(subrange(int, -42,42))}\\ \underline{\text{syms}\vdash\text{4 : subrange(int,-42,-42)}}\\ \text{syms}\vdash\text{WFStatement(assign(y,4))}

  • In this example, we instantiate T\color{lightblue}T to be the subrange from (42,42)\color{lightblue}(-42,42)
  • For the statement to be well formed, we also require that 4\color{lightblue}4 is also of the same type

We can reason that the inference syms4 : subrange(int,-42,42)\color{lightblue}\text{syms}\vdash\text{4 : subrange(int,-42,42)} is correct based on the following:

  • We can use the Narrowing Subrange rule to justify the inference:
    • Since 4242\color{lightblue}-42\le42 and y\color{lightblue}y is an int\color{lightblue}\text{int} we can reason that 4\color{lightblue}4 is well typed and is contained within the subrange type subrange(int, -42, 42)\color{lightblue}\text{subrange(int, -42, 42)}

symse : TijT{int, boolean}symse : subrange(T,i,j)\footnotesize\color{lightblue}\text{syms}\vdash\text{e : T}\\ i \le j\\ \underline{T\in\{\text{int, boolean}\}}\\ \text{syms}\vdash\text{e : subrange(T,i,j)}

We can reason that the inference symsy : ref(subrange(int, -42,42))\color{lightblue}\text{syms}\vdash\text{y : ref(subrange(int, -42,42))} based on the following:

  • We can use the Variable Identifier rule to justify the inference:
    • Since ydom(syms)\color{lightblue}y\in\text{dom(syms)} and y\color{lightblue}\text{y} has a VarEntry of the subrange type:

iddom(syms)syms(id)=VarEntry(T)symsid : T\scriptsize\color{lightblue}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

ydom(syms)syms(y)=VarEntry(ref(subrange(int,-42,42)))symsy : ref(subrange(int,-42,42))\color{lightblue}\text{y}\in\text{dom(syms)}\\ \underline{\text{syms(y)}=\text{VarEntry(ref(subrange(int,-42,42)))}}\\ \text{syms}\vdash\text{y : ref(subrange(int,-42,42))}

3.3.2 - Procedure Call

iddom(syms)syms(id)=ProcEntry(block)symsWFStatement(call(id))\color{lightblue}\text{id}\in\text{dom(syms)}\\ \underline{\text{syms(id)=ProcEntry(block)}}\\ \text{syms}\vdash\text{WFStatement(call(id))}

  • A procedure call will be a well formed statement in the context of the symbol table as long as:
    1. iddom(syms)\color{lightblue}\text{id}\in\text{dom(syms)} We call an identifier that is in the domain of the symbol table
    2. syms(id)=ProcEntry(block)\color{lightblue}\text{syms(id)=ProcEntry(block)} The identifier is a procedure entry.

3.3.3 - Read

🌱 In other programming languages, read and write wouldn’t be primitives, but we do this in PL0 for simplicity.

symslv : ref(T)T=intT=subrange(int,i,j)symsWFStatement(read(lv))\color{lightblue}\text{syms}\vdash\text{lv : ref(T)}\\ \underline{\text{T=int}\vee\text{T=subrange(int,i,j)}}\\ \text{syms}\vdash\text{WFStatement(read(lv))}

  • read(lv)\color{lightblue}\text{read(lv)} is well formed in the context of the symbol table iff:
    • lv : ref(T)\color{lightblue}\text{lv : ref(T)} The left value is a reference type (i.e. something that we can assign to)
    • T=intT=subrange(int,i,j)\color{lightblue}\text{T=int}\vee\text{T=subrange(int,i,j)} The value read has a type of either an integer or subrange of integers.

3.3.4 - Write

symse : intsymsWFStatement(write(e))\color{lightblue}\text{syms}\vdash\text{e : int}\\ \overline{\text{syms}\vdash\text{WFStatement(write(e))}}

  • write(e)\color{lightblue}\text{write(e)} will be well formed in the context of the symbol table iff:
    1. e\color{lightblue}\text{e} is of type integer\color{lightblue}\text{integer}
    2. e\color{lightblue}\text{e} is well typed

3.3.5 - Conditional Rule

symse : booleansymsWFStatement(s1)symsWFStatement(s2)symsWFStatement(if(e,s1,s2))\footnotesize\color{lightblue}\text{syms}\vdash\text{e : boolean}\\ \text{syms}\vdash\text{WFStatement(s1)}\\ \text{syms}\vdash\text{WFStatement(s2)}\\ \overline{\text{syms}\vdash\text{WFStatement(if(e,s1,s2))}}

  • For a conditional (denoted if(e,s1,s2)\color{lightblue}\text{if(e,s1,s2)}) to be well formed, we require:
    • The condition e\color{lightblue}\text{e} to be well-typed, and return a boolean type in the context of the symbol table
    • The statements s1,s2\color{lightblue}\text{s1,s2} to be well formed statements in the context of the symbol table.

Example - Well Formed Conditional

🌱 Show that if x < 0 then y := -x else y := x is a well formed conditional statement.

  • From this statement, we can generate the following:

    symsop(_<_,(x,0)) : booleansymsWFStatement(assign(y,op(-_,x)))symsWFStatement(assign(y,x))symsWFStatement(if(op(_<_,(x,0)),assign(y,op(-_,x)),assign(y,x)))\color{lightblue}\text{syms}\vdash\text{op(\_<\_,(x,0)) : boolean}\\ \text{syms}\vdash\text{WFStatement(assign(y,op(-\_,x)))}\\ \text{syms}\vdash\text{WFStatement(assign(y,x))}\\ \overline{\text{syms}\vdash\text{WFStatement(if(op(\_<\_,(x,0)),assign(y,op(-\_,x)),assign(y,x)))}}

  • From our symbol table, we know that:

    syms={,xVarEntry(ref(int))yVarEntry(ref(int)),}\color{lightblue}\text{syms}=\{\cdots,\\\text{x}\mapsto\text{VarEntry(ref(int))}\\ \text{y}\mapsto\text{VarEntry(ref(int))},\\\cdots\}

Firstly, we prove that the statement op(_<_,(x,0))\color{lightblue}\text{op(\_<\_,(x,0))} is well formed:

That is: symsop(_<_,(x,0)) : boolean\color{lightblue}\text{syms}\vdash\text{op(\_<\_,(x,0)) : boolean}

  1. From the Variable Identifier rule, we can infer that:

xdom(syms)syms(x)=VarEntry(int)syms x : ref(int)\footnotesize\color{lightblue}\text{x}\in\text{dom(syms)}\\ \text{syms(x)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ x : ref(int)}}

iddom(syms)syms(id)=VarEntry(T)symsid : T\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

  • We know that x\color{lightblue}\text{x} is in the domain of our symbol table as it is inherently in the symbol table:

    syms={,xVarEntry(ref(int)), }\color{lightblue}\text{syms}=\{\cdots,\text{x}\mapsto\text{VarEntry(ref(int)), }\cdots\}

  1. From the Dereference rule, we can infer that:

symsx : ref(int)syms x : int\footnotesize\color{lightblue}\underline{\text{syms}\vdash\text{x : ref(int)}}\\\text{syms}\vdash\text{ x : int}

symse : ref(T)symse : T\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

  1. From the Integer Value rule, we can inherently infer that:

syms0 : int\footnotesize\color{lightblue}\text{syms}\vdash\text{0 : int}

symsn:int\color{gray}\text{syms}\vdash n : \text{int}

  1. From the Binary Operator rule, we can inherently infer that:

syms(op(_<_,(x,0)) : boolean)\footnotesize\color{lightblue}\text{syms}\vdash\text{(op(\_<\_,(x,0)) : boolean)}

Secondly, we prove that the statement assign(y,-x)\color{lightblue}\text{assign(y,-x)} is well formed:

That is: symsWFStatement(assign(y,-x))\color{lightblue}\text{syms}\vdash\text{WFStatement(assign(y,-x))}

  1. From the Variable Identifier rule, we can infer that:

ydom(syms)syms(y)=VarEntry(int)syms y : ref(int)\footnotesize\color{lightblue}\text{y}\in\text{dom(syms)}\\ \text{syms(y)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ y : ref(int)}}

iddom(syms)syms(id)=VarEntry(T)symsid : T\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

  • This is inherently true as y is in the domain of the symbol table BECAUSE it is in the symbol table.

  1. We can also do the same for the variable x\color{lightblue}\text{x}

xdom(syms)syms(x)=VarEntry(int)syms x : ref(int)\footnotesize\color{lightblue}\text{x}\in\text{dom(syms)}\\ \text{syms(x)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ x : ref(int)}}

iddom(syms)syms(id)=VarEntry(T)symsid : T\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

  1. From the Dereference rule, we can infer that:

symsx : ref(int)syms x : int\footnotesize\color{lightblue}\underline{\text{syms}\vdash\text{x : ref(int)}}\\\text{syms}\vdash\text{ x : int}

symse : ref(T)symse : T\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

  1. From the Unary Negation rule, we can infer that:

symsx : intsymsop(-_,x) : int\footnotesize\color{lightblue}\text{syms}\vdash\text{x : int}\\ \overline{\text{syms}\vdash\text{op(-\_,x) : int}}

symse : intsymsop(-_,e) : int\footnotesize\color{gray}\text{syms}\vdash\text{e : int}\\ \overline{\text{syms}\vdash\text{op(-\_,e) : int}}

  1. And then combining Step 1 and Step 4 using the Statement Assignment rule.

symsy : ref(int)symsop(-_,x) : intsymsWFStatement(assign(y,op(-_, x)))\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{op(-\_,x) : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,op(-\_, x)))}}

Thirdly, we prove the statement assign(y,x)\color{lightblue}\footnotesize\text{assign(y,x)} is well formed:

That is: symsWFStatement(assign(y,x))\color{lightblue}\text{syms}\vdash\text{WFStatement(assign(y,x))}

  1. From the Variable Identifier rule, we can infer that:

ydom(syms)syms(y)=VarEntry(int)syms y : ref(int)\footnotesize\color{lightblue}\text{y}\in\text{dom(syms)}\\ \text{syms(y)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ y : ref(int)}}

iddom(syms)syms(id)=VarEntry(T)symsid : T\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

  • This is inherently true as y is in the domain of the symbol table BECAUSE it is in the symbol table.

  1. We can also do the same for the variable x\color{lightblue}\text{x}

xdom(syms)syms(x)=VarEntry(int)syms x : ref(int)\footnotesize\color{lightblue}\text{x}\in\text{dom(syms)}\\ \text{syms(x)=VarEntry(int)}\\ \overline{\text{syms}\vdash\text{ x : ref(int)}}

iddom(syms)syms(id)=VarEntry(T)symsid : T\scriptsize\color{gray}\text{id}\in\text{dom(syms)}\\ \text{\underline{syms(id)=VarEntry(T)}}\\ \text{syms}\vdash\text{id : T}

  1. From the Dereference rule, we can infer that:

symsx : ref(int)syms x : int\footnotesize\color{lightblue}\underline{\text{syms}\vdash\text{x : ref(int)}}\\\text{syms}\vdash\text{ x : int}

symse : ref(T)symse : T\footnotesize\color{gray}\underline{\text{syms}\vdash\text{e : ref(T)}}\\ \text{syms}\vdash\text{e : T}

  1. And then combining Step 1 and Step 3 using the Statement Assignment rule.

symsy : ref(int)symsx : intsymsWFStatement(assign(y,x))\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{x : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,x))}}

Finally, putting it all together:

symsy : ref(int)symsop(-_,x) : intsymsWFStatement(assign(y,op(-_, x)))\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{op(-\_,x) : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,op(-\_, x)))}}

symsy : ref(int)symsx : intsymsWFStatement(assign(y,x))\footnotesize\color{lightblue} \text{syms}\vdash\text{y : ref(int)}\\ \text{syms}\vdash\text{x : int}\\ \overline{\text{syms}\vdash\text{WFStatement(assign(y,x))}}

syms(op(_<_,(x,0)) : boolean)\footnotesize\color{lightblue}\text{syms}\vdash\text{(op(\_<\_,(x,0)) : boolean)}


  • Combining all of these conclusions using the Conditional Statement rule, we get:

syms(op(_<_,(x,0)) : boolean)symsWFStatement(assign(y,op(-_, x)))symsWFStatement(assign(y,x))symsWFStatement(if(op(_<_,(x,0), assign(y,op(-_,x)),assign(y,x)))\footnotesize\color{lightblue}\text{syms}\vdash\text{(op(\_<\_,(x,0)) : boolean)}\\ \text{syms}\vdash\text{WFStatement(assign(y,op(-\_, x)))}\\ \text{syms}\vdash\text{WFStatement(assign(y,x))}\\ \overline{\text{syms}\vdash\text{WFStatement(if(op(\_<\_,(x,0), assign(y,op(-\_,x)),assign(y,x)))}}

3.3.6 - Iteration (While Rule)

🌱 For a while loop, if the expression is well formed in the context of the symbol table and is a boolean AND if the statement is well formed, then the while statement will be well formed.

symse : booleansymsWFStatement(s)symsWFStatement(while(e,s))\color{lightblue}\footnotesize \text{syms}\vdash\text{e : boolean}\\ \text{syms}\vdash\text{WFStatement(s)}\\ \overline{\text{syms}\vdash\text{WFStatement(while(e,s))}}

3.3.7 - Statement List

🌱 For a statement list, if all statements are well formed in the context of the symbol table, then the statement list will be well formed.

selems(ls)  (syms  WFStatement(s))symsWFStatement(list(ls))\footnotesize\color{lightblue} \underline{\forall\text{s}\in\text{elems(ls)}\ \bullet\ \text{(syms }\vdash \text{ WFStatement(s))}}\\ \text{syms}\vdash\text{WFStatement(list(ls))}

4.0 - Implementation of Static Semantic Checking

🌱 The PL0 Static Checker is implemented in the tree package.

4.1 - Abstract Syntax Tree Classes

  • The tree package contains all of the abstract syntax tree classes.
    • The StatementNode class implements all of the statement classes

      s    ::== assign(lv,e)                 | write(e)                 | read(lv)               | call(lv)               | if(e,s,s)                   | while(e,s)                   | list(seq s)\color{lightblue}\footnotesize\text{s\ \ \ \ ::== assign(lv,e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | write(e)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | read(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | call(lv)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | if(e,s,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | while(e,s)}\\ \text{\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | list(seq s)}\\

    • The ExpNode class implements all of the expression classes

      e    :==    n | lv | op(unary, e) | op(binary, (e,e))\footnotesize\color{lightblue}\\\text{e\ \ \ \ :==\ \ \ \ n | lv | op(unary, e) | op(binary, (e,e))}\\

4.2 - Symbol Table Implementations

  • The syms package contains all of the data type implementations from the symbol table.
    • Type is an abstract class which other classes can use to implement their own desired behaviour.
    • E.g. ScalarType, ProductType, ReferenceType, FunctionType
    • In the Predefined class, we configure all of our predefined types and symbols in our symbol table.
      • E.g. Scalars - Integers and Booleans
      • Predefined operators (the actual symbol is defined in tree/Operator.java as enum)
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION, LOGICAL_BINARY);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION, LOGICAL_BINARY);
predefined.addOperator(Operator.NEG_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_UNARY);
predefined.addOperator(Operator.ADD_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.SUB_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.MUL_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.DIV_OP, ErrorHandler.NO_LOCATION, ARITHMETIC_BINARY);
predefined.addOperator(Operator.EQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.NEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GREATER_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LESS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.GEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
predefined.addOperator(Operator.LEQUALS_OP, ErrorHandler.NO_LOCATION,
        INT_RELATIONAL_TYPE);
  • Notice that our addOperator function takes three inputs:
    • Note that when we define these, we use the ErrorHandler.NO_LOCATION as a location - they’re predefined and thus are not defined in the source code.
SymEntry.OperatorEntry addOperator(Operator op, Location loc, Type.FunctionType type);
  • For our arithmetic operators, we define a FunctionType above:
FunctionType ARITHMETIC_BINARY = new FunctionType(PAIR_INTEGER_TYPE, INTEGER_TYPE);

4.3 - Static Checker Implementation

  • The static checker traverses the abstract syntax tree and type checks the tree as it traverses through

    • As we traverse the AST, we may be updating it with type information
    • The tree traversal is done using the Visitor pattern
  • Consider the following code, which is essentially an abstraction and simplification of an AST

    public abstract class Tree {
      public abstract void accept(Visitor v);
      public static class ATree extends Tree {
        // Tree class with two children
        Tree t1, t2;
        
        Public ATree(Tree t1, Tree t2) {
          super();  this.t1 = t1;  this.t2 = t2;
        }
    
        public void accept(Visitor v) {  v.visitATree(this);  }
      }
    
      public static class BTree extends Tree {
        // A leaf node of the tree (i.e. has no subtrees)
        int n;
    
        public BTree(int n) {
          super();  this.n = n;
        }
    
        public void accept(Visitor v) {  v.visitBTree(this);  }
      }
    
      public static class CTree extends Tree {
        // Tree subclass with three children
        Tree t1,t2,t3;
    
        public CTree extends Tree {
          public CTree(Tree t1, Tree t2, Tree t3) {
            super();  this.t1 = t1;  this.t2 = t2;  this.t3 = t3;
          }
        }
      }
    }
    

4.3.1 - Visitor Pattern

  • We would like to create the visit methods shown here to implement tree traversal using the visitor pattern.
  • Create individual methods for visiting ATree, BTree and CTree
  • Ideally, we’d just like to do something like this:
public class TraversalIdeal {
  public void visit(Tree.ATree t) {
    visit(t.t1);
    visit(t.t2);
  }
  public void visit(Tree.BTree t) {
     System.out.println(t.n);
  }
  public void visit(Tree.CTree t) {
    visit(t.t1);
    visit(t.t2);
    visit(t.t3);
  }
}
  • However, we don’t know actually know the (dynamic) type of t.tn - we know that it’s a Tree (static type), but we don’t know what implementation of Tree it is.

    • The compiler is looking for a method of the form:

      public void visit(Tree t) { ... }
      
  • We can solve this by using the visitor pattern - we create an interface that has a traversal method for each tree subtype.

    public interface Visitor {
      public void visitATree(Tree.ATree t);
    	public void visitBTree(Tree.BTree t);
    	public void visitCTree(Tree.CTree t);
    }
    
  • We then add an accept method in each of our tree implementations:

    public abstract class Tree {
      public abstract void accept(Visitor v);
      public static class ATree extends Tree {
    		Tree t1,t2;
        public ATree(Tree t1, Tree t2){ super(); this.t1 = t1; this.t2 = t2; }
        public void accept(Visitor v) { v.visitATree(this); }
      }
    }
    
  • And then our Traversal class implements these methods:

    public class Traversal implements Visitor {
      @Override
      public void visitATree(Tree.ATree t) {
        // Traversal using the aceept() method instead of visit() method
        // Dynamically dispatches the call to the right visitXTree method.
    		t.t1.accept(this);  t.t2.accept(this); 
    	}
    }