Types

Our specifications and implementations of value-of specify and implement the dynamic semantics of expressions.

Expressions can also have static semantics, which concern the properties of expressions that can be deduced without executing the expressions.

Type safety is an important property of expressions. Whether type safety is static or dynamic depends upon the programming language. If type safety is a static property, then we say the language is strongly typed.

Some programming languages allow the type of an expression to be calculated without executing the expression. We say these languages are statically typed.

A language can be statically typed without being strongly typed. C and C++, for example, are statically typed but not strongly typed, because type safety is not a static property of those languages. The problem is that the C/C++ type system is unsound. Although the type of a C/C++ expression can be calculated statically, that type is not always a reliable prediction of the expression's value at run time.

For example:

#include <stdio.h>
#include <stdlib.h>

double f (double * p) {
  *p = 3.14159;
  return *p;
}

int main (int argc, char* argv[]) {
  int n = 12345;
  double * p = (void *) &n;
  double x = f(p);
  double y = f(p+1);
  double z = f(p+2);
  printf ("n =  %d\n", n);
  printf ("x =  %lf\n", x);
}

What output is printed by that program?

On one machine, the output was:

n =  1074340345
x =  3.141590

On another machine, the program terminated with a segmentation fault. The problem is that the program is not type-safe.

Definition of Type Safety for LET, PROC, or LETREC

For every evaluation of a variable, the variable is bound.
For every evaluation of a difference expression (diff-exp exp₁ exp₂), the values of exp₁ and exp₂ are both numbers.
For every evaluation of an expression of the form (zero?-exp exp₁), the value of exp₁ is a number.
For every evaluation of a conditional expression (if-exp exp₁ exp₂ exp₃), the value of exp₁ is a boolean.
For every evaluation of a procedure call (call-exp exp₁ exp₂), the value of exp₁ is a procedure.

If one of those conditions is violated, we call it a type error. (Hence not all errors are type errors.) We say that a LET, PROC, or LETREC program is type-safe if and only if its execution cannot possibly involve a type error.

Not all LET, PROC, and LETREC programs are type-safe. That leads to the following question:

Is type safety a static property of LET, PROC, or LETREC?

That's the same as asking whether LET, PROC, and LETREC are strongly typed.

It so happens that LET is strongly typed. That is not terribly interesting, however, because LET is not a very expressive language. For example, there is no LET expression exp such that, for all integer values n, let x = n in exp evaluates to the absolute value of n. For another example, it is not possible to write an infinite loop in the LET language.

The interesting question is whether PROC is strongly typed.

If PROC were strongly typed, then type safety would be a static property of PROC programs. In other words, there would be some algorithm that takes an arbitrary PROC program as input and decides whether the program is type-safe. In particular, that algorithm would be able to decide whether an arbitrary program of the form

    if <expression>
       then (0 0)
       else (0 0)

is type-safe.

It should be obvious that programs of that form are type-safe if and only if the <expression> does not halt. If PROC were strongly typed, therefore, then there would be some algorithm that takes an arbitrary expression as input and decides whether the expression halts.

Theorem. For all Turing-complete programming languages, the halting problem is undecidable.

PROC (unlike LET) is Turing-complete. Because the halting problem is undecidable, no algorithm is able to decide whether an arbitrary PROC program is type-safe. In other words, PROC is not strongly typed. Since PROC is a proper subset of the LETREC language, LETREC is not strongly typed either.

The undecidability of the halting problem tells us that no general purpose programming language can be strongly typed, assuming type safety and strong typing are defined as above.

That's not the answer we want.

We can't have the answer we want.

We can, however, change the definition of type safety and/or strong typing so we can pretend to have the answer we want. The standard way to do that is:

Define a static type system.
Define the well-typed programs.
Show that every well-typed program is type-safe in the sense defined earlier.
Show that well-typedness is statically decidable.
Pretend that's good enough.

That last step means we redefine strongly typed to mean

well-typedness is a static property
well-typedness implies type safety

Assigning a Type to an Expression

We'll start by defining a static type system for PROC:

Typing rules for PROC

(type-of (const-exp num) tenv) = int

(type-of (var-exp var) tenv) = tenv(var)

(type-of exp₁ tenv) = int
(type-of exp₂ tenv) = int
--------------------------------------------------------------------
(type-of (diff-exp exp₁ exp₂) tenv) = int

(type-of exp₁ tenv) = int
----------------------------------------------------------------
(type-of (zero?-exp exp₁) tenv) = bool

(type-of exp₁ tenv) = bool
(type-of exp₂ tenv) = t
(type-of exp₃ tenv) = t
--------------------------------------------------------------------
(type-of (if-exp exp₁ exp₂ exp₃) tenv) = t

(type-of exp₁ tenv) = t₁
(type-of body [var₁=t₁]tenv) = t
------------------------------------------------------------------------
(type-of (let-exp var₁ exp₁ body) tenv) = t

(type-of body [var₁=t₁]tenv) = t₂
----------------------------------------------------------------------------
(type-of (proc-exp var₁ body) tenv) = (t₁ → t₂)

(type-of exp₁ tenv) = (t₁ → t₂)
(type-of exp₂ tenv) = t₁
--------------------------------------------------------------------
(type-of (call-exp exp₁ exp₂) tenv) = t₂

Well-typed PROC programs

Definition. A PROC program (a-program exp) is well-typed if and only if there exists some type t such that the typing rules for PROC can be used to prove (type-of exp tenv₀) = t

where tenv₀ = [i:int,v:int,x:int] is the initial type environment that specifies the types of all variables bound in the standard initial environment.

The next step is to prove

Type Soundness

Theorem. If P is a well-typed PROC program, then P is type-safe.

That theorem is proved by induction on the number of calls to value-of that occur during the evaluation of P.

The next step is to prove that well-typedness is statically decidable.

The usual way to prove the decidability of some problem is to describe an algorithm that decides the problem. Such an algorithm is said to be a decision procedure.

It is easy to describe a decision procedure for determining whether a LET program is well-typed:

Decision Procedure for Well-typedness of LET Programs

Algorithm. Given a LET program (a-program exp), use the following algorithm to decide whether exp is well-typed with respect to the initial type environment tenv₀.

If exp is a constant expression, then exp is well-typed with respect to tenv.

If exp is a variable x, then exp is well-typed with respect to tenv if and only if x is bound in the type environment tenv.

If exp is of the form (diff-exp exp₁ exp₂), then exp is well-typed with respect to tenv if and only if both exp₁ and exp₂ are well-typed in the type environment tenv and are of type int.

If exp is of the form (zero?-exp exp₁), then exp is well-typed with respect to tenv if and only if exp₁ is well-typed in the type environment tenv and is of type int.

If exp is of the form (if-exp exp₁ exp₂ exp₃), then exp is well-typed with respect to tenv if and only if exp₁, exp₂, and exp₃ are well-typed in the type environment tenv, exp₁ is of type bool, and exp₂ and exp₃ are of the same type.

If exp is of the form (let-exp var₁ exp₁ body), then exp is well-typed with respect to tenv if and only if exp₁ is well-typed in the type environment tenv and body₁ is well-typed in the type environment [var₁:t₁]tenv, where t₁ is the type of exp₁.

That decision procedure is just the obvious algorithm that uses the typing rules for the LET language to compute the type of an expression.

If we try to extend that algorithm to proc expressions, we run into a problem:

(type-of body [var₁=t₁]tenv) = t₂
----------------------------------------------------------------------------
(type-of (proc-exp var₁ body) tenv) = (t₁ → t₂)

With the other rules, every type that occurs in the hypotheses of the rule is either a fixed type (such as int) or is the type of some subexpression. With proc expressions, however, it looks like we'd have to guess the type of the bound variable.

There are two standard ways to deal with this problem:

Make the programmer tell us the type of the bound variable.
Make the type checker infer the type of the bound variable.

Each of these approaches has its own advantages and disadvantages:

Make the programmer tell us the type of the bound variable.
- Advantage: The language is easier to implement.
- Advantage: Programs are easier to understand.
- Advantage: More sophisticated types can be expressed.
Make the type checker infer the type of the bound variable.
- Advantage: Programs are easier to write.
- Advantage: Programs are less cluttered.
- Advantage: Programs are more general, hence more reusable.

Historically, most programming languages have been designed by the same people who implement them, so most programming languages place the burden on users instead of implementors. Although that is a fairly trivial basis for making such an important design decision, it really does seem to have been the most influential factor in most programming languages.

Last updated 22 March 2008.