Our specifications and implementations of value-of
specify and implement the dynamic semantics of
expressions.
Expressions can also have static semantics, which concern the properties of expressions that can be deduced without executing the expressions.
Type safety is an important property of expressions. Whether type safety is static or dynamic depends upon the programming language. If type safety is a static property, then we say the language is strongly typed.
Some programming languages allow the type of an expression to be calculated without executing the expression. We say these languages are statically typed.
A language can be statically typed without being strongly typed. C and C++, for example, are statically typed but not strongly typed, because type safety is not a static property of those languages. The problem is that the C/C++ type system is unsound. Although the type of a C/C++ expression can be calculated statically, that type is not always a reliable prediction of the expression's value at run time.
For example:
#include <stdio.h> #include <stdlib.h> double f (double * p) { *p = 3.14159; return *p; } int main (int argc, char* argv[]) { int n = 12345; double * p = (void *) &n; double x = f(p); double y = f(p+1); double z = f(p+2); printf ("n = %d\n", n); printf ("x = %lf\n", x); }
What output is printed by that program?
On one machine, the output was:
n = 1074340345 x = 3.141590
On another machine, the program terminated with a segmentation fault. The problem is that the program is not type-safe.
(diff-exp exp1
exp2)
,
the values of exp1
and exp2
are both numbers.
(zero?-exp exp1)
,
the value of exp1
is a number.
(if-exp exp1
exp2
exp3)
,
the value of exp1
is a boolean.
(call-exp exp1
exp2)
,
the value of exp1
is a procedure.
If one of those conditions is violated, we call it a type error. (Hence not all errors are type errors.) We say that a LET, PROC, or LETREC program is type-safe if and only if its execution cannot possibly involve a type error.
Not all LET, PROC, and LETREC programs are type-safe. That leads to the following question:
Is type safety a static property of LET, PROC, or LETREC?
That's the same as asking whether LET, PROC, and LETREC are strongly typed.
It so happens that LET is strongly typed.
That is not terribly interesting, however,
because LET is not a very expressive language.
For example, there is no LET expression
exp
such that, for all integer values n
,
let x = n in exp
evaluates to the absolute value of n
.
For another example, it is not possible to write an infinite loop
in the LET language.
The interesting question is whether PROC is strongly typed.
If PROC were strongly typed, then type safety would be a static property of PROC programs. In other words, there would be some algorithm that takes an arbitrary PROC program as input and decides whether the program is type-safe. In particular, that algorithm would be able to decide whether an arbitrary program of the form
if <expression> then (0 0) else (0 0)
is type-safe.
It should be obvious that programs of that form are type-safe
if and only if the <expression>
does not halt.
If PROC were strongly typed, therefore, then there would
be some algorithm that takes an arbitrary expression
as input and decides whether the expression halts.
Theorem. For all Turing-complete programming languages, the halting problem is undecidable.
PROC (unlike LET) is Turing-complete. Because the halting problem is undecidable, no algorithm is able to decide whether an arbitrary PROC program is type-safe. In other words, PROC is not strongly typed. Since PROC is a proper subset of the LETREC language, LETREC is not strongly typed either.
The undecidability of the halting problem tells us that no general purpose programming language can be strongly typed, assuming type safety and strong typing are defined as above.
That's not the answer we want.
We can't have the answer we want.
We can, however, change the definition of type safety and/or strong typing so we can pretend to have the answer we want. The standard way to do that is:
That last step means we redefine strongly typed to mean
We'll start by defining a static type system for PROC:
(type-of (const-exp num) tenv)
=int
(type-of (var-exp var) tenv)
=tenv
(var
)
(type-of exp1 tenv)
=int
(type-of exp2 tenv)
=int
--------------------------------------------------------------------
(type-of (diff-exp exp1 exp2) tenv)
=int
(type-of exp1 tenv)
=int
----------------------------------------------------------------
(type-of (zero?-exp exp1) tenv)
=bool
(type-of exp1 tenv)
=bool
(type-of exp2 tenv)
=t
(type-of exp3 tenv)
=t
--------------------------------------------------------------------
(type-of (if-exp exp1 exp2 exp3) tenv)
=t
(type-of exp1 tenv)
=t1
(type-of body [var1:t1]tenv)
=t
------------------------------------------------------------------------
(type-of (let-exp var1 exp1 body) tenv)
=t
(type-of body [var1:t1]tenv)
=t2
----------------------------------------------------------------------------
(type-of (proc-exp var1 body) tenv)
=(t1 → t2)
(type-of exp1 tenv)
=(t1 → t2)
(type-of exp2 tenv)
=t1
--------------------------------------------------------------------
(type-of (call-exp exp1 exp2) tenv)
=t2
The next step is to define
Definition. A PROC program(a-program exp)
is well-typed if and only if there exists some typet
such that the typing rules for PROC can be used to prove(type-of exp tenv0) = t
where tenv0 = [i:int,v:int,x:int]
is the initial type environment that specifies the types of
all variables bound in the standard initial environment.
The next step is to prove
Theorem. IfP
is a well-typed PROC program, thenP
is type-safe.
That theorem is proved by induction on the number of calls to
value-of
that occur during the evaluation of
P
.
The next step is to prove that well-typedness is statically decidable.
The usual way to prove the decidability of some problem is to describe an algorithm that decides the problem. Such an algorithm is said to be a decision procedure.
It is easy to describe a decision procedure for determining whether a LET program is well-typed:
Algorithm. Given a LET program(a-program exp)
, use the following algorithm to decide whetherexp
is well-typed with respect to the initial type environmenttenv0
.
Ifexp
is a constant expression, thenexp
is well-typed with respect totenv
.
Ifexp
is a variablex
, thenexp
is well-typed with respect totenv
if and only ifx
is bound in the type environmenttenv
.
Ifexp
is of the form(diff-exp exp1 exp2)
, thenexp
is well-typed with respect totenv
if and only if bothexp1
andexp2
are well-typed in the type environmenttenv
and are of typeint
.
Ifexp
is of the form(zero?-exp exp1)
, thenexp
is well-typed with respect totenv
if and only ifexp1
is well-typed in the type environmenttenv
and is of typeint
.
Ifexp
is of the form(if-exp exp1 exp2 exp3)
, thenexp
is well-typed with respect totenv
if and only ifexp1
,exp2
, andexp3
are well-typed in the type environmenttenv
,exp1
is of typebool
, andexp2
andexp3
are of the same type.
Ifexp
is of the form(let-exp var1 exp1 body)
, thenexp
is well-typed with respect totenv
if and only ifexp1
is well-typed in the type environmenttenv
andbody1
is well-typed in the type environment[var1:t1]tenv
, wheret1
is the type ofexp1
.
That decision procedure is just the obvious algorithm that uses the typing rules for the LET language to compute the type of an expression.
If we try to extend that algorithm to proc
expressions, we run into a problem:
(type-of body [var1:t1]tenv)
=t2
----------------------------------------------------------------------------
(type-of (proc-exp var1 body) tenv)
=(t1 → t2)
With the other rules, every type that occurs in the
hypotheses of the rule is either a fixed type
(such as int
) or is the type of some
subexpression.
With proc
expressions, however, it looks
like we'd have to guess the type of the bound variable.
There are two standard ways to deal with this problem:
Each of these approaches has its own advantages and disadvantages:
Historically, most programming languages have been designed by the same people who implement them, so most programming languages place the burden on users instead of implementors. Although that is a fairly trivial basis for making such an important design decision, it really does seem to have been the most influential factor in most programming languages.
If we make the programmer tell us the types of bound variables, then we'll have to change the syntax of PROC and LETREC to accomodate those types. Section 7.3 of our textbook shows one possible syntax, and also implements a type-checker along the lines of the type checker for LET.
Section 7.4 of our textbook changes the syntax yet again. This is unnecessary for PROC and LETREC, but it facilitates certain extensions of PROC and LETREC that are explored in the exercises.
We will show how to infer types for the original syntax of LET, PROC, and LETREC.
We have already described an algorithm that decides whether a LET program is well-typed; that algorithm works by inferring a type for every subexpression that appears within the program.
To extend that algorithm to the PROC and LETREC languages, we must find some alternative to guessing the type of each bound variable.
When designing algorithms, there are two standard techniques for avoiding such guesswork:
The first of those techniques generally leads to an exponential algorithm, which is impractical. For type inference (and many other static analyses), the second technique is standard.
Algorithm. Given a LETREC program, translate the program into an equivalent program(a-program exp)
by renaming all bound variables so no variable is bound in more than one place. Invent a fresh type variabletx
for every variablex
that is bound withinexp
. Invent a fresh type variablete
for every subexpressione
that occurs withinexp
. Use the following algorithm to generate the type constraints that thetx
andte
must satisfy. Then solve those type constraints for the values of thetx
andte
.
If the type constraints have a solution, then the program is well-typed. If the type constraints are not satisfiable, then the program is not well-typed.
Algorithm. Given a LETREC expressionexp
, returns the set of type constraints generated fromexp
.
Ifexp
is(const-exp n)
, then the type constraints aretexp = int
Ifexp
is(var-exp x)
, then the type constraints aretexp = tx
Ifexp
is(diff-exp e1 e2)
, then the type constraints arete1 = int te2 = int texp = int
Ifexp
is(zero?-exp e1)
, then the type constraints arete1 = int texp = bool
Ifexp
is(if-exp e1 e2 e3)
, then the type constraints arete1 = bool te2 = texp te3 = texp
Ifexp
is(let-exp x1 e1 body)
, then the type constraints aretx1 = te1 tbody = texp
Ifexp
is(proc-exp x1 body)
, then the type constraints aretexp = (tx1 → tbody)
Ifexp
is(call-exp e1 e2)
, then the type constraints arete1 = (te2 → texp)
Ifexp
is(letrec-exp x0 x1 e1 body)
, then the type constraints aretx0 = (tx1 → te1) texp = tbody
We will need an algorithm for solving sets of type constraints, but that algorithm will turn out to be little more than a systematic algorithm for substituting equals for equals. Before we consider that systematic algorithm, let's infer the types for this example:
let a = proc (i) proc (j) -(i, -(0,j)) in letrec m (x) = proc (y) if zero?(x) then 0 else ((a ((m -(x,1)) y)) y) in ((m 11) 12)
The type variables that are associated with the bound variables are
ta a tm m ti i tj j tx x ty y
The type variables that are associated with the subexpressions are
t1 11 t2 12 t3 m t4 (m 11) t5 ((m 11) 12) t6 x t7 zero?(x) t8 0 t9 x t10 1 t11 -(x,1) t12 m t13 (m -(x,1)) t14 y t15 ((m -(x,1)) y) t16 a t17 (a ((m -(x,1)) y)) t18 y t19 ((a ((m -(x,1)) y)) y) t20 if zero?(x) then 0 else ((a ((m -(x,1)) y)) y) t21 proc (y) if zero?(x) then ... else ... t22 letrec m (x) = proc (y) ... in ((m 11) 12) t23 0 t24 j t25 -(0,j) t26 i t27 -(i, -(0,j)) t28 proc (j) -(i, -(0,j)) t29 proc (i) proc (j) -(i, -(0,j)) t30 let a = ... in letrec m (x) = ... in ...
The type constraints that are generated by those subexpressions are
t1 = int t2 = int t3 = tm t3 = (t1 → t4) t4 = (t2 → t5) t6 = tx t6 = int t7 = bool t8 = int t9 = tx t10 = int t9 = int t10 = int t11 = int t12 = tm t12 = (t11 → t13) t14 = ty t13 = (t14 → t15) t16 = ta t16 = (t15 → t17) t18 = y18 t17 = (t18 → t19) t7 = bool t8 = t20 t19 = t20 t21 = (ty → t20) tm = (tx → t21) t22 = t5 t23 = int t24 = tj t23 = int t24 = int t25 = int t26 = ti t25 = int t26 = int t27 = int t28 = (tj → t27) t29 = (ti → t28) ta = t29 t30 = t22
Any solution to those type constraints will prove that the program is well-typed. To obtain a decision procedure, however, we will need a systematic solver that always finds a solution if a solution exists, and always terminates with a failure notice if no solution exists.
Last updated 24 March 2008.