The first programming languages we will study are expression languages. We will use SLLgen grammars to specify the syntax of these languages and the representations of their abstract syntax trees. We will then specify the semantics of these languages by writing interpreters for the abstract syntax trees. These interpreters take an environment as their second argument, which records the value of any variables that may appear free within the expression.
(value-of
exp
ρ)
= val
means the value of expression exp in environment ρ should be val.
The source language is the language we are defining, specifying, or implementing. The implementation language (usually Scheme with EoPL extensions) is the language in which we write our interpreters.
The front end of an interpreter or compiler translates the source language into abstract syntax trees. A compiler translates abstract syntax trees into some target language, such as Intel x86-32 machine code or JVM byte code. The abstract syntax trees or target language can then be executed by some interpreter. For example, an Intel Core 2 Duo contains an extremely efficient interpreter for Intel x86-32 machine code:
> (define add1 (lambda (n) (+ n 1))) > add1 #<PROCEDURE add1> > (nasm-disassemble add1) 00000000 83FB04 cmp ebx,byte +0x4 00000003 7411 jz 0x16 00000005 C7452C04000000 mov dword [ebp+0x2c],0x4 0000000C FF9500020000 call near [ebp+0x200] 00000012 90 nop 00000013 90 nop 00000014 EBEA jmp short 0x0 00000016 F6C103 test cl,0x3 00000019 750A jnz 0x25 0000001B 89CB mov ebx,ecx 0000001D 83C304 add ebx,byte +0x4 00000020 710E jno 0x30 00000022 83EB04 sub ebx,byte +0x4 00000025 89CB mov ebx,ecx 00000027 B804000000 mov eax,0x4 0000002C FF551C call near [ebp+0x1c] 0000002F 90 nop 00000030 C3 ret 0 > (add1 (expt 10 70)) 10000000000000000000000000000000000000000000000000000000000000000000001
Our interpreters will not be as efficient as the Intel Core 2 Duo, but they will be much simpler, much easier to build, and much easier to understand.
Scanning divides the plain text of a source program into meaningful substrings called tokens. The tokens are described by a lexical specification.
Parsing translates the sequence of tokens into an abstract syntax tree. The syntactically legal sequences of tokens are described by the source language's grammar.
A parser generator is a program whose inputs include a lexical specification, a grammar, and a description of the abstract syntax trees to be constructed for each production of the grammar. The main outputs of the parser generator are a scanner and parser.
We will use the SLLgen parser generator for most of this
course. For MP3, however, the mp3-data-structures.scm
will contain a hand-written scanner and a complete parser
that was generated by a different parser generator.
This is just to show you what a scanner and parser look like.
In future assignments, where the scanners and parsers will be
more complicated, you will see the lexical specifications and
the grammars but will not see the scanners and parsers built
from them.
The main thing to remember is that scan&parse
takes a string containing the plain text representation of a
program, and returns the abstract syntax tree for that program.
Program | ::= |
Expression | a-program (exp1) |
Expression | ::= |
Number | const-exp (num) |
::= |
-( Expression
, Expression) |
diff-exp (exp1 exp2) |
|
::= |
zero? ( Expression) |
zero?-exp (exp1) |
|
::= |
if Expression
then Expression
else Expression |
if-exp (exp1 exp2 exp3) |
|
::= |
Identifier | var-exp (var) |
|
::= |
let Identifier
= Expression
in Expression |
let-exp (var exp1 body) |
For example,
(scan&parse "let x = 4 in -(x,-(1,x))")
evaluates to the abstract syntax tree that is the result of
(a-program (let-exp 'x (const-exp 4) (diff-exp (var-exp 'x) (diff-exp (const-exp 1) (var-exp 'x)))))
For any programming language, the expressed values are the possible values of an expression, and the denoted values are the values to which a variable can be bound in some environment.
For LET, the expressed and denoted values happen to be the same:
ExpVal = Int + Bool
DenVal = Int + Bool
The expressed and denoted values will be abstract data types with this algebraic specification:
num-val
: Int → ExpVal
bool-val
: Bool → ExpVal
expval->num
: ExpVal → Int
expval->bool
: ExpVal → Bool
(expval->num (num-val
n))
= n
(expval->bool (bool-val
b))
= b
We use the following abbreviations:
ρ ranges over environments
[] denotes the empty environment
[var = val]ρ denotes(extend-env
varval
ρ
)
[var = val] denotes [var = val][]
const-exp
: Int → Exp
zero?-exp
: Exp → Exp
if-exp
: Exp × Exp × Exp → Exp
diff-exp
: Exp × Exp → Exp
var-exp
: Symbol → Exp
let-exp
: Symbol × Exp × Exp → Exp
value-of
: Exp × Env → ExpVal
(value-of (const-exp
n)
ρ)
=(num-val
n)
(value-of (var-exp
var)
ρ)
=(apply-env
ρvar
)
(value-of (diff-exp
exp1exp2
)
ρ)
=(- (expval->num (value-of
exp1ρ
)) (expval->num (value-of
exp2ρ
)))
For LET, specifying the behavior of programs amounts to specifying the initial environment. For most programming languages, the initial environment consists of a standard set of predefined libraries that every implementation of the language is supposed to provide. For LET, we'll mimic that by providing three predefined identifiers.
(value-of-program
exp)
=(value-of
expρ0
)
where
ρ0 = [i=1,v=5,x=10
]
(value-of
exp1ρ
)
= val1
(expval->num
val1)
= 0
------------------------------------
(value-of
exp1ρ
)
=(bool-val #t)
(value-of
exp1ρ
)
= val1
(expval->num
val1)
= n
n ≠ 0
------------------------------------
(value-of
exp1ρ
)
=(bool-val #f)
(value-of
exp1ρ
)
= val1
(expval->bool
val1)
=#t
----------------------------------------------------
(value-of (if-exp
exp1exp2
exp3
)
ρ)
=(value-of
exp2ρ
)
(value-of
exp1ρ
)
= val1
(expval->bool
val1)
=#f
----------------------------------------------------
(value-of (if-exp
exp1exp2
exp3
)
ρ)
=(value-of
exp3ρ
)
let
(value-of
exp1ρ
)
= val1
------------------------------------
(value-of (let-exp
varexp1
body
)
ρ)
=(value-of
body[var=val1]ρ
)
let
Last updated 28 January 2008.