Assignment 2: Adder: A starter language
Due: Tue 1/21 at 8:59pm
git clone
Note: The starter code for this assignment contains answers for the previous assignment. If you are taking a late day on that assignment, you must not view this code until you have completed the previous one.
Note: You will be working with partners on this assignment. Make sure that you’ve requested a team on https://handins.khoury.northeastern.edu/ well before the deadline, so that we can approve the team. Working solo is not allowed without checking with me beforehand. Also, if either member of a team is enrolled in CS6410 (graduate), you will be graded at the graduate level, rather than the undergrad level. If you are working in a team of three, you will also be graded at the graduate level.
In this assignment you’ll implement a compiler for a small language called Adder (because it primarily adds things).
Reminder: Test names cannot have spaces; this is due to the way the Makefile
relies
on test names being used for filenames.
1 The Adder Language
In each of the next several assignments, we’ll introduce a language that we’ll
implement. We’ll start small, and build up features incrementally. We’re
starting with Adder, which has just a few features —
There are a few pieces that go into defining a language for us to compile.
A description of the concrete syntax —
the text the programmer writes A description of the abstract syntax —
how to express what the programmer wrote in a data structure our compiler uses. The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.
1.1 Concrete Syntax
The concrete syntax of Adder is:
‹expr› NUMBER IDENTIFIER ( let ( ‹bindings› ) ‹expr› ) ( add1 ‹expr› ) ( sub1 ‹expr› ) ‹bindings› ( IDENTIFIER ‹expr› ) ( IDENTIFIER ‹expr› ) ‹bindings›
Here, a let
expression can have one or more bindings. An
IDENTIFIER is any non-empty sequence of
letters and digits (starting with a letter). Any highlighted text is a literal token, meaning the programmer must
type exactly those characters or keywords.
1.2 Abstract Syntax
The abstract syntax of Adder is an OCaml datatype, and corresponds nearly one-to-one with the concrete syntax.
type prim1 =
| Add1
| Sub1
type 'a expr =
| Number of int64 * 'a
| Prim1 of prim1 * 'a expr * 'a
| Let of (string * 'a expr) list * 'a expr * 'a
| Id of string * 'a
1.3 Semantics
An Adder program always evaluates to a single integer. Number
s evaluate to
themselves (so a program just consisting of Number(5L)
should evaluate to the
integer 5L
). Primitive expressions perform addition or subtraction by one on
their argument. Let bindings should evaluate all the binding expressions to
values one by one, and after each, store a mapping from the given name to the
corresponding value in both (a) the rest of the bindings, and (b) the body of
the let expression.1If you are familiar with Racket, this is the
semantics
of let*, although we are enforcing a constraint that the names be
unique. Identifiers evaluate to whatever their current stored
value is. There are several examples further down to make this concrete.
The compiler should signal an error if:
There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)
Here are some examples of Adder programs:
Concrete Syntax |
| Abstract Syntax |
| Answer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 Starter code for this assignment
You’ve been given a starter codebase that has several pieces of infrastructure:
A parser for s-expressions (
sexp.ml
), which takes concrete syntax (text files) and turns it into instances of thesexp
datatype. You don’t need to edit this, but you should compare it to your prior homework, both to confirm that your homework was correct, and to understand how it works.A main program (
main.ml
) that uses the parser and compiler to produce assembly code from an input Adder text file. You don’t need to edit this.A
Makefile
that buildsmain.ml
, builds a tester for Adder that you will modify (test.ml
), and manipulates assembly programs created by the Adder compiler. You don’t need to edit theMakefile
, but you will edittest.ml
.An OCaml program (
runner.ml
) that works in concert with theMakefile
to allow you to compile and run an Adder program from within OCaml, which is quite useful for testing. You don’t need to editrunner.ml
.
All of your edits —test.ml
and compile.ml
.
3 Implementing a Compiler for Adder
3.1 Writing the Compiler
The primary task of writing the Adder compiler is simple to state: take an
instance of the expr
datatype and turn it into a list of assembly
instructions. The provided compiler skeleton is set up to do just this,
broken up over a few functions.
The first is
compile : pos expr -> instruction list
which takes a expr
value (abstract syntax, tagged with source location
information) and turns it into a list of assembly instructions, represented by
the instruction
type. Use only the provided instruction types for this
assignment; we will be gradually expanding this set of instructions as the
semester progresses. This function has an associated helper that takes some
extra arguments to track the variable environment and stack offset.
The other component you need to implement is:
to_asm_string : instruction list -> string
which renders individual instances of the instruction datatype into a string
representation of the instruction (this is done for you for mov
).
This second step is straightforward, but forces you to understand the syntax
of the assembly code you are generating. Most of the compiler concepts happen
in the first step, where we are generating assembly instructions from abstract
syntax. Do use this assembly
guide —
3.2 Testing the Compiler
The test file has two helper functions that will be useful to you:
t : string -> string -> string -> OUnit.test
The first string given to t
(test) is a test name, followed by an adder
program (in concrete syntax) to compile and evaluate, followed by a string for
the expected output of the program (this will just be an integer in quotes).
This helper compiles, links, and runs the given program, and if the compiler
ends in error, it will report the error message as a string. This includes
problems building at the assembler/linker level, as well as any explicit
failwith
statements in the compiler itself.
te : string -> string -> string -> OUnit.test
The first string given to te
(test-error) is a test name, followed by a
program in concrete syntax to compile and evaluate, followed by a string that
is a substring of the expected error message. For example, in the
starter code there is a test that fails with the substring "not yet"
,
because the Let
case fails with an exception that mentions it is
"not yet implemented"
. You should use this helper to explicitly test for
the two error cases mentioned above. Your compiler should generate such errors
by calling raise
2Calling failwith "some string"
function in
OCaml is equivalent to calling raise (Failure "some string")
– that is,
it fails with a very generic Failure
type of exception. You can define
your own exception types, allowing for more fine-grained error handling than if
everything were just a single, simple Failure
exception.3If you stop
to think about it for a bit, this error type is rather weird: all other data
types in OCaml have a finite set of constructors that are enumerated in a
type
definition, but these exceptions can be declared anywhere in your
program, and do not have an a priori known finite set... It takes
special work in OCaml’s type system to support this, and the exception type is
the only type for which this is true. on some value of the
appropriate exception type (which we have provided for you) and with an
appropriately descriptive and distinctive message.
Both of these functions store the assembly file your compiler generated in the
output/
directory, with the name of the test suffixed by .s
, if
it was possible to generate. So, for example, the starter tests generate a
file called test1.s
in the output/
directory, containing the
compiled code for that case (assuming the file parses and the compiler even
gets to run – right now, the starter code doesn’t quite get that far). This
can be useful for debugging. If you hand-edit a generated assembly file (to
tweak it for an attempted fix), you can then run:
$ make output/test1.run
to trigger the build from that assembly file. This can be helpful if you
think you’re generating mostly-correct code, but just want to try a small edit
to fix something up. It’s also helpful if you want to hand-write a small
assembly example: you can create .s
files from scratch in the output
directory to experiment with, if you want to practice with assembly
instructions without the compiler in the way.
For files that built completely, the generated *.run
executable is also
stored in output/
.
To run all of your tests, run
$ make test
$ ./test
3.3 Running main
The main
program built with make main
takes a single file as its
command-line argument, and outputs the compiled assembly string on standard
out. If you want to write files containing Adder code, you can write files in
the input/
directory with the suffix .adder
, and then run
make output/yourfile.run
, which will trigger the same process that happens in
test.ml
—main.c
to create an executable. The intermediate files will be
stored in output/yourfile.s
, output/yourfile.o
, and output/yourfile.run
.
You can run the file by executing:
$ ./output/yourfile.run
4 List of Deliverables
At a minimum, you must provide
your
compile.ml
any additional modules you saw fit to write
tests in an OUnit test module (
test.ml
)any test input programs (
input/*.adder
files)
Preferably, provide a zip containing all the sources of your code (including
the Makefile
, runner.ml
, sexp.ml
and main.ml
files)
so that we can simply download your code and run it. Please do not
provide your .git/
or _build/
directories, or any other extraneous
files! (The easiest way to ensure this is to run make clean
before
archiving your code directory.)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
5 Grading Standards
We have provided you with several starter functions, and they have all been fully annotated with types. If you define any additional top-level functions, please annotate them in the same way. (Local definitions do not have to be annotated, but they shouldn’t be too hard to decipher, either.) There are no formal OCaml style guidelines, though ocaml.org defines some good ones to follow.
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
(We may attempt to run each of your tests against each of your classmates’ compilers. If so, part of grading your tests will be how well they trip up bugs in your classmates, while still working properly with our reference solution. We don’t have this infrastructure set up yet, so this may not happen.)
6 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.
1If you are familiar with Racket, this is the semantics of let*, although we are enforcing a constraint that the names be unique.
2Calling failwith "some string"
function in
OCaml is equivalent to calling raise (Failure "some string")
– that is,
it fails with a very generic Failure
type of exception. You can define
your own exception types, allowing for more fine-grained error handling than if
everything were just a single, simple Failure
exception.3If you stop
to think about it for a bit, this error type is rather weird: all other data
types in OCaml have a finite set of constructors that are enumerated in a
type
definition, but these exceptions can be declared anywhere in your
program, and do not have an a priori known finite set... It takes
special work in OCaml’s type system to support this, and the exception type is
the only type for which this is true.
3If you stop
to think about it for a bit, this error type is rather weird: all other data
types in OCaml have a finite set of constructors that are enumerated in a
type
definition, but these exceptions can be declared anywhere in your
program, and do not have an a priori known finite set... It takes
special work in OCaml’s type system to support this, and the exception type is
the only type for which this is true.