Assignment 2: Adder: A starter language

8.14

Assignment 2: Adder: A starter language🔗

Due: Tue 1/21 at 8:59pm

git clone

Note: The starter code for this assignment contains answers for the previous assignment. If you are taking a late day on that assignment, you must not view this code until you have completed the previous one.

Note: You will be working with partners on this assignment. Make sure that you’ve requested a team on https://handins.khoury.northeastern.edu/ well before the deadline, so that we can approve the team. Working solo is not allowed without checking with me beforehand. Also, if either member of a team is enrolled in CS6410 (graduate), you will be graded at the graduate level, rather than the undergrad level. If you are working in a team of three, you will also be graded at the graduate level.

In this assignment you’ll implement a compiler for a small language called Adder (because it primarily adds things).

Reminder: Test names cannot have spaces; this is due to the way the Makefile relies on test names being used for filenames.

1 The Adder Language🔗

In each of the next several assignments, we’ll introduce a language that we’ll implement. We’ll start small, and build up features incrementally. We’re starting with Adder, which has just a few features — defining variables, and primitive operations on numbers.

There are a few pieces that go into defining a language for us to compile.

A description of the concrete syntax — the text the programmer writes
A description of the abstract syntax — how to express what the programmer wrote in a data structure our compiler uses.
The semantics — or description of the behavior — of the abstract syntax, so our compiler knows what the code it generates should do.

1.1 Concrete Syntax🔗

The concrete syntax of Adder is:

‹expr›: | NUMBER | IDENTIFIER | ( let ( ‹bindings› ) ‹expr› ) | ( add1 ‹expr› ) | ( sub1 ‹expr› ) ‹bindings›: | ( IDENTIFIER ‹expr› ) | ( IDENTIFIER ‹expr› ) ‹bindings›

Here, a let expression can have one or more bindings. An IDENTIFIER is any non-empty sequence of letters and digits (starting with a letter). Any highlighted text is a literal token, meaning the programmer must type exactly those characters or keywords.

1.2 Abstract Syntax🔗

The abstract syntax of Adder is an OCaml datatype, and corresponds nearly one-to-one with the concrete syntax.

type prim1 =
  | Add1
  | Sub1

type 'a expr =
  | Number of int64 * 'a
  | Prim1 of prim1 * 'a expr * 'a
  | Let of (string * 'a expr) list * 'a expr * 'a
  | Id of string * 'a

1.3 Semantics🔗

An Adder program always evaluates to a single integer. Numbers evaluate to themselves (so a program just consisting of Number(5L) should evaluate to the integer 5L). Primitive expressions perform addition or subtraction by one on their argument. Let bindings should evaluate all the binding expressions to values one by one, and after each, store a mapping from the given name to the corresponding value in both (a) the rest of the bindings, and (b) the body of the let expression.1If you are familiar with Racket, this is the semantics of let*, although we are enforcing a constraint that the names be unique. Identifiers evaluate to whatever their current stored value is. There are several examples further down to make this concrete.

The compiler should signal an error if:

There is a binding list containing two or more bindings with the same name
An identifier is unbound (there is no surrounding let binding for it)

Here are some examples of Adder programs:

Concrete Syntax	Abstract Syntax	Answer
`5`	`Number(5L)`	`5`
`(sub1 (add1 (sub1 5)))`	`Prim1(Sub1, Prim1(Add1, Prim1(Sub1, Number(5L))))`	`4`
`(let ((x 5)) (add1 x))`	`Let([("x", Number(5L))], Prim1(Add1, Id("x")))`	`6`
`(let ((x 5) (y (sub1 x))) (sub1 y))`	`Let([("x", Number(5L)); ("y", Prim1(Sub1, Id("x")))], Prim1(Sub1, Id("y")))`	`3`

2 Starter code for this assignment🔗

You’ve been given a starter codebase that has several pieces of infrastructure:

A parser for s-expressions (sexp.ml), which takes concrete syntax (text files) and turns it into instances of the sexp datatype. You don’t need to edit this, but you should compare it to your prior homework, both to confirm that your homework was correct, and to understand how it works.
A main program (main.ml) that uses the parser and compiler to produce assembly code from an input Adder text file. You don’t need to edit this.
A Makefile that builds main.ml, builds a tester for Adder that you will modify (test.ml), and manipulates assembly programs created by the Adder compiler. You don’t need to edit the Makefile, but you will edit test.ml.
An OCaml program (runner.ml) that works in concert with the Makefile to allow you to compile and run an Adder program from within OCaml, which is quite useful for testing. You don’t need to edit runner.ml.

All of your edits — which will be to write the compiler for Adder, and test it — will happen in test.ml and compile.ml.

3 Implementing a Compiler for Adder🔗

3.1 Writing the Compiler🔗

The primary task of writing the Adder compiler is simple to state: take an instance of the expr datatype and turn it into a list of assembly instructions. The provided compiler skeleton is set up to do just this, broken up over a few functions.

The first is

compile : pos expr -> instruction list

which takes a expr value (abstract syntax, tagged with source location information) and turns it into a list of assembly instructions, represented by the instruction type. Use only the provided instruction types for this assignment; we will be gradually expanding this set of instructions as the semester progresses. This function has an associated helper that takes some extra arguments to track the variable environment and stack offset.

The other component you need to implement is:

to_asm_string : instruction list -> string

which renders individual instances of the instruction datatype into a string representation of the instruction (this is done for you for mov). This second step is straightforward, but forces you to understand the syntax of the assembly code you are generating. Most of the compiler concepts happen in the first step, where we are generating assembly instructions from abstract syntax. Do use this assembly guide — or ask! — if you have questions about the concrete syntax of an instruction.

3.2 Testing the Compiler🔗

The test file has two helper functions that will be useful to you:

t : string -> string -> string -> OUnit.test

The first string given to t (test) is a test name, followed by an adder program (in concrete syntax) to compile and evaluate, followed by a string for the expected output of the program (this will just be an integer in quotes). This helper compiles, links, and runs the given program, and if the compiler ends in error, it will report the error message as a string. This includes problems building at the assembler/linker level, as well as any explicit failwith statements in the compiler itself.

te : string -> string -> string -> OUnit.test

The first string given to te (test-error) is a test name, followed by a program in concrete syntax to compile and evaluate, followed by a string that is a substring of the expected error message. For example, in the starter code there is a test that fails with the substring "not yet", because the Let case fails with an exception that mentions it is "not yet implemented". You should use this helper to explicitly test for the two error cases mentioned above. Your compiler should generate such errors by calling raise2Calling failwith "some string" function in OCaml is equivalent to calling raise (Failure "some string") – that is, it fails with a very generic Failure type of exception. You can define your own exception types, allowing for more fine-grained error handling than if everything were just a single, simple Failure exception.3If you stop to think about it for a bit, this error type is rather weird: all other data types in OCaml have a finite set of constructors that are enumerated in a type definition, but these exceptions can be declared anywhere in your program, and do not have an a priori known finite set... It takes special work in OCaml’s type system to support this, and the exception type is the only type for which this is true. on some value of the appropriate exception type (which we have provided for you) and with an appropriately descriptive and distinctive message.

Both of these functions store the assembly file your compiler generated in the output/ directory, with the name of the test suffixed by .s, if it was possible to generate. So, for example, the starter tests generate a file called test1.s in the output/ directory, containing the compiled code for that case (assuming the file parses and the compiler even gets to run – right now, the starter code doesn’t quite get that far). This can be useful for debugging. If you hand-edit a generated assembly file (to tweak it for an attempted fix), you can then run:

$ make output/test1.run

to trigger the build from that assembly file. This can be helpful if you think you’re generating mostly-correct code, but just want to try a small edit to fix something up. It’s also helpful if you want to hand-write a small assembly example: you can create .s files from scratch in the output directory to experiment with, if you want to practice with assembly instructions without the compiler in the way.

For files that built completely, the generated *.run executable is also stored in output/.

To run all of your tests, run

$ make test
$ ./test

3.3 Running main🔗

The main program built with make main takes a single file as its command-line argument, and outputs the compiled assembly string on standard out. If you want to write files containing Adder code, you can write files in the input/ directory with the suffix .adder, and then run make output/yourfile.run, which will trigger the same process that happens in test.ml — the compiler will run on your adder file, and it will attempt to link it with main.c to create an executable. The intermediate files will be stored in output/yourfile.s, output/yourfile.o, and output/yourfile.run.

You can run the file by executing:

$ ./output/yourfile.run

4 List of Deliverables🔗

At a minimum, you must provide

your compile.ml
any additional modules you saw fit to write
tests in an OUnit test module (test.ml)
any test input programs (input/*.adder files)

Preferably, provide a zip containing all the sources of your code (including the Makefile, runner.ml, sexp.ml and main.ml files) so that we can simply download your code and run it. Please do not provide your .git/ or _build/ directories, or any other extraneous files! (The easiest way to ensure this is to run make clean before archiving your code directory.)

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

5 Grading Standards🔗

We have provided you with several starter functions, and they have all been fully annotated with types. If you define any additional top-level functions, please annotate them in the same way. (Local definitions do not have to be annotated, but they shouldn’t be too hard to decipher, either.) There are no formal OCaml style guidelines, though ocaml.org defines some good ones to follow.

For this assignment, you will be graded on

Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage

(We may attempt to run each of your tests against each of your classmates’ compilers. If so, part of grading your tests will be how well they trip up bugs in your classmates, while still working properly with our reference solution. We don’t have this infrastructure set up yet, so this may not happen.)

6 Submission🔗

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.

1If you are familiar with Racket, this is the semantics of let*, although we are enforcing a constraint that the names be unique.

2Calling failwith "some string" function in OCaml is equivalent to calling raise (Failure "some string") – that is, it fails with a very generic Failure type of exception. You can define your own exception types, allowing for more fine-grained error handling than if everything were just a single, simple Failure exception.3If you stop to think about it for a bit, this error type is rather weird: all other data types in OCaml have a finite set of constructors that are enumerated in a type definition, but these exceptions can be declared anywhere in your program, and do not have an a priori known finite set... It takes special work in OCaml’s type system to support this, and the exception type is the only type for which this is true.

3If you stop to think about it for a bit, this error type is rather weird: all other data types in OCaml have a finite set of constructors that are enumerated in a type definition, but these exceptions can be declared anywhere in your program, and do not have an a priori known finite set... It takes special work in OCaml’s type system to support this, and the exception type is the only type for which this is true.

1	The Adder Language
2	Starter code for this assignment
3	Implementing a Compiler for Adder
4	List of Deliverables
5	Grading Standards
6	Submission