Assignment 1: OCaml warmup, part 2: trees
Due: Tue 01/14 at 8:59pm
git clone
Almost every assignment we work on in this course will involve transforming list- or tree-shaped data from one form to another. In this assignment, you will be working with traversing a tree and producing results, and traversing a list and producing a tree.
1 Evaluating arithmetic
Let’s define a data type for describing simple arithmetic expressions:
type arith =
| Plus of arith * arith
| Times of arith * arith
| Variable of string
| Num of int
"Math" syntax |
|
|
|
|
|
|
|
|
This is a miniature language, and we reasonably want to do typical things with it: evaluate it and get an answer, or print it back out as text. Evaluation is straightforward, except for handling variables: we need some sort of environment to look up their values.
In the starter file
expr.ml
, we have given you thearith
data type, and a type for environments. Implement the five functions in that file:(* Looks up the given variable name in the given environment *) get : env string -> int option (* Determines whether the given environment contains a binding for the given variable *) contains : env string -> bool (* Adds a new variable-->value mapping to an environment, and produces a new environment *) add : env string int -> env (* Evaluates the given expression in the given environment *) evaluate : arith env -> int (* Neatly prints the given expression as a string *) pretty : arith -> string
Enhance the provided starter file
test.ml
to thoroughly test your functions. Think carefully about how to test them, and be sure to check for sneaky edge cases: get used to thinking about that now, while the languages are still small, so that it becomes second-nature when the languages get bigger!
2 Parsing S-expressions
In this part, you will be working on obtaining a tree of data from a string input: in other words, you will be parsing the data. Generally speaking, manually parsing an arbitrary language is tedious, so we will be working with a particularly simple language instead: s-expressions.
The first step in parsing a file is tokenization, that is, a function
tokenize : string -> token list
You can think of this as breaking up the string into its component words and punctuation marks. The starter file you are given contains a definition of this function, and a definition of the token type, but those definitions are slightly more intricate than above. When you make a mistake writing a program in almost any language, the compiler will give you an error message that contains the location of the error in the source. Accordingly, we need to define
(* startline, startcol, endline, endcol *)
type pos = int * int * int * int
To define tokens, we could explicitly include position information in each token, but we’ll be more general: we’ll let tokens include any information, and position information will often be handy:
type 'a tok =
| LPAREN of 'a
| RPAREN of 'a
| TSym of string * 'a
| TInt of int * 'a
| TBool of bool * 'a
Now the tokenize
function actually has signature
tokenize : string -> pos tok list
Explain in a short paragraph how you think this function works. You should be sure to explain why it uses
List.fold_left
, what thetry/with
construction is for, what the call toregexp
is likely doing, and how the function keeps track of position information.
The starter file also gives you a type definition of s-expressions, which have also been decorated:
type 'a sexp =
| Sym of string * 'a
| Int of int * 'a
| Bool of bool * 'a
| Nest of 'a sexp list * 'a
The last function in the file produces a value of type
result
,
which is predefined as:
type ('a , 'b) result =
| Ok of 'a
| Error of 'b
Define a function
that parses a given list of tokens with position information (theparse_toks : pos tok list -> (pos sexp list, string) result
pos tok list
argument) into aresult
: either it successfully produces a list of valid s-expressions from the token list, or else it should produce an error message explaining the error...and referencing the location at which the error occurred. For example:(parse_toks (tokenize "(a b)")) ==> Ok [Nest ([Sym ("a", (0, 1, 0, 2)); Sym ("b", (0, 3, 0, 4))], (0, 0, 0, 5))] (parse_toks (tokenize "(a (b true) 3)")) ==> Ok [Nest ([Sym ("a", (0, 1, 0, 2)); Nest ([Sym ("b", (0, 4, 0, 5)); Bool (true, (0, 6, 0, 10))], (0, 3, 0, 11)); Int (3, (0, 12, 0, 13))], (0, 0, 0, 14))] (parse_toks (tokenize "(a")) ==> Error "Unmatched left paren at line 0, col 0" (parse_toks (tokenize "(a (b c")) ==> Error "Unmatched left paren at line 0, col 3"
Hint: the signature above is not general enough to be sufficient to parse
s-expressions, and you’ll need one or more helper functions. First consider
what a plausible “subtask” might be, to make progress on producing a
pos sexp list
. Next, consider the data definition for sexp
carefully, and look very carefully at the second example above: at the point
just after the token true
and before the following RPAREN, how might you
describe your intermediate state? What have you built up so far, what are you
in the process of building, and what is left over?
Hint: you can and should solve this problem by looking at only one token
at a time. Trying to “look ahead” in the list is pretty much guaranteed not
to work, so you need to use some other technique to match up left and right
parentheses. One mnemonic I was shown a long time ago: to check if an
s-expression is balanced, mentally start a running count at zero, scan across
the expression from left to right, increment the count at each left
parenthesis, and decrement it at each right parenthesis. If the count is zero
at the end, and never became negative, then you’ve got a correctly-matched
s-expression. How might you carry the idea of this across into your code?
(You won’t need to maintain an actual int
counter; you’re implicitly
maintaining a stack...)
Define a function
that is more convenient to use thanparse : string -> (pos sexp list, string) result
parse_toks
.Test your parser carefully.
3 Grading standards
For this assignment, you will be graded on
whether your code compiles,
whether your code implements the specification (functional correctness),
whether you thoroughly test every method that you write, and
how readable your code is (indented well, commented well, etc).
4 Submission
4.1 Deliverables
Your submission should include all the provided files; you should not need to create any new ones.
Please ensure that your code compiles! On this assignment, half of your grade is for correctness as determined by automated testing, so code that doesn’t compile is subject to a 50% penalty up front. Zip the provided files together:
$ zip submission.zip Makefile sexp.ml expr.ml test.ml
and submit that zip.