Assignment 0: OCaml warmup, part 1: basics
Due: Sun 01/12 at 8:59pm
git clone
We will use three programming languages in this course: OCaml, C (not C++), and x64 assembly. Depending on your experience so far, you should all have some background in C and possibly x64, and the C we use won’t be surprising. You may not have seen OCaml before, and warrants an introduction.
This writeup serves as both a reference for some of the OCaml topics we’ll need in the course, and your first assignment. You should do all the numbered Exercises throughout the document for your first assignment.
1 Setup
1.1 Using login
.khoury
​.northestern
.edu
Add the following lines to your .bash_profile
file:
# OPAM configuration
export OPAMROOT=/net/proj/ocaml/.opam
export PATH=/net/proj/ocaml/opam:$PATH
. /net/proj/ocaml/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true
Log out and log back in. You should now have a working ocaml
, with the
libraries above already installed for you. Confirm this by running
ocaml --version
to ensure that it’s on your path and runnable. By default this
should give you version 4.14.2; you can also use opam
to switch to
version 5.2.1.
1.2 On your own computer
OPAM is a package manager for OCaml.
If you know of pip
or easy_install
for Python, it’s the same idea.
OCaml is installed on the department machines, but you need to do a small bit
of setup to use some libraries we’ll need for the course. Assuming you’re
using Bash as your shell, open up the file
.profile
in your home directory (with e.g. emacs ~/.profile
), and add
this line at the bottom, if it’s not already there (replacing my username with
your own):
test -r /home/blerner/.opam/opam-init/init.sh && . /home/blerner/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true
Then run:
$ source ~/.bashrc
(In this and all subsequent assignments, the $
indicates the command
prompt; you do not type that character!) This makes it so each time you open a
terminal, the build commands you run for OCaml will be able to use some
libraries we need for the course.
Be sure to install a recent version of OPAM – the latest is currently v2.3.0.
Many distributions have not updated their packages to the latest version, so
you’ll need to install a binary in order to get a newer version.
(WARNING! If you are using the Fish shell with older
OPAM versions (2.0.5 or lower), setting up OPAM will possibly clobber your
MANPATH
environment variable, such that trying to run man <anything>
will fail. If you’re stuck with this older version:
Run opam config var prefix
to find out the path
where OPAM has installed your current setup. On my machine, that gives
/home/blerner/.opam/4.14.2
. Within that directory, edit the file
.opam-switch/environment
, and comment out the line that sets
MANPATH
with a #
. You will need to redo this step every time
you run an opam install
or opam update
command.)
Once you’ve installed OPAM, install the needed OCaml packages:
$ opam init
# this takes a while; you could also install 5.2.1
$ opam switch create 4.14.2
$ opam install extlib ounit batteries
# wait a while
# follow the directions for setting up your environment for opam, then
# the Makefile should work for you
1.3 OSX Instructions
If you want to work on your OSX laptop, you can install OCaml and OPAM via Homebrew first, then install the necessary packages manually:
$ brew install ocaml opam
Then follow the Setup instructions above.
1.4 Windows Instructions
Sorry, it’s not feasible for me to support Windows, BSD, or any other more exotic platforms. You’re free to try building things on those platforms, but make sure your assignments run on the department machines before submitting. The assignments should work on either OSX or on the department machines; I’ll do my best to support both and note any exceptions as time goes on. When in doubt, use a department machine.
So far, I’ve had some success using Windows Subsystem for Linux, specifically the newer version 2. Within WSL, install Ubuntu, and then continue with the generic Linux instructions above.
2 Programming in OCaml — The Basics
This section covers some basics first, and then gives a structured introduction to how we’ll program in OCaml in this course.
2.1 The Simplest Program
OCaml files are written with a .ml
extension. An OCaml file is somewhat
similar to a Python file: when run, it evaluates the file directly (unlike,
for example, C++, which designates a special main
function).
An OCaml file consists of
(Optionally) a series of
open
statements for including other modulesA series of declarations for defining datatypes, functions, and constants
A series of (though often just one) toplevel expressions to evaluate.
For example:
open Printf
let message = "Hello world";;
(printf "%s\n" message)
The first line includes the built-in library for printing, which provides
functions similar to fprintf
and printf
from stdlib
in C. The
next two lines define a constant named message
, and then call the
printf
function with a format string (where %s
means “format as
string”), and the constant message
we defined on the line before.
Put this in a file called hello.ml
and run it with:
$ ocaml hello.ml
Hello world
Most of our programs will be more complex than this, and much of the infrastructure for the “main” function will be provided, but it’s useful to see the simplest case first.
2.2 Defining and Calling Functions
One thing that we’ll do over and over again is define functions. Here’s an example of a function definition in OCaml:
let max (n : int) (m : int) : int =
if n > m then n else m;;
It uses the keyword let
followed by the name of the function (here,
max
). Next comes the parameter list. Each parameter is enclosed in
parentheses, and has the parameter’s name, then a colon, then its type. So
n
and m
are the parameters of max
, and they are both type
int
. The final colon followed by int
describes the return type of
the function. Then there is an =
sign, followed by the function body.
This declaration is similar to the following in C/C++/Java:
int max(int n, int m) {
if(n > m) { return n; }
else { return m; }
}
One notable difference is that the OCaml function does not have any
return
statements. We’ll talk about how to think about the “return”
value of a function without return
statements next. It’s also important
to note that the declaration in OCaml ends in ;;
. This is a required
syntactic convention for all top-level declarations in OCaml.
We’ll get to a more robust notion of testing in a little bit.
We can check that max
works by defining a useful top-level call with
print statements:
open Printf
let max (n : int) (m : int) : int =
if n > m then n else m;;
(printf "Should be 5: %d\n" (max 5 4));
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
You can copy this program into a file called max.ml
and run it with
ocaml max.ml
.
There are a few things to explain here. First, the syntax for function calls in OCaml is different than you may be used to. Instead of writing
some_function(arg1, arg2, ...)
// for example
max(4, 5)
The surrounding parentheses can be omitted in some cases, but it’s always safe to include them to be clear. as we would in C++ or Python, in OCaml we write
(some_function arg1 arg2 ...)
(* for example *)
(max 4 5)
This is just a useful model; everything you know about stacks and memory
diagrams is still true (and in fact, we’ll talk about stacks in quite a bit of
detail this semester). But substitution is a very helpful model for reasoning
in the style of programming we’ll do.
That’s just a syntactic difference. There’s also a useful distinction in how
I prefer to think about what happens when we call a function in OCaml. Rather
than thinking about a call to max
creating a new stack frame, let’s think
about what happens if we substitute the provided argument values for
the parameters in the body of max
, and continue by evaluating the
function body.
So, for example, the call to max
below takes a step to the
substituted form:
(max 4 5)
=> if 4 > 5 then 4 else 5
Then we can think about how the if
expression takes steps. First, it
evaluates the conditional part, and based on that value being true
or
false
, it evaluates one or the other branch:
if 4 > 5 then 4 else 5
=> if false then 4 else 5
=> 5
From this sequence of steps, we say that (max 4 5)
evaluates to
5
. This gives us a way to think about evaluation that doesn’t require a
notion of a return
statement.
With this idea of substitution in mind, we can think about how the sequence of
printf
expressions we wrote will evaluate:
(printf "Should be 5: %d\n" (max 5 4));
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 5: %d\n" (if 5 > 4 then 5 else 4));
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 5: %d\n" (if true then 5 else 4));
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 5: %d\n" 5);
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
The rule for semicolon-separated sequences is that they are evaluated in order, and the value resulting from each expression is ignored once it is done.
=> <internal-to-OCaml-printing of "Should be 5: 5\n">;
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 4: %d\n" (if 3 > 4 then 3 else 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 4: %d\n" (if false then 3 else 4));
(printf "Should be 4: %d\n" (max 4 4));
=> (printf "Should be 4: %d\n" 4);
(printf "Should be 4: %d\n" (max 4 4));
=> <internal-to-OCaml-printing of "Should be 4: 4\n">;
(printf "Should be 4: %d\n" (max 4 4));
... and so on
2.3 Recursive Functions
This is because, as we’ll see, there are lots of trees to process in a
compiler, and trees have a fundamentally recursive structure.
A lot of the code we write this semester will be recursive
functions. OCaml distinguishes between functions that can contain recursive
calls and functions that cannot. We saw the latter kind above in max
which simply used the let
keyword. We can define a recursive function by
using let rec
:
let rec factorial (n : int) : int =
if n <= 1 then 1
else n * (factorial (n - 1));;
The substitution-based rules are a little more interesting when thinking about
evaluating a call to factorial
:
(factorial 3)
=> (if 3 <= 1 then 1 else 3 * (factorial (3 - 1)))
=> (if false then 1 else 3 * (factorial (3 - 1)))
=> (3 * (factorial (3 - 1)))
=> (3 * (factorial 2))
=> (3 * (if 2 <= 1 then 1 else 2 * (factorial (2 - 1)))
...
=> (3 * (2 * (factorial (2 - 1))))
...
=> (3 * (2 * 1))
=> 6
Here, we can see the chained multiplications “stack up” during the recursive calls. Writing this in a substitution-based style makes it easy to track where the return values of function calls end up.
2.4 Testing with OUnit
Testing by printing values becomes pretty onerous when we want to write more than a few examples. In this course, we’ll use a library called OUnit to write tests.
With OUnit, we will write declarations in one file, and test them in another.
The code provided in your checkout has two files: functions.ml
, which
you’ll fill in with some implementations in the rest of the exercises, and
test.ml
, which will contain tests. This will become a common layout for
how we write our programs in this course.
The weird-looking >::
operator is described
here
in terms of more basic concepts, if you’re interested. It’s just a shorthand
for constructing a test value with a name.
A test in OUnit is a name paired with a function of one argument. The
function uses one of several different test predicates to check a
computation; the one we’ll use most commonly is assert_equal
. The syntax
>::
is used to combine the name and the function together into a test.
Here’s an example:
open OUnit2
let check_fun _ = (* a function of one argument *)
assert_equal (2 + 2) 4;;
let my_first_test = "my_first_test">::check_fun;;
Most of this is just boilerplate that you won’t have to think much, if
at all, about. But it’s useful to explain it once. Now my_first_test
is
a named test value. Please be careful not to use spaces in test names,
because running the test suite will produce one log file per test, and those files
will have spaces in their names, which may cause some headaches.
Note that we used an underscore when defining the
parameter of check_fun
; we can use an underscore to indicate to OCaml
that we don’t care about the argument (there needs to be a parameter because
of how the testing library works, even though we won’t use the parameter). We
can run our test by creating a suite out of a list of tests,Again, please
avoid spaces in the suite name. and running the suite:
let suite = "suite">:::[my_first_test];;
run_test_tt_main suite
The command used is not ocaml
in this case,
but a wrapper around ocaml
called ocamlfind
that knows how to search
your system for packages installed with e.g. OPAM. Candidly, OCaml’s build
system can be a little onerous, so I’m not teaching it explicitly. If you
want to use OCaml for a large project outside this course, I recommend
learning about the corebuild
tool that comes with Real World OCaml.
To build and run the given skeleton, use the provided Makefile that does the
work of building for you. In this case, you just need to run
$ make test
$ ./test
in order to run the tests.
We can also add tests that fail to see what happens:
let check_fun2 _ = (* a failing test *)
assert_equal (2 + 2) 5;;
let my_second_test = "my_second_test">::check_fun2;;
If we add this test to the suite and run, we get a failure:
$ ./test
.F
==============================================================================
Error: suite:1:my_second_test.
File "/Users/joe/.../oUnit-suite-prob#02.log", line 2, characters 1-1:
Error: suite:1:my_second_test (in the log).
Raised at file "src/oUnitAssert.ml", line 45, characters 8-27
Called from file "src/oUnitRunner.ml", line 46, characters 13-26
not equal
------------------------------------------------------------------------------
Ran: 2 tests in: 0.14 seconds.
FAILED: Cases: 2 Tried: 2 Errors: 0 Failures: 1 Skip: 0 Todo: 0 Timeouts: 0.
This output identifies the failing test by name (my_second_test), though it
doesn’t tell us much more than that. Another annoying thing about the way we
wrote those tests is that defining a new function for every test causes
significant extra typing. To get a little more information out, we can pass
assert_equal
an optional argument that specifies how to turn the values
under test into a string for printing. We can bundle that up inside a
function that creates the test with its name. So, for example, we can define
a function that creates tests comparing integers to integers:
let t_int name value expected = name>::
(fun _ -> assert_equal expected value ~printer:string_of_int);;
let my_third_test = t_int "my_third_test" (2 + 2) 7;;
let my_fourth_test = t_int "my_fourth_test" (2 + 2) 4;;
If we add these two tests to the suite, we see a much more useful failure
report that says expected: 7 but got: 4
. I’ll often provide useful
helper functions for testing with examples, but you may also decide to write
your own for different kinds of tests as the semester goes on.
(Note: the funny ~printer:
syntax is a
labeled
argument, and is a mechanism for passing arguments by name rather than
just by position. Functions must be defined to expect labeled
arguments; you can’t simply assume that an arbitrary argument happens to have a
name. Additionally, labeled arguments might be optional, and there
might be a default value supplied for them. Labeled, optional arguments are
prohibited from being the final argument for a function (why do you
suppose that must be?). If you want to supply a different value-to-string
converter for assert_equal
, you’d need to write
~printer:your_function_here
.)
2.5 Exercises
Implement
fibonacci
as an OCaml function that takes an integern
and returns the \(n^{th}\) Fibonacci number. Write out the evaluation of(fibonacci 3)
in substitution style.Write tests for
max
andfibonacci
usingt_int
.
3 Programming in OCaml — Datatypes
Programming with only integers, we wouldn’t make much progress on building a compiler. The next thing we need to do is understand how to create new datatypes in OCaml, and program with them.
3.1 Binary Trees with type
Let’s start with a datatype we all ought to know well: binary trees. We
know we’ll need to represent a binary tree node somehow, which has a value and
two children. For now, let’s say the value has to be a string. In OCaml, we
can define such a node using the keyword type
:
type btnode =
| Leaf
| Node of string * btnode * btnode
Translated into English, this reads:
A binary tree node is either a leaf of the tree, which has no fields, or a node, which has three fields: a string, a binary tree node, and another binary tree node.
This defines what we call constructors for Leaf
and Node
,
which we can use to construct trees. Here are a few examples of trees and
their corresponding btnode
value:
"a" Node("a", |
/ \ Node("b", Leaf, Leaf), Node("c", Leaf, Leaf)) |
"b" "c" |
"a" Node("a", |
/ Node("b", Leaf, Leaf), Leaf) |
"b" |
"a" Node("a", |
/ Node("b", |
"b" Leaf, Node("c", Leaf, Leaf)), Leaf) |
\ |
"c" |
In C++ implementation of binary trees, we would commonly use NULL
to
represent a Leaf
; in Fundies 2 or OOD, we would likely use a dedicated
Leaf
class instead. Unlike NULL
in C++, Leaf
can only be a
btnode
and can never be confused for a value of some other type. Each
position with no child corresponds to a Leaf
, and the others correspond to
uses of Node
. We call Leaf
and Node
variants of the
btnode
type.
3.2 Manipulating Data with match
The next question is how to work with these values. For example, how can we
construct an in-order concatenation of the strings in a btnode
as we’ve
defined it? That is, how do we fill in this function:
let rec inorder_str (btn : btnode) : string =
...
The next feature we need to introduce is match
, which allows us to
examine which variant of a type a particular value has, and extract the values
of its fields. Here are some examples:
let m1 = match Leaf with
| Leaf -> true
| Node(s, left, right) -> false;;
(* m1 is true *)
let m2 = match Leaf with
| Leaf -> 44
| Node(s, left, right) -> 99;;
(* m2 is 44 *)
let m3 = match Node("a", Leaf, Leaf) with
| Leaf -> "z"
| Node(s, left, right) -> s;;
(* m3 is "a" *)
let m4 = match Node("a", Node("b", Leaf, Leaf), Leaf) with
| Leaf -> "z"
| Node(s, left, right) ->
match left with
| Leaf -> "y"
| Node(s2, left2, right2) -> s2;;
(* m4 is "b" *)
From these examples, we can see how match
must work. It inspects the
value after the match
keyword, and selects the branch that corresponds to
the variant of that value. Then it extracts the fields from the value, and
substitutes them for the names given in the branch. Let’s use the m4
example to make that concrete:
match Node("a", Node("b", Leaf, Leaf), Leaf) with
| Leaf -> "z"
| Node(s, left, right) ->
match left with
| Leaf -> "y"
| Node(s2, left2, right2) -> s2
(* substitute Node("b", Leaf, Leaf) for left *)
=> match Node("b", Leaf, Leaf) with
| Leaf -> "y"
| Node(s2, left2, right2) -> s2
(* substitute "b" for s2 *)
=> "b"
With match
available, we can now fill in the body for inorder_str
.
We can start by writing out a skeleton of the match structure for a
btnode
, as most functions over btnode
s will need to match
on
the node to decide what to do.
let rec inorder_str (bt : btnode) : string =
match bt with
| Leaf -> ...
| Node(s, left, right) -> ...
Now we can ask what the inorder traversal should yield in the case of a leaf
of the tree (or an empty tree altogether). In this case, that ought to be an
empty string. So the Leaf
case should be filled in with ""
. How
about for Node
s? We know an inorder traversal should have the elements
to the left in order, then the current node, then the elements to the right.
We can get the elements to either side via a recursive call, and then we just
need one more piece of information, which is that ^
is the operator for
concatenating strings in OCaml:
let rec inorder_str (bt : btnode) : string =
match bt with
| Leaf -> ""
| Node(s, left, right) ->
(inorder_str left) ^ s ^ (inorder_str right)
3.3 Exercises
This is a trick question.Write a test function
t_string
that’s liket_int
, but tests for equality of strings. Can you write a function that produces a string form of the results liket_int
did for integers?Write at least five interesting tests for
inorder_str
.Write out the substitution-based evaluation of
inorder_str
on a tree with at least 3 nodes.Write a function
size
that takes abtnode
and produces an integer that is the number ofNode
s in the tree.Write a function
height
that takes abtnode
and produces an integer that is the height of the tree.Make sure to test the above two functions.
4 Programming in OCaml — Lists and Parametric Polymorphism
4.1 Linked Lists, By Hand
Since we’ve seen binary trees, it’s natural to think about a similar definition for the nodes of a linked list. One OCaml datatype we could write is:
type llist =
| Empty
| Link of string * llist
That is, a list is either Empty
(the end of the list), or a Link
of
a string onto another list. Of course, this would require that we write
additional datatype declarations for lists of numbers, lists of booleans,
lists of binary trees, and so on, if we needed those shapes of data. The
natural solution is to make the datatype generic over the kind of data it
uses. OCaml lets us do this by defining datatypes with type variables
that can be filled with any type. Type variables are written with a leading
'
character:
This idea corresponds to the use of templates in C++, or generic types in Java.
type 'a llist =
| Empty
| Link of 'a * 'a llist
The types of the fields in Link
have changed with this addition. The
first field can now hold a value of the list’s type, and the second must hold
a llist
that contains elements of that type as well. That is, this
describes a homogeneous linked list, where all elements will have the
same type.
We won’t have much use for heterogeneous lists, which would introduce
different complications.
Lets say we want to write a function sum
that takes a llist
of
numbers and adds them up. We now need to describe its type in terms of the
contents, which will be an int
:
let rec sum (l : int llist) : int =
match l with
| Empty -> 0
| Link(first, rest) -> first + (sum rest)
When we construct llist
s, we do not need to provide any extra type
information —
let l1 = Link(1, Empty);;
let l2 = Link("a", Empty);;
Here, l1
will have type int llist
, and l2
will have type
string llist
.
4.2 Linked Lists, Built-in
It turns out that our definition of llist
above is important enough that a
version of it is built into OCaml, just with slightly different names and
syntax. The built-in equivalent of Empty
is written []
, and
Link(first, rest)
is written first :: rest
. The syntax [a; b; c]
is shorthand for a::b::c::[]
. (Whitespace is not required before or
after the colons and semicolons here, but is useful for readability.) The type
of built-in lists is 'a list
, which can be specialized for any list
contents. For example, we could rewrite sum
above as:
let rec sum2 (l : int list) : int =
match l with
| [] -> 0
| first::rest -> first + (sum2 rest)
And we could test it by creating tests with t_int
:
(* these would go in the suite list *)
t_int "sum2_empty" (sum2 []) 0;
t_int "sum2_single" (sum2 [5]) 5;
t_int "sum2_longer" (sum2 [3; 4; 5]) 12;
t_int "sum2_longer2" (sum2 (3::4::5::[])) 12;
Note that the last two tests mean the same thing; they are just different ways of writing the same list containing 3, 4, and 5.
Since lists are quite a fundamental structure, we will end up using them frequently; handy functions to use with lists are here, and we’ll talk about them more as we build up more experience with OCaml. I’ll mention here, since it’s slightly tricky to search for, that OCaml provides two ways to append lists:
(* This function ... *)
List.append [1;2;3] [4;5;6] ====> [1;2;3;4;5;6]
(* ... is synonymous with this infix operator *)
[1;2;3] @ [4;5;6] ====> [1;2;3;4;5;6]
4.3 Exercises
Write and test a function
increment_all
that takes anint list
and produces a newint list
with all the elements increased by 1.Write and test a function
long_strings
that takes astring list
and anint
and produces a newstring list
that contains all the strings that had length greater than the givenint
. You can get the length of a string with the functionString.length
. Other string functions are documented here.Write and test a function
every_other
that takes a'a list
(a list with elements of any one type), and produces a new list that contains every other element from the given list, starting with the first element.Write and test a function
sum_all
that takes anint list list
(that is, a list of lists of integers), and returns anint list
that contains the sums of the sub-lists.
5 Tuples
There are many times in programs where we wish to return more than one value. For example, when returning a pair of key and value from a hash-table data structure, when returning an average and its standard deviation, or when representing a two (or three)-dimensional point, to name a few.
OCaml has a built-in way of handling these cases called tuples. To create a tuple, we enclose two or more values in parentheses:
let tup = (1, "a", []);;
To access the values in a tuple, we can use a special kind of let binding, where we give names to the positions of a tuple value:
let tup = (1, "a", []);;
let (one, a, empty_list) = tup;
(*
one is 1
a is "a"
empty_list is []
*)
Since pairs —fst
and snd
, that get the first and second
component of a two-element tuple, respectively.
The type of a tuple is written with *
characters separating the
components’ types, and surrounded by parentheses.
let increment_snd (t : (string * int)) : (string * int) =
(fst t, 1 + (snd t));;
(increment_snd ("a", 5)) (* returns the pair ("a", 6) *)
Warning! It’s really tempting to write [1, 2, 3]
and think you have
an int list
. You don’t! You actually have a one-element list, of
type (int * int * int) list
. Commas are reserved for creating tuple
values, and semicolons are used for creating list values. This typo can cause
some very confusing error messages, where the types reported in the error bear
little resemblance to what you think ought to be in your code...
In the second part of this assignment, you will use tuples to represent source locations of tokens, when parsing S-expressions (and of course, tuples will be useful for many other things, later!).
6 The option
Type
A common way of handling failure that we’ve already seen is raising exceptions
with failwith
. This works well when an operation is truly nonsensical.
However, it forces programs to use a different class of features —
Consider the problem of finding and returning the first element in a list that matches a particular predicate:
let rec find (l : 'a list) (pred : 'a -> bool) : 'a =
match l with
| [] -> failwith "Not found"
| x::xs -> if pred x then x else find xs pred;;
(find [1;2;3] (fun n -> n > 4);; (* raises an error *)
(find [1;2;3] (fun n -> n > 2);; (* returns 3 *)
When the element isn’t found, we cannot return a value of type 'a
,
because the algorithm hasn’t found one. It seems we have to throw an error,
as there is nothing left for us to do. This certainly limits the utility of
find
, which now needs a companion contains
if is to be useful on
lists that aren’t already known to have a matching element.
It would be convenient if we had a value that represented that there is
no appropriate value to return in the empty case. Similarly, it would
be useful to have the counterpart, a representation of being able to provide
some appropriate value. OCaml provides just such a datatype, called
option
, which is built-in. If we wrote the definition ourselves, it
would look like:
type 'a option =
| None
| Some of 'a
That is, an option
is either None
, which we can use to indicate
failure or the lack of an appropriate value, or Some
, which contains a
single field that is a value of the option’s type. To write find
using
option, we would write:
let rec find2 (l : 'a list) (pred : 'a -> bool) : 'a option =
match l with
| [] -> None
| x::xs -> if pred x then Some(x) else find2 xs pred;;
(find2 [1;2;3] (fun n -> n > 4);; (* returns None *)
(find2 [1;2;3] (fun n -> n > 2);; (* returns Some(3) *)
Now a program that calls find
, rather than using an exception handler to
manage the not found case, can simply match
on the option
that is
returned to decide what to do.
Note that option
s aren’t always better than exceptions, as sometimes it’s
difficult for the caller to know what to do when None
is returned. But
in many cases, when “failure” is something that the caller can reasonably
react to, returning an option
is a much more natural choice.
(For those coming from C/C++ backgrounds, note that an 'a option
is very
different from a “pointer that might be NULL
”. Because of ML’s type
system, you can’t simply “dereference the pointer”: the only thing you can do
with a value of a datatype is match
on it, which means you can never
forget to check if the value is actually None
.)
7 Grading standards
For this assignment, you will be graded on
whether your code compiles,
whether your code implements the specification (functional correctness),
whether you thoroughly test every method that you write, and
how readable your code is (indented well, commented well, etc).
8 Submission
8.1 Deliverables
Your submission should include the three files provided: Makefile
, functions.ml
, and test.ml
.
Please ensure that your code compiles! On this assignment, half of your grade is for correctness as determined by automated testing, so code that doesn’t compile is subject to a 50% penalty up front. Zip the files together:
$ zip submission.zip Makefile functions.ml test.ml
and submit that zip. (Note: we may or may not wind up using automated unit tests on the handin server itself, but we will run the unit tests ourselves offline.)
8.2 Instructions
You will submit your homework at https://handins.khoury.northeastern.edu. Follow the instructions here on how to use the server.