6.8

Lecture 20: Mutable data structures

Removing items from a list, sentinels, and wrappers

In the last lecture, we considered phone lists and dealt with the possibility that people might move and change their phone numbers. In this lecture, we deal with a related but more subtle situation, where we lose contact with someone altogether, and need to remove them from a phone list. Once again, we’ll start with the non-generic ILoPerson type to focus on the essential new aspects of the problem; generalizing to IList<Person> is straightforward.

20.1Removing items from a list: the setup

As it turns out, removing an item from a list is a bit subtle. First, we need a signature and a few examples of what we expect to happen when we remove a person from a list:
 // In ILoPerson // EFFECT: modifies this list and removes the person with the given name from it void removePerson(String name);
 // In ExamplePhoneLists // Tests removing the first person in a list void testRemoveFirstPerson(Tester t) { this.initData(); ILoPerson list1 = new ConsLoPerson(this.anne, new ConsLoPerson(this.clyde, new ConsLoPerson(this.henry, new MtLoPerson()))); ILoPerson list2 = new ConsLoPerson(this.anne, new ConsLoPerson(this.dana, new ConsLoPerson(this.gail, new MtLoPerson()))); // Check initial conditions t.checkExpect(list1.contains("Anne"), true); t.checkExpect(list2.contains("Anne"), true); // Modify list1 list1.removePerson("Anne"); // Check that list1 has been modified... t.checkExpect(list1.contains("Anne"), false); // ...but that list2 has not t.checkExpect(list2.contains("Anne"), true); } // Tests removing a middle person in a list void testRemoveMiddlePerson(Tester t) { this.initData(); ILoPerson list1 = new ConsLoPerson(this.anne, new ConsLoPerson(this.clyde, new ConsLoPerson(this.henry, new MtLoPerson()))); ILoPerson list2 = new ConsLoPerson(this.dana, new ConsLoPerson(this.clyde, new ConsLoPerson(this.gail, new MtLoPerson()))); // Check initial conditions t.checkExpect(list1.contains("Clyde"), true); t.checkExpect(list2.contains("Clyde"), true); // Modify list1 list1.removePerson("Clyde"); // Check that list1 has been modified... t.checkExpect(list1.contains("Clyde"), false); // ...but that list2 has not t.checkExpect(list2.contains("Clyde"), true); } // Tests removing the last person in a list void testRemoveLastPerson(Tester t) { this.initData(); ILoPerson list1 = new ConsLoPerson(this.anne, new ConsLoPerson(this.clyde, new ConsLoPerson(this.henry, new MtLoPerson()))); ILoPerson list2 = new ConsLoPerson(this.dana, new ConsLoPerson(this.gail, new ConsLoPerson(this.henry, new MtLoPerson()))); // Check initial conditions t.checkExpect(list1.contains("Henry"), true); t.checkExpect(list2.contains("Henry"), true); // Modify list1 list1.removePerson("Henry"); // Check that list1 has been modified... t.checkExpect(list1.contains("Henry"), false); // ...but that list2 has not t.checkExpect(list2.contains("Henry"), true); }

Do Now!

These examples are not complete. What other edge cases might there be?

Let’s try to design removePerson.

20.2Removing a person by name, part 1

We can dispense with the empty case easily enough: there’s no person to be removed, so there’s nothing to do.
 // In MtLoPerson void removePerson(String name) { return; }

The non-empty case is a bit trickier. The recursive part is easy: if we haven’t found the person by name yet, keep searching.
 // In ConsLoPerson void removePerson(String name) { if (this.first.name.equals(name)) { ??? } else { this.rest.removePerson(name); } }
What should we do when we finally find the name? We need somehow to remove the current ConsLoPerson from the list. Is there anything we can do to modify the current object to accomplish that? No! Each ConsLoPerson only contains references to the next item in the list, so all we could do is modify the node after the current one. But the link we need to modify is therefore from the node before the current one, and we don’t have a reference to it!

Exercise

Well...maybe we don’t need to remove this ConsLoPerson from the list — maybe we just need to remove its data. Why couldn’t we just “move” the data from this.rest.first into this.first, and then do the same thing to this.rest, moving the items up one by one to eliminate the current one?

Give three technical reasons why this approach fails. (Hint: consider types, aliasing, and any base cases.)

(It’s tempting to think that perhaps all we need to do is “set this to this.rest”, and then we’d be done. But this is not a variable, it’s a keyword. It is a pronoun, referring to the current object, and we cannot modify it to mean something else. Instead, we have to find where the references to the current object are being held—i.e., in the preceding node of this list—and modify them.)

It sounds like we need to keep track of additional information: we need an accumulator parameter that tells us what the node is before this one, so that we can modify it as needed. Note that the node before the current one is guaranteed to be a ConsLoPerson, not merely an ILoPerson, so we can define the following signature:
 // In ILoPerson void removePerson(String name); void removePersonHelp(String name, ConsLoPerson prev);
 // In MtLoPerson void removePerson(String name) { return; } void removePersonHelp(String name, ConsLoPerson prev) { return; }
 // In ConsLoPerson void removePersonHelp(String name, ConsLoPerson prev) { if (this.first.name.equals(name)) { prev.rest = this.rest; // Modify the previous node to bypass this node } else { this.rest.removePersonHelp(name, this); // this is the previous node of this.rest } }
Of course, this isn’t the method we want to define; we still need to implement removePerson itself. To do that, we’ll need to pass to removePersonHelp some particular ConsLoPerson, and the only one we have available is this:
 // In ConsLoPerson void removePerson(String name) { return this.rest.removePersonHelp(name, this); }
If we try running the tests above, most of them pass.

Do Now!

There is a serious but subtle problem with this single line of code. What is it?

Before we tackle that problem, there’s another potential hazard to recognize and avoid.

20.3Aliasing, again, and removing items from a list

Suppose we had a different setup of friends and coworkers than the last lecture: suppose we work in an office where we’re friends with everyone; additionally, we have friends outside of work.
 // In ExamplePhoneLists void initData() { // ... initialize Anne, Bob, etc... this.work = new ConsLoPerson(this.bob, new ConsLoPerson(this.clyde, new ConsLoPerson(this.dana, new ConsLoPerson(this.eric, new ConsLoPerson(this.frank, new MtLoPerson()))))); // We're friends with everyone at work, and also with other people this.friends = new ConsLoPerson(this.anne, new ConsLoPerson(this.gail, new ConsLoPerson(this.henry, this.work); }
Now suppose Eric quits his job. We need to remove him from the list of work contacts, but we still remain friends with him outside of work. In other words, the following test should pass:
 // In ExamplePhoneLists void testRemoveCoworker(Tester t) { this.initData(); // Test that Eric is a coworker and a friend t.checkExpect(this.work.findPhoneNum("Eric"), this.eric.num); t.checkExpect(this.friends.findPhoneNum("Eric"), this.eric.num); // Remove Eric from coworkers this.work.removePerson("Eric"); // Check that Eric is no longer a coworker t.checkExpect(this.work.findPhoneNum("Eric"), -1); // Check that Eric is still a friend t.checkExpect(this.friends.findPhoneNum("Eric"), this.eric.num); }

Do Now!

Do these tests pass? Why or why not? Draw an object diagram to illustrate the situation.

Here is the object diagram that represents our initial data, before removing Eric.
+------+ +------+ +-------+ +------+ +-------+ +-------+ +------+ +-------+ +-------+ +---------+
| Anne | | Bob  | | Clyde | | Dana | | Eric  | | Frank | | Gail | | Henry | | Irene | | Jenny   |
| 1234 | | 3456 | | 6789  | | 1357 | | 12469 | | 7924  | | 9345 | | 8602  | | 91302 | | 8675309 |
+------+ +------+ +-------+ +------+ +-------+ +-------+ +------+ +-------+ +-------+ +---------+
^        ^        ^          ^      ^         ^          ^         ^
|        |        |          |      |         +----------|---------|-----------------+
|        |        |          |      +--------------------|---------|------+          |
|        |        |          +---------------------------|-----+   |      |          |
|        |        +---------------------------------+    |     |   |      |          |
|        +-------------------------------+          |    |     |   |      |          |
|                                        |          |    |     |   |      |          |
|          +-----------------------------|----------|----+     |   |      |          |
|          |          +------------------|----------|----------|---+      |          |
+--|----+  +--|----+  +--|----+             |          |          |          |          |
| first |  | first |  | first |          +--|----+  +--|----+  +--|----+  +--|----+  +--|----+  ++
| rest --->| rest --->| rest ----------->| first |  | first |  | first |  | first |  | first |  ||
+-------+  +-------+  +-------+          | rest --->| rest --->| rest --->| rest --->| rest --->||
^                                       +-------+  +-------+  +-------+  +-------+  +-------+  ++
|                                         ^
friends                                     |
work
The first test certainly passes, since Eric is present as the fourth item of the work list. Notice that in the friends list, the fourth ConsLoPerson node is the first node of the work list: in other words, work is aliased as friends.rest.rest.rest. So finding Eric in the friends list (test 2) also passes; he’s present as the seventh item of the list. But as a result of the aliasing, after removing Eric from the list of work colleagues, we get this:
+------+ +------+ +-------+ +------+ +-------+ +-------+ +------+ +-------+ +-------+ +---------+
| Anne | | Bob  | | Clyde | | Dana | | Eric  | | Frank | | Gail | | Henry | | Irene | | Jenny   |
| 1234 | | 3456 | | 6789  | | 1357 | | 12469 | | 7924  | | 9345 | | 8602  | | 91302 | | 8675309 |
+------+ +------+ +-------+ +------+ +-------+ +-------+ +------+ +-------+ +-------+ +---------+
^        ^        ^          ^                ^          ^         ^
|        |        |          |                +----------|---------|-----------------+
|        |        |          |                           |         |                 |
|        |        |          +---------------------------|-----+   |                 |
|        |        +---------------------------------+    |     |   |                 |
|        +-------------------------------+          |    |     |   |                 |
|                                        |          |    |     |   |                 |
|          +-----------------------------|----------|----+     |   |                 |
|          |          +------------------|----------|----------|---+                 |
+--|----+  +--|----+  +--|----+             |          |          |                     |
| first |  | first |  | first |          +--|----+  +--|----+  +--|----+             +--|----+  ++
| rest --->| rest --->| rest ----------->| first |  | first |  | first |             | first |  ||
+-------+  +-------+  +-------+          | rest --->| rest --->| rest -------------->| rest --->||
^                                       +-------+  +-------+  +-------+             +-------+  ++
|                                         ^
friends                                     |
work
We’ve successfully removed Eric from the work list, so the third test passes. But because of the aliasing, we’ve inadvertently removed Eric from the friends list as well, so the fourth test fails.

There is no way to avoid this problem, except by avoiding unintended aliasing in mutable data structures. (For immutable data structures, sharing like this is perfectly fine, because there will never be a modification that could expose the sharing.) When constructing values of mutable data types, pay extra attention to whether objects are being reused, and whether that reuse could become problematic later in the design of the program.

20.4Removing from a list, part 2: removing the first item

In our initial attempt at removing items from a list, there was a suspicious line of code:
 // In ConsLoPerson void removePerson(String name) { this.rest.removePersonHelp(name, this); }
When does this line of code fail? This method completely ignores this.first, so if we need to remove the first item of the list, we’ll never even examine it! But how can we possibly remove the front item of the list—what can we remove it from? There is no reference to the first node from some previous node that we can modify. Is there?

There is one reference to the front of the list: after all, the variables friends and work, etc., are how we refer to the lists! Perhaps we could write something like this?
 work.removePersonHelp("Bob", work)
This will not succeed, and illustrates an important point about variables. If we run the line of code above, it will evaluate work to a particular object value, and then pass that value in to the removePersonHelp method, where it will be given the name prev. We cannot pass the variable work itself in as a parameter—that’s simply meaningless in Java—so we cannot modify that variable from within the removePersonHelp method.

If the problem with removing the first item of the list is that it’s the first item of the list, and therefore there is no reference to it from some previous cell, then the only way forward is to make it not be the first item of the list!

20.5Revising our data structure: Introducing sentinels

What if we created a “dummy” node to be the very first node of the list, but that didn’t contain any data? Instead, it merely serves to hold a reference to the second node of the list, which will be the first item that actually stores data. That is, instead of a picture that looks like this:
+------+   +------+   +-------+
| Anne |   | Gail |   | Henry |
| 1234 |   | 9345 |   | 8602  |
+------+   +------+   +-------+
^          ^          ^
+--|----+  +--|----+  +--|----+  ++
| first |  | first |  | first |  ||
| rest --->| rest --->| rest --->||
+-------+  +-------+  +-------+  ++
^
|
friends
We’d have a picture that looks more like this:
           +------+   +------+   +-------+
| Anne |   | Gail |   | Henry |
| 1234 |   | 9345 |   | 8602  |
+------+   +------+   +-------+
^          ^          ^
+-------+  +--|----+  +--|----+  +--|----+  ++
| rest --->| first |  | first |  | first |  ||
+-------+  | rest --->| rest --->| rest --->||
^        +-------+  +-------+  +-------+  ++
|
friends
The first node holds no data. It is a sentinel, a “guard” or intermediate object that always sits at the front of the list, before any of the interesting data. Now, the first data-carrying node isn’t so special after all, because there’s a node that comes before it: the sentinel! We’ve gotten rid of the special case that broke our tests earlier...assuming we can design the sentinel properly.

This particular technique of introducing an extra object between what we have (the variable friends) and what we want (the data in the list) is called adding a layer of indirection. In courses on algorithms and data structures, you will see many, many more examples of using indirection to solve problems that seem difficult or impossible, otherwise.

Introducing the idea of a sentinel node is the first time we have encountered a data definition where not every part of the data definition is dedicated to the purpose of holding data. In all our types before this—trees with nodes to hold data and leaves to mark the end; lists with conses to hold data and emptys to mark the end; shapes that hold data; books and authors and posns and all the rest—every class we defined had a clear purpose for representing information. Now, this new Sentinel class that we are about to define has as its sole purpose to make it easier to operate on our data. In fact, this is a concept that will reappear over and over in future courses: sometimes the challenge is not merely to design a representation for our data, but to design a representation that is efficient or convenient to work with.

Do Now!

Before adding the sentinel node, we could access the first data item of a non-empty list using theList.first. Now that we have a sentinel node, how can we access the first item of the list?

How can we define this Sentinel class, and how can we use it? At the very least, it must have a rest field capable of containing a ConsLoPerson, or else we couldn’t place it at the front of the list. And Sentinel and ConsLoPerson must share a common base class, because the node before a ConsLoPerson could now be either a ConsLoPerson or a Sentinel, but either way, removePersonHelp needs a way to access the rest field of such an object.

This leads to the following tentative class design:
+------------------------------------------------+
| ILoPerson                                      |
+------------------------------------------------+
| void removePerson(String name)                 |
| void removePersonHelp(String name, ANode prev) |
+------------------------------------------------+
/_\                     /_\
|                       |
+----------------+               |
| ANode          |               |
+----------------+               |
| ILoPerson rest |               |
+----------------+               |
/_\       /_\                  |
|         |                   |
+----+         |                   |
|              |                   |
+----------+    +--------------+  +------------+
| Sentinel |    | ConsLoPerson |  | MtLoPerson |
+----------+    +--------------+  +------------+
+----------+    | Person data  |  +------------+
+--------------+
A Sentinel and a ConsLoPerson are both kinds of ANodes, in that they have rest fields. But it’s not entirely clear whether ANode should implement the ILoPerson interface: it still makes no sense to implement removePerson on ConsLoPerson (because we don’t know the node preceding the current one); we’re creating this Sentinel class precisely so that we only have to implement removePersonHelp.

Worse, users of this data type will have to create Sentinels instead of just creating ConsLoPersons and MtLoPersons—and if users forget to create the Sentinel, then removePerson might simply break!

So now we have a data type that we can manipulate correctly, but that doesn’t implement the interface the way we want it, and that isn’t pleasant to use!

20.6Revising our data structure: introducing wrappers

If the problem is that we want to ensure that users of our data structure can’t mistakenly forget to create a Sentinel, perhaps we should create another class that manages the Sentinel for them. We call this a wrapper class, since it wraps around the fiddly details of managing Sentinels and ConsLoPersons and such. A user merely has to create an instance of this wrapper class, and can then ignore the implementation details inside it.

We’re introducing another extra object between what we have to manipulate (the Sentinel object) and what we want to manipulate (the list overall). This is adding another layer of indirection. Looks like we’re seeing another example of this much sooner than we thought!

Introducing the idea of a wrapper is the second time we have encountered a data definition where not every part of the data definition is dedicated to the purpose of holding data.

So our new, final class design for mutable person lists is:
         +---------------------------------------+
| MutablePersonList                     |
+---------------------------------------+
| void removePerson(String name)        |
| void addPerson(String name, int num)  |
+-------- Sentinel sentinel                     |
|       +---------------------------------------+
|
|      +------------------------------------------------+
|      | APersonList                                    |
|      +------------------------------------------------+
|      | void removePersonHelp(String name, ANode prev) |
|      +------------------------------------------------+
|                  /_\              /_\
|                   |                |
|         +------------------+       |
|         | ANode            |       |
|         +------------------+       |
|         | APersonList rest |       |
|         +------------------+       |
|           /_\   /_\                |
|            |     |                 |
|   +--------+     |                 |
V   |              |                 |
+----------+    +--------------+  +------------+
| Sentinel |    | ConsLoPerson |  | MtLoPerson |
+----------+    +--------------+  +------------+
+----------+    | Person data  |  +------------+
+--------------+

Our users of this data type will construct a new MutablePersonList(). Given that object, they can invoke addPerson to add new contacts by name and number. And again given that object, they can invoke removePerson to remove a person by name. We will implement removePerson to delegate to removePersonHelp in such a way that we will always take care of the Sentinel properly. Let’s see how it all works.

20.6.1Implementing the nodes of the list

 // Represents a sentinel at the start, a node in the middle, // or the empty end of a list class APersonList { abstract void removePersonHelp(String name, ANode prev); APersonList() { } // nothing to do here }
 // Represents a node in a list that has some list after it class ANode extends APersonList { APersonList rest; ANode(APersonList rest) { this.rest = rest; } }
 // Represents the empty end of the list class MtLoPerson extends APersonList { MtLoPerson() { } // nothing to do void removePersonHelp(String name, ANode prev) { return; } }
The ConsLoPerson class is pretty much as we’ve seen it before; we only have to implement removePersonHelp on it, but we can assume that the MutablePersonList class will invoke this method properly:
 // Represents a data node in the list class ConsLoPerson extends ANode { Person data; ConsLoPerson(Person data, APersonList rest) { super(rest); this.data = data; } void removePersonHelp(String name, ANode prev) { if (this.first.name.equals(name)) { prev.rest = this.rest; } else { this.rest.removePersonHelp(name, this); } } }
The Sentinel class is slightly funny, in that we have to implement removePersonHelp (because otherwise we’d get a type error in the code above), but we know that we’ll never invoke this method (because it wouldn’t make any sense):
 // Represents the dummy node before the first actual node of the list class Sentinel extends ANode { Sentinel(APersonList rest) { super(rest); } void removePersonHelp(String name, ANode prev) { throw new RuntimeException("Can't try to remove on a Sentinel!"); } }

20.6.2Implementing MutablePersonList itself

Now finally we have to implement the MutablePersonList class. Since the purpose of this class is to hide the Sentinel and other nodes, we’ll design a utility constructor that takes zero arguments, and sets up a Sentinel and empty list for us:
 class MutablePersonList { Sentinel sentinel; MutablePersonList() { this.sentinel = new Sentinel(new MtLoPerson()); } }
How should we implement removePerson? We know that the first data item (if any) can be found at this.sentinel.rest. So we invoke removePersonHelp on that object, passing in the sentinel as the preceding node:
 // In MutablePersonList void removePerson(String name) { this.sentinel.rest.removePersonHelp(name, this.sentinel); }

(It’s basically ok to use the field-of-field access pattern here, because we know exactly how this.sentinel works, since the MutablePersonList class is the only place that we manipulate sentinels.)

Exercise

Try implementing addPerson that inserts a person with the given name and number onto the front of the list. How does having a sentinel help here?

20.7Discussion

Removing an item from a list is a surprisingly tricky operation. There was only one case that our initial, simple attempt couldn’t handle: when the item to be removed was the first item of the list. But handling that case gracefully is hard: we were forced to add a sentinel node to “make the first node not be first”, and then were driven to add a wrapper around that, to hide the details that were getting somewhat messy. But now that it’s done, we have a data structure that’s rather elegant: the methods available on it are precisely the ones we want, and none of the helper methods or helper classes are visible to users.

It is technically possible, now that we have the MutablePersonList wrapper class, to eliminate the sentinel class. But doing so would clutter the code of the MutablePersonList class, and require changing several other methods in the other classes. Besides, now that we have identified a way to eliminate a special case in our code (handling the first item of the list), why would we want to go back to making it special and tricky again?

Exercise

Try eliminating the Sentinel class, while maintaining the method signatures for MutablePersonList exactly as they currently are.

20.8Generalizing from MutablePersonLists to mutable lists of arbitrary data

What is the analogue of IList<T> here, now that we have a MutablePersonList? It’s not merely enough to add more methods to IList<T> that interface was used to define immutable lists. Instead, we might want to define the following hypothetical interface:
 interface IMutableList { // adds an item to the (front of) the list void add(T t); // removes an item from list void remove(T t); }
But this interface is not flexible enough: how exactly does remove decide which item to remove?

Do Now!

Which notion of equality—extensional or intensional—could we possibly use here that works on values of arbitrary type T?

And, why can we only add to the front of the list? Why not the end? Or in the middle? Now that we’ve separated the implementation details of the list from the interface we’d like to use, let’s actually think about the methods we’d want to use! Here is a more flexible version of this interface, one that’s closer to what we might want to use:
 interface IMutableList { // adds an item to the (front of) the list void addToFront(T t); // adds an item to the end of the list void addToEnd(T t); // removes an item from list (uses intensional equality) void remove(T t); // removes the first item from the list that passes the predicate void remove(IPred whichOne); // gets the numbered item (starting at index 0) of the list T get(int index); // sets (i.e. replaces) the numbered item (starting at index 0) with the given item void set(int index, T t); // inserts the given item at the numbered position void insert(int index, T t); // returns the length of the list int size(); }

As it turns out, we will not have to implement this interface ourselves. Java defines several classes that implement (roughly) this interface for us. In the next lecture, we’ll start to see how they work, and see how the index-based methods drive us towards yet another language feature in Java.