Lecture 22: ArrayLists

6.5

Lecture 22: ArrayLists

Binary search over sorted ArrayLists, sorting ArrayLists

In the last lecture we began implementing several functions over ArrayLists as methods in a helper utility class. We continue that work in this lecture, designing methods to find an item in an ArrayList matching a predicate, and to sort an ArrayList according to some comparator.

22.1 Finding an item in an arbitrary ArrayList

To find an item that matches a given predicate, we need to add a new method to our utility class as well. Our initial guess at a signature is

// In ArrayUtils <T> ??? find(ArrayList<T> arr, IPred<T> whichOne) {
???
}

What should the return type of our method be? If we simply return the item (i.e. have a return type of T), we are limited in what we can do: we can modify the item, perhaps, but we cannot remove the item from the ArrayList itself, because removal requires specifying an index. Therefore we should return the index where we found the item. If no item is found, we could throw an exception, but since it is a common occurrence for a value not to be found, we should instead return some invalid index, rather than aborting our program:

// In ArrayUtils // Returns the index of the first item passing the predicate, // or -1 if no such item was found <T> int find(ArrayList<T> arr, IPred<T> whichOne) {
???
}

How can we implement this method? We don’t have ConsList and MtList against which to dynamically dispatch to methods. Even if we did, we’d need to keep count of the index we were at (using an accumulator parameter) so that we could return it. Here, we can use that index to drive the iteration as well. We define a helper method

// In ArrayUtils // Returns the index of the first item passing the predicate at or after the // given index, or -1 if no such such item was found <T> int findHelp(ArrayList<T> arr, IPred<T> whichOne, int index) {
if (whichOne.apply(arr.get(index)) {
return index;
}
else {
return findHelp(arr, whichOne, index + 1);
}
}

Do Now!
What’s wrong with this code?

We’ve forgotten our base case: this code will continue its recursion until index gets too big, at which point get will throw an exception. We need to compare index with the size of the list:

// In ArrayUtils // Returns the index of the first item passing the predicate at or after the // given index, or -1 if no such such item was found <T> int findHelp(ArrayList<T> arr, IPred<T> whichOne, int index) {
if (index >= arr.size()) {
return -1;
}
else if (whichOne.apply(arr.get(index)) {
return index;
}
else {
return findHelp(arr, whichOne, index + 1);
}
}

Do Now!
What would happen if we had used > instead of >=?

22.2 Finding an item in a sorted ArrayList – version 1

Suppose we happen to know that our ArrayList contains items that are comparable, and that the ArrayList itself is sorted. Can we do better than blindly scanning through the entire ArrayList? For concreteness, let’s assume our ArrayList is an ArrayList<String> and we’ll use the built-in comparisons on Strings. We’ll revisit this decision after we’ve developed the method, and generalize it to arbitrary element types.

To guide our intuition on a better searching algorithm, consider a well-known sorted list of strings: a dictionary, whose entries are alphabetized. Here is a sample dictionary of words, along with their 0-based indices:

   0       1       2      3    4     5       6       7         8
[apple, banana, cherry, date, fig, grape, honeydew, kiwi, watermelon]

Suppose we were searching for “grape”.

We know that words beginning with ‘g’ are not likely to appear at the very front of the dictionary, nor are they likely to appear at the back. Instead we start our search somewhere in the middle of the dictionary. In this case, the middle of our dictionary is index 4, “fig”. Because the dictionary is alphabetized, and “grape” comes after “fig” in the alphabet, we now know that all indices of 4 and below will definitely not contain the word we seek. Instead, we turn our attention to indices 5 (which is one more than the middle index, 4+1) through 8 (our upper bound on which indices might contain our word).
We could begin blindly scanning through all those items (and indeed, in this particular example, we’d luckily find our target on the very next try!), but our first approach of checking the “middle” index and eliminating half the dictionary in one shot worked so well; let’s try it again. This time, the middle index is 6 (or 7; either will work, but since indices must be integers, we will use integer division, allowing Java to truncate any fractional part and we’ll get 6 as our answer), “honeydew”. Since “grape” precedes “honeydew”, we now know that indices 6 and up will definitely not contain the word we seek. So we continue with indices 5 (our lower bound) through 5 (which is one less than the middle index, 6-1).
Happily, index 5 contains “grape”, so we return 5 as our answer.

Do Now!
What indices would we check if we were searching for “blueberry”?

Let’s trace through the steps involved in searching for “blueberry”.

Once again, we consider the entire ArrayList, from index 0 through index 8, and start our search at the middle index 4, “fig”, which is greater than our target word. So we eliminate indices 4 and up, and focus on indices 0 (our lower bound on where to find the word) through 3 (which is 4-1).
Our middle index is 2, corresponding to “cherry”, which is greater than “blueberry”, so we eliminate indices 2 and up, and focus on indices 0 (our lower bound) through 1 (which is 2-1).
Now our middle index is 0, “apple”, which is less than our target, so we eliminate index 0, and focus on indices 1 (which is 0+1) through 1 (our upper bound).
Index 1 contains “banana”, which is less than our target, so we eliminate index 1, and focus on indices 2 (which is 1+1) through 1 (our upper bound).
Now our bounds have crossed: our lower bound is greater than our upper bound, so there are no possible words in the dictionary that might be our target. We must not have the target word in our ArrayList; we therefore return -1.

Let’s see how to translate this description into code. This search technique, which splits the search space in half each time, is known as binary search, so we’ll implement a new method to distinguish it from our previous find operation:

// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically int binarySearch(ArrayList<String> strings, String target) {
???
}

Once again, we find ourselves in need of a helper method: we need to keep track of the lower and upper bounds on the indices where our target string might be found.

// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically int binarySearchHelp_v1(ArrayList<String> strings, String target, int lowIdx, int highIdx) {
int midIdx = (lowIdx + highIdx) / 2;
if (target.compareTo(strings.get(midIdx)) == 0) {
return midIdx; // found it! }
else if (target.compareTo(strings.get(midIdx)) > 0) {
return this.binarySearchHelp_v1(strings, target, midIdx + 1, highIdx); // too low }
else {
return this.binarySearchHelp_v1(strings, target, lowIdx, midIdx - 1); // too high }
}

Do Now!
What’s wrong with this code?

Once again we forgot our base case: when the indices cross, the target must not be present:

// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically int binarySearchHelp_v1(ArrayList<String> strings, String target, int lowIdx, int highIdx) {
int midIdx = (lowIdx + highIdx) / 2;
if (lowIdx > highIdx) {
return -1; // not found }
else if (target.compareTo(strings.get(midIdx)) == 0) {
return midIdx; // found it! }
else if (target.compareTo(strings.get(midIdx)) > 0) {
return this.binarySearchHelp_v1(strings, target, midIdx + 1, highIdx); // too low }
else {
return this.binarySearchHelp_v1(strings, target, lowIdx, midIdx - 1); // too high }
}

Do Now!
What would happen if we didn’t add or subtract 1 from midIdx in the recursive calls?

Consider searching for “clementine”, this time without adding or subtracting 1:

We start the search between indices 0 and 8. The middle index is 4, and “fig” is bigger than “clementine”, so we search from the lower bound to the middle index.
We search between indices 0 and 4. The middle index is 2, and “banana” is smaller than “clementine”, so we search from the middle index to the upper bound.
We search between indices 2 and 4. The middle index is 3, and “cherry” is smaller than “clementine”, so we search from the middle index to the upper bound.
We search between indices 3 and 4. The middle index is 3, and “cherry” is smaller than “clementine”, so we search from the middle index to the upper bound.
We search between indices 3 and 4...

If we don’t add or subtract 1, then we can easily get stuck comparing the same items with the same upper and lower bounds indefinitely. Once again, when dealing with indices, we have to be particularly careful about our edge cases.

Do Now!
What would happen if our exit condition were if (loxIdx >= highIdx)...?

Now that we have a working helper, we just need to invoke it from the main binarySearch method:

// In ArrayUtils int binarySearch_v1(ArrayList<String> strings, String target) {
return this.binarySearchHelp_v1(strings, target, 0, strings.size() - 1);
}

22.3 Finding an item in a sorted ArrayList – version 2

Functionally, the code above works great: we’ve covered all cases, and it computes the correct answer. Aesthetically, though, it’s a bit...fiddly. All those adding and subtracting 1s from the indices is tricky to get right, and if we miss even one of them, our code could loop indefinitely. Perhaps there’s a cleaner, less brittle way we could organize our code to avoid these.

Recall our discussions from Fundies I about semi-open intervals: a semi-open interval \([m, n)\) consists of all numbers \(x\) such that \(m \leq x < n\), i.e. it includes \(m\) (and so is “closed” on the left) and excludes \(n\) (and so is “open” on the right). As a degenerate case, the interval \([m, m)\) is empty, because it must both include and exclude its edge values. How might we use this concept in our binary search?

Do Now!
What kind of intervals were we using in version 1 of our binary search code?

We never actually stated explicitly what lowIdx and highIdx meant in our code above! We just blindly manipulated them arithmetically, but never specifically gave them an interpretation. We can infer their meaning by looking at the initial call to binarySearchHelp_v1 in binarySearch_v1 itself: we pass in 0 for the lower bound, and strings.size() - 1 for the upper bound. Apparently, the lower bound means the lowest possible valid index where the data could be found, and the upper bound means the highest possible valid index where the data could be found. Because lowIdx and highIdx are inclusive bounds, they represent a closed interval.

Ironically, the mathematical terminology here is to say that closed intervals are not “closed under splitting.” Further ironically, semi-open intervals are “closed under splitting.” Mathematicians overload the term “closed” with multiple meanings.

Arithmetically, what we’ve noticed in our code, with its adding and subtracting 1s everywhere, is that it’s hard to split a closed interval into two pieces that are themselves closed intervals — and we need the two pieces to be closed intervals, or else we can’t pass them to recursive calls. On the other hand, it’s easy to split a semi-open interval in two: we can split an interval \([l, h)\) into \([l, m)\) and \([m, h)\), for any \(l \leq m \leq h\), and it’s straightforward to check that both smaller intervals contain all the values of the original interval, and that the smaller intervals do not overlap.

Do Now!
Confirm this — use the definition of semi-open above.

What if we used a semi-open interval for our indices, instead of a closed one? The skeleton of our code will be identical to the version above, but a few details will change.

// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically // Assumes that [lowIdx, highIdx) is a semi-open interval of indices int binarySearchHelp_v2(ArrayList<String> strings, String target, int lowIdx, int highIdx) {
int midIdx = (lowIdx + highIdx) / 2;
if (lowIdx ??? highIdx) {
return -1; // not found }
else if (target.compareTo(strings.get(midIdx)) == 0) {
return midIdx; // found it! }
else if (target.compareTo(strings.get(midIdx)) > 0) {
return this.binarySearchHelp_v2(strings, target, midIdx ???, highIdx); // too low }
else {
return this.binarySearchHelp_v2(strings, target, lowIdx, midIdx ???); // too high }
}

Read the calls to binarySearchHelp_v2 as “find the index of the target string in the given list, knowing that it must be at least at the low index and before the high index.” We have three holes to fill in, which we’ll examine out of order:

We need a base case to determine when there are no valid indices left to check. This now falls out of the definition of semi-open intervals: the interval is empty when lowIdx >= highIdx.
Otherwise we split the interval in half. If the target is too high, then the midIdx is too big. We need to exclude it in the recursive call, and since the interpretation of the high index is that it’s excluded, we can simply pass midIdx directly, with no subtracting 1.
If the target is too low, then the midIdx is too small. We can exclude it from the recursive call by adding 1 to it. Sadly, this addition is necessary and can’t be eliminated, because indices are integers, not reals, and so we run the risk of infinitely recuring when computing midIdx that we get the exact same numbers we started with.

Do Now!
Suppose we didn’t add 1 in the last case. Construct a test case that causes the search to recur forever.

Our final code, then, is this:

// In ArrayUtils // Returns the index of the target string in the given ArrayList, or -1 if the string is not found // Assumes that the given ArrayList is sorted aphabetically // Assumes that [lowIdx, highIdx) is a semi-open interval of indices int binarySearchHelp_v2(ArrayList<String> strings, String target, int lowIdx, int highIdx) {
int midIdx = (lowIdx + highIdx) / 2;
if (lowIdx >= highIdx) {
return -1; // not found }
else if (target.compareTo(strings.get(midIdx)) == 0) {
return midIdx; // found it! }
else if (target.compareTo(strings.get(midIdx)) > 0) {
return this.binarySearchHelp_v2(strings, target, midIdx + 1, highIdx); // too low }
else {
return this.binarySearchHelp_v2(strings, target, lowIdx, midIdx); // too high }
}

It should be clear from looking at the code that we split the original interval \([lowIdx, highIdx)\) into \([lowIdx, midIdx)\) and \([midIdx + 1, highIdx)\), which — since we’re only considering integers — clearly cover the original interval with no overlap.

Finally, we need our initial routine that calls the helper. Now, since our upper bound is excluded, we don’t need to subtract 1 from the size of the list, because we’ll never consider the initial upper bound as a valid index:

// In ArrayUtils int binarySearch_v2(ArrayList<String> strings, String target) {
return this.binarySearchHelp_v2(strings, target, 0, strings.size());
}

22.4 Generalizing to arbitrary element types

For completeness, here is the version of binarySearch that works for arbitrary element types. Our signature gets slightly more complicated, but the logic behind the index computations and comparisons remains the same:

// In ArrayUtils <T> int gen_binarySearch_v2(ArrayList<T> arr, T target, IComparator<T> comp) {
return this.gen_binarySearchHelp_v2(arr, target, comp, 0, arr.size());
}
<T> int gen_binarySearchHelp_v2(ArrayList<T> arr, T target, IComparator<T> comp,
int lowIdx, int highIdx) {
int midIdx = (lowIdx + highIdx) / 2;
if (lowIdx >= highIdx) {
return -1;
}
else if (comp.compare(target, strings.get(midIdx)) == 0) {
return midIdx;
}
else if (comp.compare(target, strings.get(midIdx)) > 0) {
return this.gen_binarySearchHelp_v2(strings, target, comp, midIdx + 1, highIdx);
}
else {
return this.gen_binarySearchHelp_v2(strings, target, comp, lowIdx, midIdx);
}
}

(Note that obviously in practice, these methods would lose the gen_ and _v2 affixes, which were added here only to distinguish the various versions of our code.)

22.5 Sorting an ArrayList

Picture a set of cards spread out in a row on a table, each with a word on them:

   0       1     2      3      4     5       6         7        8
[kiwi, cherry, apple, date, banana, fig, watermelon, grape, honeydew]

How would we sort this? There are many, many techniques we could use, but since we have only two hands to move the cards around, one of the most natural might be the following. We pick up the first card, “kiwi”, and look for the card that ought to go in that spot — “apple” — and replace “kiwi” with “apple”. Since we do not want to lose “kiwi”, and since we have to set it down again somewhere, we might as well place it in the spot where “apple” was: we exchange them.

   0       1     2      3      4     5       6         7        8
[apple, cherry, kiwi, date, banana, fig, watermelon, grape, honeydew]

Do Now!
How did we decide that “apple” was the appropriate replacement for “kiwi”?

Next, we pick up the second card, “cherry”, and look for the card that ought to go in that spot — “banana” — and exchange them.

   0       1     2      3      4     5       6         7        8
[apple, banana, kiwi, date, cherry, fig, watermelon, grape, honeydew]

Do Now!
How did we decide that “banana” was the appropriate replacement for “cherry”?

Let’s be a bit more rigorous about what we’re doing here. In the first case, when we were searching for a replacement for “kiwi”, we were looking for the smallest item of the list. In the second case, we could not possibly have been searching for the smallest item of the list, or else we’d have found “apple” again! Instead, we were searching for the smallest item of the rest of the list, beyond the location we were swapping. Why does this work? Our algorithm essentially partitions the list into two segements: the front of the list has been processed, while the back of the list remains to be processed. Moreover, the front of the list is guaranteed to be sorted.

   0       1   ||  2      3      4     5       6         7        8
[apple, banana,|| kiwi, date, cherry, fig, watermelon, grape, honeydew]
    SORTED  <--++-->  NOT YET SORTED

By searching for the smallest item of the not-yet-sorted portion of the list, and exchanging it with the first item in the not-yet-sorted portion, we have essentially sorted that one item:

                         MIN
   0       1   ||  2      3      4     5       6         7        8
[apple, banana,|| kiwi, date, cherry, fig, watermelon, grape, honeydew]
    SORTED  <--++-->  NOT YET SORTED

    Swap items at index 2 and index 3...

   0       1     2   ||   3      4     5       6         7        8
[apple, banana, date,|| kiwi, cherry, fig, watermelon, grape, honeydew]
       SORTED     <--++-->  NOT YET SORTED

Now if we repeat this process for each index in the list, we’ll have grown the sorted section to encompass the entire list: we’ll have sorted the list.

But how to do that? We cannot use a for-each loop here, because we specifically care about the indices, more than we care about the particular items. We could write our code using a recursive method and an accumulator parameter:

// In ArrayUtil // EFFECT: Sorts the given list of strings alphabetically void sort(ArrayList<String> arr) {
this.sortHelp(arr, 0); // (1) }
// EFFECT: Sorts the given list of strings alphabetically, starting at the given index void sortHelp(ArrayList<String> arr, int minIdx) {
if (minIdx >= arr.size()) { // (2) return;
}
else { // (3) int idxOfMinValue = ...find minimum value in not-yet-sorted part...
this.swap(arr, minIdx, idxOfMinValue);
this.sortHelp(arr, minIdx + 1); // (4) }
}

But this feels clumsy: there’s too much clutter surrounding the actual activity of sortHelp. Again, since iterating over all items by position is such a common operation, Java provides syntax to make it easier: a counted-for loop, or just a for loop. We introduce counted-for loop syntax by rewriting sort and sortHelp to use one:

// In ArrayUtil // EFFECT: Sorts the given list of strings alphabetically void sort(ArrayList<String> arr) {
for (int idx = 0; // (1) idx < arr.size(); // (2) idx = idx + 1) { // (4) // (3) int idxOfMinValue = ...find minimum value in not-yet-sorted part...
this.swap(arr, minIdx, idxOfMinValue);
}
}

A for loop consists of four parts, which are numbered here (and their corresponding parts are numbered in the recursive version of the code). First is the initialization statement, which declares the loop variable and initializes it to its starting value. This is run only once, before the loop begins. Second is the termination condition, which is checked before every iteration of the loop body. As soon as the condition evaluates to false, the loop terminates. Third is the loop body, which is executed every iteration of the loop. Fourth is the update statement, which is executed after each loop body and is used to advance the loop variable to its next value. Read this loop aloud as “For each value of idx starting at 0 and continuing while idx < arr.size(), advancing by 1, execute the body.”

The initialization, termination condition, and update statement used here are pretty typical: loops often count by ones, starting at zero and continuing until some upper bound. But these loops can be far more flexible: they could start counting at some large number, and count down to some lower bound:

for (int idx = bigNumber; idx >= smallNumber; idx = idx - 1) { ... }

or count only odd numbers:

for (int idx = smallOddNumber; idx < bigNumber; idx = idx + 2) { ... }

or anything else that the problem at hand requires.

Exercise
Practice using the counted-for loop: design a method
<T> ArrayList<T> interleave(ArrayList<T> arr1, ArrayList<T> arr2)
that takes two ArrayLists of the same size, and produces an output ArrayList consisting of one item from arr1, then one from arr2, then another from arr1, etc.
Design a method
<T> ArrayList<T> unshuffle(ArrayList<T> arr)
that takes an input ArrayList and produces a new list containing the first, third, fifth ... items of the list, followed by the second, fourth, sixth ... items.

22.6 Finding the minimum value

Exercise
Design the missing method to finish the sort method above: this method should find the minimum value in the not-yet-sorted part of the given ArrayList<String>.

← prev up next →

	General
	Texts
	Lectures
	Syllabus
	Pair Programming Overview
	Code style
	Blog
	Documentation

	Lecture 1: Data Definitions in Java
	Lecture 2: Data Definitions: Unions
	Lecture 3: Methods for simple classes
	Lecture 4: Methods for unions
	Lecture 5: Methods for self-referential lists
	Lecture 6: Accumulator methods
	Lecture 7: Accumulator methods, continued
	Lecture 8: Practice Design
	Lecture 9: Abstract classes and inheritance
	Lecture 10: Customizing constructors for correctness and convenience
	Lecture 11: Defining sameness for complex data, part 1
	Lecture 12: Defining sameness for complex data, part 2
	Lecture 13: Abstracting over behavior
	Lecture 14: Abstractions over more than one argument
	Lecture 15: Abstracting over types
	Lecture 16: Visitors
	Lecture 17: Mutation
	Lecture 18: Mutation inside structures
	Lecture 19: Mutation, aliasing and testing
	Lecture 20: Mutable data structures
	Lecture 21: Array Lists
	Lecture 22: Array Lists
	Lecture 23: For-each loops and Counted-for loops
	Lecture 24: While loops
	Lecture 25: Iterator and Iterable
	Lecture 26: Hashing and Equality
	Lecture 27: Introduction to Big-O Analysis
	Lecture 28: Quicksort and Mergesort
	Lecture 29: Priority Queues and Heapsort
	Lecture 30: Breadth-first search and Depth-first search on graphs
	Lecture 34: Implementing Objects

22.1	Finding an item in an arbitrary Array List
22.2	Finding an item in a sorted Array List – version 1
22.3	Finding an item in a sorted Array List – version 2
22.4	Generalizing to arbitrary element types
22.5	Sorting an Array List
22.6	Finding the minimum value