On this page:
11.1 Finishing Heap  Sort
11.2 Dependencies as Directed Acyclic Graphs
11.3 Implementing Topological Sort
11.4 Discussion

Lab 11: Depth-First Search and Topological Sort🔗

Goals: To practice graph traversal algorithms and learn about topological sort.

Reminder: If you have not emailed Prof Lerner with exam conflicts for Tuesday, April 9 at 6-9pm Boston time, please do so ASAP.

11.1 Finishing HeapSort🔗

If you have not completed implementing Heapsort from last week’s lab, please do so.

11.2 Dependencies as Directed Acyclic Graphs🔗

The focus of this lab is on directed, acyclic graphs (called DAGs). These kinds of graphs are often used for showing dependencies between different objects. For example, here’s a subset of the computer science courses offered at Northeastern and their prerequisites.

It’s much easier to view this information as a DAG, where there’s an edge from course A to course B if course A has course B as a prerequisite. Draw this graph on paper for yourself as an exercise; you will use it below.

A particular student has decided they want to take all of these courses, but they can only take one course per semester (this student is particularly ambitious, so they’re also majoring in biology, and have a full-time job, and lead the debate club, and ...). With all of these prerequisites, though, it’s a little difficult for them to figure out a valid order they can take these courses in so they can plan their schedule for the next few years, and they’ve enlisted your help.

Let’s try doing this by hand first. By looking at the graph you drew, figure out and write down some possible order in which the student can take those classes so that they always satisfy the prerequisites for the next course. (Note: there are multiple possible orders, but we only need to find one of them.) What mental “algorithm” did you use to figure it out?

11.3 Implementing Topological Sort🔗

This is not quite the same kind of sorting as the other algorithms we’ve seen: the nodes of the graph are not comparable in all cases. We don’t know, based on the example above, whether "Object-Oriented Design" should come before or after "Computer Systems", but we do know that both of them must come after "Fundamentals 2".

It turns out this student is indecisive, though, and wants to look at lots of possible sets of courses. Figuring out a valid order for each set by hand would be tedious, so let’s come up with an algorithm to do it, instead. The general problem is called topological sort, and there are a few algorithms to compute such an ordering. The purpose of these algorithms is to take some DAG and produce an ordered list of its vertices such that if there is an edge from vertex u to vertex v, then u comes after v in the list.

For this lab, you’ll use the following data definitions. They are simplified from the ones used in lecture (we’re skipping a separate class for edges, since there is no information we need to store on those edges for this problem):

class Curriculum {
ArrayList<Course> courses;
Curriculum() { this.courses = new ArrayList<Course>(); }
// EFFECT: adds another course to the set of known courses void addCourse(Course c) { this.courses.add(c); }
// add methods here... }
class Course {
String name;
ArrayList<Course> prereqs;
Course(String name) { this.name = name; this.prereqs = new ArrayList<Course>(); }
// EFFECT: adds a course as a prereq to this one void addPrereq(Course c) { this.prereqs.add(c); }
// add methods here }

Fortunately, the algorithm we’ll work through here is based on something you’ve already seen: depth-first search. Recall that in depth-first search, processing a node involves the following steps:

In DFS, the point of the algorithm was to determine whether some vertex was reachable from another one. Topological sort, however, isn’t looking for any particular vertex. Instead, the point is to traverse the graph in depth-first order and add a vertex to a sorted result list only after all of its “prerequisite” vertices have been added to the result list. As a result, the process method for topological sort consists of the following steps.

Note that here we add the processed vertex to the result list after processing its children, rather than before as we did in the DFS algorithm. This is important for two reasons. First, because we’re assuming that topological sort will be given only acyclic graphs, there is no risk of the algorithm running into an infinite loop. Second, by adding the vertex to the list only after its prerequisite vertices have been processed, we ensure that that vertex comes after its prerequisites (as well as their prerequisites, and their prerequisites’ prerequisites, and so on).

Just doing a single depth-first search is not enough to cover all of the courses, though. For example, doing a depth-first search starting from "Compilers" would never process "Algorithms and Data", because neither one (transitively) depends on the other. So topological sort does one depth-first traversal starting from every vertex in the graph. This ensures that every vertex is processed at least once, but subsequent re-processing of a vertex won’t cause any harm because we don’t do anything for vertices that have already been seen.

Follow these steps to implement topological sort.

Challenge: If you’ve gotten this working, try enhancing the algorithm to detect if it’s actually been given a cyclic graph. Your enhanced algorithm should take in any graph and either return an appropriate topologically-sorted list, or throw an exception if a cycle is detected. What technique can you (re)use from algorithms you may have already seen, to help in this process?

11.4 Discussion🔗

Topological sort is used in many, many places, wherever scheduling constraints might be interesting. Consider baking a fancy pastry, which requires filling, cake layers, and frosting, each of which requires several preparation stages in advance, each of those of which require prepping ingredients, etc... Or more progammatically, we might worry about which order to load various support libraries in order to boot up the computer and eventually load and run our own programs. When there are additional kinds of constraints (like finishing the overall processing in as short a time as possible), then we need fancier algorithms yet than this, but topological sorting gives us a baseline to compare those other algorithms against.