Assignment 6

 

CS 3500 - Summer 1 2014 - Assignment #6

Posted: Wednesday, May 21, 2014

Due: Wednesday, May 28, 2014 at 11:59pm


The purposes of this assignment are:

* to design code to timing test your KVMap<K,V> implementation with and without a

       Comparator

* to timing test your KVMap<K,V> implementation

* to understand the performance of your KVMap<K,V> implementation


Collaboration between students is forbidden on this assignment. You are responsible for keeping your code hidden from all other students. You will submit this assignment within Web-CAT.


The assignment shall be implemented in Java and will be tested using Sun's Java 2 Runtime Environment, Standard Edition, version 1.6.0 on the Linux machines in the labs. Your assignment must run in the designated environment without any changes.


Part of your grade will depend on the quality and correctness of your code, part will depend on the readability of your code (comments and indentation), and part will depend on how well you follow the procedure above for submitting your work. Assignments submitted between 12:00 am and 11:59 pm the day after the due date will receive a 20 percentage point penalty on the assignment.


This assignment is worth 100 points: 25 points for automated code style, 10 points for automated testing, and 65 points from grader.


-------------------------------------------------- 


For this assignment, you will be writing a timing program for your implementation of KVMap<K,V> from Assignment 5. (Note: If your KVMap<K,V> implementation had errors, you will need to correct them before you can complete this assignment.)


You will submit your timing program, your updated KVMap<K,V> implementation that you timed, and a report of your timing output. Format of report is given below. You do not need to submit the files that you are given. Name your zip or tar files with your CCIS username.


--------------------------------------------------


Your timing program will contain two sets of timing---timing on lists of words and testing on classical literature.


A StringIterator program is provided to iterate through a text file.


The timing program sets up the data needed to perform the timing tests, runs the timing tests and collects the timing data, then reports on the results. When you cannot estimate

how long the timing tests will run, it helps if you also print the results of each test, as soon as it completes. You will not be using JUnit or the Tester library as you are not writing that type of test. Instead you are writing a program to collect the timing data.


Rather than writing out 180 different timing tests, your timing program should use methods, fields, and/or loops to perform the timing tests.


Testing on lists of words:


Your timing program should test all four different operations on the three different data sets (files) with four varying numbers of Strings inserted using three different Comparators to define the ordering of the data. When you run the program, the time that each of these operations took to complete the task should be output.


   Operations:

     - Building a KVMap with given number of Strings from the

       input file, where the key is the String and the value

       is what word it is in the file (5th word has value

       of 5). An example of how this could be written:

         Iterator<String> sit = new StringIterator(inputFile);

         int count = 0;

         while (count < numString && sit.hasNext()) {

            count++;

            mm = mm.assign(sit.next(), count);

         }

     - Creating an iterator for your KVMap.

         Iterator<String> it = mm.iterator();    

     - Iterating through your KVMap

         while(it.hasNext()){ it.next(); }

     - Checking whether the KVMap contains each String from

       contains_list.txt as a key. You will keep track of the

       number of checked Strings contained in your KVMap.

         StringIterator si = new StringIterator(

                            "contains_list.txt");

         for (String s : si) {

           if (mm.containsKey(s)) {

              countContainsTrue++;

           }

           countContains++;

         }


   Files:

     - Words lexicographically ordered - lexicographically_ordered.txt

     - Words ordered according to StringReverseByLex - reverse_ordered.txt

     - Words in random order - random_order.txt


   Number of words to iterate through:

       2000 words

       4000 words

       8000 words

      16000 words

  

   Comparators:

     - StringByLex (should build as BST)

     - StringReverseByLex (should build as BST)

     - No Comparator



Testing on classical literature:


Your timing program should test all four different operations on the three complete data sets (files) using three different Comparators to define the ordering of the data. When you run the program, the time that each of these operations took to complete the task should be output.


   Operations:

     - Building a KVMap from the input file, where the key is

       the String and the value is the what word it is in the

       file (5th word has value of 5). An example of how this

       could be written:

         Iterator<String> sit = new StringIterator(inputFile);

         int count = 0;

         while (sit.hasNext()) {

            count++;

            mm = mm.assign(sit.next(), count);

         }

     - Creating an iterator for your KVMap.

         Iterator<String> it = mm.iterator();    

     - Iterating through your KVMap

         while(it.hasNext()){ it.next(); }

     - Checking whether the KVMap contains each String from

       contains_list.txt as a key. You will keep track of the

       number of checked Strings contained in your KVMap.

         StringIterator si = new StringIterator(

                            "contains_list.txt");

         for (String s : si) {

           if (mm.containsKey(s)) {

              countContainsTrue++;

           }

           countContains++;

         }

        

   Files:

     - hippooath.txt

     - Confucius_The_Great_Learning.txt

     - Apology_Plato.txt

  

   Comparators:

     - StringByLex (should build as BST) 

     - StringReverseByLex (should build as BST)

     - No Comparator

    

--------------------------------------------------


Your report of timing output should be submitted as an Excel file. Your file should have two main worksheets---(1) timing on list of words and (2) timing on classical literature. These worksheets should contain the following headers:

  Comparator, File, Num Strings, Size (#), Build (ms),

  Iterator (ms), Iterate (ms), Contains (ms), Num Contained


You should be able to format your output using commas or tabs so that you can easily open/put the data in Excel either by outputting directly to file or by copy/paste from console.


Along with the two worksheets with the raw data, you should also create six charts. See course directory for examples.

  - Building on lexicographically_ordered.txt

  - Contains on lexicographically_ordered.txt

  - Building on random_order.txt

  - Contains on random_order.txt

  - Building on classical literature

  - Contains on classical literature


--------------------------------------------------


You can download the input files, contains file, comparators, and examples from the course directory:  /course/cs3500su14/Assignments/A06/


--------------------------------------------------------------------------------


System.currentTimeMillis() will be helpful you when completing the timings.

  http://docs.oracle.com/javase/6/docs/api/java/lang/System.html 


To run the timing program, you will need a computer that (1) supports a version of Java similar to the version installed on the CCIS Linux machines and (2) is not being used for any purpose other than running the timing program. (If the machine were being used by other people or to run other programs at the same time, then the other users or programs would probably have some influence on the timings.)

 

If your timing is slow, you may have to expand the stack segment from its default value.  If you need to increase the size of the stack segment, then it is your responsibility to figure out how to do that on your chosen machine. On CCIS Linux machines, you can run the program with a 20 megabyte stack segment as follows:

    % java -Xss20M <name of timing program>     



--------------------------------------------------------------------------------


Warning:


Do not wait until the last minute to submit your work to Web-CAT. Do it early and often - there is no limit on the number of submissions and you can retrieve your earlier submissions as if they were saved in a version control system. It takes a bit of time to do the automatic grading and the system can be slow when too many students are submitting simultaneously. Also, use the feedback Web-CAT to improve your code.