Design and Analysis of Algorithms

CSE 5311 Section 003 Fall 2004

Program 1

Due: September 29, 2004 (midnight). No late submissions accepted.

Consider the problem of maintaining a database of employees indexed by a unique 10-digit identification number. We want the database to support fast insert and search operations, but also be able to output the numbers in sorted order. Normally, we would store other information about the employee, but for the purposes of this assignment, we will just store the ID number. We will try two solutions to the problem: hash table and red-black tree. The hash table supports fast insert and search, but requires a full sort on the table. The red-black tree provides more balanced support of all three operations. Our goal is to compare the running time and memory requirements of the two solutions on test data. The specifics of the assignment follow.

  1. Implement an open-addressing hash table consistent with the algorithms described in the textbook. The hash function and probe technique are up to you, but should be chosen in order to maximize hash table performance and minimize memory usage (i.e., the size of the hash table). The hash table size, hash function and probe technique should remain fixed throughout all experiments. You may implement the hash table solution yourself, or use code obtained from another source (e.g., the internet). However, if you use any code that is not your own, you must provide a reference (location and author) for the code in a comment and in the summary document described later. You cannot use code written by other students in the class.

  2. Implement an efficient algorithm that produces a separate array containing the numbers in the hash table sorted in increasing order. This should be implemented by you alone, but can make use of the programming language's built-in sorting functionality.

  3. Implement a red-black tree consistent with the algorithms described in the textbook. You only need to implement insert and search operations, as well as an efficient traversal algorithm for producing the sorted array of ID numbers. Again, you may implement the red-black tree yourself, or use code obtained from another source (e.g., the internet). However, if you use any code that is not your own, you must provide a reference (location and author) for the code in a comment and in the summary document described later. You cannot use code written by other students in the class.

  4. Implement code that can generate a sequence of N unique, random 10-digit numbers. This should be implemented by you alone, but can make use of the programming language's built-in random number generation functionality.

  5. Integrate the hash table and red-black tree solutions into one program. The program should also include a control algorithm that can allocate and deallocate the data structures, generate a sequence of unique, random 10-digit numbers, call the various data structure operations and collect the time spent in each phase (insert, search, sort). The control algorithm should be implemented by you alone. Specifically, your control algorithm should do the following.
    for N = 1000, 2000, 3000, ..., 10000
      allocate array A of N unique random 10-digit numbers
      allocate new hash table H and red-black tree T
      insert each element of A into H (record total hash insert time)
      insert each element of A into T (record total tree insert time)
      search for each element of A in H (record total hash search time)
      search for each element of A in T (record total tree search time)
      sort H into A (record total hash sort time)
      sort T into A (record total tree sort time)
      deallocate A, H, T and any other allocated memory
      output N, six individual timings, and two total times (hash and tree)
    

  6. Produce one or more plots showing the eight different timings (six individual and two total) versus N. The number of plots and plotting format should be chosen in order to best compare the two solutions. These plots should be included in the summary document described below. I would recommend the use of MSExcel or similar spreadsheet/plotting application to generate the plots.

  7. Finally, write a summary of your results. The summary should describe any references to code used from other sources, your choices for the hash table implementation, a comparison of memory usage by the hash table and red-black tree, and a discussion of the timings, i.e., which solution was faster for insert, search, sort, overall, and why. The summary should be submitted in MSWord, PDF or PostScript format.

  8. After ensuring that your program compiles and runs correctly on the omega.uta.edu system, submit all source code and the summary document to me (holder@cse.uta.edu) by the above deadline. Please only send one email message; use attachments if sending multiple files. In addition to correct functionality and satisfaction of the above constraints, your submission will be graded based on good programming style and documentation (this includes code from other sources).