next up previous

CSE 2320 Section 501/571 Fall 1999

Program 2

Due: October 12, 1999, 5:30pm (October 13, 1999, 5:00pm for -10%)

Implement a program that produces a histogram of unique tokens from a text file. Your program will read in tokens from a given text file and search for them in a hash table using collision resolution by chaining. If not found, you will insert the token into the table with a count of one. If found, you will increment the count of this token by one. After reading the entire file, you will print out the tokens in decreasing order by count, and in lexicographic order within tokens having the same count. Lastly, your program will print out the minimum, maximum, mean and standard deviation of the length of the hash table chains, and the hash table load factor (tokens in table / size of table). Specifically,

You may write your code in C or C++, but cannot use global variables. Those of you using C++ classes may use state variables for the major data structures, but avoid inappropriate use of state variables as a global variables. Follow the Coding Standards referenced in Program 1 and be sure to write modular, well-documented code.

A token is any string of characters (ASCII codes 33-126) separated by whitespace (spaces, tabs or newlines). You may assume the input file consists of only these characters.

Your program must process each token as it is read in from the file. You should not first read in all tokens prior to processing them.

The hash table using collision resolution by chaining should be of size m=256 (define this as a constant). Use the division method hash function h(k) = k mod m, where k is the sum of the ASCII values of the first ten characters of the token. If the token has less than ten characters, then sum all the characters.

You may use an auxiliary array for sorting the histogram. You must implement the sorting algorithm(s) yourself from one of those discussed in class; you may not use the C/C++ library sorting procedures.

Follow the instructions from Program 1 for handing in the file containing all your source code using the program /public/cse/2320-501/handin2.

next up previous