CSE 2320 Section 501/571 Fall 1999
Program 3
Due: November 2, 1999, 5:30pm (November 3, 1999, 5:00pm for -10%)
For this program you will be implementing the same basic functionality of
Program 2, but the underlying data structure will be a binary-search tree.
As before, you will implement a program that produces a histogram of unique
tokens from a text file. Your program will read in tokens from a given
text file and search for them in a binary-search tree. If not found, you
will insert the token into the tree with a count of one. Tokens should be
inserted in the tree so that an inorder traversal of the tree
visits the tokens in lexicographic order. If found, you will increment
the count of this token by one. After reading the entire file, you will
print out the binary-search tree (see below for details), and then
print out the tokens in decreasing order by count, and in lexicographic
order within tokens having the same count. Lastly, your program will print
out the minimum, maximum, mean and standard deviation of the depth of each
token in the binary-search tree. Specifically,
- 1.
- You may write your code in C or C++, but cannot use global variables.
Those of you using C++ classes may use state variables for the major data
structures, but avoid inappropriate use of state variables as a global
variables. Follow the Coding Standards referenced in Program 1 and be sure
to write modular, well-documented code.
- 2.
- A token is any string of characters (ASCII codes 33-126)
separated by whitespace (spaces, tabs or newlines). You may assume the
input file consists of only these characters.
- 3.
- Your program must process each token as it is read in from the file.
You should not first read in all tokens prior to processing them.
- 4.
- Your implementation of the binary-search tree and its operations
(Insert and Search) should follow the pseudocode from the textbook.
- 5.
- The binary-search tree should be printed as if turned 90 degrees
counter-clockwise using indentation to indicate levels of the tree, that is
children are indented three spaces more than their parent. A node in the
tree should be printed as the token followed by the count in parentheses.
For example, the following binary-search tree on the left would be printed
as shown on the right.
- 6.
- You may use an auxiliary array for sorting the histogram. You should
only sort the histogram once by count, because the tokens are already in
the correct order in the tree. You must implement the sorting algorithm(s)
yourself from one of those discussed in class; you may not use the C/C++
library sorting procedures.
- 7.
- Follow the instructions from Program 1 for handing in the file
containing all your source code using the program /public/cse/2320-501/handin3.