CSE 2320 Section 501/571 Fall 1999

**Program 3**

Due: November 2, 1999, 5:30pm (November 3, 1999, 5:00pm for -10%)

Due: November 2, 1999, 5:30pm (November 3, 1999, 5:00pm for -10%)

For this program you will be implementing the same basic functionality of Program 2, but the underlying data structure will be a binary-search tree. As before, you will implement a program that produces a histogram of unique tokens from a text file. Your program will read in tokens from a given text file and search for them in a binary-search tree. If not found, you will insert the token into the tree with a count of one. Tokens should be inserted in the tree so that an inorder traversal of the tree visits the tokens in lexicographic order. If found, you will increment the count of this token by one. After reading the entire file, you will print out the binary-search tree (see below for details), and then print out the tokens in decreasing order by count, and in lexicographic order within tokens having the same count. Lastly, your program will print out the minimum, maximum, mean and standard deviation of the depth of each token in the binary-search tree. Specifically,

- 1.
- You may write your code in C or C++, but cannot use global variables.
Those of you using C++ classes may use state variables for the major data
structures, but avoid inappropriate use of state variables as a global
variables. Follow the Coding Standards referenced in Program 1 and be sure
to write modular, well-documented code.
- 2.
- A
*token*is any string of characters (ASCII codes 33-126) separated by whitespace (spaces, tabs or newlines). You may assume the input file consists of only these characters. - 3.
- Your program must process each token as it is read in from the file.
You should not first read in all tokens prior to processing them.
- 4.
- Your implementation of the binary-search tree and its operations
(Insert and Search) should follow the pseudocode from the textbook.
- 5.
- The binary-search tree should be printed as if turned 90 degrees
counter-clockwise using indentation to indicate levels of the tree, that is
children are indented three spaces more than their parent. A node in the
tree should be printed as the token followed by the count in parentheses.
For example, the following binary-search tree on the left would be printed
as shown on the right.

- 6.
- You may use an auxiliary array for sorting the histogram. You should
only sort the histogram once by count, because the tokens are already in
the correct order in the tree. You must implement the sorting algorithm(s)
yourself from one of those discussed in class; you may not use the C/C++
library sorting procedures.
- 7.
- Follow the instructions from Program 1 for handing in the file
containing all your source code using the program
`/public/cse/2320-501/handin3`.