CSE 2320 Section 501/571 Fall 1999

**Sample Dynamic Programming / Greedy Algorithms Solution**

- 1.
- Consider the problem of constructing an optimal binary search tree
(in terms of search time) for a set of integer keys, given the frequencies
with which each key will be accessed. Assume there are
*n*keys with values 1 to*n*, and key*i*has frequency*f*_{i}such that and . The goal is to minimize the value of where*d*_{i}is the depth of key*i*in the binary search tree.- (a)
- Let
*c*(*i*,*j*) be the optimal cost of a binary search tree containing the keys . Assuming the optimal choice for the root of this tree is key*k*, give an expression for*c*(*i*,*j*) in terms of*c*(*i*,*k*-1) and*c*(*k*+1,*j*).

The last two terms derive from the fact that attaching the left and right subtrees to the root key

*k*adds one to the depth of each key in the subtrees. Thus, each*f*_{i}*d*_{i}term in the original summation becomes*f*_{i}(*d*_{i}+ 1), which adds an extra*f*_{i}to the cost for each key in the subtree, except for key*k*. - (b)
- Show that this problem exhibits optimal substructure.
Let be the cost of a binary search tree

*T*for the keys*i*through*j*. Assume we have an optimal tree*T*_{opt}for the keys*i*through*j*consisting of a root key*k*with left subtree*T*_{L}containing keys*i*to*k*-1 and right subtree*T*_{R}containing keys*k*+1 to*j*(see figure below).We can express the cost of the tree

*T*_{opt}as

To prove optimal substructure, we need to show that the subtrees

*T*_{L}and*T*_{R}contained in*T*_{opt}are optimal binary search trees for the keys*i*to*k*-1 and*k*+1 to*j*, respectively. Using proof by contradiction, assume that there is a tree having a lower cost than*T*_{L}; i.e., . Then, we can build a new tree by replacing*T*_{L}by in*T*_{opt}. The cost of tree will be the same as equation 1 with*c*_{TL}(*i*,*k*-1) replaced by . Since , . However, this contradicts the original assumption that*T*_{opt}is the optimal tree. Therefore, the subtrees must be optimal as well, and the problem exhibits optimal substructure. - (c)
- Define a recursive solution for computing
*c*(*i*,*j*) and write pseudocode for a divide-and-conquer algorithm implementing your solution.Equation 1 already describes a recursive solution to the problem when we know the value of

*k*. In general, we do not know the optimal key*k*to be the root, so, as in the matrix multiplication problem, we can try each possible*k*and retain the minimum cost choice. The only other aspect of the recursive solution is the stopping condition, which occurs when computing the cost*c*(*i*,*j*) when . In this case, the cost for an empty tree or a tree containing one key is zero. Thus, the recursive solution is

`RECURSIVE-TREE(`*i*,*j*,*f*)

1 if

2 then return 0

3 else

4 for*k*=*i*to*j*

5*c*_{1}= 0

6 for*l*=*i*to*j*

7*c*_{1}=*c*_{1}+*f*_{l}

8*c*_{1}=*c*_{1}-*f*_{k}

9*q*= RECURSIVE-TREE(*i*,*k*-1) + RECURSIVE-TREE(*k*+1,*j*) +*c*_{1}

10 if*q*<*c*

11 then*c*=*q*

12 return c - (d)
- Give a recurrence
*T*(*n*) for the running time of your recursive solution in part c, where*n*=*j*-*i*+1, and show that .The recurrence for RECURSIVE-TREE is shown below (best, average and worst cases are all the same). Note the addition of the term representing the computation of the added depth cost in lines 5-8.

Using the substitution method, we can show using the inductive hypothesis for

*k*<*n*.

The last inequality is true only if , or , which is true for any constant

*c*and sufficiently large*n*. Thus, . - (e)
- Show that the recursive solution has overlapping subproblems and
compute the number of unique subproblems in terms of
*n*.Below is a portion of the recursion tree for the computation of

*c*(1,4)with overlapping subproblems indicated.For each subproblem

*c*(*i*,*j*), . Thus, for a particular value of*i*,*j*can range from*i*to*n*for a total of*n*-*i*+1 values. Thus, for all values of*i*from 1 to*n*, we get the following number of unique subproblems.

There are only a polynomial number of unique subproblems.

- (f)
- Write pseudocode for a
*O*(*n*^{3}) bottom-up, dynamic programming solution to the optimal binary search tree problem. Justify the*O*(*n*^{3})running time of your solution. Note that your solution must keep track of the root keys for each subtree and must include a procedure for actually constructing the tree.Following the MATRIX-CHAIN-ORDER algorithm given in class, the following OPTIMAL-ROOTS(

*f*) algorithm computes the optimal key roots for the entire tree and each of its subtrees given the key frequencies*f*.`OPTIMAL-ROOTS(`*f*)

1*n*= length(*f*)

2 allocate

3 allocate

4 allocate

5 for*i*= 0 to*n*+1

6 for*j*= 0 to*n*+1

7*sf*[*i*,*j*] = 0

8 if*i*<*j*

9 then

10 else*c*[*i*,*j*] = 0

11 for*ws*= 2 to*n*

12 for*i*= 1 to*n*-*ws*+ 1

13*j*=*i*+*ws*- 1

14 for*k*=*i*to*j*

15*sf*[*i*,*j*] =*sf*[*i*,*k*-1] +*sf*[*k*+1,*j*] +*f*_{k}

16*q*=*c*[*i*,*k*-1] +*c*[*k*+1,*j*] +*sf*[*i*,*j*] -*f*_{k}

17 if*q*<*c*[*i*,*j*]

18 then*c*[*i*,*j*] =*q*

19*r*[*i*,*j*] =*k*Analysis and Explanation: Line 1 determines the number of keys

*n*from the length of the frequency list*f*in time. Lines 2-4 each take time to allocate the arrays used by the procedure. Array*c*[*i*,*j*] will hold the optimal cost for a tree containing keys*i*to*j*. Array*r*[*i*,*j*] will hold the value of the key at the root of the optimal tree containing keys*i*to*j*. Array*sf*[*i*,*j*] will compute in a bottom-up fashion to avoid making the algorithm*O*(*n*^{4}) by computing this sum on demand. Lines 5-10 consist of two nested`for`loops initializing the arrays in time. Lines 11-19 follow the MATRIX-CHAIN-ORDER algorithm to determine the optimal tree costs and root keys using three nested`for`loops in worst-case running time of*O*(*n*^{3}). The two differences worth noting are that*k*goes from*i*to*j*instead of to*j*-1, and that*sf*[*i*,*j*] is calculated bottom up along with*c*[*i*,*j*] and*r*[*i*,*j*]. Thus, the total worst case running time of OPTIMAL-ROOTS is dominated by the nested`for`loops taking*O*(*n*^{3}) time.All that remains is an algorithm to generate the actual tree once the roots are determined. The following OPTIMAL-BST(

*r*,*i*,*j*) algorithm will return a pointer to the optimal binary search tree for the keys*i*to*j*according to the optimal roots in*r*previously computed by calling OPTIMAL-ROOTS. The initial call would be OPTIMAL-BST(*r*,1,*n*).`OPTIMAL-BST(`*r*,*i*,*j*)

*x*= ALLOCATE-BST-NODE()

*k*=*r*[*i*,*j*]

key(*x*) =*k*

left(*x*) = NIL

right(*x*) = NIL

if*i*<*j*

then if*k*>*i*

then left(*x*) = OPTIMAL-BST(*r*,*i*,*k*-1)

if*k*<*j*

then right(*x*) = OPTIMAL-BST(*r*,*k*+1,*j*)

return*x*Since the analysis of this algorithm is similar to that of a binary search tree traversal, the running time is . Thus, the running time for generating the optimal binary search tree is dominated by the OPTIMAL-ROOTS procedure, and is therefore

*O*(*n*^{3}).

- 2.
- Suppose we want to design a greedy algorithm for the optimal binary
search tree problem. Below are two possible greedy choices we could try.
Unfortunately, neither satisfies the greedy choice property. For each,
prove by counterexample that the greedy choice does not satisfy the greedy
choice property.
- (a)
- Choose the middle (
)
key as the root.
Consider a problem with

*n*=3 keys with frequencies*f*_{1}= 0.1,*f*_{2}= 0.1 and*f*_{3}= 0.8. In the figure below the tree on the left will result from the greedy choice of taking the middle (second) key as the root with a cost of (0.1)(1) + (0.1)(0) + (0.8)(1) = 0.9. However, the optimal tree on the right has a better cost of (0.1)(2) + (0.1)(1) + (0.8)(0) = 0.3.

Therefore, the greedy choice is not in the optimal solution and does not satisfy the greedy choice property.

- (b)
- Choose the highest frequency key as the root.
Consider a problem with

*n*=4 keys with frequencies*f*_{1}= 0.1,*f*_{2}= 0.2,*f*_{3}= 0.3 and*f*_{4}= 0.4. In the figure below the tree on the left will result from the greedy choice of always taking the highest-frequency key as the root with a cost of (0.1)(3) + (0.2)(2) + (0.3)(1) + (0.4)(0) = 1.0. However, the optimal tree on the right has a better cost of (0.1)(2) + (0.2)(1) + (0.3)(0) + (0.4)(1) = 0.8.

Therefore, the greedy choice is not in the optimal solution and does not satisfy the greedy choice property.