# Homework 4

#### Due: April 28, 2005 (midnight). No late submissions accepted.

For this homework you will utilize Java implementations for Bayesian inference (JavaBayes) and reinforcement learning to apply some of the concepts we learned in class to the Wumpus world. These problems will be solved in a standalone fashion, independent of the Wumpus simulator.

1. For this problem we will use the JavaBayes system to implement a Bayesian network for computing the probability of a wumpus in a location given stench information.

2. Use JavaBayes to create the Bayes network shown below. You may assume a 10x10 wumpus world with one stationary wumpus, where each cell is equally likely to contain the wumpus. You should ignore effects near the walls. Save your network to a file.
3. Using the network to compute the probability of a wumpus with no stench information, then one stench, then two stenches, then three stenches, and then four stenches. This can be done by using the "Observe" feature to set a stench node to true, and then use the "Query" feature to query the probability of the wumpus node. This information will be printed to the console window, which you should then save to a file.
4. Collect the wumpus network file with no evidence, a file with an explanation of your rationale for the CPT values of each node, and the file containing the dump of the console window.

2. For this problem you will implement a reinforcement learning (RL) agent to navigate the specific wumpus world of Figure 7.2 in Russell and Norvig. The RL agent will select actions based on the utility of states in the wumpus world according to the following formula:
U(i) = R(i) + maxaj Maij U(j)

The state i represents a four-tuple [X,Y,Orient,Gold], where X,Y (1 ≤ X,Y ≤ 4) is the location of the agent, Orient{up, down, left, right} is the orientation of the agent, and Gold{no,yes} indicates whether the agent has the gold. The actions a are one of {goforward, turnleft, turnright, grab}. The reward R(i) for state i is equal to -0.05 except for the following 36 terminal states:

R([1,3,_,_]) = -1.0
R([3,3,_,_]) = -1.0
R([3,1,_,_]) = -1.0
R([4,4,_,_]) = -1.0
R([1,1,_,yes]) = 1.0

Maij is the probability of being in state j after taking action a in state i. You may assume the agent's actions always work as expected; therefore, Maij is always either 0 or 1. For example, the probability of being in state [1,2,right,no] after executing goforward in state [1,1,right,no] is 1. The probability of being in state [1,2,right,no] after executing goforward in state [1,1,up,no] is 0. Maij = 0 when i is one of the above terminal states. Specifically,

1. Write a method UpdateUtility that makes a single pass over each of the 128 states updating the utility according to the above formula for U(i). The U(j) values in the formula should all come from the previous state utilities, not newly computed ones. In other words, you should not update the current utilities until all of them have been recomputed. Initially, U(i) = 0 for all states i.
2. Write a method RLagent that uses the current utility values to select actions and move through the states of the wumpus world from Figure 7.2. Given that the agent is in state i, the action to choose will be the one maximizing utility:
action = arg maxaj Maij U(j)
RLagent should return a sequence of actions for getting from the initial state [1,1,right,no] to the goal state [1,1,_,yes], or NULL if the agent is killed or exceeds some upper bound (e.g., 100) on the number of actions.
3. The main procedure of your program should iterate between calling UpdateUtility and RLagent until the agent successfully gets the gold and returns to location (1,1) without being killed. Your program should then output the successful action sequence and the number of iterations.
4. Collect your well-documented, object-oriented Java code for the RL solution along with a log of the run.
3. Submit the collected files from the above two problems in a zip file to me (holder@cse.uta.edu) by the above deadline. Also include in your submission a README file that describes what files are in your submission and any special instructions for building and running. In addition to correct functionality and satisfaction of the above constraints, your submission will be graded based on good programming style and documentation.