Homework #1 This homework is to be completed on your own, without input, code, or assistance from other students. See me or the TA if you have questions. 1. Show the decision tree that would be learned by C4.5 assuming that it is given the five training examples for the EnjoySport target concept shown in the table below. Show the value of the information gain for each candidate attribute at each step in growing the tree. Break ties randomly. Sky AirTemp Humidity Wind Water Forecast EnjoySport ------------------------------------------------------------------------- sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes sunny warm normal weak warm same no Solution: Let S represent the entire training set. Entropy(S) = -(3/5)lg(3/5) - (2/5)lg(2/5) = 0.44217935 + .5287712 = 0.9709506 We can use this value to compute the Gain of each of the six attributes. Gain(S, Sky) = 0.97 - ((4/5)E(Sky_sunny) + (1/5)E(Sky_rainy)) = .97 - (.8*(-(3/4)lg(3/4) - (1/4)lg(1/4)) + .2*0) = .97 - .65 = .32 Gain(S, AirTemp) = 0.97 - ((4/5)E(AirTemp_warm) + (1/5)E(AirTemp_cold)) = .97 - (.8*(-(3/4)lg(3/4) - (1/4)lg(1/4)) + .2*0) = .97 - .65 = .32 Gain(S, Humidity) = 0.97 - ((3/5)E(Humidity_high) + (2/5)E(Humidity_normal)) = .97 - (.6*(-(2/3)lg(2/3) - (1/3)lg(1/3)) + .4*(-(1/2)lg(1/2) - (1/2)lg(1/2))) = .97 - (.55 + .4) = .02 Gain(S, Wind) = 0.97 - ((4/5)E(Wind_strong) + (1/5)E(Wind_weak)) = .97 - (.8*(-(3/4)lg(3/4) - (1/4)lg(1/4)) + .2*0) = .97 - .65 = .32 Gain(S, Water) = 0.97 - ((4/5)E(Water_warm) + (1/5)E(Water_cool)) = .97 - (.8*(-(1/2)lg(1/2) - (1/2)lg(1/2)) + .2*0) = .97 - .8 = .17 Gain(S, Forecast) = 0.97 - ((3/5)E(Forecast_change) + (2/5)E(Forecast_same)) = .97 - (.6*(-(2/3)lg(2/3) - (1/3)lg(1/3)) - .4*1 = .97 - (.6*.92 + .4) = 0.02 This means there is a three-way tie between Sky, AirTemp, and Wind for the choice of attribute to place at the root. If we place Sky at the root, the decision tree becomes Sky / \ sunny / \ rainy / \ 3+,1- NO The entropy of the new non-leaf node is E(Sunny) = -(3/4)lg(3/4) - (1/4)lg(1/4) = 0.81 Gain(Sunny, AirTemp) = .81 - (1.0*(-(3/4)lg(3/4) - (1/4)lg(1/4))) = .81 - .81 = 0.0 Gain(Sunny, Humidity) = .81 - ((2/4)E(Humidity_high) + (2/4)E(Humidity_normal)) = .81 - (.5*0 + .5*(-(1/2)lg(1/2) - (1/2)lg(1/2))) = .81 - .5 = .31 Gain(Sunny, Wind) = .81 - ((3/4)E(Wind_strong) + (1/4)E(Wind_weak)) = .81 - (.75*0 + .25*0) = .81 Gain(Sunny, Water) = .81 - ((3/4)E(Water_warm) + (1/4)E(Water_cool)) = .81 - (.75*(-(2/3)lg(2/3) - (1/3)lg(1/3)) + .25*0) = .81 - .69 = .12 Gain(Sunny, Forecast) = .81 - ((3/4)E(Forecast_same) + (1/4)E(Forecast_change)) = .81 - (.75*(-(2/3)lg(2/3) - (1/3)lg(1/3)) + .25*0) = .81 - .69 = .12 The attribute with the largest gain is Wind, so the decision tree becomes Sky / \ sunny / \ rainy / \ Wind NO / \ strong / \ weak / \ YES NO If AirTemp had been chosen initially instead of Sky, the rest of the tree (and all of the calculations) would be the same as above, switching attribute Sky with attribute AirTemp. If, however, we had picked Wind as the root node, the decision tree would become Wind / \ strong / \ weak / \ 3+,1- NO Sky AirTemp Humidity Wind Water Forecast EnjoySport ------------------------------------------------------------------------- sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes E(Strong) = -(3/4)lg(3/4) - (1/4)lg(1/4) = .81 Gain(Strong, Sky) = .81 - ((3/4)E(Sky_sunny) + (1/4)E(Sky_rainy)) = .81 - (.75*0 + .25*0) = .81 Gain(Strong, AirTemp) = .81 - ((3/4)E(AirTemp_warm) + (1/4)E(AirTemp_cold)) = .81 - (.75*0 + .25*0) = .81 Gain(Strong, Humidity) = .81 - ((3/4)E(Humidity_high) + (1/4)E(Humidity_normal)) = .81 - (.75*(-(2/3)lg(2/3) - (1/3)lg(1/3)) + .25*0) = .81 - .69 = .12 Gain(Strong, Water) = .81 - ((3/4)E(Water_warm) + (1/4)E(Water_cool)) = .81 - (.75*(-(2/3)lg(2/3) - (1/3)lg(1/3)) + .25*0) = .81 - .69 = .12 Gain(Strong, Forecast) = .81 - ((1/2)E(Forecast_same) + (1/2)E(Forecast_change)) = .81 - (.5*0 + .5*(-(1/2)lg(1/2) - (1/2)lg(1/2))) = .81 - .5 = .31 Here, the attributes with the largest gain are AirTemp and Sky. So, the decision tree becomes, Wind / \ strong / \ weak / \ Sky NO / \ sunny / \ rainy / \ YES NO or Wind / \ strong / \ weak / \ AirTemp NO / \ warm / \ cold / \ YES NO 2. Given the data used for problem 1, construct a neural net to classify examples in terms of the EnjoySport concept. Assume that there are two hidden nodes. Show the structure of the neural network with the number of input nodes and output nodes. Assign each edge a value of 0.5 and show what the output of the network would be for the first example in the dataset. For simplicity, use the step (perceptron) function instead of the sigmoid rule and assume the threshold value is 0. The number of nodes in the input layer would be six, corresponding to the six observable attributes. By default, all seven input nodes would be connected to the two nodes in the hidden layer, each with an edge weight of 0.5. The two hidden nodes would each have a directed edge to the single output node. Use the following interpretations of input and output values. For input node 1 (Sky): 0=sunny, 1=rainy For input node 2 (AirTemp): 0=warm, 1=cold For input node 3 (Humidity): 0=high, 1=normal For input node 4 (Wind): 0=strong, 1=weak For input node 5 (Water): 0=warm, 1=cool For input node 6 (Forecast): 0=same, 1=change For output node 7 (EnjoySport): 1 = yes, -1 = no For each training example input0 has a value of 1 (this is the threshold node). For training example 1, the value of each node is: input1=0, input2=0, input3=1, input4=0, input5=0, input6=0 hidden1: weighted sum of inputs = 0.5, value=1 hidden2: weighted sum of inputs = 0.5, value=1 output: weighted sum of inputs = 1.0, value=1 (correct label) For training example 2, the value of each node is: input1=0, input2=0, input3=0, input4=0, input5=0, input6=0 hidden1: weighted sum of inputs = 0, value=-1 hidden2: weighted sum of inputs = 0, value=-1 output: weighted sum of inputs = -1.0, value=-1 (incorrect label) For training example 3, the value of each node is: input1=1, input2=1, input3=0, input4=0, input5=0, input6=1 hidden1: weighted sum of inputs = 1.5, value=1 hidden2: weighted sum of inputs = 1.5, value=1 output: weighted sum of inputs = 1.0, value=1 (incorrect label) For training example 4, the value of each node is: input1=0, input2=0, input3=0, input4=0, input5=1, input6=1 hidden1: weighted sum of inputs = 1.0, value=1 hidden2: weighted sum of inputs = 1.0, value=1 output: weighted sum of inputs = 1.0, value=1 (correct label) For training example 5, the value of each node is: input1=0, input2=0, input3=1, input4=1, input5=0, input6=0 hidden1: weighted sum of inputs = 1.0, value=1 hidden2: weighted sum of inputs = 1.0, value=1 output: weighted sum of inputs = 1.0, value=1 (incorrect label)