Russell and Norvig, Chapter 21: Knowledge in Learning 21.1 Knowledge in Learning - find Hypothesis satisfying entailment constraint - Hypothesis & Descriptions |= Classifications - cumulative learning process [Fig21.1, p626] - explanation-based learning - Hypothesis & Descriptions |= Classifications - Background |= Hypothesis - relevance-based learning - Hypothesis & Descriptions |= Classifications - Background & Descriptions & Classifications |= Hypothesis - inductive logic programming - Background & Hypothesis & Descriptions |= Classifications 21.2 Explanation-Based Learning (EBL) - Look what Zog (Thag) do! [Far Side cartoon] - generalize a proof of a query - identify necessary conditions - process [Fig21.2, p631] - given example, use BK to construct proof of goal predicate - e.g., simplify(1*(0+X),X) - construct generalized proof tree in parallel - same inferences using variablized goal - construct new rule - head is variablized goal, given bindings - body is leaves of proof tree, given bindings - drop conditions always true - e.g., ArithmeticUnknown(z) => Simplify(1*(0+z),z) - implementation in Prolog [Bratko, Fig20.3-4, p541-544] - improving efficiency - learn more general rules - which ones? - operationality - don't include simple proofs in rule body - utility problem - retain only rules having positive utility - learning control rules 21.3 Learning Using Relevance Information - relevance-based learning (RBL) - determinations - constrains predicates used to construct hypotheses - e.g., forall(X) nationality(X,usa) => language(X,english) - i.e., nationality determines language - finding determinations [Fig21.3, p635] - find smallest subset of attributes with same values and class value - can be used to filter attributes for IDT (RBDTL) [bottom, p635] - learning curves [Fig21.4, p636] 21.4 Inductive Logic Programming (ILP) - attribute-based learning algorithms cannot learn relations - background knowledge allows simpler hypotheses - e.g., having 'parent' instead of just 'father' and 'mother' to describe 'grandparent' relation - inverse resolution - given correct Hypothesis, resolution can derive Classifications - resolve(C1,C2) -> C - invert resolution to derive Hypothesis [Fig21.6, p639] - invert_resolve(C) -> C1, C2 - invert_resolve(C,C1) -> C2 - typically infinite number of resolvents - restrict to Horn clauses, eliminate functions, etc. - generate new predicates - C1 and C2 contain a new predicate that is resolved away - e.g., [Fig21.7, p641] - top-down learning methods - similar to decision-tree learning - given examples and BK, start with most general clause - e.g., gpa(X,Y). - try adding one literal - e.g., gpa(X,Y) :- father(X,Y). gpa(X,Y) :- parent(X,Z). % assuming parent in BK gpa(X,Y) :- father(X,Z). - keep clause describing most positive and fewest negative examples - e.g., gpa(X,Y) :- father(X,Z). - continue adding literals until clause consistent - e.g., gpa(X,Y) :- father(X,Z), parent(Z,Y). - remove positive examples covered - repeat to find more clauses until all positives covered - FOIL [Fig21.8, p643] - New-Literals - any literal from BK (negated or unnegated) including target - arguments must contain a variable used earlier in clause - X \== Y or X =< Y, where X and Y appear earlier in clause - X and Y can also be constants - constrained by type information - Choose-Literal - heuristic measure based on information gain - clauses pruned based on Ockham bias - when clause description length exceeds that of covered examples - solved many list exercises from Bratko's book