Russell and Norvig, Chapter 15: Probabilistic Reasoning Systems 15.1 Representing Knowledge in an Uncertain Domain - represention of uncertain knowledge from Chapter 14 - complete joint probability distribution - conditional probabilities and Bayes rule - assuming conditional independence - belief networks - nodes represent random variables - directed link between X and Y implies that X "directly influences" Y - each node has a conditional probability table (CPT) quantifying the effects that the parents (incoming links) have on the node - network is a DAG (no directed cycles) - example belief network [Fig15.2, p439] - only represent what we know - which summarizes what we don't know - reasons alarm doesn't go off - reasons John and Mary don't call 15.2 Semantics of Belief Networks - network represents the joint probability distribution - every entry of the JPD can be calculated form the network - P(X1=x1 ^ ... ^ Xn=xn) = P(x1,...,xn) = Pi(i=1,n) P(xi|Parents(Xi)) - how to construct network ? - network encodes conditional independence statements - P(x1,...,xn) = Pi(i=1,n) P(xi|x(i-1),...,x1) - {\bf P}(Xi|X(i-1),...,X1) = P(Xi|Parents(Xi)) (1) - provided Parents(Xi) subset of {X(i-1),...,X1} - i.e., order the nodes breadth first - those X(i-1),...,X1 not in Parents(Xi) implies conditional independence between the two - conditional independence implies no direct influence - e.g., MaryCalls and EarthQuake are conditionally independent MaryCalls and Alarm are not - network construction - algorithm choose variables (nodes) Xi order variables while variables left pick variables Xi and add node for it select Parents(Xi) such that conditional independence satisfied (1) define CPT for Xi - designer cannot (i.e., unable to) violate axioms of probability - order variables from causes to effects (causal model) - may not need to specify each entry of CPT - e.g., node represents disjunction of parents (add parent's probs.) - conditional independence relations - are X and Y conditionally independent given evidence E - depends on whether E "blocks" the path from X to Y - graphical depiction [Fig15.4, p445], example [Fig15.5, p446] - needed for inference 15.3 Inference in Belief Networks - given network, compute P(Query|Evidence) - evidence obtained from percepts - possible inferences - diagnostic: P(Burglary|JohnCalls) - causal: P(JohnCalls|Burglary) - intercausal: P(Burglary|Alarm ^ Earthquake) - mixed - algorithm for polytrees [Fig 15.7, p449; Fig15.8, p452] - at most one directed path between two nodes - example P(Burglary|JohnCalls) = 0.016 15.4 Inference in Multiply-Connected Belief Networks - clustering [Fig15.9-15.10, p454] - convert to polytree by combining alternative nodes into a meganode - use polytree algorithm - queries on clustered variables averaged over values of others - CPTs grow exponentially in size - cutset conditioning [Fig15.11, p455] - form multiple polytrees - choose cutset - form a polytree for each instantiation of variables in cutset - average results - exponential number of polytrees - bounded cutset conditioning - only use a subset of polytrees - stochastic simulation - logic sampling - randomly chose values of root-node variables according to their priors - continue with next layer, using CPTs - iterate for a number of trials - P(Q|E) = [freq(Q and E) / freq(Q)] in trials - difficult for rare values of E, so ... - likelihood weighting - use given evidence values, instead of random - randomly choose non-evidence values - use CPTs to compute likelihood of this value - use accumulated CPTs to weight query value result - iterate (choosing different random values) - compute P(Q|E) using frequencies weighted by likelihoods - example in text [p456] 15.5 Knowledge Engineering for Uncertain Reasoning - decide what to talk about - decide on vocabulary of random variables - discretize continuous variables - encode knowledge about variable dependencies - links and CPTs - encode case description - pose queries - case study: PathFinder IV - lymph node diseases - almost 90% accurate - better than the experts involved in design 15.6 Other Approaches to Uncertain Reasoning - default reasoning - statements assumed true in the absence of contradictory evidence - rule-based methods - locality - if A=>B and A, then B without considering other evidence - detachment - once B concluded, then used despite dependence on A - these allow more efficiency - but not appropriate for uncertain reasoning - add certainty factors (MYCIN) - but can lead to incorrect inferences - representing ignorance - Dempster-Shafer theory - computes belief function - probability evidence supports proposition - not probability of proposition itself - representing vagueness - fuzzy sets and fuzzy logic - propositions are no longer true or false, but a number between 0 and 1 - evaluation of sentences similar to certainty factors - fuzzy logic - vagueness on meaning is not uncertainty