Russell and Norvig, Chapter 16: Making Simple Decisions 16.1 Combining Beliefs and Desires Under Uncertainty - maximum expected utility - a rational agent should choose an action that maximizes the agent's expected utility - expected utility EU(A|E) of action A given evidence E - EU(A|E) = Sum_i P(Result_i(A) | E, Do(A)) * U(Result_i(A)) - Result_i(A) are possible outcome states after executing action A - U(S) is the agent's utility for state S - Do(A) is the proposition that action A is executed in the current state - "solves" AI, but only as a framework - knowing current state requires perception, learning, KR and inference - P(Res|E,Do(A)) requires complete causal model of world - U(Res) requires search and planning to know the utility of a state based on where we can get from that state - if utility function matches performance measure, agent will maximize performance - one-shot vs. sequential decisions (Ch17) 16.2 Basis of Utility Theory - any set of preferences can be captured by a utility function if certain constraints are satisfied - lottery L = [p1,S1; p2,S2; ... ;pn,Sn] - pi is the probability of possible outcome Si - Si can be another lottery - given that lotteries satisfy the axioms of utility theory [p474]: - utility principle - U(A) > U(B) <=> A preferred to B - U(A) = U(B) <=> agent indifferent between A and B - maximum expected utility principle - U([p1,S1;...;pn,Sn]) = Sum_i p_i * U(S_i) 16.3 Utility Functions (e.g., money) - example - possible outcomes - [1.0,$1000; 0.0,$0] - [0.5,$3000; 0.5,$0] - expected monetary value (EMV) - $1000 vs $1500 - but depends on current monetary value k - EU(accept) = 0.5*U(S_k) + 0.5*U(S_k+3000) - EU(decline) = U(S_k+1000) - will decline for some values of U, accept for others - studies have determined that U(S_k+n) ~ log_2 n [Fig16.1, p477] - risk-averse agents in positive part of curve - risk-seeking in negative part of curve - assigning utilities (i.e., the utility of outcome S) - choose best S_b and worst S_w possible outcomes - U(S_b) = 1, U(S_w) = 0 (normalization) - determine p when agent indifferent between S and [p,S_b; 1-p,S_w] - U(S) = p 16.4 Multi-Attribute Utility Functions - attibutes X1,X2,... - strict dominance [Fig16.2, p481] - stochastic dominance [Fig16.3, p482] - preferences without uncertainty - mutual preferential independence (MPI) of attributes - if MPI, agent's preferred behavior can be found by maximizing - V(S) = Sum_i V_i(X_i(S)) - V_i(X_i(S)) = (typically) w_i * X_i(S) - preferences with uncertainty - mutual utility independence (MUI) of lottery attributes - if MUI, utility expressed as (for three attributes) - U = k1*U1 + k2*U2 + k3*U3 + k1*k2*U1*U2 + k1*k3*U1*U3 + k2*k3*U2*U3 + k1*k2*k3*U1*U2*U3 16.5 Decision Networks - also called influence diagrams - decision networks = belief networks + actions and utilities - describes agent's: - current state - possible actions - state resulting from agent's action - utility of resulting state - composed of [Fig16.4, p485] - chance node (oval) - random variable and CPT (i.e., same as belief network node) - decision node (rectangle) - can take on a value for each possible action - utility node (diamond) - parents are those chance nodes affecting utility - contains utility function mapping parents to utility value or lottery - can remove outcome states [Fig16.5, p486] - utility node's parents are the current state and decision node - utility node's table is the expected utility of each action - less flexible to change in outcome CPTs - e.g., change in Noise = f(AirTraffic) not easy - evaluating decision networks - set evidence variables according to current state - for each action value of decision node - set value of decision node to action - use belief-net inference to calculate posteriors for parents of utility node - calculate utility for action - return action with highest utility 16.6 Value of Information - depends on - whether different outcomes make significant difference in action choice - likelihood of different outcomes - value of perfect information (VPI) - see formulae on p488 - alpha represents the best action - Ej is a random variable with values e_jk - the value of information is - always non-negative: VPI_E(Ej) >= 0 - not additive: VPI_E(Ej,Ek) \= VPI_E(Ej) + VPI_E(Ek) - order independent: VPI_E(Ej,Ek) = VPI_E(Ej) + VPI_E,Ej(Ek) = VPI_E(Ek) + VPI_E,Ek(Ej) - information-gathering agent [Fig16.7, p490] - sensible agent should - ask questions in reasonable order - avoid asking irrelevant questions - weigh importance of information against its cost - stop asking questions when appropriate - agent is myopic - never attempts to get more than one piece of information - a greedy approach 16.7 Decision-Theoretic Expert Systems - augmenting expert systems with decision theory - gives them ability to recommend actions in addition to factual conclusions 16.8 Summary - decision theory = probability theory + utility theory - rational agent chooses action maximizing expected utility