Russell and Norvig, Chapter 5: Game Playing 5.1 Games as Search Problems - well-defined, accessible environments - differ from previously-discussed search problems - presence of opponent - uncertainty of opponent's actions (contingency problem) - huge search spaces - chess: 10^40 unique states, typical search space ~35^100 - time limits 5.2 Perfect Decisions in Two-Person Games - can search to terminal nodes (end of game) - players are MAX and MIN - MAX chooses a move based on maximizing the outcome given the possible, rational responses of MIN (see example in Fig5.2) - MiniMax algorithm [Fig5.3] 5.3 Imperfect Decisions - assumes search space too big to reach terminal nodes - cutoff search at some point and apply heuristic evaluation to leaves - weighted linear function typically used for heuristic evaluation - learning weights? features? - where to cutoff - fixed depth - iterative-deepening until time runs out - further expand non-quiescent nodes - horizon problem 5.4 Alpha-Beta Pruning [Fig5.6] - do not expand nodes that cannot possibly be better (worse) - MAX (alpha) node value less than the highest so far - MIN (beta) node value greater than the lowest so far - in best case, alpha-beta allows twice-as-deep search - operator ordering based on knowledge or experience 5.5 Games That Include an Element of Chance - chance nodes (in addition to min/max nodes) [Fig5.10] - one for each possible outcome (e.g., dice rolls) with associated probability - calculate expected value (expectimax and expectimin) - replaces MiniMax-Value in MiniMax algorithm - absolute differences in evaluation function can affect move choice [Fig5.11] - pruning possible if values are bounded 5.6 State-of-the-Art Game Programs - chess - Deep Thought 2 (1993) - rated among top 100 players - 1/2 billion positions per move, depths to 11 - Deep Blue (1995?) - 100-200 billion positions per move, depths to 14 - checkers - Samuel's checkers program (1952) - learned evaluation function - Chinook (1992), current world champion - alpha-beta pruning with moderate evaluation function - exhaustive over knowledge-based approach - hard-coded end-game move library - othello - backgammon - BKG (1980) - Tesauro's system (1992) - neural net learning, Samuel-style - among top 3 players - go - $2M prize to computer beating top-level player 5.7 Discussion: Improvements to the Game-Playing Approach - use probability distributions over possible values instead of the raw values to ensure significance of value differences - consider the utility of a node expansion, even if legal - combine goal-directed behavior and search - e.g., get the queen