J. Forbes, T. Huang, K. Kanazawa and S. Russell. "The BATmobile: Towards a Bayesian Automated Taxi," IJCAI-95, pages 1878-1885. 1 The BAT Project - under Intelligent Vehicle and Highway Systems (IVHS) project - improve traffic congestion and accidents - IVHS funds California's Partners for Advanced Transit and Highways (PATH) - groups of closely-spaced vehicles improves highway capacity - PATH funds BAT, although BAT focuses on individual vehicle autonomy - BAT encompasses most of AI - low-level problems maturing - visual vehicle monitoring - visual lane following - knowledge representation, uncertainty, planning, uncertainty, learning, vision, speech and natural language processing, ... - partially accessible, nondeterministic, nonepisodic, dynamic, continuous - input is a 3D rendering of a traffic situation - noisy, possibly failing, sensors - output is high-level actions like braking, accelerating, lane changing - maximum error rate 1 in 100,000,000 secs = 3.17 years - model driving problem as Partially Observable Markov Decision Process (POMDP) - optimal decision a function of current belief state only - current belief state can incorporate previous percepts 2 Maintaining the Current Belief State - state - BAT's and neighboring vehicles' position, velocity and intentions - road and weather conditions - only partially observable - noisy, possibly failing, sensors 2.1 Dynamic Probabilistic Networks (DPNs) - probabilistic networks (see Chap 15 R&N) - also called Belief Networks - directed, acyclic graph - nodes are random variables with CPT based on parents - edges denote causal dependence - succinctly represents exponential joint probability distribution - dynamic PNs (see figure 2) - partially observable Markov process - P(S_t+1 | S_t, S_t-1, ...) = P(S_t+1 | S_t) - next state does not depend on previous states - but, past percepts influence the current state - state evolution and sensor models 2.2 Efficient Updating - update DPN by removing past time slices - Temporary Invariant Networks (see figure 3) - temporally variant networks require the addition of new arcs - invariants have a node deletion ordering avoiding new arcs - Stochastic Simulation in DPNs - generate samples of variable assignments biased by observed evidence - logic sampling uses no evidence bias - liklihood weighting biases based on CPTs - use samples to determine desired probabilities - still generates many samples far from evidence - don't allow unobserved parents to evidence variables - use liklihood weightings to choose only likely samples 2.3 Network Structure (see figure 5) - each vehicle represented by separate DPN - note nodes linking in other vehicles - alternatively putting all vehicles in one network too expensive 2.4 Sensor Models and Sensor Failures (see figure 6) - high variance in or no values indicates degradation or failure (SensorStatus) - linking SensorStatus_i nodes allows poor status to persist 3 Decision Making in the BAT - POMDP decision problem is hard - BAT uses approximate solutions (see 3.1, 3.2 and 3.3 below) 3.1 Dynamic Decision Networks (see figure 7) - add to DPNs nodes for actions and utilities - since partially observable, must maximize over unobserved evidence as well as actions - inefficiency constrains this approach to offline processing of training examples to generate an efficient state/action policy 3.2 Decision Tree Policy Representation - hand-constructed decision trees (e.g., see figure 8) - tests are actually sets of probability thresholds on variables 3.3 POMDP Policy Learning - reinforcement learning to learn action-value function Q(a,i) - expected utility of taking an action in a state - mapping of time-varying CPTs and evidence to action-value 4 Scenarios and Results - traffic scenarios drive a time-step simulator - simulator computes trajectories for all vehicles - vehicle percepts communicated through sensors (possibly noisy) - not via 3D images as indicated in Section 1 - only one BAT in traffic - one 300 node, highly connected network is slow enough - BAT's goal is to maintain a target speed and target lane - five situations - passing a slower car (figure 9) - reacting to unsafe drivers (figure 10) - avoiding a stalled car (figure 11) - aborting a lane change (figure 12) - merging into traffic (figure 13) 5 Summary and Future Work - benefits - DPNs solve noise and partial observability of driving problem - one DPN per vehicle works - handles sensor degradation and failure - real-time updating using temporally invariant networks and stochastic simulation - decision-tree-based policies work - future - learning DPNs from time-series data - vision system -> learning models from videotape of human drivers - continuous variables in the DPN - appropriate utility functions - validation - real-world response from videotape Automated Taxi? - grand challenge - taxis in "Total Recall" - possible? - guaranteed control versus rational autonomy