Rajesh P. N. Rao and Dana H. Ballard. "Natural Basis Functions and Topographic Memory for Face Recognition", IJCAI 1995. 1 Introduction - in general, visual object recognition - specifically, expression-invariant face recognition - difficulty representing faces geometrically (non-rigid body) - some success using principal component analysis (PCA) - computationally intensive - PCs of nature images found to be usually derivative-of Gaussian - points in image represented by n-dimensional response vector - n = mk = 9*5 = 45 - m = number of Gaussians used to convolve at point (9) - k = number of scale factors used per Gaussian (5) - faces represented by points selected from image (section 3) - association accomplished by topologically-distributed sparse distributed memory (SDM) (section 4) - 33 points per face achieved 93.3% accuracy on a database of faces from 20 different people 2 Natural Basis Functions - neighboring pixels in nature images highly correlated - this redundancy should be reduced 2.1 Redundancy Reduction via Principal Component Analysis (PCA) - PCA yields a set of orthogonal axes (eigenvectors or principal components) in decreasing order by variance - eigenvectors comprise basis functions for representing input - for example - for n NxN input images J_1,...,J_n - compute mean-centered vectors I_1,...I_n by subtracting each pixel value by the mean value for that pixel over all input images - PCA finds n N^2-dimensional eigenvectors maximizing variance - all n eigenvectors needed to represent input images - but much fewer eigenvectors (m << n) represent significant variances - face recognition achieved by projecting new faces onto chosen m eigenvectors - requires costly recomputation of eigenvectors for each new face 2.2 Unsupervised Learning of Basis Functions - what do eigenvectors for sets of natural images look like? - used Sanger's NN approach on natural images (fig 1a) - 32x32 pixel image patches (1024 inputs to NN) - Hebbian learning (nudge weights toward current I/O case) - results in fig 1b - top nine eigenvectors represent zero, first, second and third order derivatives of Gaussians - so, use 9 derivative-of Gaussians as basis functions - also supported from neurobiological studies as the best fit to cortical receptive field profiles in primates 3 Iconic Representation of Faces - use first (0 and 90 degrees), second (0/60/120 degrees) and third order (0/45/90/135 degrees) derivative of Gaussians (fig 1c) - zero order left out to reduce dependence on intensity - higher orders left out to prevent noise overfitting - total of nine 3.1 Representing Image Regions - response r_(i,j)(x0,y0) of image patch I centered at (x0,y0) to a particular Gaussian basis obtained by convolution (eqn 7) - ^r(x0,y0) = [r_(i,j,s)(x0,y0)] - i = 1,2,3 (order) - j = 1,...,i+1 denotes which orientation angle - s = s_min,...,s_max denotes scale (obtained?) - scale invariant - high (45) dimensionality promotes orthogonality: arbitrary pairs of vectors tend to be uncorrelated - rotational invariance - current orientation = atan2(r_(1,1,s_max),r_(1,2,s_max)) ^0 deg ^90 deg - rotate all responses to same canonical orientation 3.2 Representing Faces - faces represented by response vectors from various points - points are intersections of radial lines from centroid with exponentially-increasing radii concentric circles (fig 2a) - points limited to lie within face boundary 4 Topographic Sparse Distributed Memory - needed for - long term storage for model faces - method for learning face representation to identity mapping - SDM depicted in fig 3 - model face response vectors used as memory addresses - one SDM per point in iconic face representation (distributedness) - memory addresses point to face identity vectors - identity vectors comprised of {-1,1}^k - for a new response vector - compute normalized dot product d_i with each memory address - select those addresses whose d_i is above some threshold D - D values chosen by user (see section 6) - training - add identity vector to contents of selected addresses - addresses will contain positive or negative integers - Hebbian learning - recognition - add contents of selected memory addresses - threshold result to {-1,1}^k to yield identity 5 Implementation - hardware allows convolutions on 30 frames/sec - convolutions involve 8x8 kernels for each of the 9 derivative-of Gaussians 6 Experimental Results - D-thresholds chosen in the range 0.80-0.95 - experiment 1: discrimination ability of feature vectors - fig 5 shows the correlation of only the centroid vector from the first face to that of the other five faces - plot shows a D-threshold of at least 0.45 would discriminate - using more than just the one vector would improve performance - experiment 2: varying facial expressions - fig 6 shows the correlation of two points in images of the same face with varying expressions - correlation varies, but is high (always above 0.45) - training on various expressions, combined with interpolation inherent in SDM, should yield expression-invariant recognition - experiment 3: occlusion tolerance - fig 7 shows the correlation of two points in images of the same face with varying amounts of occlusion - minor occlusion handled, but hat in face 5 causes problems - suggests need for multiple points and better major-occlusion strategies - experiment 4: recognition performance - trained system on 120 faces of 20 people with 6 different expressions - faces recorded in 128x128 pixels, greyscale, 8-bits per pixel - 60 test faces (not in training) used for performance results - best results at 33 points per face (93.3% correct, see plot in fig 8d) - 4 errors (one in fig 8c), correct identity second best in 3 of 4 7 Conclusions and Future Work - iconic feature vectors for representing faces - :) achieve dimensionality reduction and orthogonality - :) expression (and somewhat facial feature) invariant - :) rotation and scale invariant - :) efficient to compute - :) real-time face recognition from large database of faces - distributed sparse distributed memory - :) interpolation - :) constant indexing time - :) human-motivated learning behavior - :) fault tolerance through distributedness - saves storage (33 points * 9 kernels * 5 scales) versus 128x128 pixels - future work - color using RGB-component Gaussians - larger face databases