Instance Based Learning


Key idea: just store all training examples $\langle x_i, f(x_i) \rangle$


Nearest neighbor:

k-Nearest neighbor:

When To Consider Nearest Neighbor


Advantages:

Disadvantages:

Voronoi Diagram


\psfig{figure=figures/knn-f1.ps}

Behavior in the Limit


Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative).



k-Nearest neighbor:

  $\mbox{$\bullet$}$
As number of training examples $\rightarrow\infty$ and k gets large, approaches Bayes optimal
Bayes optimal: if p(x)>.5 then predict 1, else 0

Distance-Weighted kNN


Might want weight nearer neighbors more heavily...

\begin{displaymath}\hat{f}(x_{q}) \leftarrow\frac{\sum_{i=1}^{k} w_{i} f(x_{i})}{\sum_{i=1}^{k} w_{i}}
\end{displaymath}

where

\begin{displaymath}w_{i} \equiv \frac{1}{d(x_{q}, x_{i})^{2}} \end{displaymath}

and d(xq, xi) is distance between xq and xi



Note now it makes sense to use all training examples instead of just k

Example


Day Outlook Temperature Humidity Wind PlayTennis

D1

Sunny 88 High Weak No (4)
D2 Sunny 80 High Strong No (2)
D3 Overcast 92 High Weak Yes (8)
D4 Rain 72 High Weak Yes (6)
D5 Rain 51 Normal Weak Yes (6)
D6 Rain 55 Normal Strong No (2)
D7 Overcast 60 Normal Strong Yes (10)
D8 Sunny 75 High Weak No (9)
D9 Sunny 48 Normal Weak Yes (7)
D10 Rain 68 Normal Weak Yes (6)
D11 Sunny 78 Normal Strong Yes (7)
D12 Overcast 77 High Strong Yes (8)
D13 Overcast 95 Normal Weak Yes (8)
D14 Rain 68 High Strong No (4)

Curse of Dimensionality


Imagine instances described by 20 attributes, but only 2 are relevant to target function


Curse of dimensionality: nearest nbr is easily mislead when high-dimensional X


One approach: