next up previous
Next: Up: Previous:

Gradient Descent


To understand, consider simpler linear unit, where


\begin{displaymath}o = w_{0} + w_{1}x_1 + \cdots + w_n x_n \end{displaymath}

Let's learn wi's that minimize the squared error


\begin{displaymath}E[\vec{w}] \equiv \frac{1}{2}\sum_{d \in D}(t_{d} - o_{d})^{2} \end{displaymath}

Where D is set of training examples