Introduction to Machine Learning (Tamil)

Visual Guide to Orthogonal Projection

by

Arun Prakash A

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\0\end{bmatrix}
e_2 = \begin{bmatrix}0\\1\end{bmatrix}
a = \begin{bmatrix}3\\1\end{bmatrix}
a = \begin{bmatrix}3\\1\end{bmatrix} = 3\begin{bmatrix}1\\0\end{bmatrix} + 1\begin{bmatrix}0\\1\end{bmatrix}

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\0\end{bmatrix}
e_2 = \begin{bmatrix}0\\1\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

How many points are there in \(\mathbb{R}^2\)?

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\0\end{bmatrix}
e_2 = \begin{bmatrix}0\\1\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

How many points are there in \(\mathbb{R}^2\)?

Then \(e_1,e_2\) spans the whole space

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\1\end{bmatrix}
e_2 = \begin{bmatrix}2\\2\end{bmatrix}

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\1\end{bmatrix}
e_2 = \begin{bmatrix}2\\2\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\1\end{bmatrix}
e_2 = \begin{bmatrix}2\\2\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

No!

It spans the sub-space (Line)

x1 x2 y
1 -1 3
2 2 2

Back to Regression Problem

\begin{bmatrix}1&-1 \\ 2&2 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}
x1 x2 y
1 -1 3
2 2 2

Unique Solution

\begin{bmatrix}1&-1 \\ 2&2 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}
\begin{bmatrix}1&-1 \\ 2&2 \\ \end{bmatrix}
\begin{bmatrix}2\\-1 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Error is Zero!

x1 x2 y
1 2 3
2 4 2

Orthogonal Projection 

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Now change \(x_2\) to different values

x1 x2 y
1 2 3
2 4 2
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection 

The label vector is not in the span of \(X\)!

x1 x2 y
1 2 3
2 4 2
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection 

The label vector is not in the span of \(X\)!

No solution :-(

x1 x2 y
1 2 3
2 4 2
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection 

But we want one..

It is ok if error is not exactly zero!

x1 x2 y
1 2 3
2 4 2
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection 

y'
e
x1 x2 y
1 2 3
2 4 2
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection 

x1 x2 y
1 2 3
2 4 2
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2 \end{bmatrix}
\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}
\begin{bmatrix}0.28\\0.56 \end{bmatrix}
=\begin{bmatrix}1.4\\2.8 \end{bmatrix}

There is an error in the prediction!

x1 x2 y
1 2 3
2 4 2
3 6 2
\begin{bmatrix}1&2 \\ 2&4 \\ 3&6\\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}3\\2\\2 \end{bmatrix}

Subspace is \(\mathbb{R}^1\)

\begin{bmatrix}w_1=0.18\\w_2=0.37 \end{bmatrix}

Orthogonal Projection 

Now, the matrix is rectangular!

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Feature and labels are points in which dimension m or n?

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Feature and labels are points in which dimension m or n?

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4

Orthogonal Projection

Feature and labels are points in which dimension m or n?

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Do  the vectors \(\mathbf{X_1},\mathbf{X_2}\) span whole \(R^3\)?

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Do  the vectors \(\mathbf{X_1},\mathbf{X_2}\) span whole \(R^3\)?

Is the vector \(\mathbf{Y}\) in the space spanned by 

\mathbf{X_1},\mathbf{X_2}?

Subspace is \(\mathbb{R}^2\)

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Let's project \(\mathbf{Y}\) on the subspace spanned by the two data points?

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4
\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}
\begin{bmatrix}w_1\\w_2 \end{bmatrix}
=\begin{bmatrix}-1\\1\\4 \end{bmatrix}
\begin{bmatrix}w_1=1.2\\w_2=0.6 \end{bmatrix}

Orthogonal Projection

x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4
\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}
=\begin{bmatrix}0.3\\2.3\\2.6 \end{bmatrix}
\begin{bmatrix}1.2\\0.6 \end{bmatrix}

Orthogonal Projection

-\begin{bmatrix}0.3\\2.3\\2.6 \end{bmatrix}
\begin{bmatrix}-1\\1\\4 \end{bmatrix}
x1 x2 y
-2 4 -1
2.5 -1 1
0.5 3 4

Summary

\vdots
\vdots
\vdots
\vdots
\cdots
\cdots
\cdots
\cdots
m \times n
  • \(m\) data points/samples
  • \(n\) features
  • \(m \times 1\) label vector

All of them are points in \(m\) dimensional space!

  • In general, \(m \gg n\) in real-world, therefore no inverse exists!
  • Therefore, we do projection and get pseudo-inverse that guarantees minimum error in the least-square sense.

\(\mathbf{X}\mathbf{w}=\mathbf{Y}\)

What is hypothesis (\(h\)) and Hypothesis Space \(H\) ? How are they related to \(f(x)\)?

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

(With a slight abuse of notations)

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61

How many functions are there such that it connects all these four points?

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61

How many functions are there such that it connects all these four points?

-0.17x+0.27y=-0.08

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61

How many functions are there such that it connects all these four points?

f(x)=e^{x-2}

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61

How many functions are there such that it connects all these four points?

f(x)=0.2x^2+0.15

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61

How many functions are there such that it connects all these four points?

f(x)=sin(x-0.4)-0.29

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61

How many functions are there such that it connects all these four points?

There could be infinite such functions.

\(H\)

\(f(\mathbf{x})\)

\(h\)

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x y
1.22 0.44
1.3 0.51
1.4 0.56
1.49 0.61
2.17 1.18
-0.09 0.12

How many functions are there such that it connects all these four points?

The number of data points helps choose a better function!

\(H\)

\(f(\mathbf{x})\)

\(h\)

Difference between Analytical/closed-form solution and iterative solution?

What is a closed-form solution to a problem?

Sum first 100 natural numbers

Iterative: 1 +2 +3+...+100

Closed form:  \( \frac{n(n+1)}{2}=\frac{100*(100+1)}{2}  \)

Source: Wikipedia
w^*=(\sum \mathbf{x_i}\mathbf{x_i}^T)^{-1}(\sum \mathbf{x_i}y_i)

What is the difference between setting \(f(x)=0\) and \( \nabla f(x) =0\)? 

Consider a function \(f(x)=x^2-5x+4\)

1. \(f(x)=x^2-5x+4 = 0\) gives us \(x=1, x=4\)

2. \(\nabla f(x)=2x-5=0\) gives us \(x=2.5\)

Let's see geometrically by plotting the function in the interval \(0 \leq x \leq 5\).

We can't rely on plotting, because we don't know the range for \(x\) a prior!

What do you mean by gradient or slope of a function?

\(f'(x)=\frac{f(x+\Delta x)-f(x)}{\Delta}\)

Let's set \(\Delta x=0.1\)

Gradient Descent Play ground

Follow exactly opposite to where the gradient points, to reach the minima.

Gradient is your guide!

Gradient Descent in 2D

ML_Tamil_w1

By Arun Prakash