Introduction to Machine Learning (Tamil)

Visual Guide to Orthogonal Projection

Arun Prakash A

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\0\end{bmatrix}

e_2 = \begin{bmatrix}0\\1\end{bmatrix}

a = \begin{bmatrix}3\\1\end{bmatrix}

a = \begin{bmatrix}3\\1\end{bmatrix} = 3\begin{bmatrix}1\\0\end{bmatrix} + 1\begin{bmatrix}0\\1\end{bmatrix}

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\0\end{bmatrix}

e_2 = \begin{bmatrix}0\\1\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

How many points are there in \(\mathbb{R}^2\)?

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\0\end{bmatrix}

e_2 = \begin{bmatrix}0\\1\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

How many points are there in \(\mathbb{R}^2\)?

Then \(e_1,e_2\) spans the whole space

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\1\end{bmatrix}

e_2 = \begin{bmatrix}2\\2\end{bmatrix}

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\1\end{bmatrix}

e_2 = \begin{bmatrix}2\\2\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

Vectors and their linear combinations

e_1 = \begin{bmatrix}1\\1\end{bmatrix}

e_2 = \begin{bmatrix}2\\2\end{bmatrix}

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

No!

It spans the sub-space (Line)

x1	x2	y
1	-1	3
2	2	2

Back to Regression Problem

\begin{bmatrix}1&-1 \\ 2&2 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

x1	x2	y
1	-1	3
2	2	2

Unique Solution

\begin{bmatrix}1&-1 \\ 2&2 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

\begin{bmatrix}1&-1 \\ 2&2 \\ \end{bmatrix}

\begin{bmatrix}2\\-1 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Error is Zero!

x1	x2	y
1	2	3
2	4	2

Orthogonal Projection

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Now change \(x_2\) to different values

x1	x2	y
1	2	3
2	4	2

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection

The label vector is not in the span of \(X\)!

x1	x2	y
1	2	3
2	4	2

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection

The label vector is not in the span of \(X\)!

No solution :-(

x1	x2	y
1	2	3
2	4	2

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection

But we want one..

It is ok if error is not exactly zero!

x1	x2	y
1	2	3
2	4	2

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection

x1	x2	y
1	2	3
2	4	2

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

Orthogonal Projection

x1	x2	y
1	2	3
2	4	2

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2 \end{bmatrix}

\begin{bmatrix}1&2 \\ 2&4 \\ \end{bmatrix}

\begin{bmatrix}0.28\\0.56 \end{bmatrix}

=\begin{bmatrix}1.4\\2.8 \end{bmatrix}

There is an error in the prediction!

x1	x2	y
1	2	3
2	4	2
3	6	2

\begin{bmatrix}1&2 \\ 2&4 \\ 3&6\\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}3\\2\\2 \end{bmatrix}

Subspace is \(\mathbb{R}^1\)

\begin{bmatrix}w_1=0.18\\w_2=0.37 \end{bmatrix}

Orthogonal Projection

Now, the matrix is rectangular!

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Feature and labels are points in which dimension m or n?

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Feature and labels are points in which dimension m or n?

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

Orthogonal Projection

Feature and labels are points in which dimension m or n?

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Do the vectors \(\mathbf{X_1},\mathbf{X_2}\) span whole \(R^3\)?

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Do the vectors \(\mathbf{X_1},\mathbf{X_2}\) span whole \(R^3\)?

Is the vector \(\mathbf{Y}\) in the space spanned by

\mathbf{X_1},\mathbf{X_2}?

Subspace is \(\mathbb{R}^2\)

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

Orthogonal Projection

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}_{m \times n}

Let's project \(\mathbf{Y}\) on the subspace spanned by the two data points?

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}

\begin{bmatrix}w_1\\w_2 \end{bmatrix}

=\begin{bmatrix}-1\\1\\4 \end{bmatrix}

\begin{bmatrix}w_1=1.2\\w_2=0.6 \end{bmatrix}

Orthogonal Projection

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

\begin{bmatrix}-2&4 \\ 2.5&-1 \\ 0.5&3\\ \end{bmatrix}

=\begin{bmatrix}0.3\\2.3\\2.6 \end{bmatrix}

\begin{bmatrix}1.2\\0.6 \end{bmatrix}

Orthogonal Projection

-\begin{bmatrix}0.3\\2.3\\2.6 \end{bmatrix}

\begin{bmatrix}-1\\1\\4 \end{bmatrix}

x1	x2	y
-2	4	-1
2.5	-1	1
0.5	3	4

Summary

\vdots

\cdots

m \times n

\(m\) data points/samples
\(n\) features
\(m \times 1\) label vector

All of them are points in \(m\) dimensional space!

In general, \(m \gg n\) in real-world, therefore no inverse exists!
Therefore, we do projection and get pseudo-inverse that guarantees minimum error in the least-square sense.

\(\mathbf{X}\mathbf{w}=\mathbf{Y}\)

What is hypothesis (\(h\)) and Hypothesis Space \(H\) ? How are they related to \(f(x)\)?

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

(With a slight abuse of notations)

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61

How many functions are there such that it connects all these four points?

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61

How many functions are there such that it connects all these four points?

-0.17x+0.27y=-0.08

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61

How many functions are there such that it connects all these four points?

f(x)=e^{x-2}

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61

How many functions are there such that it connects all these four points?

f(x)=0.2x^2+0.15

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61

How many functions are there such that it connects all these four points?

f(x)=sin(x-0.4)-0.29

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61

How many functions are there such that it connects all these four points?

There could be infinite such functions.

\(H\)

\(f(\mathbf{x})\)

\(h\)

What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?

x	y
1.22	0.44
1.3	0.51
1.4	0.56
1.49	0.61
2.17	1.18
-0.09	0.12

How many functions are there such that it connects all these four points?

The number of data points helps choose a better function!

\(H\)

\(f(\mathbf{x})\)

\(h\)

Difference between Analytical/closed-form solution and iterative solution?

What is a closed-form solution to a problem?

Sum first 100 natural numbers

Iterative: 1 +2 +3+...+100

Closed form: \( \frac{n(n+1)}{2}=\frac{100*(100+1)}{2} \)

Source: Wikipedia

w^*=(\sum \mathbf{x_i}\mathbf{x_i}^T)^{-1}(\sum \mathbf{x_i}y_i)

What is the difference between setting \(f(x)=0\) and \( \nabla f(x) =0\)?

Consider a function \(f(x)=x^2-5x+4\)

1. \(f(x)=x^2-5x+4 = 0\) gives us \(x=1, x=4\)

2. \(\nabla f(x)=2x-5=0\) gives us \(x=2.5\)

Let's see geometrically by plotting the function in the interval \(0 \leq x \leq 5\).

We can't rely on plotting, because we don't know the range for \(x\) a prior!

What do you mean by gradient or slope of a function?

\(f'(x)=\frac{f(x+\Delta x)-f(x)}{\Delta}\)

Let's set \(\Delta x=0.1\)

Gradient Descent Play ground

Follow exactly opposite to where the gradient points, to reach the minima.

Gradient is your guide!

Gradient Descent in 2D

ML_Tamil_w1

By Arun Prakash

Introduction to Machine Learning (Tamil)

Visual Guide to Orthogonal Projection

Vectors and their linear combinations

Vectors and their linear combinations

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

How many points are there in \(\mathbb{R}^2\)?

Vectors and their linear combinations

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

How many points are there in \(\mathbb{R}^2\)?

Then \(e_1,e_2\) spans the whole space

Vectors and their linear combinations

Vectors and their linear combinations

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

Vectors and their linear combinations

Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?

No!

It spans the sub-space (Line)

Back to Regression Problem

Unique Solution

Error is Zero!

Orthogonal Projection

Orthogonal Projection

The label vector is not in the span of \(X\)!

Orthogonal Projection

The label vector is not in the span of \(X\)!

No solution :-(

Orthogonal Projection

But we want one..

It is ok if error is not exactly zero!

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Orthogonal Projection

Summary

What is hypothesis (\(h\)) and Hypothesis Space \(H\) ? How are they related to \(f(x)\)?

Difference between Analytical/closed-form solution and iterative solution?

What is a closed-form solution to a problem?

Gradient Descent Play ground

Gradient Descent in 2D

ML_Tamil_w1

More from Arun Prakash