Introduction to Machine Learning (Tamil)
Visual Guide to Orthogonal Projection
by
Arun Prakash A
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Vectors and their linear combinations
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Vectors and their linear combinations
Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?
How many points are there in \(\mathbb{R}^2\)?
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Vectors and their linear combinations
Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?
How many points are there in \(\mathbb{R}^2\)?
Then \(e_1,e_2\) spans the whole space
Vectors and their linear combinations
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Vectors and their linear combinations
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?
Vectors and their linear combinations
Can we reach all the points in \(\mathbb{R}^2\) using the linear combination of \(e_1,e_2\)?
No!
It spans the sub-space (Line)
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
x1 | x2 | y |
---|---|---|
1 | -1 | 3 |
2 | 2 | 2 |
Back to Regression Problem
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
x1 | x2 | y |
---|---|---|
1 | -1 | 3 |
2 | 2 | 2 |
Unique Solution
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Error is Zero!
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
Orthogonal Projection
data:image/s3,"s3://crabby-images/ea0cf/ea0cf13d18b4750392c92579512b231d114a3f98" alt=""
Now change \(x_2\) to different values
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
data:image/s3,"s3://crabby-images/769ec/769ec21d4a401bb1b0be81e9c1db98dd9d7eda5e" alt=""
Orthogonal Projection
The label vector is not in the span of \(X\)!
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
data:image/s3,"s3://crabby-images/769ec/769ec21d4a401bb1b0be81e9c1db98dd9d7eda5e" alt=""
Orthogonal Projection
The label vector is not in the span of \(X\)!
No solution :-(
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
data:image/s3,"s3://crabby-images/5d088/5d088710487554c81e3089539128763a8768f248" alt=""
Orthogonal Projection
But we want one..
It is ok if error is not exactly zero!
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
data:image/s3,"s3://crabby-images/eaacc/eaacc0838872cdf7ba20c3718b3eee5400e84dcc" alt=""
Orthogonal Projection
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
data:image/s3,"s3://crabby-images/eaacc/eaacc0838872cdf7ba20c3718b3eee5400e84dcc" alt=""
Orthogonal Projection
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
There is an error in the prediction!
x1 | x2 | y |
---|---|---|
1 | 2 | 3 |
2 | 4 | 2 |
3 | 6 | 2 |
Subspace is \(\mathbb{R}^1\)
Orthogonal Projection
data:image/s3,"s3://crabby-images/8a2f8/8a2f81f9fa9a1a1b7a7c5ffbda5d7f6daa087fdf" alt=""
Now, the matrix is rectangular!
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
data:image/s3,"s3://crabby-images/74bc0/74bc0f8b469d27bb67957ef0b67b5fbccaa5b2c0" alt=""
Feature and labels are points in which dimension m or n?
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
data:image/s3,"s3://crabby-images/75cd8/75cd8384d16b5c31c67b0e83c6791328e138873c" alt=""
Feature and labels are points in which dimension m or n?
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
Feature and labels are points in which dimension m or n?
Do the vectors \(\mathbf{X_1},\mathbf{X_2}\) span whole \(R^3\)?
data:image/s3,"s3://crabby-images/19e7c/19e7cc078293d56d7f2d143163f2fab4a5b5f98e" alt=""
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
data:image/s3,"s3://crabby-images/9c9e3/9c9e327b05de9839d3b4d10f2968b07c39b968e6" alt=""
Do the vectors \(\mathbf{X_1},\mathbf{X_2}\) span whole \(R^3\)?
Is the vector \(\mathbf{Y}\) in the space spanned by
Subspace is \(\mathbb{R}^2\)
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
Let's project \(\mathbf{Y}\) on the subspace spanned by the two data points?
data:image/s3,"s3://crabby-images/5eb27/5eb27210eed3cf7695632704a36833a941361a65" alt=""
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
data:image/s3,"s3://crabby-images/5eb27/5eb27210eed3cf7695632704a36833a941361a65" alt=""
x1 | x2 | y |
---|---|---|
-2 | 4 | -1 |
2.5 | -1 | 1 |
0.5 | 3 | 4 |
Orthogonal Projection
data:image/s3,"s3://crabby-images/5eb27/5eb27210eed3cf7695632704a36833a941361a65" alt=""
x1 | x2 | y | |
---|---|---|---|
-2 | 4 | -1 | |
2.5 | -1 | 1 | |
0.5 | 3 | 4 | |
Summary
- \(m\) data points/samples
- \(n\) features
- \(m \times 1\) label vector
All of them are points in \(m\) dimensional space!
- In general, \(m \gg n\) in real-world, therefore no inverse exists!
- Therefore, we do projection and get pseudo-inverse that guarantees minimum error in the least-square sense.
\(\mathbf{X}\mathbf{w}=\mathbf{Y}\)
What is hypothesis (\(h\)) and Hypothesis Space \(H\) ? How are they related to \(f(x)\)?
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
(With a slight abuse of notations)
data:image/s3,"s3://crabby-images/f4c19/f4c1952b1160c3524721808fe53e285c633244d3" alt=""
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
How many functions are there such that it connects all these four points?
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
How many functions are there such that it connects all these four points?
data:image/s3,"s3://crabby-images/68b0c/68b0c454324eaeda3a6c01acdef6cf9dbd6c2950" alt=""
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
How many functions are there such that it connects all these four points?
data:image/s3,"s3://crabby-images/a2677/a2677959d7e7d0fe393cd6e06cd88b42f39a4f2e" alt=""
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
How many functions are there such that it connects all these four points?
data:image/s3,"s3://crabby-images/cb321/cb321f3317a8a8bb0fce8916c56fd2cd38ee2999" alt=""
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
How many functions are there such that it connects all these four points?
data:image/s3,"s3://crabby-images/f2304/f2304302b8024960e5366a66a4e31c83d074d09b" alt=""
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
How many functions are there such that it connects all these four points?
data:image/s3,"s3://crabby-images/565fa/565fa129618d78e0eb6c75f2b877f50af18cb1a3" alt=""
There could be infinite such functions.
\(H\)
\(f(\mathbf{x})\)
\(h\)
What is \(h(\mathbf{x})\)? How it differs from \(f(\mathbf{x})\)?
x | y |
---|---|
1.22 | 0.44 |
1.3 | 0.51 |
1.4 | 0.56 |
1.49 | 0.61 |
2.17 | 1.18 |
-0.09 | 0.12 |
How many functions are there such that it connects all these four points?
data:image/s3,"s3://crabby-images/663fb/663fb8a8d60b1985a64b683f173fe8f552f0d04d" alt=""
The number of data points helps choose a better function!
\(H\)
\(f(\mathbf{x})\)
\(h\)
Difference between Analytical/closed-form solution and iterative solution?
What is a closed-form solution to a problem?
Sum first 100 natural numbers
Iterative: 1 +2 +3+...+100
Closed form: \( \frac{n(n+1)}{2}=\frac{100*(100+1)}{2} \)
data:image/s3,"s3://crabby-images/53d1a/53d1ad317bddafc31b8b707ebbdd293d0a35ad9e" alt=""
Source: Wikipedia
What is the difference between setting \(f(x)=0\) and \( \nabla f(x) =0\)?
Consider a function \(f(x)=x^2-5x+4\)
data:image/s3,"s3://crabby-images/b49f1/b49f16758f6fddbc3f6bc5eb9566a2f0d1a05c8b" alt=""
1. \(f(x)=x^2-5x+4 = 0\) gives us \(x=1, x=4\)
2. \(\nabla f(x)=2x-5=0\) gives us \(x=2.5\)
Let's see geometrically by plotting the function in the interval \(0 \leq x \leq 5\).
We can't rely on plotting, because we don't know the range for \(x\) a prior!
What do you mean by gradient or slope of a function?
\(f'(x)=\frac{f(x+\Delta x)-f(x)}{\Delta}\)
Let's set \(\Delta x=0.1\)
data:image/s3,"s3://crabby-images/02a6d/02a6dfb8cfd37126bb79732f551e78fd5ff8ab08" alt=""
Gradient Descent Play ground
Follow exactly opposite to where the gradient points, to reach the minima.
Gradient is your guide!
Gradient Descent in 2D
data:image/s3,"s3://crabby-images/240dd/240dde3bcd68d92939f8a839e90549fbd249dc79" alt=""
ML_Tamil_w1
By Arun Prakash
ML_Tamil_w1
- 198