Machine Learning Foundations

 Week-1 Revision

Arun Prakash A

Applications

Weather prediction

Chat bots , voice assistants : Alexa

Gaming : Alpha go

Recommendation : Amazon

Automobiles and Robotics: Autonomous car

ML or Not to ML:

Rules well defined, known  or not

Data 

Data
x1 = [3,5,4]
x2 = [3,4,5]
x3 = [4,2,1]
x4 = [6,7,8]
x5 = [1,2,3]
x6 = [1,1,1]
x7 = [1,2,0]
Data label
x1 = [3,5,4] 0
x2 = [3,4,5] 1
x3 = [4,2,1] 0
x4 = [6,7,8] 1
x5 = [1,2,3] 1
x6 = [1,1,1] 1
x7 = [1,2,0] 0

Terminology: Features (\(x_j^i)\),number of samples (n) , Labels (Ground truth) (\(y^i)\)

Features (\(x_j^i \)), Index starts from 1

number of samples (n=7),

Labels (Ground truth) (\(y^i),y^2=1\)

Train,Validation and Test Data

80% of total : 455

20% of training :91 

20% of total: 204

Train Set:

Validation Set:

Test Set:

Total samples: 659

Types

Supervised (Data with labels)

Unsupervised (Data without labels)

Classification

Regression

Density Estimation

Dimensionality Reduction

Output: Discrete and Finite 

Loss: 0-1 loss

  

Output: Continuous and infinite in general 

Loss: MSE

Encoder, decoder (compressor or decompressor),

Loss (Reconstruction error)

Estimate PDF (Mean, variance),

Loss: Log-likelhood

\frac{1}{n} \sum_{i=1}^n (f(x^i)-y^i)^2
\frac{1}{n} \sum_{i=1}^n \mathbf{1}(f(x^i) \neq y^i)
\frac{1}{n} \sum_{i=1}^n ||g(f(x^i))-x^i||^2
\frac{1}{n} \sum_{i=1}^n -\log(P(x^i))
x \in \mathbb{R}^d
f: \mathbb{R}^d \rightarrow \mathbb{R}^{d'}
g: \mathbb{R}^{d'} \rightarrow \mathbb{R}^d
d' \ll d
Data
x1 = [3,5,4]
x2 = [3,4,5]
x3 = [2,1,4]
x4 = [6,7,8]
x5 = [1,2,3]
x6 = [1,1,1]
x7 = [1,2,0]

Linear Classification Model

\(w_1x_1+w_2x_2+w_3x_3\)

Label
0
1
0
1
1
1
0

\(w_0,w_1,w_2\) are parameters or weights of the model. The best values for the parameters will be learned from the data

Training

Prediction:

\(f(x)=1x_1+0.5x_2-1x_3\)

Given a new sample, \( x = [1,-1,1]\), predict the output.

Any Questions?