(For Beginners)
Data Collection
Transformations
Training
model
Testing model
It is alright if some words are Alien to you
Matrix (Tensor) of dim \( m \times n \)
Image of size \(height \times width\)
Matrix (Tensor) of dim \( m \times n \)
Image of size \(height \times width\)
Matrix (Tensor) of dim \( m \times n \)
Image of size \(height \times width\)
" The course contents are well organized"
Tokenize
[The, course,contents,are,well,organized]
Numericalize
[{The:2, course:4,contents:1,are:3,well:5,organized:6}]
Embedding
Ability to create \(n\) dimensional array or also called tensors
Scalar
(0 dim Tensor)
Vector
(1 dim Tensor)
Matrix
(2 dim Tensor)
Stacked Matrix
(3 dim Tensor)
Provide efficient ways to manipulate them (Accessing elements, mathematical operations, ..)
flatten()
reshape(1,9)
transpose(,0,1)
cat(,dim=1)
sum()
sum(,dim=0)
sum(,dim=1)
Scalar (dim=0)
Vector (dim=1)
Vector (dim=1)
Reduction Operations:
Mean, accessing elements
sigmoid()
softmax(,dim=0)
softmax(,dim=1)
across rows
across columns
Core: tensors
JIT
nn
Optim
multiprocessing
quantization
sparse
ONNX
Distributed
fast.ai
Detectron 2
Horovod
Flair
AllenNLP
torch.vision
BoTorch
GloW
Lightening
Skorch
Creating Tensors in PyTorch: Switch to colab
Two configurations out of million possibilities
The Parameter \(\mathbf{W}\) is randomly initialized
Compute the gradient of loss w.r.t \(\mathbf{w}\) , \(\nabla \mathbf{w}\)
Update rule:
The non-linear function \(f\) in neurons is called activation function
Check the performance with a set of criteria. Iteratively update the parameters for improving the performance
Wait, the image is not 1D then how do we feed it to a neuron as an input?
Note: Input elements are real-valued in general.
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(a_i(x) = b_i +W_ih_{i-1}(x)\)
\(h_i(x) = g(a_i(x))\)
\(f(x) = h_L(x)=O(a_L(x))\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(a_i = b_i +W_ih_{i-1}\)
\(h_i = g(a_i)\)
\(f(x) = h_L=O(a_L)\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(\hat y_i = \hat{f}(x_i) = O(W_3 g(W_2 g(W_1 x + b_1) + b_2) + b_3)\)
\(\theta = W_1, ..., W_L, b_1, b_2, ..., b_L (L = 3)\)
\(min \cfrac {1}{N} \displaystyle \sum_{i=1}^N \sum_{j=1}^k (\hat y_{ij} - y_{ij})^2\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(x_1\)
\(x_2\)
\(x_3\)
\(h_2\)
\(a_3\)
\(b_2\) = [0.01,0.02,0.03]
\(b_3\) = [0.01,0.02]
\(a_2\)
\(h_1\)
\(a_1\)
1.5
2.5
3
0.36
0.37
0.38
0.589
0.591
0.593
0.054
0.064
0.074
0.513
0.516
0.518
1.558
1.568
0.497
0.502
\(\hat y = h_3 \)
\(\mathscr {L}(\theta) = -\frac{1}{N} \sum_{i=1}^N (y_ilog(\hat y_i)+(1-y_i)log(1- \hat y_i))\) = 0.6981
"Forward Pass"
\(x=[1.5, 2.5, 3]\)
\(b_1\) = [0.01,0.02,0.03]
\([h_1]=sigmoid(a_1)\)
\([h_2]=sigmoid(a_2)\)
\([h_3]=softmax(a_3)\)
\([a_1]=[1.5,2.5,3]*\)
\(+ [0.01,0.02,0.03]\)
\([a_2]=[0.589,0.591,0.593]*\)
\(+ [0.01,0.02,0.03]\)
\([a_3]=[0.513,0.516,0.518]*\)
\(+ [0.01,0.02]\)
\(y=[1, 0]\)
"Binary Cross Entropy Loss"
Convolution
Image
filter/kernel
Activation Map
Max Pooling
Sequence
"Text is something different from images"
Tokenizer
[Text,
is, something, different, from,
images]
Numericalize
Text is an ordered sequence
Embedding
Matrix
"Text:10"
Embedding
"Is:2"
Embedding
"Images:5"
Embedding
Text is an ordered sequence
We want the model to capture the dependency!