|
---|
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(a_i(x) = b_i +W_ih_{i-1}(x)\)
\(h_i(x) = g(a_i(x))\)
\(f(x) = h_L(x)=O(a_L(x))\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(a_i = b_i +W_ih_{i-1}\)
\(h_i = g(a_i)\)
\(f(x) = h_L=O(a_L)\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(\hat y_i = \hat{f}(x_i) = O(W_3 g(W_2 g(W_1 x_i + b_1) + b_2) + b_3)\)
\(\theta = W_1, ..., W_L, b_1, b_2, ..., b_L (L = 3)\)
\(min \cfrac {1}{N} \displaystyle \sum_{i=1}^N \sum_{j=1}^k (\hat y_{ij} - y_{ij})^2\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
|
---|
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(w_{t+1} \gets w_t - \eta \nabla w_t\)
\(b_{t+1} \gets b_t - \eta \nabla b_t\)
\(t \gets 0;\)
\(max\_iterations \gets 1000; \)
end
while \(t\)++ \(< max\_iterations\) do
\(Initialize w_0,b_0;\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(t \gets 0;\)
\(max\_iterations \gets 1000; \)
\(Initialize \theta_0 = [w_0,b_0];\)
end
while \(t\)++ \(< max\_iterations\) do
\(\theta_{t+1} \gets \theta_t - \eta \nabla \theta_t\)
\(t \gets 0;\)
\(max\_iterations \gets 1000; \)
\(Initialize\) \(\theta_0 = [W_1^0,...,W_L^0,b_1^0,...,b_L^0];\)
end
while \(t\)++ \(< max\_iterations\) do
\(\theta_{t+1} \gets \theta_t - \eta \nabla \theta_t\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{111}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{11n}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{121}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{12n}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{1n1}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{1nn}}\)
\( \vdots\)
\( \vdots\)
\( \vdots\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{211}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{21n}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{221}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{22n}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{2n1}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{2nn}}\)
\( \vdots\)
\( \vdots\)
\( \vdots\)
\(...\)
\(...\)
\(...\)
\( \vdots\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{L,11}}\)
\( ... \)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{L,1k}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{L,21}}\)
\( ... \)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{L,2k}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{L,n1}}\)
\( ... \)
\(\frac {\partial \mathscr{L}(\theta)}{\partial W_{L,nk}}\)
\( \vdots\)
\( \vdots\)
\( \vdots\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial b_{11}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial b_{L1}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial b_{12}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial b_{L2}}\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial b_{1n}}\)
\(...\)
\(\frac {\partial \mathscr{L}(\theta)}{\partial b_{Lk}}\)
\( \vdots\)
\( \vdots\)
\( \vdots\)
|
---|
|
---|
|
---|
\(\mathscr {L}(\theta) = \cfrac {1}{N} \displaystyle \sum_{i=1}^N \sum_{j=1}^k (\hat y_{ij} - y_{ij})^2\)
Neural network with \(L - 1\) hidden layers
isActor Damon
isDirector
Nolan
imdb
Rating
Critics
Rating
RT
Rating
\(y_j =\) {\(7.5 8.2 7.7\)}
\(x_i\)
\(. .\)
. . . . . .
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = \hat{f}(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
Intentionally left blank
Neural network with \(L - 1\) hidden layers
\(y =\) [\(1 0 0 0\)]
Neural network with \(L - 1\) hidden layers
\(y =\) [\(1 0 0 0\)]
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = f(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = f(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
Neural network with \(L - 1\) hidden layers
\(y =\) [\(1 0 0 0\)]
\(\mathscr {L}(\theta) = - \displaystyle \sum_{c=1}^k y_c \log \hat y_c \)
\(\hat y_\ell = [O(W_3 g(W_2 g(W_1 x + b_1) + b_2) + b_3)]_\ell\)
\(a_2\)
\(a_3\)
\(x_1\)
\(x_2\)
\(x_n\)
\(a_1\)
\(h_L=\hat {y} = f(x)\)
\(h_2\)
\(h_1\)
\(W_1\)
\(W_1\)
\(b_1\)
\(W_2\)
\(b_2\)
\(W_3\)
\(b_3\)
Output Activation | ||
Loss Function |
Outputs |
---|
Real Values | Probabilities |
Linear
Softmax
Squared Error
Cross Entropy
Output Activation | ||
Loss Function |
Outputs |
---|
Real Values | Probabilities |
Linear
Softmax
Squared Error
Cross Entropy