\(+ Variance\)
\(+ \sigma^2\) (irreducible error)
\(+ Variance\)
\(+ \sigma^2\) (irreducible error)
error
model complexity
\(ε ∼ N (0, σ^2 )\)
\(E[( \hat f(x) − f(x))^2 ]\)
\(E[( \hat y − y)^ 2 ]\)
\(∴ E[( \hat f(x) − f(x))^2 ] = E[( \hat y − y)^2 ] − E[ε^2 ] + 2E[ ε( \hat f(x) − f(x)) ]\)
\(E[( \hat y_i − y_i)^ 2 ]\)
\(∴ E[( \hat f(x_i) − f(x_i))^2 ] = E[( \hat y_i − y_i)^2 ] − E[ε^2_i ] + 2E[ ε_i( \hat f(x_i) − f(x_i)) ]\)
\(∵\) covariance \((X, Y )\)
error
\(H = Q \Lambda Q^T\) [\(Q\) is orthogonal, \(QQ^T = Q^TQ = \mathbb I\)]
\(w_1\)
\(w_2\)
\(\tilde w\)
\(w^*\)
[given training data]
label = 2
rotated by \(20\degree\)
rotated by \(65\degree\)
shifted vertically
shifted horizontally
blurred
changed some pixels
label = 2
\(x_1 + ε_1\)
\(x_2 + ε_2\)
\(x_k + ε_k\)
\(x_n + ε_n\)
minimize : \( \displaystyle \sum_{i=0}^{9} p_i \space log \space q_i\)
0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
---|
\(\text{Error}\)
\(\text {Steps}\)
\(Validation \) \(error\)
\(Training\) \(error\)
\(k-p\)
\(return\) \(this\) \(model\)
\(k\)
\(stop\)
\(\text{Error}\)
\(\text {Steps}\)
\(Validation \) \(error\)
\(Training\) \(error\)
\(k-p\)
\(return\) \(this\) \(model\)
\(k\)
\(stop\)
\(\text{Error}\)
\(\text {Steps}\)
\(Validation \) \(error\)
\(Training\) \(error\)
\(k-p\)
\(return\) \(this\) \(model\)
\(k\)
\(stop\)
\(y_{final}\)
\(y_{lr}\)
\(y_{nb}\)
\(y\)
\(y\)
\(y_{svm}\)
\(x_1\)
\(x_2\)
\(x_3\)
\(x_4\)
\(Logistic Regression\)
\(SVM\)
\(Naive Bayes\)
\(y_{final}\)
\(y_{lr1}\)
\(y_{lr3}\)
\(y\)
\(y\)
\(y_{lr2}\)
\(y\)
\(Logistic \)
\(Regression\)
\(Logistic \)
\(Regression\)
\(Logistic \)
\(Regression\)
\(h_i\)
Feel free to play with the loss landscape : https://losslandscape.com/explorer
Data \(x_i\)
1. Data Augmentation
2. Noise Injection
Architecture choice \(f(\cdot)\)
1. Dropout
2. Skip connections (CNN)
3. Weight sharing
4. Pooling
Penalize cost \(\mathscr{L}(\cdot)\)
1. \(L_1\)
2. \(L_2\)
Optimizer \(\nabla\)
1. SGD
2. Early stopping
Explicit
Data Augmentation
Noise Injection
Dropout
Skip connections (CNN)
Weight sharing
Pooling
Implicit
\(L_1\), \(L_2\)
SGD
Large initial learning rate
Small initial learning rate
Early stopping