Note, in this workshop we could briefly cover the core concepts so that one could sail through the documentation of hugging face.
For more details, you may read the lectures by Prof.Mitesh Khapra here ( all lecture recordings will be made available on youtube) or watch lectures by Andrej Karpathy
Specific Task Helper
n-gram models
Task-agnostic Feature Learners
word2vec, context modelling
Transfer Learning for NLP
ELMO, GPT, BERT
(pre-train,fine-tune)
General Language Models
GPT-3, InstructGPT
(emerging abilities)
Source Input Block
Target Input Block
Output Block (tied)
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
Encoder
Decoder
I am reading a book
Naan oru puthagathai padiththu kondirukiren
Naan oru puthagathai padiththu kondirukiren
Source Input Block
Target Input Block
Tokenizer
Token Ids
embeddings
I am reading a book
Tokenizer
I am reading a book
["i, am, reading, a, book]
Contains:
Normalizer: Lowercase, (I ->i)
Pre-tokenizer: Whitespace
Tok-algorithm: BPE
Tokenizer
I am reading a book
["i, am, reading, a, book]
Token Ids
["i:2, am:8, reading:75, a:4, book:100]
["[BOS]:1, i:2, am:8, reading:75, a:4, book:100,[EOS]:3]
I am reading a book
[" [BOS]:1 i:2 am:8 reading:75a:4 book:100 [EOS]:0 ]
Tokenizer
Token Ids
embeddings
I am reading a book
[" [BOS]:1 i:2 am:8 reading:75a:4 book:100 [EOS]:0 ]
Tokenizer
Token Ids
embeddings
position embeddings
I am reading a book
Source Input Block
Embedding for each input token
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
reading
I
am
a
book
Naan
puthakathai
padithtu
kondirukiren
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
Source Input Block
Target Input Block
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
Position Encoding
Attention
Normalization
Activation
Input text
Predict the class/sentiment
Input text
Summarize
Question
Answer
Input text
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
\(x_1,<mask>,\cdots,x_{T}\)
\(P(x_2=?)\)
Multi-Head masked Attention
Feed forward NN
Add&Norm
Add&Norm
\(x_1,x_2,\cdots,x_{i-1}\)
\(P(x_i)\)
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Maksed Attention
Add&Norm
\(x_1,<mask>,\cdots,x_{T}\)
\(<go>\)
\(.,P(x_2|x_1,).,\)
Transformer Block 1
Transformer Block 2
Transformer Block 3
Transformer Block 4
Transformer Block 5
Suitable for translation
Suitable for tasks like Text generation, summarization
[mask]
enjoyed
the
[mask]
transformers
Encoder Layer (attention,FFN,Normalization,Residual connection)
Encoder Layer (attention,FFN,Normalization,Residual connection)
Encoder Layer (attention,FFN,Normalization,Residual connection)
[CLS]
I
enjoyed
the
movie
Feed Forward Network
Self-Attention
transformers
[SEP]
The
visuals
were
amazing
input: Sentence: A
input: Sentence: B
Label: IsNext
Special tokens: [CLS],[SEP]
[CLS]
I
enjoyed
the
movie
transformers
[SEP]
The
visuals
were
amazing
Position Embeddings
Segment Embeddings
Token Embeddings
Feed Forward Network
Self-Attention
Encoder Layer
Encoder Layer
Encoder Layer
Encoder Layer
[CLS]
[mask]
enjoyed
the
[mask]
transformers
[SEP]
The
[mask]
were
amazing
encoder
encoder
decoder
decoder
Scale:
Objective:MLM
Scale:
Scale:
pre-Training
pre-Training
pre-Training
Pretraining:
Fine-Tuning
Fine-Tuning
Fine-Tuning
FineTuning:
hyp-params:
hyp-params:
hyp-params:
Objective:
FineTuning:
Objective:
FineTuning:
All points are taken from the book NLP with Transformers by Lewis Tunstall
All points are taken from the book NLP with Transformers by Lewis Tunstall
Core: tensors
JIT
nn
Optim
multiprocessing
quantization
sparse
ONNX
Distributed
fast.ai
Detectron 2
Horovod
Flair
AllenNLP
torch.vision
BoTorch
GloW
Lightening
Skorch
Transformers
Datasets
evaluate
trainer
accelerate
GradIO
Inference EndPoints
PEFT