Stone/Iron Age
Industrial Age
Digital Age
Carved in Stones
Written on papers
Digitized
Parameterized
The Age of AI [has begun]*
Store and Retrieve
Store and Retrieve
Store and Retrieve
Store and Generate!
Specific Task Helper
n-gram models
Task-agnostic Feature Learners
word2vec, context modelling
Transfer Learning for NLP
ELMO, GPT, BERT
(pre-train,fine-tune)
General Language Models
GPT-3, InstructGPT
(emerging abilities)
Multi-head Masked Attention
tell
me
a
joke
about
idli
why
why
did
the
did
Multi-head Masked Attention
tell
me
a
joke
about
idli
why
why
did
the
did
idli
the
Input text
Predict the class/sentiment
Input text
Summarize
Question
Answer
Input text
Prompt: Input text
Output response conditioned on prompt
Prompt: Predict sentiment, summarize, fill in the blank, generate story
Labelled data for task-1
Labelled data for task-2
Labelled data for task-3
Raw text data
(cleaned)
Build one
Inadequate quality datasets for Indic Languages
English data
Use Instruction Fine-tuning and build datasets for the same
(full) Fine-Tuning of LLMs on Indic datasets still requires a lot of compute and expensive
Source Input Block
Target Input Block
Output Block (tied)
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
Encoder
Decoder
I am reading a book
Naan oru puthagathai padiththu kondirukiren
Naan oru puthagathai padiththu kondirukiren
Source Input Block
Target Input Block
Tokenizer
Token Ids
embeddings
I am reading a book
Tokenizer
I am reading a book
["i, am, reading, a, book]
Contains:
Normalizer: Lowercase, (I ->i)
Pre-tokenizer: Whitespace
Tok-algorithm: BPE
Tokenizer
I am reading a book
["i, am, reading, a, book]
Token Ids
["i:2, am:8, reading:75, a:4, book:100]
["[BOS]:1, i:2, am:8, reading:75, a:4, book:100,[EOS]:3]
I am reading a book
[" [BOS]:1 i:2 am:8 reading:75a:4 book:100 [EOS]:0 ]
Tokenizer
Token Ids
embeddings
I am reading a book
[" [BOS]:1 i:2 am:8 reading:75a:4 book:100 [EOS]:0 ]
Tokenizer
Token Ids
embeddings
position embeddings
I am reading a book
Source Input Block
Embedding for each input token
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
reading
I
am
a
book
Naan
puthakathai
padithtu
kondirukiren
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
Source Input Block
Target Input Block
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Masked Attention
Add&Norm
Position Encoding
Attention
Normalization
Activation
Input text
Predict the class/sentiment
Input text
Summarize
Question
Answer
Input text
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
\(x_1,<mask>,\cdots,x_{T}\)
\(P(x_2=?)\)
Multi-Head masked Attention
Feed forward NN
Add&Norm
Add&Norm
\(x_1,x_2,\cdots,x_{i-1}\)
\(P(x_i)\)
Multi-Head Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head cross Attention
Feed forward NN
Add&Norm
Add&Norm
Multi-Head Maksed Attention
Add&Norm
\(x_1,<mask>,\cdots,x_{T}\)
\(<go>\)
\(.,P(x_2|x_1,).,\)
Transformer Block 1
Transformer Block 2
Transformer Block 3
Transformer Block 4
Transformer Block 5
Suitable for translation
Suitable for tasks like Text generation, summarization
[mask]
enjoyed
the
[mask]
transformers
Encoder Layer (attention,FFN,Normalization,Residual connection)
Encoder Layer (attention,FFN,Normalization,Residual connection)
Encoder Layer (attention,FFN,Normalization,Residual connection)
[CLS]
I
enjoyed
the
movie
Feed Forward Network
Self-Attention
transformers
[SEP]
The
visuals
were
amazing
input: Sentence: A
input: Sentence: B
Label: IsNext
Special tokens: [CLS],[SEP]
[CLS]
I
enjoyed
the
movie
transformers
[SEP]
The
visuals
were
amazing
Position Embeddings
Segment Embeddings
Token Embeddings
Feed Forward Network
Self-Attention
Encoder Layer
Encoder Layer
Encoder Layer
Encoder Layer
[CLS]
[mask]
enjoyed
the
[mask]
transformers
[SEP]
The
[mask]
were
amazing
encoder
encoder
decoder
decoder
Scale:
Objective:MLM
Scale:
Scale:
pre-Training
pre-Training
pre-Training
Pretraining:
Fine-Tuning
Fine-Tuning
Fine-Tuning
FineTuning:
hyp-params:
hyp-params:
hyp-params:
Objective:
FineTuning:
Objective:
FineTuning: