AI: A Moving Frontier
Intro to Intelligence, Learning by Neurons, Generative AI, Importance of data quality
Mitesh M. Khapra, Arun Prakash A


AI4Bharat, Department of Computer Science and Engineering, IIT Madras
source: Click on the image1958197020141966
AI in the News

2023What is Intelligence?
there appears to be no simple definition of intelligence that is satisfactory to most observers
Looked at in one way, everyone knows what intelligence is; looked at in another way, no one does.
No, we are not being pessimistic.
This means that intelligence is complex and difficult to define.
Pursuing something that is not well-defined is challenging , but that doesn't stop us from pushing the boundaries
What is Intelligence?
A rock of any size, despite where it is in the universe, is not intelligent
We all agree, without any definition of intelligence, that

What is Intelligence?

However, small insects, like ants
A rock of any size, irrespective of where it is in the universe, is not intelligent
We all agree, without any definition of intelligence, that
, termites
are intelligent. Why?
What is Intelligence?
Why? They can solve problems individually and collectively, which requires the ability to plan, communicate, and coordinate effectively.
Adaptive bridge
What is Intelligence?
Why? They can solve problems individually and collectively, which demands the ability to plan, communicate, and coordinate effectively.

A termite mound is home to millions of termites. It is weather-resistant and well-ventilated, allowing fungus, their food, to thrive.
What is Intelligence?
Why? They can solve problems individually and collectively, which demands the ability to plan, communicate, and coordinate effectively.

However, their intelligence is embedded/fixed (e.g., the weaver birds have been building the nest in the same pattern for thousands of years) and never grows (learns) with age or experience
Moreover, they do not vary across individuals (i.e., no weaver bird is superior to others in anyways)
What about Human Intelligence (HI)?
well, we know that HI is not fixed but fluid
Human Intelligence

From ages 0 to 2, children can differentiate sounds, colors, and shapes while learning to walk and babble...
And intelligence grows with age through learning ..
Intelligence


The world

What a beautiful world it is


Intelligence
The world




The world
Learn to read, write, speak, walk, run, jump, draw, drive,
Human Intelligence
Reading, writing, speaking, and driving are complex skills unique to humans.




Intelligence
The world



The world


The world


The world



We preserve knowledge through writing in books, on computers, and more, and we pass it on to the next generation by teaching it!
Human Intelligence
What is Human Intelligence?
“a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. ”
- From the book "The Neuroscience of Intelligence"
Ability to learn and perform suitable techniques to solve problems and achieve goals, appropriate to the context in an uncertain, ever-varying world
- Christopher Manning-HAI
There are many definitions of human intelligence or intelligence in general
Where does intelligence come from?
Looked at in one way, everyone knows what intelligence is; looked at in another way, no one does.
To build (human-like) intelligence, we need to answer the following question
Residence of Human intelligence
From where does intelligence emerge?
It took thousands of years for Homo sapiens to find that the brain is the epicenter of intelligence!
The fundamental computational unit of the brain is the neuron.
It has billions of neurons connected to other neurons, forming trillions of connections.
The collective actions can be grouped based on their functions, such as vision, speech, hearing, motor skills, and touch.
The brain processes raw data from all senses to build the model of the world
Therefore, unlike other species, human intelligence varies from individual to individual
Intelligence emerges due to the collective action of these neurons
What is Artificial intelligence?
the science and engineering of making intelligent machines
- John McCarthy
(in 1997) An IBM machine named Deep Blue defeated a world chess champion.

Is that intelligence?
Yes and No!
" With the same object, therefore, it is possible that one man would consider it as intelligent and another would not; the second man would have found out the rules of its behaviour."
-Alan Turing
Yes for public
No for specialists
What is Artificial intelligence?
the science and engineering of making intelligent machines
- John McCarthy
(in 1997) An IBM machine named Deep Blue defeated a world chess champion.

Is that intelligence?
the specific intelligence follows an explicit set of rules coded by specialists!
the machine didn't learn the rules by participating in thousands of tournaments!
So, how do we make machines to learn from data without giving any explicit set of instructions to follow?
Yes and No!

Is there someone in the image on the left?
You quickly responded "yes" without a doubt (high confidence).
You would have seen a thousand different objects in your life. How did you distinguish this object (person) from the rest quickly ? Can you explain the process?
Is this problem easy or hard for you?
easy
"we feel we know what's going ‘on but can’t describe it properly. How could anything seem so close, yet always keep beyond our reach? "
- Marvin Minsky

Is there someone in the image on the left?
Is the person male or female?
Yes
Female.
Again, how did you do that? What cues did you use to distinguish a male from a female?
Though we might attempt to describe it, we do not know the exact process.
We have seen many examples from our childhood and learned to distinguish males from females
Is this problem easy or hard for you?
easy

Is there someone in the image on the left?
Is that someone male or female?
Is she happy?
Is she wearing a saree?
How many ears/hands does she have?
Female.
Yes
Yes (same reasoning)
Highly Likely (we can reason/infer from cues). All of us come with more or less similar reasoning..
Two (used general knowledge even though the picture doesn't reveal that)
Are these problems easy or hard for you?
easy

Among all possible routes from Chennai to Bengaluru, find the shortest route.
Is this problem easy or hard for you?
Hard.
Among all possible routes from Chennai to Bengaluru, find the shortest route.
Is this problem easy or hard for you?
Hard.

However, it is easy for computers since we can provide an algorithm (an explicit set of rules) and instruct the computer to follow it.
Key takeaway: If we have an explicit set of rules to solve a problem, we can ask computers to solve it quickly.
In reality, for many cognitive (vision, audio and text) problems, we do not have an explicit set of rules to code the machines to solve the problems.
Moravec’s paradox
Things that are easy for humans are difficult for AI and vice versa.
Make it Challenging

The cognitive problems become even harder to solve (for machines) under various transformations of the input data


original
rotated
change in illumination
Is this woman real or AI-generated?
Apply your intelligence to this new problem :-)

Change in scale
Artificial intelligence
What to learn?
The algorithm (set of rules, function,patterns..) that solves the given problem
There are many approaches. Here we focus on the approach of learning from data
How to learn?
Using machine learning models such as Artificial Neural Networks (ANN)
What is an artificial neuron in ANN?
It is a highly simplified computational model that mimics neurons in the brain
What exactly is being learned?
Let's see
AI is science and engineering of making machines smart
Decision Problems

male or female

Intelligence is what you use when you don’t know what to do
Model
Model
Model
Intelligence is what you use when you ____ know what to do
Model
happy or sad
positive/negative/neutral
Predict the missing word
Most problems can be framed as decision (or prediction) problems.
While the models are complex and consist of billions of neurons, understanding how an individual neuron makes a decision based on a given input makes the underlying concepts clear.
Artificial Neuron
The output of a decision is either yes or no
for example, 1 means yes, 0 means No.
How does a neuron make a decision?
The magical math operations involved are "multiply and add"
We want the output to be either 0(No) or 1 (yes)
Machines can understand only numbers.
Therefore all the inputs and outputs should be represented as numbers
Artificial Neuron
The magical math operations involved are "multiply and add"
We want the output to be either 0(No) or 1 (yes).
1 if \(z>32\) else 0
Great but
We are closer to answering that question, let's take an example
what exactly is being learned?
Artificial Neuron
1 if \(z>32\) else 0
Suppose, you wish to buy a smart phone at a store.
How do you decide whether to buy a particular phone or not?
* the image is for representational purposes only.
we consider a few factors (that vary across individuals) say, weight (in kg), review stars and price in thousands
Then for a particular phone these values will be \(x=[0.1,2,10]\) (that is, weight = 100 grams, stars=2, price=10k)
Then we internally weigh \((w)\) these factors to make a decision
That's what exactly the machine does! Even ChatGPT is not an exception!
Artificial Neuron
1 if \(z>32\) else 0
Both \(x\) and\(w\) influence the decision
We can fix \(w\) (already learned) and vary \(x\)
Let's see the decision for each configuration of \(x\) for a given \(w\)
Artificial Neuron
1 if \(z>32\) else 0
Great but
what exactly is being learned?
The three weights in \(w\) are being learned from data
For ChatGPT, it is not just three numbers, but billions of numbers (technically called parameters)
Training Artificial Neuron
Training a neuron or neural network requires data
| example | weight | stars | price | label |
|---|---|---|---|---|
| 1 | 0.1 | 2 | 35 | 0 (reject) |
| 2 | 0.15 | 4 | 25 | 1 (buy) |
| 3 | 0.17 | 3 | 15 | 1 (buy) |
The data contains examples and ground truth values (labels, annotations,..) that act as a feedback
For the second example, the model outputs 0 (reject) but the desired output is 1 (buy)
So, we have to adjust the weights (change the sliders) until it outputs 1 (buy)
Weights are adjusted until the model predicts the output label correctly for all examples
Importance of Quality
| example | weight | stars | price | label |
|---|---|---|---|---|
| 1 | 0.1 | 2 | 35 | 0 (reject) |
| 2 | 0.15 | 4 | 25 | 1 (buy) |
| 3 | 0.17 | 3 | 15 | 1 (buy) |
In practice, the weights are adjusted automatically by using the labels/annotations as feedback.
Data:
Labeling errors degrade the model's performance.
As a consequence, if the model makes a mistake, it can be quite difficult to trace it back to a labeling error.
Quality Data \(\rightarrow\) Quality Model
Narrow vs General Intelligence

male or female

Intelligence is what you use when you don’t know what to do
Model-1
Model-2
Model-3
Intelligence is what you use when you ____ know what to do
Model-4
happy or sad
positive/negative/neutral
Predict the missing word
Narrow AI: AI for one particular task (emotion recognition, sentiment classification,....) using different architectures (CNN, RNN, Transformers,..)
AGI*: Broad intelligence that solves more than one task (helpful for human interactions with chatbots like chatGPT)
* there is no common agreement on the concept of AGI. Alternative proposal is Artificial Machine Intelligence (AMI)
Module 1 : From Discriminative to Generative


AI4Bharat, Department of Computer Science and Engineering, IIT Madras
Mitesh M. Khapra, Arun Prakash A
Not every problem is discriminative
The illustrative examples we have seen are discriminative, as they differentiate between categories such as male/female or happy/sad.
For decades, the majority of AI research has focused on developing algorithms to solve discriminative problems.
However, humans not only recognize (discriminate) patterns; they are also creative (generative/innovative) in both science and the arts.

Sakunthala by Ravi Varma (source)
It is very simple to be happy, but it is very difficult to be simple.
-Rabindranath Tagore
"An excited polar bear wearing swim goggles and an inner tube in a children's book illustration style with soft pastel colors. The simple character should be running on an empty, snow-dusted beach with subtle waves crashing on the shore nearby."
Imagine approaching an artist and narrating a scene in your mind to create a piece of art.
Artist needs to understand the description provided in natural language

If an AI is able to do that, then we call it generative AI.
As you might have guessed, the image is generated by an AI.
Today, one can generate not only images but also videos and texts.
This has introduced a paradigm shift in the way we access information
Not every problem is discriminative



We are in Generative Era
Stone/Iron Age
Industrial Age
Digital Age
Carved in Stones
Written on papers
Digitized
Parameterized




The Age of AI [has begun]*
Store and Retrieve
Store and Retrieve
Store and Retrieve
Store and Generate!










Model with Billions of Parameters





Creative Text Generation
"Any sufficiently Advanced Technology is Indistinguishable from Magic"
Model with Billions of Parameters





Text to Image Generation
"Any sufficiently Advanced Technology is Indistinguishable from Magic"
"An excited polar bear wearing swim goggles and an inner tube in a children's book illustration style with soft pastel colors. The simple character should be running on an empty, snow-dusted beach with subtle waves crashing on the shore nearby."

Model with Billions of Parameters





Text to Video Generation
"Any sufficiently Advanced Technology is Indistinguishable from Magic"
"This medium shot, with a shallow depth of field, portrays a cute cartoon girl with wavy brown hair, sitting upright in a 1980s kitchen. Her hair is medium length and wavy. She has a small, slightly upturned nose, and small, rounded ears. She is very animated and excited as she talks to the camera."
Disrupted the way we search for information in Google (elaborate on it )
Module 2 : How Does a Generative Model work?


AI4Bharat, Department of Computer Science and Engineering, IIT Madras
Mitesh M. Khapra, Arun Prakash A





Trillions of
Tokens
Billions of
Parameters
Zetta FLOPS
of Compute
GenAI
Pretraining: Adjust Billion Parameters



The model outputs are generated from the information it learned (parameterized) by churning through billions of pages from the internet during training.
The idea is simple
Model predicts the next word (token) in a sequence given the previous words (tokens)
tell me a ____
joke
predicted word (likely)
tell me a joke ___
book
predicted word (less likely)
keep predicting next trillion words in a corpus





Trillions of
Tokens
Billions of
Parameters
Zetta FLOPS
of Compute
GenAI
Pretraining: Adjust Billion Parameters



The model outputs are generated from the information it learned (parameterized) by churning through billions of pages from the internet during training.
It would take 150,000 years for a human to read the training data (12 hr/day, 256 w/minute) [Source]
Online content can contain both valuable and misleading information. Eliminating the bad or misleading data is not a simple task.
This information could manifest in the generated response (we will see some examples soon)
Instruction Tuning
We cannot directly use a pre-trained model for any application.





Model

If we use it, it will generate a coherent response as shown in the figure right

This is not what the user wants. We say the model doesn't understand the user's intent.
Well, we didn't train it to align it's response with the user's intent!
Using instruction tuning techniques, we align the model's responses with user intent.
The model continues to predict the next token after instruction tuning





Model

Neural Language Models

tell
me
a
joke
about
idli




why
why




did
the
did
Instruction Tuning
Instruction Tuning





Model


However, recall that
Online content can contain both valuable and misleading information. Eliminating the bad or misleading data is not a simple task.
This information could manifest in the generated response (we will see some examples soon)
Because the model predicts the next word based on previous words, it can create false information, produce toxic responses, and may reflect stereotypes found in the data.
There is no doubt that GenAI is closing the gap between humans and machines.


Hallucination
Gender Bias and Stereotypes
Harmful Generation

Factual error
(hallucination is fine for creative applications (writing poems, jokes..) but not for all applications)
Some examples





Trillions of
Tokens
Billions of
Parameters
Zetta FLOPS
of Compute
GenAI
Quality Wins Quantity



Therefore, ensuring the quality of data ( factualness, correct labels/annotations,..) significantly increases the reliability of the model's output
Data annotators playing the important role in producing helpful AI tools.
It needs human involvement to enhance the reliability of the model's output.
Continual Pre-training
Pre-training is an expensive operation (costs billions of dollars)
Additionally, the pre-training data is dominated by English.

\(\approx\)550 Billion tokens
\(\approx 1.5\) Billion tokens
For examples, taking a snapshot of the entire internet for a year (2019) and counting the number of tokens for each language will yield the following distribution.
We have about 1.1 Billion tokens for Hindi and Tamil
Therefore, these pre-trained models miss the nuances of languages other than English.
For example, even the recent GPT-4o generates a wrong interpretation of the word "vaalarivan" in Tamil.
Continual Pre-training
*If possible add one more for another language
One way to improve its performance is to create high-quality data in Tamil and continue pre-training the model using the new dataset.
This is called continual pretraining.
It is more cost-effective than pre-training

Building these datasets is both time-consuming and costly due to the involvement of humans.
AI is not Mystic, it is a mashup spirit of data labellers
- Andrej Karpathy
For example when you ask eg “top 10 sights in Amsterdam” or something, some hired data labeler probably saw a similar question at some point, researched it for 20 minutes using Google and Trip Advisor or something, came up with some list of 10, which literally then becomes the correct answer, training the AI to give that answer for that question. If the exact place in question is not in the finetuning training set, the neural net imputes a list of statistically similar vibes based on its knowledge gained from the pretraining stage (language modeling of internet documents).
Effort by AI4Bharat



English data
Capture all India specific knowledge in all Indian Languages!

A Long Way to Go

Accurately recognizing the contents of rich historical documents, scripts, ancient books is essential for understanding the rich history and diverse culture of people
Building a high-performing OCR model requires high-quality annotated data.
Errors in recognition impact the performance of language models that depend on them.

शाखा कार्यालय / கிளை அனுவலகம்
चेन्ने शाखा कार्यालय - 1
चेन्ने शाखा कार्यालय - 11
मदुरै शाख कार्यालय
कोयंबट्र शाखा कार्यालय
சென்னை கிளை அலுவலகம் - 1
சென்னை கிளை அலுவலகம் - 11
மதுவர கிலை அறுவலைகள்
கோயம்புத்தூர் கிளை அலுவலகம்
mdbo-bis@bis.gov.in
cnbo1@bis.gov.in
cnbo2@bis.gov.in
cbto@bis.gov.in
Sample
Annotation

A Long Way to Go
Prompt: Shakespeare, Aristotle and Plato happily talk to each other in a church

Prompt: Thiruvalluvar, Rabindranath Tagore and Bharathiyar happily talk to each other in a temple

A Long Way to Go
Prompt: Thiruvalluvar, Rabindranath Tagore and Bharathiyar happily talk to each other in a temple
The models were trained to understand the prompt using examples that contain a textual description of a scene (by a group of people) as shown on the right

The model may not have seen a sample that contained description similar to this.
What if we use a different model for image generation?
A Long Way to Go
Prompt: Thiruvalluvar, Rabindranath Tagore and Bharathiyar having a conversation in a temple

Though not fully correct, it could understand the prompt to some extent.
Using more annotated data that captures culture intricacies might help the model to generate better images
Again, this requires human efforts which is time consuming and costly.
Multi-Modal LLMs

A model which is trained on multiple modalities (text, image, audio and video) is called multi-modal LLM
Again, What about the data for Indian Context?
A long way to go!
What is Googolplex?Trained ModelA Googolplex is an extremely large number.User Input
Generated Output





Trillions of
words
Billions of
Parameters
Massive
Compute
Trained
Model



Human FeedbackPre-trained model
Model responseDataset of input promptsTokenizer
What is Googolplex?What is Goog ol plex ?Three words
Six tokens
Data_Annotation_Lecture_1
By Arun Prakash
Data_Annotation_Lecture_1
- 111
















