AI: A Moving Frontier

Intro to Intelligence, Learning by Neurons, Generative AI, Importance of data quality

Mitesh M. Khapra, Arun Prakash A

AI4Bharat, Department of Computer Science and Engineering, IIT Madras

source: Click on the image
1958
1970
2014
1966

AI in the News

2023

What is Intelligence?

there appears to be no simple definition of intelligence that is satisfactory to most observers

Looked at in one way, everyone knows what intelligence is; looked at in another way, no one does. 

No, we are not being pessimistic.

This means that intelligence is complex and difficult to define.

Pursuing something that is not well-defined is challenging , but that doesn't stop us from pushing the boundaries 

What is Intelligence?

A rock of any size, despite where it is in the universe, is not intelligent 

We all agree, without any definition of intelligence, that

What is Intelligence?

However, small insects, like ants

A rock of any size, irrespective of where it is in the universe, is not intelligent 

We all agree, without any definition of intelligence, that

, termites

are intelligent. Why?

What is Intelligence?

Why? They can solve problems individually and collectively, which requires the ability to plan, communicate, and coordinate effectively.

Adaptive bridge

What is Intelligence?

Why? They can solve problems  individually and collectively, which demands the ability to plan, communicate, and coordinate effectively.

A termite mound is home to millions of termites. It is weather-resistant and well-ventilated, allowing fungus, their food, to thrive.

What is Intelligence?

Why? They can solve problems  individually and collectively, which demands the ability to plan, communicate, and coordinate effectively.

However, their  intelligence is embedded/fixed (e.g., the weaver birds have been building the nest in the same pattern for thousands of years) and never grows (learns) with age or experience

Moreover, they do not vary across individuals (i.e., no weaver bird is superior to others in anyways)

What about Human Intelligence (HI)? 

well, we know that HI is not fixed but fluid

Human Intelligence

From ages 0 to 2, children can differentiate sounds, colors, and shapes while learning to walk and babble...

And intelligence grows with age through learning ..

Intelligence

The world

What a beautiful world it is

Intelligence

The world

The world

Learn to read, write, speak, walk, run, jump, draw, drive,

Human Intelligence

Reading, writing, speaking, and driving are complex skills unique to humans.

Intelligence

The world

The world

The world

The world

We preserve knowledge through writing in books, on computers, and more, and we pass it on to the next generation by teaching it!

Human Intelligence

What is Human Intelligence?

“a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. ”

- From the book "The Neuroscience of Intelligence"

Ability to learn and perform suitable techniques to solve problems and achieve goals, appropriate to the context in an uncertain, ever-varying world

- Christopher Manning-HAI

There are many definitions of human intelligence or intelligence in general

Where does intelligence come from?

Looked at in one way, everyone knows what intelligence is; looked at in another way, no one does.

To build (human-like) intelligence, we need to answer the following question

Residence of Human intelligence

From where does intelligence emerge?

It took thousands of years for Homo sapiens to find that the brain is the epicenter of intelligence!

The fundamental computational unit of the brain is the neuron.

It has billions of neurons connected to other neurons, forming trillions of connections.

The collective actions can be grouped based on their functions, such as vision, speech, hearing, motor skills, and touch.

The brain processes raw data from all senses to build the model of the world 

Therefore, unlike other species, human intelligence varies from individual to individual 

Intelligence emerges due to the  collective action of these neurons

What is Artificial intelligence?

the science and engineering of making intelligent machines

- John McCarthy

(in 1997) An IBM machine named Deep Blue defeated a world chess champion.

Is that intelligence? 

Yes and No!

" With the same object, therefore, it is possible that one man would consider it as intelligent and another would not; the second man would have found out the rules of its behaviour."

-Alan Turing

Yes for public

No for specialists

What is Artificial intelligence?

the science and engineering of making intelligent machines

- John McCarthy

(in 1997) An IBM machine named Deep Blue defeated a world chess champion.

Is that intelligence? 

the specific intelligence follows an explicit set of rules coded by specialists!

the machine didn't learn the rules by participating in thousands of tournaments!

So, how do we make machines to learn from data without giving any explicit set of instructions to follow?

Yes and No!

Is there someone in the image on the left?

You quickly responded "yes" without a doubt (high confidence). 

You would have seen a thousand different objects in your life. How did you distinguish this object (person) from the rest quickly ? Can you explain the process?

Is this problem easy or hard for you?

easy

"we feel we know what's going ‘on but can’t describe it properly. How could anything seem so close, yet always keep beyond our reach? "

- Marvin Minsky

Is there someone in the image on the left?

Is the person male or female?

Yes

Female.

Again, how did you do that? What cues did you use to distinguish a male from a female?

Though we might attempt to describe it, we do not know the exact process.

We have seen many examples from our childhood and learned to distinguish males from females

Is this problem easy or hard for you?

easy 

Is there someone in the image on the left?

Is that someone male or female?

Is she happy?

Is she wearing a saree?

How many ears/hands does she have?

Female.

Yes

Yes (same reasoning)

Highly Likely (we can reason/infer from cues). All of us come with more or less similar reasoning..

Two (used general knowledge even though the picture doesn't reveal that)

Are these problems easy or hard for you?

easy 

Among all possible routes from Chennai to Bengaluru, find the shortest route. 

Is this problem easy or hard for you?

Hard. 

Among all possible routes from Chennai to Bengaluru, find the shortest route. 

Is this problem easy or hard for you?

Hard. 

However, it is easy for computers since we can provide an algorithm (an explicit set of rules) and instruct the computer to follow it.

Key takeaway: If we have an explicit set of rules to solve a problem, we can ask computers to solve it quickly.

In reality, for many cognitive (vision, audio and text) problems, we do not have an explicit set of rules to code the machines to solve the problems.

Moravec’s paradox

Things that are easy for humans are difficult for AI and vice versa.

Make it Challenging

The cognitive problems  become even harder to solve (for machines) under various transformations of the input data 

original

rotated

change in illumination

Is this woman real or AI-generated? 

Apply your intelligence to this new problem :-) 

Change in scale

Artificial intelligence

What to learn?

The algorithm (set of rules, function,patterns..) that solves the given problem

There are many approaches. Here we focus on the approach of learning from data

How to learn?

Using machine learning models such as Artificial Neural Networks (ANN)

What is an artificial neuron in ANN?

It is a highly simplified computational model that mimics neurons in the brain

What exactly is being learned?

Let's see

AI is science and engineering of making machines smart 

Decision Problems

male or female

Intelligence is what you use when you don’t know what to do

Model

Model

Model

Intelligence is what you use when you ____ know what to do

Model

happy or sad

positive/negative/neutral

Predict the missing word

Most problems can be framed as decision (or prediction) problems.

While the models are complex and consist of billions of neurons, understanding how an individual neuron makes a decision based on a given input makes the underlying concepts clear.

Artificial Neuron

The output of a decision is either yes or no

 for example, 1 means yes, 0 means No.

How does a neuron make a decision?

The magical math operations involved are "multiply and add"

x = [0.1,2,10]
w = [-1,0.5,1]
\times
+
z = [0.1\times -1,2\times 0.5,10\times1]
z = [-0.1,1,10]
z = [10.9]

We want the output to be either 0(No) or 1 (yes)

Machines can understand only numbers. 

Therefore all the inputs and outputs should be represented as numbers

Artificial Neuron

The magical math operations involved are "multiply and add"

x = [0.1,2,10]
w = [-1,0.5,1]
\times
+
z = [0.1\times -1,2\times 0.5,10\times1]
z = [-0.1,1,10]
z = [10.9]

We want the output to be either 0(No) or 1 (yes). 

y = 0

1 if \(z>32\) else 0

Great but  

We are closer to answering that question, let's take an example

what exactly is being learned?

x = [0.1,2,10]
w = [-1,0.5,1]
\times
+
z = [0.1\times -1,2\times 0.5,10\times1]
z = [-0.1,1,10]
z = [10.9]

Artificial Neuron

x = [0.1,2,10]
w = [-1,0.5,1]
\times
+
z = [0.1\times -1,2\times 0.5,10\times1]
z = [-0.1,1,10]
z = [10.9]
y = 0

1 if \(z>32\) else 0

Suppose, you wish to buy a smart phone at a store. 

How do you decide whether to buy a particular phone or not?

* the image is for representational purposes only.

we consider a few factors (that vary across individuals) say, weight (in kg), review stars and price in thousands

Then for a particular phone these values will be \(x=[0.1,2,10]\) (that is, weight = 100 grams, stars=2, price=10k)

Then we internally weigh \((w)\) these factors to make a decision

That's what exactly the machine does! Even ChatGPT is not an exception!

Artificial Neuron

x = [0.1,2,10]
w = [-1,0.5,1]
\times
+
z = [0.1\times -1,2\times 0.5,10\times1]
z = [-0.1,1,10]
z = [10.9]
y = 0

1 if \(z>32\) else 0

Both \(x\) and\(w\) influence the decision

We can fix \(w\) (already learned) and vary \(x\) 

Let's see the decision for each configuration of \(x\) for a given \(w\) 

Artificial Neuron

x = [0.1,2,10]
w = [-1,0.5,1]
\times
+
z = [0.1\times -1,2\times 0.5,10\times1]
z = [-0.1,1,10]
z = [10.9]
y = 0

1 if \(z>32\) else 0

Great but  

what exactly is being learned?

The three weights in \(w\) are being learned from data

For ChatGPT, it is not just three numbers, but billions of numbers (technically called parameters)

Training Artificial Neuron

Training a neuron or neural network requires data 

example weight stars price label
1 0.1 2 35 0 (reject)
2 0.15 4 25 1 (buy)
3 0.17 3 15 1 (buy)

The data contains examples and ground truth values (labels, annotations,..) that act as a feedback

For the second example, the model outputs 0 (reject) but the desired output is 1 (buy)

So, we have to adjust the weights (change the sliders) until it outputs 1 (buy)

Weights are adjusted until the model predicts the output label correctly for all examples

Importance of Quality

example weight stars price label
1 0.1 2 35 0 (reject)
2 0.15 4 25 1 (buy)
3 0.17 3 15 1 (buy)

In practice, the weights are adjusted automatically by using the labels/annotations   as feedback.

Data: 

Labeling errors degrade the model's performance.

As a consequence, if the model makes a mistake, it can be quite difficult to trace it back to a labeling error.

Quality Data \(\rightarrow\) Quality Model

Narrow vs General Intelligence

male or female

Intelligence is what you use when you don’t know what to do

Model-1

Model-2

Model-3

Intelligence is what you use when you ____ know what to do

Model-4

happy or sad

positive/negative/neutral

Predict the missing word

Narrow AI: AI for one particular task (emotion recognition, sentiment classification,....) using different architectures (CNN, RNN, Transformers,..)

AGI*:  Broad intelligence that solves more than one task (helpful for human interactions with chatbots like chatGPT)

* there is no common agreement on the concept of AGI. Alternative proposal is Artificial Machine Intelligence (AMI)

Module 1 : From Discriminative to Generative

AI4Bharat, Department of Computer Science and Engineering, IIT Madras

Mitesh M. Khapra, Arun Prakash A

Not every problem is discriminative

The illustrative examples we have seen are discriminative, as they differentiate between categories such as male/female or happy/sad.

For decades, the majority of AI research has focused on developing algorithms to solve discriminative problems.

However, humans not only recognize (discriminate) patterns; they are also creative (generative/innovative) in both science and the arts.

Sakunthala by Ravi Varma (source)

It is very simple to be happy, but it is very difficult to be simple.

-Rabindranath Tagore

"An excited polar bear wearing swim goggles and an inner tube in a children's book illustration style with soft pastel colors. The simple character should be running on an empty, snow-dusted beach with subtle waves crashing on the shore nearby."

Imagine approaching an artist and narrating a scene in your mind to create a piece of art.

Artist needs to understand the description provided in natural language

If an AI is able to do that, then we call it generative AI.

As you might have guessed, the image is generated by an AI.

Today, one can generate not only images but also videos and texts. 

This has introduced a paradigm shift in the way we access information 

Not every problem is discriminative

We are in Generative Era

Stone/Iron Age 

Industrial Age

Digital  Age

Carved in Stones

Written on papers

 Digitized

 Parameterized

The Age of AI [has begun]*

Store and Retrieve

Store and Retrieve

Store and Retrieve

Store and Generate!

Model with Billions of Parameters

Creative Text Generation

"Any sufficiently Advanced Technology is Indistinguishable from Magic"

Model with Billions of Parameters

Text to Image Generation

"Any sufficiently Advanced Technology is Indistinguishable from Magic"

"An excited polar bear wearing swim goggles and an inner tube in a children's book illustration style with soft pastel colors. The simple character should be running on an empty, snow-dusted beach with subtle waves crashing on the shore nearby."

Model with Billions of Parameters

Text to  Video Generation

"Any sufficiently Advanced Technology is Indistinguishable from Magic"

"This medium shot, with a shallow depth of field, portrays a cute cartoon girl with wavy brown hair, sitting upright in a 1980s kitchen. Her hair is medium length and wavy. She has a small, slightly upturned nose, and small, rounded ears. She is very animated and excited as she talks to the camera."

Disrupted the way we search for information in Google  (elaborate on it )

Module 2 : How Does a Generative Model work?

AI4Bharat, Department of Computer Science and Engineering, IIT Madras

Mitesh M. Khapra, Arun Prakash A

Trillions of 

Tokens

Billions of 

Parameters

Zetta FLOPS 

of Compute

GenAI

+
+

Pretraining: Adjust Billion Parameters 

The model outputs are generated from the information it learned (parameterized) by churning through billions of pages from the internet during training.

The idea is simple

Model predicts the next word (token) in a sequence given the previous words (tokens)

tell me a ____

joke

predicted word (likely)

tell me a joke ___

book

predicted word (less likely)

keep predicting next trillion words in a corpus

Trillions of 

Tokens

Billions of 

Parameters

Zetta FLOPS 

of Compute

GenAI

+
+

Pretraining: Adjust Billion Parameters 

The model outputs are generated from the information it learned (parameterized) by churning through billions of pages from the internet during training.

It would take 150,000 years for a human  to read the training data (12 hr/day, 256 w/minute) [Source]

Online content can contain both valuable and misleading information. Eliminating the bad or misleading data is not a simple task.

This information could manifest in the generated response (we will see some examples soon)

Instruction Tuning

We cannot directly use a pre-trained model for any application.

Model

\leftarrow prompt \rightarrow

If we use it, it will generate a coherent response as shown in the figure right

This is not what the user wants. We say the model doesn't understand the user's intent.

Well, we didn't train it to align it's response with the user's intent!

Using instruction tuning techniques, we align the model's responses with user intent.

The model continues to predict the next token after instruction tuning

Model

Neural Language Models

N \times

tell

me

a

joke

about

idli

\leftarrow prompt \rightarrow
\leftarrow response \rightarrow
\leftarrow prompt \rightarrow
W_v

why

why

W_v
W_v

did

the

did

\leftarrow response \rightarrow
\cdots
\cdots

Instruction Tuning

Instruction Tuning

Model

\leftarrow prompt \rightarrow
\leftarrow response \rightarrow

However, recall that

Online content can contain both valuable and misleading information. Eliminating the bad or misleading data is not a simple task.

This information could manifest in the generated response (we will see some examples soon)

Because the model predicts the next word based on previous words, it can create false information, produce toxic responses, and may reflect stereotypes found in the data.

There is no doubt that GenAI is closing the gap between humans and machines.

Hallucination

Gender Bias and Stereotypes

Harmful Generation

Factual error

(hallucination is fine for creative applications (writing poems, jokes..) but not for all applications)

Some examples

Trillions of 

Tokens

Billions of 

Parameters

Zetta FLOPS 

of Compute

GenAI

+
+

Quality Wins Quantity

Therefore, ensuring the quality of data ( factualness, correct labels/annotations,..) significantly increases the reliability of the model's output

Data annotators playing the important role in producing helpful AI tools.

It needs human involvement to enhance the reliability of the model's output.

Continual Pre-training

Pre-training is an expensive operation (costs billions of dollars)

Additionally, the pre-training data is dominated by English.

\(\approx\)550 Billion tokens

\(\approx 1.5\) Billion tokens

For examples, taking a snapshot of the entire internet for a year (2019) and counting the number of tokens for each language will yield the following distribution.

We have about 1.1 Billion tokens for Hindi and Tamil

Therefore, these pre-trained models miss the nuances of languages other than English.

For example, even the recent GPT-4o  generates a wrong interpretation of the word "vaalarivan" in Tamil. 

Continual Pre-training

*If possible add one more for another language

One way to improve its performance is to create high-quality data in Tamil and continue pre-training the model using the new dataset. 

This is called continual pretraining. 

It is more cost-effective than pre-training

Building these datasets is both time-consuming and costly due to the involvement of humans.

AI is not Mystic, it is a mashup spirit of data labellers

- Andrej Karpathy

For example when you ask eg “top 10 sights in Amsterdam” or something, some hired data labeler probably saw a similar question at some point, researched it for 20 minutes using Google and Trip Advisor or something, came up with some list of 10, which literally then becomes the correct answer, training the AI to give that answer for that question. If the exact place in question is not in the finetuning training set, the neural net imputes a list of statistically similar vibes based on its knowledge gained from the pretraining stage (language modeling of internet documents).

Effort by AI4Bharat 

English data

Capture all India specific knowledge in all Indian Languages!

A Long Way to Go

Accurately recognizing the contents of rich historical documents, scripts, ancient books is essential for understanding the rich history and diverse culture of people

Building a high-performing OCR model requires high-quality annotated data.

Errors in recognition impact the performance of language models that depend on them.

शाखा कार्यालय / கிளை அனுவலகம்
चेन्ने शाखा कार्यालय - 1
चेन्ने शाखा कार्यालय - 11
मदुरै शाख कार्यालय
कोयंबट्र शाखा कार्यालय
சென்னை கிளை அலுவலகம் - 1
சென்னை கிளை அலுவலகம் - 11
மதுவர கிலை அறுவலைகள்
கோயம்புத்தூர் கிளை அலுவலகம்
mdbo-bis@bis.gov.in
cnbo1@bis.gov.in
cnbo2@bis.gov.in
cbto@bis.gov.in

Sample

Annotation

A Long Way to Go

Prompt: Shakespeare, Aristotle and Plato happily talk to each other in a church

Prompt: Thiruvalluvar, Rabindranath Tagore and Bharathiyar happily talk to each other in a temple

A Long Way to Go

Prompt: Thiruvalluvar, Rabindranath Tagore and Bharathiyar happily talk to each other in a temple

The models were trained to understand the prompt using examples that contain a textual description of a scene (by a group of people) as shown on the right

The model may not have seen a sample that contained description similar to this.

What if we use a different model for image generation?

A Long Way to Go

Prompt: Thiruvalluvar, Rabindranath Tagore and Bharathiyar having a conversation in a temple

Though not fully correct, it could understand the prompt to some extent. 

Using more annotated data that captures culture intricacies might help the model to generate better images

Again, this requires human efforts which is time consuming and costly.

Multi-Modal LLMs

A model which is trained on multiple modalities (text, image, audio and video) is called multi-modal LLM

Again, What about the data for  Indian Context?

A long way to go!

What is Googolplex?
Trained Model
A Googolplex is an extremely large number.

User Input

Generated Output

Trillions of

words

Billions of

Parameters

Massive

Compute

Trained

Model

+
+
Human Feedback
Pre-trained model
Model response
Dataset of input prompts

Tokenizer

What is Googolplex?
What is Goog ol plex ?

Three words

Six tokens

Data_Annotation_Lecture_1

By Arun Prakash

Data_Annotation_Lecture_1

  • 111