Naive Bayes in sci-kit learn

Dr. Ashish Tendulkar

Machine Learning Practice

IIT Madras

Naive Bayes Classifier

  • Naive Bayes classifier applies Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

Naive Bayes classifier

For a given class variable \(y\) and dependent feature vector \(x_1\) through \(x_m\), 

the naive conditional independence assumption is given by:

P(x_i|y, x_1,...,x_{i-1},x_{i+1},...,x_m) = P(x_i|y)

Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods.

List of NB Classifiers

ComplementNB

GaussianNB

BernoulliNB

CategoricalNB

MultinomialNB

  • Implemented in sklearn.naive_bayes module
  • Implements fit method to estimate parameters of NB classifier with feature matrix and labels as inputs.
  • The prediction is performed using predict method.

Which NB to use if data is only numerical?

GaussianNB

implements the Gaussian Naive Bayes algorithm for classification

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

Instantiate a GaussianNBClassifer estimator and then call fit method using X_train and y_train.

Which NB to use if data is multinomially distributed?

MultinomialNB

​implements the naive Bayes algorithm for multinomially distributed data

(text classification)

from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb.fit(X_train, y_train)

Instantiate a MultinomialNBClassifer estimator and then call fit method using X_train and y_train.

What to do if data is imbalanced ?

ComplementNB

implements the complement naive Bayes (CNB) algorithm.

from sklearn.naive_bayes import ComplementNB
cnb = ComplementNB()
cnb.fit(X_train, y_train)

Instantiate a ComplementNBClassifer estimator and then call fit method using X_train and y_train.

CNB regularly outperforms MNB (often by a considerable margin) on text classification tasks.

What to do if data has multivariate Bernoulli distributions?

BernoulliNB

  • implements the naive Bayes algorithm for data that is distributed according to multivariate Bernoulli distributions
from sklearn.naive_bayes import BernoulliNB
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

Instantiate a BernoulliNBClassifer estimator and then call fit method using X_train and y_train.

  • each feature is assumed to be a binary-valued (Bernoulli, boolean) variable

What to do if data is categorical ?

CategoricalNB

implements the categorical naive Bayes algorithm suitable for classification with discrete features that are categorically distributed

from sklearn.naive_bayes import CategoricalNB 
canb = CategoricalNB()
canb.fit(X_train, y_train)

Instantiate a CategoricalNBClassifer estimator and then call fit method using X_train and y_train.

assumes that each feature, which is described by the index \(i\), has its own categorical distribution.

Copy of Classification functions in sci-kit learn

By Swarnim POD

Copy of Classification functions in sci-kit learn

  • 105