Naive Bayes in sci-kit learn
Dr. Ashish Tendulkar
Machine Learning Practice
IIT Madras
Naive Bayes Classifier
- Naive Bayes classifier applies Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.
Naive Bayes classifier
For a given class variable \(y\) and dependent feature vector \(x_1\) through \(x_m\),
the naive conditional independence assumption is given by:
Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods.
List of NB Classifiers
ComplementNB
GaussianNB
BernoulliNB
CategoricalNB
MultinomialNB
- Implemented in sklearn.naive_bayes module
- Implements fit method to estimate parameters of NB classifier with feature matrix and labels as inputs.
- The prediction is performed using predict method.
Which NB to use if data is only numerical?
GaussianNB
implements the Gaussian Naive Bayes algorithm for classification
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
Instantiate a GaussianNBClassifer estimator and then call fit method using X_train and y_train.
Which NB to use if data is multinomially distributed?
MultinomialNB
implements the naive Bayes algorithm for multinomially distributed data
(text classification)
from sklearn.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
Instantiate a MultinomialNBClassifer estimator and then call fit method using X_train and y_train.
What to do if data is imbalanced ?
ComplementNB
implements the complement naive Bayes (CNB) algorithm.
from sklearn.naive_bayes import ComplementNB
cnb = ComplementNB()
cnb.fit(X_train, y_train)
Instantiate a ComplementNBClassifer estimator and then call fit method using X_train and y_train.
CNB regularly outperforms MNB (often by a considerable margin) on text classification tasks.
What to do if data has multivariate Bernoulli distributions?
BernoulliNB
- implements the naive Bayes algorithm for data that is distributed according to multivariate Bernoulli distributions
from sklearn.naive_bayes import BernoulliNB
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
Instantiate a BernoulliNBClassifer estimator and then call fit method using X_train and y_train.
- each feature is assumed to be a binary-valued (Bernoulli, boolean) variable
What to do if data is categorical ?
CategoricalNB
implements the categorical naive Bayes algorithm suitable for classification with discrete features that are categorically distributed
from sklearn.naive_bayes import CategoricalNB
canb = CategoricalNB()
canb.fit(X_train, y_train)
Instantiate a CategoricalNBClassifer estimator and then call fit method using X_train and y_train.
assumes that each feature, which is described by the index \(i\), has its own categorical distribution.
Copy of Classification functions in sci-kit learn
By Swarnim POD
Copy of Classification functions in sci-kit learn
- 105