What is Bayesian Classifier and how Bayesian Classifier works ?

September 12, 2017 Author: virendra
Print Friendly, PDF & Email

Bayesian classifier: introduction

A Bayesian classifier is based on idea that can predict values of features for members of learned classes. In data set patterns are grouped in classes because they have common features. Such classes are often called natural kinds.

The idea behind a Bayesian classifier is that, if someone knows the classes, it can predict values of  similar other patterns. If it does not know the class, Bayes’ rule can be used to predict the class given according to attributes. In a Bayesian classifier, the learning model is a probabilistic model of attributes and that predict the classification labels of a new similar Pattern. A latent variable is a probabilistic variable that is not observed. A Bayesian classifier is a probabilistic model where the classification is a latent variable that is probabilistically related to the observed variables. Classification then becomes inference in the probabilistic model.



Naive Bayes is a family of probabilistic algorithms that take advantage of probability theory and Bayes’ Theorem to predict the category of a sample (like a piece of news or a customer review). They are probabilistic, which means that they calculate the probability of each category for a given sample, and then output the category with the highest one. The way they get these probabilities is by using Bayes’ Theorem, which describes the probability of a feature, based on prior knowledge of conditions that might be related to that feature.

The simplest case is the naive Bayesian classifier, which makes the independence assumption that the input features are conditionally independent of each other given the classification. The independence of the naive Bayesian classifier is embodied in a particular belief network where the features are the nodes, the target variable (the classification) has no parents, and the classification is the only parent of each input feature. This belief network requires the probability distributions ​\( P (Y) \)​ for the target feature Y and ​\( P(X_i/Y) \)​ for each input feature ​\( X_i \)​. For each example, the prediction can be computed by conditioning on observed values for the input features and by querying the classification.

Naive Bayes algorithm





It is a classification technique based on Bayes’ Theorem with assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that presence of a particular attribute in a class is unrelated to presence of any other attribute. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Bayes theorem provides a way of calculating posterior probability ​\( P (c/x) \)​, from ​\( P (c) P (x) \)​ and ​\( P (x/c) \)​.Look at the equation below:

\[ P (c/x)= (P (x/c)P(c))/(P(x)) \]

\[ P (c/X)= P (x_1/c)*P (x_2/c)*….P(x_n/c)*P(c) \]

Where, ​\( P (c/x) \)​ is the posterior probability of class (c, target) given predictor (x, attributes),

\( P (c), \)​is the prior probability of class.

\( P (x/c), \)​ is the likelihood which is the probability of predictor given class.

\( P (x) \)​ is the prior probability of predictor.

Working of Naive Bayes Algorithm





Let’s understand it using an example. Below we have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

Bayesian Classifier example

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

\[ P (Yes/Sunny)= P (Sunny/Yes)* (P (Yes))/(P (Sunny)) \]

Here we have,

\[ P (Sunny/Yes)=3/9=0.33,P (Sunny)=5/14=0.36,P (Yes)=9/14=0.64 \]

Now, ​\( P (Yes/Sunny)=(0.33*0.64)/0.36=0.60 \)​ this has higher probability

Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.

Pros and Cons of Naive Bayes

Pros:

  • It is easy and fast to predict class of test data set. It also perform well in multi class prediction
  • When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
  • It performs well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

Cons:

  • If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
  • On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
  • Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

References

[1] Artificial Intelligence, “Foundations of Computational Agents”, available online at: http://artint.info/html/ArtInt_181.html

[2] “A practical explanation of a Naive Bayes classifier”, available online at: https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/

[3] Sunil Ray, “6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python)”, September 13, 2015, available online at: https://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/

[4]https://stats.stackexchange.com/questions/4949/calculating-the-error-of-bayes-classifier-analytically (image)

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert