# Bayesian Classifier

September 12, 2017

# Bayesian classifier: introduction

A Bayesian classifier is based on the idea that the role of a (natural) class is to predict the values of features for members of that class. Examples are grouped in classes because they have common values for the features. Such classes are often called natural kinds. In this section, the target feature corresponds to a discrete class, which is not necessarily binary.

The idea behind a Bayesian classifier is that, if an agent knows the class, it can predict the values of the other features. If it does not know the class, Bayes’ rule can be used to predict the class given (some of) the feature values. In a Bayesian classifier, the learning agent builds a probabilistic model of the features and uses that model to predict the classification of a new example. A latent variable is a probabilistic variable that is not observed. A Bayesian classifier is a probabilistic model where the classification is a latent variable that is probabilistically related to the observed variables. Classification then becomes inference in the probabilistic model.

Naive Bayes is a family of probabilistic algorithms that take advantage of probability theory and Bayes’ Theorem to predict the category of a sample (like a piece of news or a customer review). They are probabilistic, which means that they calculate the probability of each category for a given sample, and then output the category with the highest one. The way they get these probabilities is by using Bayes’ Theorem, which describes the probability of a feature, based on prior knowledge of conditions that might be related to that feature.

The simplest case is the naive Bayesian classifier, which makes the independence assumption that the input features are conditionally independent of each other given the classification. The independence of the naive Bayesian classifier is embodied in a particular belief network where the features are the nodes, the target variable (the classification) has no parents, and the classification is the only parent of each input feature. This belief network requires the probability distributions ​$$P (Y)$$​ for the target feature Y and ​$$P(X_i/Y)$$​ for each input feature ​$$X_i$$​. For each example, the prediction can be computed by conditioning on observed values for the input features and by querying the classification.

## Naive Bayes algorithm

It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Bayes theorem provides a way of calculating posterior probability ​$$P (c/x)$$​, from ​$$P (c) P (x)$$​ and ​$$P (x/c)$$​.Look at the equation below:

$P (c/x)= (P (x/c)P(c))/(P(x))$

$P (c/X)= P (x_1/c)*P (x_2/c)*….P(x_n/c)*P(c)$

Where, ​$$P (c/x)$$​ is the posterior probability of class (c, target) given predictor (x, attributes),

$$P (c),$$​is the prior probability of class.

$$P (x/c),$$​ is the likelihood which is the probability of predictor given class.

$$P (x)$$​ is the prior probability of predictor.

### Working of Naive Bayes Algorithm

Let’s understand it using an example. Below we have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

Problem: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

$P (Yes/Sunny)= P (Sunny/Yes)* (P (Yes))/(P (Sunny))$

Here we have,

$P (Sunny/Yes)=3/9=0.33,P (Sunny)=5/14=0.36,P (Yes)=9/14=0.64$

Now, ​$$P (Yes/Sunny)=(0.33*0.64)/0.36=0.60$$​ this has higher probability

Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.

### Pros and Cons of Naive Bayes

#### Pros:

• It is easy and fast to predict class of test data set. It also perform well in multi class prediction
• When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.
• It performs well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

#### Cons:

• If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
• On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
• Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

### References

[1] Artificial Intelligence, “Foundations of Computational Agents”, available online at: http://artint.info/html/ArtInt_181.html

[2] “A practical explanation of a Naive Bayes classifier”, available online at: https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/

[3] Sunil Ray, “6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python)”, September 13, 2015, available online at: https://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/

[4]https://stats.stackexchange.com/questions/4949/calculating-the-error-of-bayes-classifier-analytically (image)

## One Comment

Insert math as
$${}$$