Naive Bayes Classification

In this blog, I will explain Naive Bayes Classifier through an example and I also provide the source code.

1. Concept

Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable.

For example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of any possible correlations between the color, roundness, and diameter features. For more introduction about Naive Bayes Classifiers please visit wiki.

2. Introduction

We assume a dataset is illustrated as following.

The red points belong to class A.
Green points belong to class B.
We use P_A(x,y) to describe the possibility of point (x,y) belonging to class A, P_B(x,y) to describe the possibility of point (x,y) belonging to class B. if there is a new point, we predict its class using the following strategy:
If P_A(x1,y1)>P_B(x2,y2), then point(x1,y1) belongs to class A.
If P_A(x1,y1)<P_B(x2,y2) then point (x2,y2) belongs to class B.
This means that the point will be classified to the class with higher probability.

In bayes classification, we usually use the following equation to calculate condition probability.

Where:

p(c_i) means the frequency of class I appears in prior knowledge.
p(w|c_i) means the frequency of word w_i appears in class i.
p(w) means the frequency of word w_i appears in all classes.
p(c_i|w_j) means the probability that this sentence belongs to class i if word w_j appears in the sentence.

3. Example

In the following, I will take an example to help have a better understanding on naïve bayes classification.
In the sentence “this dog is very cute”, the word vector is
w=[“this”,”dog”,”is”,”very”,”cute”].
The frequency of each word in this sentence is

word	this	dog	is	very	cute
frequency	0.2	0.2	0.2	0.2	0.2

We assume in some dataset, the frequency of these words p(w_i) is:

word	this	dog	is	very	cute
frequency	0.1	0.2	0.5	0.3	0.1

In class I, the frequency of these words p(w|c_i) is:

word	this	dog	is	very	cute
frequency	0.3	0.2	0.5	0.3	0.1

In the dataset, the frequency of class I p(c_i) appearing is 0.4.
Then in this sentence, word “this” appears and the possibility of it belonging to class I is

Other possibility could also be calculated by this way.

Reference

Book: Matching Learning in Action.

Naive Bayes Classification

1. Concept

2. Introduction

3. Example

Reference

推荐阅读更多精彩内容