Naive Bayes Classification


In this blog, I will explain Naive Bayes Classifier through an example and I also provide the source code.

1. Concept

Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable.

For example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of any possible correlations between the color, roundness, and diameter features. For more introduction about Naive Bayes Classifiers please visit wiki.

2. Introduction

We assume a dataset is illustrated as following.

  • The red points belong to class A.
  • Green points belong to class B.
    We use PA(x,y) to describe the possibility of point (x,y) belonging to class A, PB(x,y) to describe the possibility of point (x,y) belonging to class B. if there is a new point, we predict its class using the following strategy:
  • If PA(x1,y1)>PB(x2,y2), then point(x1,y1) belongs to class A.
  • If PA(x1,y1)<PB(x2,y2) then point (x2,y2) belongs to class B.
    This means that the point will be classified to the class with higher probability.

In bayes classification, we usually use the following equation to calculate condition probability.

Where:

  • p(ci) means the frequency of class I appears in prior knowledge.
  • p(w|ci) means the frequency of word wi appears in class i.
  • p(w) means the frequency of word wi appears in all classes.
  • p(ci|wj) means the probability that this sentence belongs to class i if word wj appears in the sentence.

3. Example

In the following, I will take an example to help have a better understanding on naïve bayes classification.
In the sentence “this dog is very cute”, the word vector is
w=[“this”,”dog”,”is”,”very”,”cute”].
The frequency of each word in this sentence is

word this dog is very cute
frequency 0.2 0.2 0.2 0.2 0.2

We assume in some dataset, the frequency of these words p(wi) is:

word this dog is very cute
frequency 0.1 0.2 0.5 0.3 0.1

In class I, the frequency of these words p(w|ci) is:

word this dog is very cute
frequency 0.3 0.2 0.5 0.3 0.1

In the dataset, the frequency of class I p(ci) appearing is 0.4.
Then in this sentence, word “this” appears and the possibility of it belonging to class I is


Other possibility could also be calculated by this way.

Reference

Book: Matching Learning in Action.

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容