Python: Naive Bayes’

Naive Bayes’ is a supervised machine learning classification algorithm based off of Bayes’ Theorem. If you don’t remember Bayes’ Theorem, here it is:

bayes

Seriously though, if you need a refresher, I have a lesson on it here: Bayes’ Theorem

The naive part comes from the idea that the probability of each column is computed alone. They are “naive” to what the other columns contain.

You can download the data file here: logi2

Import the Data

import pandas as pd
df = pd.read_excel("C:\Users\Benjamin\Documents\logi2.xlsx")
df.head()

nb.jpg

Let’s look at the data. We have 3 columns – Score, ExtraCir, Accepted. These represent:

  • Score – Student Test Score
  • ExtraCir – Was Student in an Extra Circular Activity
  • Accepted – Was the Student Accepted

Now the Accepted column is our result column – or the column we are trying to predict. Having a result in your data set makes this a supervised machine learning algorithm.

Split the Data

Next split the data into input(score and extracir) and results (accepted).

y = df.pop('Accepted')
X = df

y.head()

X.head()

nb1.jpg

Fit Naive Bayes

Lucky for us, scikitlearn has a bit in Naive Bayes algorithm – (MultinomialNB)

Import MultinomialNB and fit our split columns to it (X,y)

from sklearn.naive_bayes import MultinomialNB
classifier = MultinomialNB()
classifier.fit(X,y)

nb2.jpg

Run the some predictions

Let’s run the predictions below. The results show 1 (Accepted) 0 (Not Accepted)

#--score of 1200, ExtraCir = 1
print(classifier.predict([1200,1]))

#--score of 1000, ExtraCir = 0
print(classifier.predict([1000,0]))

nb3

The Code

import pandas as pd
df = pd.read_excel("C:\Users\Benjamin\Documents\logi2.xlsx")
df.head()

y = df.pop('Accepted')
X = df

y.head()
X.head()

from sklearn.naive_bayes import MultinomialNB
classifier = MultinomialNB()
classifier.fit(X,y)

#--score of 1200, ExtraCir = 1
print(classifier.predict([1200,1]))

#--score of 1000, ExtraCir = 0
print(classifier.predict([1000,0]))

 

One thought on “Python: Naive Bayes’

  1. shruthi

    Hi when i run print(classifier.predict([1200,1])) i m getting the following error:

    C:\Users\shruthibattula\AppData\Local\Continuum\anaconda2\lib\site-packages\sklearn\naive_bayes.py in predict(self, X)
    64 Predicted target values for X
    65 “””
    —> 66 jll = self._joint_log_likelihood(X)
    67 return self.classes_[np.argmax(jll, axis=1)]
    68

Leave a Reply