Site icon Analytics4All

Python: Logistic Regression

Advertisements

This lesson will focus more on performing a Logistic Regression in Python. If you are unfamiliar with Logistic Regression, check out my earlier lesson: Logistic Regression with Gretl

If you would like to follow along, please download the exercise file here: logi2

Import the Data

You should be good at this by now, use Pandas .read_excel().

df.head() gives us a the first 5 rows.

What we have here is a list of students applying to a school. They have a Score that runs from 0 -1600,  ExtraCir (extracurricular activity) 0 = no 1 = yes, and finally Accepted 0 = no 1 = yes

Create Boolean Result

We are going to create a True/False column for our dataframe.

What I did was:

What are we modeling?

The goal of our model is going to be to predict and output – whether or not someone gets Accepted based on some input – Score, ExtraCir.

So we feed our model 2 input (independent)  variables and 1 result (dependent) variable. The model then gives us coefficients. We place these coefficients(c,c1,c2) in the following formula.

y = c + c1*Score + c2*ExtraCir

Note the first c in our equation is by itself. If you think back to the basic linear equation (y= mx +b), the first c is b or the y intercept. The Python package we are going to be using to find our coefficients requires us to have a place holder for our y intercept. So, let’s do that real quick.

 

Let’s build our model

Let’s import statsmodels.api

From statsmodels we will use the Logit function. First giving it the dependent variable (result) and then our independent variables.

After we perform the Logit, we will perform a fit()

The summary() function gives us a nice chart of our results

If you are a stats person, you can appreciate this. But for what we need, let us focus on our coef.

remember our formula from above: y = c + c1*Score + c2*ExtraCir

Let’s build a function that solves for it.

Now let us see how a student with a Score of 1125 and a ExCir of 1 would fair.

okayyyyyy. So does 3.7089 mean they got in?????

Let’s take a quick second to think about the term logistic. What does it bring to mind?

Logarithms!!!

Okay, but our results equation was linear — y = c+ c1*Score + c2*ExCir

So what do we do.

So we need to remember y is a function of probability.

So to convert y our into a probability, we use the following equation

So let’s import numpy so we can make use of e (exp() in Python)

Run our results through the equation. We get .97. So we are predicting a 97% chance of acceptance.

Now notice what happens if I drop the test score down to 75. We end up with only a 45% chance of acceptance.


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

 

 

 

 

 

Exit mobile version