Python: Logistic Regression

This lesson will focus more on performing a Logistic Regression in Python. If you are unfamiliar with Logistic Regression, check out my earlier lesson: Logistic Regression with Gretl

If you would like to follow along, please download the exercise file here: logi2

Import the Data

You should be good at this by now, use Pandas .read_excel().

df.head() gives us a the first 5 rows.

What we have here is a list of students applying to a school. They have a Score that runs from 0 -1600,  ExtraCir (extracurricular activity) 0 = no 1 = yes, and finally Accepted 0 = no 1 = yes

logi1

Create Boolean Result

We are going to create a True/False column for our dataframe.

What I did was:

  • df[‘Accept’]   — create a new column named Accept
  • df[‘Accepted’]==1  — if my Accepted column is 1 then True, else False

logi1

What are we modeling?

The goal of our model is going to be to predict and output – whether or not someone gets Accepted based on some input – Score, ExtraCir.

So we feed our model 2 input (independent)  variables and 1 result (dependent) variable. The model then gives us coefficients. We place these coefficients(c,c1,c2) in the following formula.

y = c + c1*Score + c2*ExtraCir

Note the first c in our equation is by itself. If you think back to the basic linear equation (y= mx +b), the first c is b or the y intercept. The Python package we are going to be using to find our coefficients requires us to have a place holder for our y intercept. So, let’s do that real quick.

logi2

 

Let’s build our model

Let’s import statsmodels.api

From statsmodels we will use the Logit function. First giving it the dependent variable (result) and then our independent variables.

After we perform the Logit, we will perform a fit()

logi3.jpg

The summary() function gives us a nice chart of our results

logi4.jpg

If you are a stats person, you can appreciate this. But for what we need, let us focus on our coef.

logi45.jpg

remember our formula from above: y = c + c1*Score + c2*ExtraCir

Let’s build a function that solves for it.

Now let us see how a student with a Score of 1125 and a ExCir of 1 would fair.

logi9

okayyyyyy. So does 3.7089 mean they got in?????

Let’s take a quick second to think about the term logistic. What does it bring to mind?

Logarithms!!!

Okay, but our results equation was linear — y = c+ c1*Score + c2*ExCir

So what do we do.

So we need to remember y is a function of probability.

logis1

So to convert y our into a probability, we use the following equation

logis2

So let’s import numpy so we can make use of e (exp() in Python)

logi8.jpg

Run our results through the equation. We get .97. So we are predicting a 97% chance of acceptance.

logi10.jpg

Now notice what happens if I drop the test score down to 75. We end up with only a 45% chance of acceptance.

logi11.jpg


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

 

 

 

 

 

4 thoughts on “Python: Logistic Regression

    1. Thanks for the comment.

      I will go back and edit the page to clear this up. But for now, here is a brief explanation. p = probability. p/(1-p) is a well known equation called the odds ratio. Taking the natural log (ln) of the odds ratio gives us the logit function. The logit function (which is actually the inverse of logistic function when you graph it) has a special property in regression. It can be used to link our linear function: y = mx + b with our probability (p).

      Like

Please Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s