Statistics: Range, Variance, and Standard Deviation

On April 29, 2016April 29, 2016 By Ben Larson Ph.D.In statisticsLeave a comment

Measuring the spread can be a useful tool when analyzing a set of numbers. Three common measures of spread of range, variance, and standard deviation.

Here is the data set we will be working with: [2,4,6,7,8,10,15,18,22,26]

Range

Range is the simplest of the three measures. To find the range, all you need to do is subtract the smallest number in the set from the largest number

range = large-small

range = 26-2 = 24

Variance

Variance is created by taking the average of the squared difference between each value in the set minus the mean. We square the differences so that values above and below the mean do not cancel each other out.

Let’s find the mean:

statSpread

If you haven’t seen the x with the line over it before, this is referred to as x bar and it is used to represent the mean.

To find the variance you take the first number in your set, subtract the mean and square the result. You repeat that for each number in your list. Finally you add up all the results and divide by n (the number of items in your list)

ex – ((2-12)^2 + (4-12)^2 +….+(26-12)^2) / 10

statSpread1

variance = 58.56

Now I know what you are thinking, how can the average distance from the mean be 58.56 when the furthest point from the mean (26) is only 14? This is because we are squaring the differences. To get a number more in line with the data set, we have another measure called the standard deviation.

Standard Deviation

The standard deviation returns a value more in-line with what you would expect based on your data. To find the standard deviation – simply take the square root of the variance.

std dev = 7.65

Population vs Sample

The equations above work great if you have the entire population. What I mean by that is, if your data contains all the data in the set. Using our data, imagine if the numbers were ages of children in a large family. If there are 10 kids in the family, then I have all the ages, so I am dealing with the population.

However, if we instead have sampled 10 random ages from all the kids in a large extended family where the total number of kids is 90. In this case, since we are looking at 10 out of 90, we are not dealing with the population, but the sample.

When working with the sample, you need to make an adjustment to your variance and standard deviation equations. The change is simple. Instead of dividing by n you will now divide by n-1. This offset makes up for the fact you do not have all the data.

statSpread2

Python: Create, Import, and Use a Module

On April 28, 2016July 22, 2022 By Ben Larson Ph.D.In PythonLeave a comment

Today we are going to cover what I consider a flaw in the iPython Notebook environment. Code you write cannot be imported into another program like you can using just a standard Python compiler. At least it cannot under its native .ipynb format.

For this example, we are going to use the Spyder IDE that comes with Anaconda. You can use any IDE you want, in all honesty, you can use a simple notepad editor to do this if you don’t have Spyder installed.

Create a Module

I am going to create a little module called math_bl.py. This module contains 3 simple functions: sqr2, sub2, and add2

In my instance, my iPython notebooks are stored under my user directory. If you are not sure where you notebooks are, you can use the pwd command (print working directory). This will tell you where to save your math_bl.py file to.

** note the double \\ are due to a formatting function in Python. \ is a break command that can be used for many purposes. Example \t means to insert a tab.

So effectively you have to use \\ if you want a \ in Python, making my working directory C:\Users\Benjamin

Import the Module

Now if I open up a new notebook and try using the functions I have just created, I will error out.

So, what I need to do is import add2, then I can start calling on the functions I created in add2

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT.

Next Lesson:

Return to: Python for Data Science Course

Follow this link for more Python content: Python

SQL: Intro to Joins

On April 28, 2016April 28, 2016 By Ben Larson Ph.D.In MS SQL Server, Uncategorized1 Comment

More often than not, the information you need does not all reside in one table. To query information from more than one table, use the Join command.

In our example, we are going to be using AdventureWorks2012. If you want to follow along, but do not have SQL Server or AdventureWorks2012 installed, click in the following links:

Install SQL Server: MS SQL Server: Installation
Load AdventureWorks2012: SQL: SELECT Statement

We are going to be using the following Tables for our Join

Notice they both have a matching field – BusinessEntityID – This field will be important when creating a join.

sqlJoin7

We can call up both tables individually through separate SQL Queries

sqljoin9

However, to combine the two tables into a single result, you will need a Join.

Inner Join

SQL uses Inner Joins by default. In this lesson, I am only going to focus on inner joins. Inner joins work by focusing on columns with matching information. In the example below, the columns with matching information are the Name columns.

The combined result only contains rows who have a match in both source tables. Notice that Sally and Sarah do not appear in the result table.

The syntax is as follows:

Select *
from Table A join Table B
on TableA.Match = TableB.Match

In our example, the matching column is BusinessEntityID in both tables.

To clean the code up a little bit, we can assign aliases to our tables using the “as” keyword. We are using E and S as our aliases. We can now use those aliases in our “on” clause.

sqlJoin12

And, just as in a regular select statement, we can choose which columns we want. You do however, have to identify which table the columns you are requesting are from. Notice how I did this using the E and S aliases.

MS SQL Server: Database Diagrams

On April 28, 2016April 28, 2016 By Ben Larson Ph.D.In MS SQL Server1 Comment

MS SQL Server comes with a function called Database Diagrams. You can use it to produce ERDs (Entity Relationship Diagrams).

Start by right clicking on Database Diagrams > New Database Diagram in the Object Explorer

sqlJoin

If you get the following error, you database does not have an owner assigned.

To assign an owner, right click on your database name and select Properties

sqljoin2

Under Select a page > Files – By Owner select … radio button> Browse

sqlJoin3

Select your user name from the list and then hit Okay until you are back to the main screen.

Right click Database Diagrams > New Database Diagram

sqlJoin

Click Yes

sqlJoin5

Now select some tables. In this example, I am choosing Employee and SalesPerson

sqljoin6

The diagram shows the tables you picked, and the data fields. It also shows connections.

Notice the tables do not all connect directly with each other. Another point to take note of is the small yellow keys located in from of some fields. These let you know that those fields are the primary key(s) for their respective table. The primary key is the unique identifier for the table.

sqlJoin8

Python: Error handling

On April 27, 2016July 25, 2022 By Ben Larson Ph.D.In Python, Python: Learn Python Course1 Comment

No matter how well crafted your code, you will inevitably come across an error every once in a while. Unfortunately, Python’s default setting when catching an error is to crash the program.

In the example below, I accidentally asked for a letter, when the code int(input()) is looking for an integer.

So when I enter a string, I get an error

Try your code in our online Python console:

Try

Using Try, we can handle such mishaps in a more elegant fashion

The syntax is a follows:

try:
        Your Code
except: 
         What to do if your code errors out
else:
         What to do if your code is successful

And as you can see from above, it doesn’t just protect against code errors, but it protects against user errors as well.

Using a while loop, we can give the user another chance if they don’t enter a number the first time.

Code explanation

while True: – starts an infinite loop
continue – returns to the beginning of the while loop
break – exits the loop

Try your code in our online Python console:

Finally

You can add the finally: command to the end of a try: statement if you have something you want to execute regardless of whether there was an error or not.

Try your code in our online Python console:

Exception types:

You can also determine what your program does based on the type of error

ValueError

ZeroDivisionError

No Error

Try your code in our online Python console:

A list of Exception types can be found here: Link

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT.

Last Lesson: Enumerate() and Sort()

Next Lesson: Lambda, map, reduce, filter

Back to Python Course: Course

Python: Print Variables and User Input

On April 27, 2016July 25, 2022 By Ben Larson Ph.D.In Python2 Comments

Print

The basic Print statement in Python is Print (“”) – note Print (”) also works. You can choose either single or double quotes, but you cannot mix them in the same statement.

Print Variables

Printing variables by themselves is pretty straight forward:

Screenshot 2022-07-03 211010

Printing variables in line with some text is a little more complicated:

If the variable you want to print is a number, use the special string %d inside the quotes and % variable outside the quotes.

Screenshot 2022-07-04 203014

If you have more than one variable, you need to have a %d for each variable. Then place your variables inside () at the end of the print line.

Screenshot 2022-07-04 203207

To print a string, you use %s. Notice below you can mix strings and numbers in the same line.

Screenshot 2022-07-03 213335

You can use %r if you are not sure what value the variable will have. It will print both.

Try your code in our online Python console:

User Input

input() allows you ask the user for input.

You can assign the user input to a variable.

You can also pre-define what kind of input you want.

Screenshot 2022-07-03 213628

If I enter a string I get an error

Screenshot 2022-07-03 213915

But If I enter an integer, I do not get the error.

Screenshot 2022-07-03 214240

Try your code in our online Python console:

Previous Page: Printing and Variables

Next Page: Print with .format{}

Back to Python Course: Course

Excel: Scatter Plots

On April 27, 2016 By Ben Larson Ph.D.In excel1 Comment

Download worksheet here: ExcelCharts2

Our data set contains 3 sets of readings for 4 sensors (A,B,C,D)

excelCharts1

Scatter Plot

Reading 1

Highlight Columns A and B – From Ribbon > Insert>Scatter

Here is a close up of the Scatter Plot icon

Here is our plotting of Reading 1

excelCharts4

Reading 2

To add the Reading 2 column to the plot, right click on the chart area and Select Data

excelCharts5

Select Add

excelCharts6

Select the Heading from Column C for Series Name
A2:A5 as Series X values
C2:C5 as Series Y values

Reading 3

Repeat again for Reading 3

excelCharts8

Now, double click on the Y-Axis and go to the Format Axis box that will appear on your right.

Select the Bar Chart Icon, and change the Axis Bounds to minimum of 40 and max of 100.

This will help make the spacing between dots more pronounced. Generally altering a Y Axis away from 0 is considered bad taste as it tends to over-pronounce differences between elements.

excelCharts10

Now click on X-Axis and make the changes below.

excelCharts11

Next, go to Add Chart Element in the upper right corner. Legend>Right

excelCharts12

Here is our scatter plot

excelCharts13

Now wait a minute. I know what you are thinking, why is my X Axis 1 – 4 and not A,B,C,D

This is a flaw in Excel. There are some third party packages you can install that will allow you to rename the X Axis in a XY Scatter plot, but not with Excel in its default state.

You can however change the chart type to a Line Chart.

Go to Ribbon > Change Chart Type

Select Line

excelCharts14

Now your X-Axis is properly labeled

SQL: Aggregates, Group By, and Having

On April 26, 2016 By Ben Larson Ph.D.In MS SQL ServerLeave a comment

The Where clause is great. It helps you filter out items from a table, leaving only the records you want. But sometimes you are not looking for the actual records. Instead, sometimes you only want to know how many records.

In this example, I will using the [HumanResources].[EmployeeDepartmentHistory] from AdventureWorks2012:

Count and Group By

The Count(*) function is used to count the number of rows.

Now, when we combine count(*) with a Group By clause, we can count how many records exist for each element in our group by clause.

All non aggregate columns in the select statement need to be in the group by function or you will get an error.

The code below counts the amount of rows containing each departmentID

sqlAggr1

Having

Run the following query:

select BusinessEntityID, count(*) as amt
from [HumanResources].[EmployeeDepartmentHistory]
group by BusinessEntityID

You will get a list 290 records long. Most of these records will have a 1 in the amt column (meaning there is only one records containing the BusinessEntityID referenced)

If you want to know how many BusinessEntityIDs in this table have more than one record, we just have to add the Having clause to our query.

sqlAggr2

having can only be used in conjunction with aggregate functions.

Min(), Max()

Min and Max return the minimum and maximum of whatever column you you place in the parenthesis

sqlAgg4

Python: Line Graph

On April 26, 2016April 26, 2016 By Ben Larson Ph.D.In Python, VizualizationLeave a comment

Let’s build on what we learned in : Python: Intro to Graphs

First, import pyplot from matplotlib

Remember %matplotlib inline let’s you see your graphs in the jupyter notebooks

pythonLine

Line 1

Note I am using a list comprehension to fill the x axis.

pythonLine1

Now, let us do some formatting:

‘r:’ – red dotted line
label = ‘up’ – adds label to the line
py.legend(loc=9) – adds a legend to the chart

pythonLine2

Line 2

For line 2, I want to create a reverse curve:

d = u – list u is copied to d
d.reverse() – reverses values in d

pythonLine3

Plot the line.

‘g-.’ – green dashed line

pythonLine4

Combine the two lines

pythonLine5

Line 3

Here I use another list comprehension and zip to make our last like. This adds each element of list u and d in order.

‘b-‘ – blue solid line

pythonLine6

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT.

Follow this link for more Python content: Python

Python: Pandas Intro (Dataframes)

On April 25, 2016July 28, 2022 By Ben Larson Ph.D.In Python1 Comment

Before we continue onto Dataframes, I want to clear up something from the Series exercise. Note the line (from pandas import Series, DataFrame)

pandas1

Using that line, I can call upon Series or DataFrame directly in my code

pandas2

In this example below, I did not directly import the methods Series and DataFrame, so I when I tried x = Series() I go an error.

I had to use the full method name of pd.Series() for this to work.

pdDF

DataFrame

DataFrames provide another level of data management for Python. Those of you who come from a more data driven background with appreciate DataFrames.

Let’s start by creating dictionary.

Now, pass the dictionary to the method DataFrame()

Note, now you have a table looking structure with named columns

pdDF2

You can call up a list of indexes or columns using the methods below:

DataFrame.info() will return a summary of your DataFrame

pdDF4

Head and Tail

Create a new DataFrame from a dictionary

If you want just see a few of the first elements, you can use the head() method

pdDF6

The tail() method does the last few. You can even choose how many rows you want.

pdDF7

Describe

The describe() method gives you some quick statistics on any numeric column

Slice

You can slice a DataFrame just as you would a list

pdDF9

Choose What to Display

DataFrames allow you to filter what rows to display by value or column

pdDF10

There is a lot more you can do with Series and DataFrame in pandas, and we will be covering them in later lessons. For now though, I think you have a general idea.

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT.

Last Lesson: Pandas Series

Next Lesson: Working with DataFrames

Return to: Python for Data Science Course

Follow this link for more Python content: Python

	Vaibhav on Data Modeling
	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…

	Vaibhav on Data Modeling
	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…

	Vaibhav on Data Modeling
	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…

	Vaibhav on Data Modeling
	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…

	Vaibhav on Data Modeling
	Anonymous on Python: Accessing a SQL databa…
	Anonymous on Top 7 skills a Data Analyst ha…
	lovingfox4e1d0e653e on Data Jobs: What does a Data An…
	Anonymous on Top 7 skills a Data Analyst ha…

Analytics4All

Month: April 2016

Statistics: Range, Variance, and Standard Deviation

Range

Variance

Standard Deviation

Population vs Sample

Python: Create, Import, and Use a Module

Create a Module

Import the Module

SQL: Intro to Joins

Inner Join

MS SQL Server: Database Diagrams

Python: Error handling

Try

Finally

Exception types:

Python: Print Variables and User Input

Print

Print Variables

User Input

Excel: Scatter Plots

Scatter Plot

SQL: Aggregates, Group By, and Having

Count and Group By

Having

Min(), Max()

Python: Line Graph

Line 1

Line 2

Line 3

Python: Pandas Intro (Dataframes)

DataFrame

Head and Tail

Describe

Slice

Choose What to Display