python statistics – Analytics4All

Hypothesis testing is a first step into really understanding how to use statistics.

The purpose of the test is to tell if there is any significant difference between two data sets.

Consider the follow example:

Let’s say I am trying to decide between two computers. I want to use the computer to run advanced analytics, so the only thing I am concerned with is speed.

I pick a sorting algorithm and a large data set and run it on both computers 10 times, timing each run in seconds.

Now I put the results into two lists. A and B

a = [10,12,9,11,11,12,9,11,9,9]
b = [13,11,9,12,12,11,12,12,10,11]

A quick look at the data makes me think b is slower than a. But is it slower enough to mean something or are these results just a matter of chance (meaning if I ran the test 200 more times would the end result be closer to equal or further apart).

Hypothesis test

To find out, let’s do a hypothesis test.

Set our Hypothesis:

H0 = H1 – there is no significant difference between data sets
H0 <> H1 – there is a significant difference

To test our hypothesis, let’s run a t-test

import stats from scipy and run stats.ttest_ind().

Our output is the z-statistic and the p-value.

Our p-value is 0.08 – greater than the common significance value of 0.05. Since it is greater, we cannot reject H0=H1. This means both computers are effectively the same speed.

hypoTest

Let’s try a third computer – d

d = [13,12,9,12,12,13,12,13,10,11]

Now, let’s run a second T-test. This one comes back with a p-value of 0.026 – under 0.05. This means we can reject our hypothesis that a=d. The speed differences between a and d are significant.

hypoTest1

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

Analytics4All

Tag: python statistics

Python: Hypothesis Testing(T Test)

Hypothesis test