SQL: Where Clause

The Where clause in SQL works as a filter. In this lesson we will be using MS SQL Server and Microsoft’s free database example Adventure Works 2012.


If you want to follow along with the exercises:

MS SQL Server installation instructions here: MS SQL Server: Installation

Instructions for loading Adventure Works: SQL: SELECT Statement


We are going to be using some selected fields from the table HumanResouces.Employee in the AdventureWorks2012 database.

sqlwelcome.jpg

Find single value

Let’s use Where to find the record with the BusinessEntityID of 6:


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where BusinessEntityID = 6


To find a single matching numeric field, use Where Field = Number

You can also use the following operators

  • = equals
  • > greater  than
  • < less than
  • != not

Look for String

To find a string value, we are going to use the Like keyword: Where Field like ‘String’


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where JobTitle like ‘Design Engineer’


Using a Wildcard

When trying to search by part of a string, we use the % wildcard.

Consider the sentence “Dogs love chasing cats”

  • ‘Dog%’ = “Dogs love chasing cats”
  • ‘%cats’ = “Dogs love chasing cats”
  • ‘%love%’ = “Dogs love chasing cats”

Below we use a wildcard to find all job titles that end in Engineer


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where JobTitle like ‘%Engineer’


Between

The Between keyword can used to find records that fall between 2 values


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where BusinessEntityID between 5 and 20


And, Or

This query returns all Married employees who are Female


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where MaritalStatus = ‘M’ and Gender = ‘F’


This returns all employees who are married or employees who are F regardless of martial status


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where MaritalStatus = ‘M’ or Gender = ‘F’


()

Parenthesis are used just like your remember from Algebra class. They state that the items inside must performed first.

This query returns all Married Males as well as all Tool Designers regardless of gender or marital status.

 


SELECT [BusinessEntityID]
,[JobTitle]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[SalariedFlag]
FROM [AdventureWorks2012].[HumanResources].[Employee]
Where (MaritalStatus = ‘M’ and Gender = ‘M’) or JobTitle like ‘Tool Designer’


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more SQL content: SQL

Python: Generators

I apologize in advanced as this topic is going to get a little computer sciency. Unfortunately there is no way around it. At first look, generators are going to resemble all the other iterables we have covered so far. But unlike loops we have used so far, generators produce iterables “lazily”.

So what do I mean by “lazily”?

Let’s consider this infinite loop I created below. This will run forever, producing a List (at least until your computer’s memory runs out). And that is the issue. Eventually the computer will run out of memory.

pythonGen

**note if you actually decide to run the code above, you will need to force it to stop. Just closing the notebook won’t do it. Go to File>Close and Halt to stop this loop from running and eventually crashing your PC.

pythonGen1

This becomes are consideration when working with large data sets. The more memory you chew up, the worse you performance. So a work around is to use generators. Generators produce the data lazily, meaning they produce the iterator, yield it, and then forget it – they don’t put the values into a List like regular iterators do. They only yield one iterator at a time.

Notice my continued use of the world yield? There is a reason for that. Look at the code for a generator below:

Note that generators have to be functions. The yield command – which signifies a generator,  cannot be used outside of a function.

pythonGen2.jpg

Try your code in our online Python console:  

Now, I know this looks like I am actually repeating work here. I mean I am using two loops to do what I could do with one loop. This issue arises when we have large numbers. If you are only iterating a few hundred numbers, you probably don’t need to worry about generators. However, if you are going to be iterating 100,000 elements, you will see some major performance improvements by using the code above.


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Last lesson: list comprehension

Next Lesson: regular expressions

Back to Python Course: Course

Python: Tuples and Sets

Tuples

Tuples are a grouping of data in Python similar to a list, except tuples are immutable, meaning you cannot add or delete items from the tuple.

Tuples are created using  () instead of []. A tuple with a single value still requires a comma though. x = (1,)

pythontuple

Notice attempting to delete an element from the tuple results in an error.

Try your code in our online Python console:  

Set

Sets are another method of hold data collections, but they have some interesting factors other methods do not. Sets are very useful as they only return unique elements for the data they store.

pythontuple2.jpg

Looking at the example below, notice how the second set has 4 elements, but the output is only 3. This is due to the fact that sets only return unique values.

You create a set using set() like above, or you can just use {}

pythontuple1.jpg


Try your code in our online Python console:  

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Last Lesson:  Working with dictionaries

Next Lesson: Conditional Logic

Back to Python Course: Course

Python: Working with Lists

Making a list in Python is simple: x = [1,2,3,4]

Now let us see how we can work with them.

Index

First thing to understand is that a list is indexed. Each item in the list is given a number that tells you its position in the list.

It is important to note that Python is a 0 index language. This means indexes begin at 0 not 1. So the first item in the list, is found by calling: x[0]

pythonlist

Note that x[4] returns an error. Since there are 4 items in the list, the indexes go 0,1,2,3. Index 4 is out of range.

Another interesting point to understand about indexes is that you can use a negative index. x[-2]  returns Duck

Try your code in our online Python console: 

Index Range

Use a “:” to return a range of items from a list: x[start:end]. If you leave out the start or end number, the index starts or ends at the start or end of the list

pythonlist1

Update Values in a List

If you want to change a value in a list, just assign it a new value like you would with a regular variable.

pythonlist2

Append()

If you want to add an item to then end of a list, you can use the Append() function

pythonlist3.jpg

Del

To delete a item from a list, use the Del command.

pythonlist4

Remove()

Remove works like Del, except instead of using index values, Remove() uses the values stored in the list.

pythonlist5.jpg

Try your code in our online Python console: 

Pop() and Insert()

Pop() simply returns the last item from a list

Insert() lets you add a item to a list, but it also lets you choose what position in the list to add it.

pythonlist6.jpg

Len()

Len() returns a count of the number of items in a list.

pythonlist7.jpg


Try your code in our online Python console: 

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Last Lesson: Python Lists and Dictionaries

Next Lesson: Working with Dictionaries

Back to Python Course: Course

Python: Intro to Graphs

Visualizations are big part of analytics. You will need to produce visually engaging graphics for presentations, reports, and dashboards. You will also make graphs for your own use in data discovery and analysis. As bonus, unlike data cleaning, data viz can be pretty fun.

Matplotlab and Pyplot

Matplotlab is a module you can import into Python that will help you to build some basic graphs and charts. Pyplot is part of Matplotlab and the part we will be using in the following example.

**If you are using the Anaconda Python distribution, Matplotlab is already installed. If not, you may need to download it from another source.

Line Graph

Syntax

  • %matplotlib inline – this code allows you to view your graphs inside jupyter notebooks
  • from matplotlib import pyplot as plt – here we import pyplot from matplotlib into our program (note, we only want pyplot not all the functions in matplotlib).Adding “as plt” gives us a shorter alias to work with
  • age and height lines – fill our lists with age and height information for an individual
  • plt.plot(age, height, color = ‘blue’) – here we tell Python to plot age against height and to color our line blue
  • plt.show() – prints out our graph

pythonGraphs

Bar Chart

For this example, we will make a bar charting showing ages of 4 people.

Syntax

  • You should understand the first few lines from the first example
  • count_names = [i for i,_ in enumerate(name)]  – since the name list is a list of strings, we cannot really graph that onto a chart. We need a way to convert these strings into numbers.

Wait? What does for i,_ mean? Let’s jump to the next code sample

pythonGraphs1

While you don’t see it when making the list, a Python list is technically a list of tuples (index number, element). So if instead of i,_ we asked for both elements in the tuple, (i,j) we would get the following.

pythonGraphs2

So by iterating by for i,_ we only return the first element in the tuple (the index)

** notice we are using a list comprehension. If you are unfamiliar with list comprehensions, check out my earlier post: Python: List Comprehension

Let’s clean up our bar chart a little now.

  • plt.ylabel(‘Age’) – label the y-axis Age
  • plt.title(‘Age of People’) – give the graph a title
  • plt.xticks([i+0.5 for i,_ in enumerate(name)], name) – this label function is using a list comprehension to first chose the position on the X-axis, and name provides the person’s name for the label.

pythonGraphs3.jpg


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT

Follow this link for more Python content: Python

R: Intro to Statistics – Central Tendency

Central Tendency

One of the primary purposes of statistics is to find a way to summarize data. To do so, we often look for numbers known as collectively as measurements of central tendency (mean, median,  mode).

Look at the list below. This is a list of weekly gas expenditures for two vehicles. Can you tell me if one is better than the other, or are they both about the same?

rCentral

How about if I show you this?

rCentral1

Using the average, you can clearly see car2 is more cost efficient. At approx $21 a week, that is a savings of $630 over the course of the 30 weeks in the chart. That is a big difference, and one that is easy to see. We can see this using one of the 3 main measures of central tendency – the arithmetic mean – popularly called the average.

Mean

The mean – more accurately the arithmetic mean – is calculated by adding up the elements  in a list and dividing by the number of elements in the list.

rCentral2.jpg

In R, finding a mean is simple. Just put the values you want to average into a vector (**note to make a vector in R: var  <-c(x,y,z)) We then put the vector through the function mean()

rCentral3

Median

The median means, simply enough, the middle of an ordered list. This is also easy to find using R. As you can see, you do not even need to sort the list numerically first. Just feed the vector to the function median()

rCentral4.jpg

Mode

This is the last of 3 main measure. Mode returns the most common value found in a list. In the list 2,3,2,4,2 – the mode is 2. Unfortunately R does not have a built in mode function, so for this, we will have build our own function.

For those familiar with functions in programming, this shouldn’t be too foreign of a concept. However, if you don’t understand functions yet, don’t fret. We will get to them soon enough. For now, just read over the code and see if you can figure any of it out for yourself.

rCentral5

 

 

 

Python: List Comprehension

List comprehensions are a method for taking a list of elements and performing a transformation on each element.

In our first example, we want to take numbers 0-9, square them, and have the result end up in a list.

ln[5] shows how you would perform this task using for a loop to iterate.

ln[1] does the same thing, but it does it in one line.

 

pythonListcomp

Try your code in our online Python console:  

As you can see, the list comprehension basically crunches the for loop into one line. The syntax is simple enough:

S            =  [x**2 for x in range(10)]

Assign a variable = [operation for loop]

Find even numbers in a list:

Here we add an if statement inside the iteration. (x%2 is modulus, meaning it returns remainder. So 4%2 returns a remainder of 0 and 5%2 returns are remainder of 1)

pythonListcomp1.jpg

You can use list comprehensions with more complex formulas:

pythonListcomp3

You can even use functions from within a list comprehension

pythonListcomp4.jpg

Try your code in our online Python console:  


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT

Last Lesson: Zip and unpack

Next Lesson: Generators

Back to Python Course: Course