Excel: Intro to Charts

Excel makes building basic charts easy.

Download sample data set here: ExcelCharts1

The Data

Our sample data set here is simple enough. We have 5 sensors (A,B,C,D,E) and one reading from each sensor.

excelChart

Column Chart

Let’s start with a column chart. To create a column chart, first select (highlight) your data set and then go to Insert>Column Chart in the Ribbon Bar.

excelChart1

From the column chart options, I am going to pick the first 2D chart option

excelChart2

excelChart3

Once a chart is created, you will notice a new element – Chart Tools – in the Ribbon Bar.

The first thing I want you to explore is Chart Styles. Click on the different options and see what happens.

excelChart4

Next, let’s try changing the colors:

excelChart5.jpg

Now go to Add Chart Element

Go to Axis Titles>Primary Horizontal

excelChart6

Now click on the Axis Title and rename it.

excelChart7

Label this Axis “Sensors”

excelChart8

Now select Primary Vertical and name it “Readings”

excelChart9

Now to change the chart title. Click on the title:

excelChart10

Rename it “Sensor Readings”

excelChart11


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Excel content: Excel

Advertisements

Python: Pandas Intro (Series)

Pandas adds some great data management functionality to Python. Pandas allows you to create series and dataframes.

Series

Think of a series as combination of a list and a dictionary.

To use pandas, first we need to import it.

pandas1

To create as series with pandas, use the following syntax

  • variable = Series([item1, item2, … item_n])

** note Series is capitalized. This method is case sensitive

pandas Series can contain numbers or strings. You call elements in a Series like you do a list. variable[index]

pandas2

You can create you own index in a Series as well, making it function a lot like a dictionary.

Notice with the syntax leg[leg==4] I am able to filter by items equal to 4

pandas3

If you have an existing dictionary, you can convert it into a pandas Series

pandas4

You can call the index and values of a Series using the following methods:

pandas5


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

 

 

Excel: Intro to Pivot Tables

Pivot Tables are a built in feature to Excel that allows for better data analysis. Using Pivot Tables, you can work with aggregates with ease.

Download this Excel file to follow along: studGrades

Let’s look at the data:

This data set represents Grades for 5 students across four quarters.

pivot1

Trying to calculate overall averages for each student or each class would require a lot of work using just straight spreadsheet functions. Pivot Tables make such tasks a breeze though.

Create a Pivot Table

Highlight you data set and go up to the Ribbon Bar > Insert>Pivot Table

pivot2

We are just going to hit OK here. But if you want to know a bit more about the window:

  • Choose the data that you want to analyze: notice the table range is A1 through F20 on the Courses tab. This is our data set.  – Use an external data source lets you grab your data from another program if you want
  • Choose where you want the PivotTable report to be place. – Default New Worksheet opens the table in a new sheet (our option) – Existing Worksheet allows you to place the table on an existing page
  • Choose whether you want to analyze multiple tables – this is more advanced. This lets you take multiple tables and join them together for analysis.

pivot3

After you click okay, Excel opens up your new Pivot Table

pivot4

Average Student Grade for Q1

Drag Stud_ID down to the ROWS box and Q1 down to Values

pivot5

Now we don’t want a Sum, we want an Average. So click on the down arrow by Sum of Q1 in VALUES. Select Value Field Settings…

pivot6

Now select Average and click OK

pivot7

Now you have averages in your table

pivot8

Average grades in classes

To find averages for classes, drag Stud_ID out of ROWS and drag Class into ROWS

pivot9

Now you have average grades for classes.

pivot10

The other 3 quarters

Just drag Q2, Q3, and Q4 into VALUES. Note they will most likely default as SUM, so you will need to change them to Average same way you did above.

pivot11

pivot12


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Excel content: Excel

Excel: Data Cleaning

Data cleaning is one of the least enjoyable tasks for data professionals. Unfortunately, most data scientists I have spoken to state that data cleaning makes of 60% to 70% of their work load.

While there is no magic data cleaning machine I am aware of, there are some tips and tricks that you can learn to save you a lot of time and head aches. I have put together a 2 part video series using Excel that shows some of the most common data cleaning tasks:

  • Handling Duplicates
  • Spelling Errors
  • Missing Data
  • Converting Data
  • Merging Data Sets

I even have the Excel worksheet available for download so you can work along with the videos.

Download the file here: Data Cleaning

Videos

Below, you will find the links to my videos (video opens in new tab)

Excel: Data Cleaning with Excel Part 1 

Excel: Data Cleaning with Excel Part 2


If you enjoyed this lesson, click LIKE below, or better yet, leave me a COMMENT. 

Follow this link for more Excel content: Excel

Python: Numpy

First off, CONGRATS for making it this far. Numpy really signifies the first step in real data science with Python.

Numpy is a library that adds advanced mathematical capabilities to Python. If you are using the Anaconda release of iPython, you already have numpy installed. If not, you will need to go over to their site and download it. Link to numpy site: www.numpy.org

Numpy adds multi-dimensional array capabilities to Python. It also provides mathematical tools for dealing with these arrays.

Array

What is an array? In different programming languages, arrays are equivalent to lists in Python. In fact, a single dimension array in Python can be used interchangeably with lists in most cases.

What makes arrays special in Python is the ability to make multidimensional arrays. Those who have taken math courses like Linear Algebra will better know multidimensional arrays by the term matrix. And if you paid attention in Linear Algebra, you will know there is a lot of cool things you can do with matrices.

For those who didn’t take Linear Algebra (you probably went out and made friends and had a social life), don’t worry, I will fill you in on what you need to know.

Numpy

Let’s start by creating a single dimension array

  • import numpy as np   — imports numpy into your notebook, we set the alias to np
  • x = np.array([8,7,4])  — array is a method of np, so the syntax is np.array. **note the [] inside the ()
  • x[1] = you call array just like lists
  • x.shape = shows you shape of your array, since we are single dimension, the answer is a single number – **note 3L – the L indicates integer in Python

pythonnumpy.jpg

Now let’s make a 2 x 3 matrix. When describing a matrix (or multi-dim array) the first number indicates the number of elements across and the second number indicates down. This is standard mathematical notation.

You can see I create a new row by enclosing the number sets in [] separated by a ‘,’

Now when want to call an item from the multi-dim array, you need to use 2 index numbers. y[0,1] = y[row, column]

pythonnumpy2

np.arange()

np.arange() is a command we can use to auto populate an array.

  1. np.arang(5) –create a one – dim array with 5 elements
  2. np.arange(10).reshape(2,5) – reshape lets you convert a one-dim array into a multi-dim array

pythonnumpy4

Transpose

Transposing a matrix means flipping it on its axis. This comes in handy in applications like multi-variate regressions. To transpose our array in Python, we use the “.T” method

pythonnumpy5.jpg

dot Product

Performing a dot product on a matrix is a useful calculation, but it is very time consuming. Numpy makes it easy though.

pythonnumpy6.jpg

If you need a brush up on dot products, this is a great link: matrix multiplication


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

 

Excel: Convert Numbers from Text

Below is the link to an actual file I was emailed today:

Excel File: Scores Won’t Sum

This is a list of scores of test participants. Obviously all identifying information has been stripped from the file. I just want you to see that this is a real world example, not some silly made up sample.

This file was sent to me asking if I could return a pivot table showing the averages of all the grades. Note that not only are there multiple columns, but notice the ID field. One individual can represent multiple rows depending on how many courses they are taking.

grades

Now figuring this out by hand (especially considering the original file had over 500 rows) would be a pain in the rear, but it should be no issue for a Pivot Table.

But look what happens when I try to average S1 against ID in a Pivot Table

grades1.jpg

So I go back to the main sheet and try performing a SUM function on the first Row. This results in 0 as an answer.

grades2.jpg

I try my hidden value trick: Numbers don’t add up in Excel

But still nothing. Then I remember about Paste Special

To fix the problem, go find an empty cell and type 1. Make sure to take note of the placement of your 1 in the cell. It should hug the right cell wall. If it justifies left, Excel is seeing it as text – try another cell.

Also check that your cell is marked as General in the drop down menu on the Number portion of the Ribbon bar.

grades3

Copy the cell with 1 in it. Highlight the columns you are having trouble with – right click and select Paste Special

grades4.jpg

Click Multiply and Okay – this multiplies all the highlighted cells by 1

grades5.jpg

Notice my Sum function now works

grades6

You will notice that empty fields are now populated with 0. This isn’t an issue with summation, but it will give you bad readings on an average. I chose to get rid of those using an If statement. I inserted a column and ran the following statement.

=if(C2=0,””,C2)

(if you need a primer on if statements – follow this link: Excel: =If() – Conditional Formulas )

This states that if C2 = 0 replace with an empty space “”, else just populate C2

After that is done, I copy my new column, and Paste Special – Values  over the old column.

Now I can repeat for each column. When done, delete your inserted column.

grades7.jpg

Now my Pivot Table looks much better.

grades8

 

 

Python: **Kwargs and *Args

Outside of beings incredibly fun to say, Kwargs and Args can be pretty useful.

Here is how they work. Let us say you wanted to make a function that adds 4 numbers:

Notice what happens when we only provide 3 arguments, we error out. The same thing happens if we try to give the function 5 arguments.

pythonkwa

Well, if the world perfectly predictable, this wouldn’t be an issue. But in the real world, making a function that adds 4 and only 4 numbers isn’t very useful. A function that can add any given set of numbers however is very useful.

*args

*args to the rescue

Using the keyword *args inside the function parenthesis gives you the flexibility of having as many, or as few arguments as you want.

pythonkwa1.jpg

**kwargs

Kwargs are basically args with keywords. Think dictionaries.

pythonkwa2

You can combine regular arguments with *args and **kwargs

pythonkwa3

You can call functions with *args and **kwargs

oh, in case I forgot to mention earlier arg and kwarg are not required as keywords. As you can see in the examples below, it is the * or ** you need.

pythonkwa4.jpg


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

Python: Regular Expressions

Regular Expressions are used to parse text. In the world of Big Data, being able to work with unstructured text is a big advantage.

To use regular expressions, first you must import the module. This is done by placing the command import re at the top of your code.

re.search()

Now, let us examine this code below:

We want to see if dog (assigned to x) is in the sentence ‘I just a saw dog. He was chasing a cat.'(assigned to y)

Using the search() method from re, we ask if re.search(x,y). Note you place the item you are searching by first in the parenthesis. re.search() returns a boolean value (True, False).

pythonreg

You can use re.search with lists of search items as well.

Here z is taking one item from the list x at a time and running it through re.search. Notice ‘one’ returns True, while ‘two’ returns false.

pythonreg1.jpg

re.findall()

re.findall returns all instances of your search term. Notice it found water whether it was a stand alone word, or part of a larger word.

pythonreg2.jpg

re.split()

The re.split() method does pretty much what you would think it does. You can pick a delimiter and the method will split your string at that delimiter.

In the example below, ‘;‘ is my delimiter. Notice how it split my string in two, plus removed the delimiter for me.

pythonreg3

use re.search() to find position

You can use re.search() to find the starting and ending position of a search item in a string

pythonreg4

exclusion

If you want to exclude characters, use the ^ between square brackets [].

This example excludes the letter s = [^s] and puts the remaining characters in a list

In the second example, I add + after the []. This keeps all the characters together.

pythonreg5.jpg

This next example is a useful tool you will find yourself using in text mining. Here we use [^?!. ]+ to remove punctuation.

pythonreg6


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

 

 

Python: Enumerate() and Sort

Enumerate

Enumerate is a useful function in Python as it returns both the indexes and values of Lists.

Let’s look at the code:

Running the list through a For loop, we assign the index to x and the value to y

pythonEnum

Let’s say we only wanted to print every other value in a list

To do that, I could run a simple nested an IF statement inside my loop stating if the index an even number, then print the value. (x%2==0 is an easy trick for finding even or odd values. % means modulus – it returns the remainder of a division problem. Any even number divided by 2 will not have a remainder – so the modulus will be 0)

pythonEnum1

Sort

Python lists come with a two built in sort functions.

sorted()

The first: sorted() – sorts the items in a list, but does not alter the list.

As you can see, sorted(lst) returns a sorted list, but calling lst again shows the list is still in its original order.

pythonEnum2

To sort the items in descending order use the reverse=True command:

pythonEnum3

list.sort()

list.sort() is a method – meaning it is a function found with the class List.

list.sort() sorts items in a list and alters position of the items in the actual list. So the next time you run the list, the items are sorted in the list.

Also note that reverse = True works in the list.sort() method as well.

pythonEnum4


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python