R: Manually entering data

You can use the data frame edit() function to manually enter / edit data in R.

Start by creating a data frame.

Note I am initializing each of the columns to datatype(0). This tells R that I while I want the name column to be a character and the age column to be numeric, I am leaving the size dynamic. You can set size limits during the initialization phase if you so choose.

 dfe <- data.frame(name=character(0), age = numeric(0), jobTitle = character(0))

Now, let’s use the edit() function to add some data to our data frame

dfe<- edit(dfe)

When the new window pops up, fill in the data and simply click the X when you are done

2018-05-27_16-21-46.png

You may get warning messages when you close out your edit window. These particular messages I got simply informed me that name and jobTitle were set as factors by R. Remember in R, warnings just want you to be aware of something, they are not errors.

Now if you run dfe, you can see your data frame

2018-05-27_16-23-32.png

By running the edit() function again, you can edit the values that currently exist in the data frame.  In this example I am going to change Philip’s age from 28 to 29

If you want to add a column to the data frame, just add data to the next empty column

2018-05-27_16-37-59.png

You can just close out now and rename the column in R, or just click on the column header and you will be able to rename it there.

2018-05-27_16-38-24.png

Now we have a new column listing pets

  name   age jobTitle pet
1 Ben    42  Data Sc  cat
2 Philip 29  Data Ana dog
3 Julia  36  Manager  frog

You can use the edit() function to manually edit existing data sets or data imported from other sources.

Below, I am editing the ChickWeight data set

2018-05-27_16-43-00.png

Python: Pandas, Working with DataFrames

Sure DataFrames look nice, but how can I work with them?

Let’s cover some basic tasks in pandas to get you started.

Let’s start by building a DataFrame

pyDF1

I don’t like where they placed Age on my dataframe. I want to move it.

To do so, we are going to cover a couple of new terms: axis, drop() and insert()

Axis

Using numpy and pandas, you will come across many functions that require you to enter an axis as a parameter. Axis 0 is your rows while Axis 1 is your columns. This is due to the way matrices are named with a 3×2 matrix having 3 rows and 2 columns and a 2×3 having 2 rows and 3 columns

pyDF2

drop()

To move the age column, I am first going to create a copy of my dataframe minus the age column. To do this, I am going to use the drop() function. The drop() function accepts two arguments drop(name, axis). In our case name = ‘Age’ and axis = 1 since we are referring to a column.

pyDF3

insert()

Now we want to insert the age column. The syntax for the insert() function is insert(insert point, name, data)

pyDF4

add a new column

Adding a new column is straight forward. Just DataFrame[new column name] = value.

Below I created at new column called ‘Age When Start’ that shows the age of employees when they started. I derived this value by subtracting Years Service column from Age column.

pyDF5.jpg

boolean column

You can create a boolean column using a boolean operator.

pyDF6

sort()

You can sort a dataframe by any column using sort_values()

pyDF7.jpg

Sort is set to ascending by default. To reverse it, set ascending = False

** remember in Python, True and False need to start with a capital letter.

pyDF8.jpg

slicing

by rows

slicing by rows is just like with a list

pyDF9

by columns

slicing by columns is a bit more complex. To slice by column name you have to use the dataframe.ix command.

pyDF10


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Last Lesson: Pandas DataFrames

Next Lesson: Pandas: Rename a Column

Return to: Python for Data Science Course

Follow this link for more Python content: Python

 

Python: Pandas Intro (Dataframes)

Before we continue onto Dataframes, I want to clear up something from the Series exercise. Note the line (from pandas import Series, DataFrame)

pandas1

Using that line, I can call upon Series or DataFrame directly in my code

pandas2

In this example below, I did not directly import the methods Series and DataFrame, so I when I tried x = Series() I go an error.

I had to use the full method name of pd.Series() for this to work.

pdDF

DataFrame

DataFrames provide another level of data management for Python. Those of you who come from a more data driven background with appreciate DataFrames.

Let’s start by creating dictionary.

pdDF1.jpg

Now, pass the dictionary to the method DataFrame()

Note, now you have a table looking structure with named columns

pdDF2

You can call up a list of indexes or columns using the methods below:

pdDF3.jpg

DataFrame.info() will return a summary of your DataFrame

pdDF4

Head and Tail

Create a new DataFrame from a dictionary

pdDF5.jpg

If you want just see a few of the first elements, you can use the head() method

pdDF6

The tail() method does the last few. You can even choose how many rows you want.

pdDF7

Describe

The describe() method gives you some quick statistics on any numeric column

pdDF8.jpg

Slice

You can slice a DataFrame just as you would a list

pdDF9

Choose What to Display

DataFrames allow you to filter what rows to display by value or column

pdDF10

There is a lot more you can do with Series and DataFrame in pandas, and we will be covering them in later lessons. For now though, I think you have a general idea.


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Last Lesson: Pandas Series

Next Lesson: Working with DataFrames

Return to: Python for Data Science Course

Follow this link for more Python content: Python