Excel Data Cleaning: Split name column into First and Last at comma

So my wife works at a high school and had an issue trying to upload an Excel file to a database. She asked me if I could look at the file and fix any errors.

**Note the names below have been completely made up, these are not real students. I have not and would not share any private information on my site

You can download the file below to play along:

Okay, so the database she is trying to upload the file too has request the following columns:

ID > Last_Name > First_Name > Grade > Email

**For my non US readers, in the United States we commonly refer to the year you are in school as the grade, so Grade 12 is the final year for high school students.

First thing first, lets split our Student column into first and last name

  1. Let’s make some space, right click grade column and click insert to add a new column

Now, click on student column, select Data from the menu bar, and Text to Columns from the Ribbon bar

Note all the names are different lengths and some students have two last names, while others have a first and middle name. So our best bet is to split this by the delimiter (in this case the comma)

Select comma from the choices, you will see your data preview window what the results will look like

I don’t need to convert data types, so I just ignore this window > Hit Finish

Now all you have to do is change the column names

Next, lets check for bad email addresses. One good thing about email addresses in a single organization is they are often all the same after the @ sign, so we can use the RIGHT() function to count backwards

A quick count of characters (make sure to include the period and @) tells me that starting from the end of the email address, there are 11 characters up to and including the @

So my formula would be =right(E2,11)

With this nice short file, it is easy to see that the forth student has a mistake in his email address.

In the real world, where you have 100’s or even 1000’s of students in a school, it might be easier to use an IF statement to more easily tease out bad emails. Using the code below, we have nested our Right() function inside an IF statement. And we will return 0 for all good emails, 1 for a bad one

=IF(Right(E2,11)=”@school.edu”,0,1)

Even with long lists, the 1 usually jumps off the page against the rows and rows of 0s

Finally, files can some times have spaces before or after data in the cells, and this can mess up an upload to a database sometimes.

We can simply use the Trim function to get rid of extra whitespace.

Run Trim in a new column > Then right click and copy the whole column

Go back to your original column > Paste > Paste Special…>Paste Values (the blue arrow). That will paste just the data trimmed of whitespace, not the formula.

Once you are done, delete the added columns you were working with for the IF() and Trim() statements, make sure your columns are properly named, and you are good to go.

Leave a Reply