Python: Accessing a MySQL Database

Here is a code block to create a database if you want to play along

create database sandbox;

use sandbox;
CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
        emp_id varchar(8),
        b_emp_id varchar(8),
        PRIMARY KEY(emp_nm) );   

While loading data from Excel files and CVS files is pretty straightforward in Python, getting database from a SQL database is really a skill you should learn. SQL databases can store much more data than an Excel file and are used pretty much everywhere now.

This lesson will focus on MySQL, I have another lesson for SQL Server as well. The syntax does vary a little and if you are working with Oracle or Teradata, your syntax may change a bit as well. Luckily the Internet has plenty of resources to point you in the right direction.

Now, for MySQL, lets get started. First you are going to want to install mysql.connector (if you have the Anaconda distribution that I recommend, it comes with this pre-installed.

If you don’t have it installed, you can get it using pip or conda – I have a video showing you how to install a module here: video link — skip ahead to the 7 minute mark where I explain how to install modules

Once install, import it into your notebook

Now lets make a connection to our server: syntax should be self explanatory – localhost simply means its installed on my computer — otherwise you would provide a network path to the server there.

Let’s start by looking around- First I want a name off all the databases in my instance

Now in this example we will be working with the Sandbox database — I provided code at the top of the lesson you can paste and run in your MySQL instance to create a Sandbox database

Now lets add a new element to our connection string – database

And query the table names in the Sandbox database

Using the .execute() method, we can pass SQL commands to the MySQL instance

Below I am creating a table -> passing data to the table -> and committing the data

Without the commit() command, the data will not be saved into the table.

To add multiple rows to the table, I can use the executemany() method

Now let’s query our table. Note the .fetchall() method — this brings in all the rows from the query. I can iterate through list by running a simple for loop

I can also use the command .column_names after my cursor to return the column names of the last table I queried

Finally, we can use Pandas to put this database table into a dataframe we can work with

Python: Working with Rows, Pandas DataFrames

Here is a quick lesson on working with rows in DataFrames

Let’s start by building a DateFrame

Lets start by deleting (or dropping) a row:


Let’s try dropping a couple of rows:

[-1] allows us to drop the last row:

Side note:

Here is another method from pandas you should know. This is the shape method. See it returns (rows, columns)

We can filter rows out using conditional logic

The index method returns the row indexes

We can use the .index as well as conditional logic to add a new column to our dataframe

Using append(), we can add a dictionary as a new row to the DataFrame

Finally, using .iloc we can iterate through the values of a row

Last Lesson: Pandas Renaming Columns

Next Lesson: Python: Working with Rows in DataFrames

Return to: Python for Data Science Course

Follow this link for more Python content: Python

Data Jobs: What does a Data Analyst Do?

Data Analysts get a bad wrap. With the advent of the Data Scientist, Data Analysts are often viewed as Data Scientists lite, however I feel that is not the honest case. Truth is, there is a lot of overlap between the two fields. I will dive deeper into what a Data Scientist is in a future article, but just know my opinion is the definition of Data Scientist as a job is still a bit fuzzy and I think the job title may eventually be broken into a few different titles to better define the differences.

Data Analyst

So what does a Data Analyst do?

A lot actually. You could put 10 data analysts into a room and you would get ten different answers to this question. So the best I can do here is make sweeping generalities. As the old saying goes “Your results may vary”

In general, data analysts perform statistical analysis, create reporting, run ad-hoc queries from data warehouses, create data visualizations, create and maintain dashboards, perform data mining, and create machine learning models (yes, ML is not only for data scientists). Their assignments are business driven. A data analysts is either embedded with a business unit (financial planning, fraud, risk management, cyber security, etc.) or working in a centralized reporting/analytics team. They use their skills to provide reporting and analytics for the business.

Tools used by Data Analysts

  • SQL – MySql, SQL Server, Oracle, Teradata, Postgres – whether simply querying a data warehouse or creating and managing a local data mart, data analysts need to be advanced SQL programmers
  • Visualization tools – Tableau, Qlik, Power BI, Excel, analysts use these tools to create visualizations and dashboards
  • Python/R – Data analysts should be familiar with languages like Python or R to help manage data and perform statistical analysis or build machine learning models
  • Spreadsheets – Excel, Google Sheets, Smart Sheets are used to create reports, and pivot tables used to analyze the data
  • ETL tools – SSIS, Alteryx, Talend, Knime, these tools are design to move data to and from databases, CSV files, and spreadsheets. Until the data is in a usable format, analysis cannot be performed.

Educational Requirements

Typically a data analyst position will ask for a bachelors degrees, preferably in computer science, statistics, database management or even business. While the barrier to entry for a data analyst job is generally not as high as a data scientist, that does not mean you cannot make a meaningful and well paid career as a data analyst. Also, the demand for data professionals seems to keep going up and up and it most likely will for the foreseeable future.

Free Courses

Database Development and Design

Files for Database Development and Design Labs

Introduction to Database Development and Design

Python for Data Science

  • Fundamentals (Available now)
  • Data Handling / Management (coming soon)
  • Statistics and Visualizations (coming soon)
  • Machine Learning (coming soon)

R for Data Science (coming soon)

Excel for Data Science (coming soon)

SQL for Data Science (coming soon)

Python Web Scraping / Automation: Connecting to Firefox with Selenium

Selenium is a Python package that allows you to control web browsers through Python. In this tutorial (and the following tutorials), we will be connecting to Googles Chrome browser, Selenium does work with other browsers as well.

First you will need to download Selenium, you can use the following commands depending on your Python distribution

c:\> Pip install selenium

c:\> Conda install selenium

If you are on a work computer or dealing with a restrictive VPN, the offline install option may help you: Selenium_Install_Offline

Next you need to download the driver that let’s you manage Firefox through Python.

Start by determining what version of Firefox you have on your computer

Click the three horizontal lines in the upper right corner > Help >About Firefox

Search for geckodriver to download the file that matches your Firefox version. (note, this is something you will need to do every time Firefox is updated, so get used to it.)

Open up the zipfile you downloaded, you will find a file called geckodriver.exe

Put it somewhere you can find, put in the following code to let Python know where to find it.

from selenium import webdriver
opts = webdriver.FirefoxOptions()
dr = webdriver.Firefox('C:/Users/larsobe/Desktop/geckodriver.exe',chrome_options=opts)

Now to see if this works, use the following line, (you can try another website if you choose)   

Note the message Firefoxis being controlled by automated test software.

You are now running a web browser via Python.

How do I get a job in Data Science?

This has to be the most common question on data science I am asked, and honestly it is a hard one to answer. For everyone out there trying to get your foot in the door on your first data job, believe me, I feel for you. Multiple interviews without any offers, or even not getting any interviews at all can be beyond frustrating. Now unfortunately, I do not have any magic trick to get your into the data field, but I can share how I did it.

So, how did I get into the data science field…

Honestly, I “Made” my first job. My first career out of the Army was as a biomedical equipment technician. I fixed medical equipment like patient monitors, ultrasounds, and x-ray machines.

We had a ticketing system called MediMizer where all the repairs and routine maintenance jobs were recorded. My bosses would run monthly reports out the system. I read some of the reports and just felt like we could do better.

I started with just Excel. I downloaded some data, created some pivot charts and made some basic visualizations. I asked new questions from the data. I looked at angles that weren’t covered in the existing reporting.

I showed these to my bosses, my co-workers, other department managers, basically anyone who would listen to me. Then I learned about Tableau, and using its free version I was able to create some more professional looking visualizations.

I learned how to make a dashboard, I started analyzing data sets from other departments, and I began feeding them my reports. I went back to school to get a degree and used what I was learning in school to improve my reporting skills.

While my job title didn’t change, I was now able to put data analysis skills on my resume. I was lucky enough to have very supportive management who saw the value in what I was doing, and allowed me to dedicate some of my time to it.

But most importantly, I was now a data professional (even if not in title). I was using data to solve real world problems. I put together a portfolio of some of the reporting I was doing. This allowed me to show my future employer that not only was I able to create reporting, but more importantly I was able to identify real world business problems and use data to help solve them.

The take away is don’t let your job title hold you back. Look around, what kind of problems do you see? Can you find a data-driven solution to help fix the problem? If you do this, you are a now a data professional (even if not in title). A portfolio made from real world examples can be more impressive than generic tutorial or Kaggle projects.

Remember, when trying to break into a new field, sometimes you need to make your own luck.

Python: Working with Dictionaries

A dictionary is an advanced data structure in Python. It holds data in a Key – Value combination.

Basic Syntax for a dictionary:

dictionary = { <key> : <value>,
               <key> : <value>,
               <key> : <value> }

I think the key/value relationship can be explained thinking about a car. A car has a few key values that describe it. Think: make, model, color, year

If I want to make a dictionary to describe a car, I could build it like this:

car = { 'Make': 'Ford',
        'Model': 'Mustang',
        'Color': 'Candy Apple Red',
        'Year': 1965 }

You can also use the dict() function to create a dictionary

car = dict([ ('Make', 'Ford'), 
             ('Model', 'Mustang'),
             ('Color', 'Candy Apple Red'),
             ('Year', 1965)])

Now, how do we access the values in a dictionary. I know in lists you can just refer to the index, but in dictionaries, it doesn’t work that way.

Try your code in our online Python console: 

  

Instead, we refer to dictionaries by their key:

You can add a key/value pair:

You can change a dictionary value:

To remove values from a dictionary, use the del() function

Try your code in our online Python console: 

Previous Lesson: Working with lists

Next Lesson: Tuples and Sets

Back to Python Course: Course

Python: Webscraping using Requests and BeautifulSoup to identify website content

When learning any new skill, it is always helpful to see a practical application. I currently work as a data scientist in the Cyber-security department. One of our concerns is fake websites set up to look like a real website for my company. We search recently registered Domain names for names that might mimic our company brand or name. We need to know if these are possibly malicious sites looking to steal our employee or customer information.

The problem is, we get 100’s of new domains registered a day that fit our search criteria. Instead of looking at each on manually, I created a Python script that scans the list of new domains for key words. To do this, all you need is Requests and BeautifulSoup.

In this example, I am going to look at a couple of websites. I want to know if they discuss Python or podcasts or neither?

Let us start by looking at my site: Analytics4All.org

I imported requests and BeautifulSoup, then ran Request.get() on my website to pull the HTML code.

import requests as re
from bs4 import BeautifulSoup

r = re.get("https://analytics4all.org")
print(r.text)

If you are not sure how to do this, check out my earlier tutorials on this topic here:

Intro to Requests

Requests and BeautifulSoup

Now let us search the HTML for our keywords

Using .find() command, Python returns the first location in the HTML of the keywords. Notice podcast returned -1. That means podcasts are not mentioned in my site. But Python is listed at 35595. So I can label this site as mentioning Python, but not podcasts

r.text.find('Python')
r.text.find('podcast')

Let’s try another site, and use BeautifulSoup to search the code

In this example, we will look at iHeartRadio’s website: iheart.com

Using the same .find() from the last time we see that Podcasts are mentioned in this website

r1 = re.get("https://www.iheart.com/")
soup1 = BeautifulSoup(r1.content, 'html.parser')
print(soup1.prettify())  

r1.text.find('Podcasts')

Using BeautifulSoup we can limit our search to a more targeted element of the website, which is useful in reducing false positives, as sometimes you will find some weird stuff buried in the HTML that is honestly irrelevant to the website.

Above we just pulled the title from the website and looked for Podcasts. And we found it.

a = soup1.title.string
print(a)
str(a)
print(a.find('Podcasts'))

Finally, let us inspect the login page for Macy’s

Notice we are getting Access Denied when searching this site. Unfortunately Requests doesn’t work in all case. I will be explaining how to get around this problem in my lessons on Selenium.

r1 = re.get("https://www.macys.com/account/signin")
soup1 = BeautifulSoup(r1.content, 'html.parser')
print(soup1.prettify())    

But for now, just know Requests does work for most websites, and using it makes a simple way to automate scanning websites for keywords that can be used to categorize the site, and in my case, find bad operators and shut them down.

Python: Webscraping using BeautifulSoup and Requests

I covered an introduction to webscraping with Requests in an earlier post. You can check it out here: Requests

As a quick refresher, the requests module allows you call on a website through Python and retrieve the HTML behind the website. In this lesson we are going to add on to this functionality by adding the module BeautifulSoup.

BeautifulSoup

BeautifulSoup provides an useful HTML parser that makes it much easier to work with the HTML results from Requests. Let’s start by importing our libraries we will need for this lesson

Next we are going to use requests to call on my website.
We will then pass the HTML code to BeautifulSoup

The syntax is BeautifulSoup(HTML, ‘html.parser’)

The HTML I am sending to BeautifulSoup comes from my request.get() call. In the last lesson, I used r.text to print out the HTML to view, here I am passing r.content to BeautifulSoup and printing out the results.

Note I am also using the soup.prettify() command to ensure my printout is easier to read for humans

BeautifulSoup makes parsing the HTML code easier. Below I am asking to see soup.title – this returns the HTML code with the “title” markup.

To take it even another step, we can add soup.title.string to just get the string without the markup tags

soup.get_text() returns all the text in the HTML code without the markups or other code

In HTML ‘a’ and ‘href’ signify a link

We can use that to build a for loop that reads all the links on the webpage.