Python: Accessing a MySQL Database

Here is a code block to create a database if you want to play along

create database sandbox;

use sandbox;
CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
        emp_id varchar(8),
        b_emp_id varchar(8),
        PRIMARY KEY(emp_nm) );   

While loading data from Excel files and CVS files is pretty straightforward in Python, getting database from a SQL database is really a skill you should learn. SQL databases can store much more data than an Excel file and are used pretty much everywhere now.

This lesson will focus on MySQL, I have another lesson for SQL Server as well. The syntax does vary a little and if you are working with Oracle or Teradata, your syntax may change a bit as well. Luckily the Internet has plenty of resources to point you in the right direction.

Now, for MySQL, lets get started. First you are going to want to install mysql.connector (if you have the Anaconda distribution that I recommend, it comes with this pre-installed.

If you don’t have it installed, you can get it using pip or conda – I have a video showing you how to install a module here: video link — skip ahead to the 7 minute mark where I explain how to install modules

Once install, import it into your notebook

Now lets make a connection to our server: syntax should be self explanatory – localhost simply means its installed on my computer — otherwise you would provide a network path to the server there.

Let’s start by looking around- First I want a name off all the databases in my instance

Now in this example we will be working with the Sandbox database — I provided code at the top of the lesson you can paste and run in your MySQL instance to create a Sandbox database

Now lets add a new element to our connection string – database

And query the table names in the Sandbox database

Using the .execute() method, we can pass SQL commands to the MySQL instance

Below I am creating a table -> passing data to the table -> and committing the data

Without the commit() command, the data will not be saved into the table.

To add multiple rows to the table, I can use the executemany() method

Now let’s query our table. Note the .fetchall() method — this brings in all the rows from the query. I can iterate through list by running a simple for loop

I can also use the command .column_names after my cursor to return the column names of the last table I queried

Finally, we can use Pandas to put this database table into a dataframe we can work with

SQL: Check to see if table exists MySQL

First thing first, if you would like to play along with this tutorial, here is a code block to create a database and table:

create database sandbox;

use sandbox;
CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
        emp_id varchar(8),
        b_emp_id varchar(8),
        PRIMARY KEY(emp_nm) );             

To check if a table exists in MySql, you can use the INFORMATION_SCHEMA.TABLES table.

Running the following code, produces the results below:

Select *
from INFORMATION_SCHEMA.TABLES
where table_schema = 'Sandbox';
You can use this table with an IF THEN clause do determine how your query responds whether or not a table exists.
SELECT IF( EXISTS(
             SELECT *
             FROM INFORMATION_SCHEMA.TABLES
           WHERE TABLE_NAME = 'employee_id'), 1, 0);

And if I try it with a table name that does not exist:

One of the more common uses I find for this when I need to create a table in a script. I want to make sure a table with same name doesn’t already exist, or my query will fail. So I write a query like the one below.

drop table if exists employee_id;

 CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
             emp_id varchar(8),
             b_emp_id varchar(8),
             PRIMARY KEY(emp_nm) );

INSERT INTO employee_id
       (emp_nm, emp_id)
VALUES
       ('Bob', 'A1234567'),
       ('Lisa', 'A1234568'),
       ('Priyanka', 'B1234567');

Data Jobs: What does a Data Analyst Do?

Data Analysts get a bad wrap. With the advent of the Data Scientist, Data Analysts are often viewed as Data Scientists lite, however I feel that is not the honest case. Truth is, there is a lot of overlap between the two fields. I will dive deeper into what a Data Scientist is in a future article, but just know my opinion is the definition of Data Scientist as a job is still a bit fuzzy and I think the job title may eventually be broken into a few different titles to better define the differences.

Data Analyst

So what does a Data Analyst do?

A lot actually. You could put 10 data analysts into a room and you would get ten different answers to this question. So the best I can do here is make sweeping generalities. As the old saying goes “Your results may vary”

In general, data analysts perform statistical analysis, create reporting, run ad-hoc queries from data warehouses, create data visualizations, create and maintain dashboards, perform data mining, and create machine learning models (yes, ML is not only for data scientists). Their assignments are business driven. A data analysts is either embedded with a business unit (financial planning, fraud, risk management, cyber security, etc.) or working in a centralized reporting/analytics team. They use their skills to provide reporting and analytics for the business.

Tools used by Data Analysts

  • SQL – MySql, SQL Server, Oracle, Teradata, Postgres – whether simply querying a data warehouse or creating and managing a local data mart, data analysts need to be advanced SQL programmers
  • Visualization tools – Tableau, Qlik, Power BI, Excel, analysts use these tools to create visualizations and dashboards
  • Python/R – Data analysts should be familiar with languages like Python or R to help manage data and perform statistical analysis or build machine learning models
  • Spreadsheets – Excel, Google Sheets, Smart Sheets are used to create reports, and pivot tables used to analyze the data
  • ETL tools – SSIS, Alteryx, Talend, Knime, these tools are design to move data to and from databases, CSV files, and spreadsheets. Until the data is in a usable format, analysis cannot be performed.

Educational Requirements

Typically a data analyst position will ask for a bachelors degrees, preferably in computer science, statistics, database management or even business. While the barrier to entry for a data analyst job is generally not as high as a data scientist, that does not mean you cannot make a meaningful and well paid career as a data analyst. Also, the demand for data professionals seems to keep going up and up and it most likely will for the foreseeable future.

SQL: Rollback and Commit – undo mistakes in SQL

If you ever want to experience a heart attack, may I advise accidently deleting a production table from a database. And even if you survive the heart attack, your job may not.

This is, IMHO, this single most important piece of SQL code you will ever learn. It is called Rollback and Commit.

What Rollback does, is reverse and changes you have made to the database, allowing you to undelete the table you accidentally sent to the data afterworld.

Here is how it works. You start by letting the system know to you are making changes that you may want to reverse. The syntax is the as follows

Start transaction;

delete from <table> where <column> = <Value>;

Now comes the life saving part. If you realize you made a mistake, simply type Rollback;

If, on the other hand, you like the results, then type Commit;

The results will be saved and the transaction instance will be closed out.

Now lets try adding a new row, using start transaction
If that is not what you want, just type Rollback; and your mistake is gone
On the other hand, if you want to save the results, simply type commit; and the new record will stay

Important NOTE: Rollback only works if you first Start Transaction: — it is two words that will save your job. Start every instance of adding or deleting data or object with Start Transaction: – trust me on this one.

SQL: What is SQL and its 5 Subgroups DQL, DML, TCL, DDL, DCL?

SQL – short for Sequel Query Language is a programming language designed to work with data stored in RDBMS (relational database management systems). The data managed by SQL is tabular, meaning it formatted in rows and columns, very much like an Excel spreadsheet.

SQL is broken down into 5 sets of command groups:

DQL = Data Query Language

Querying (SELECT, FROM, etc)

DML = Data Manipulation Language

Data manipulation (INSERT, DELETE, etc)

TCL = Transaction Control Language

Transaction mgt. (COMMIT, ROLLBACK, etc)

DDL = Data Definition Language

Data definition (CREATE, DROP, etc)

DCL = Data Control Language

Data control (GRANT, REVOKE, etc)

SQL: Drop, Delete, Truncate commands

When it comes to deleting data from a SQL server, you have 3 main options: Drop, Delete, and Truncate

Drop

The Drop command completely deletes a table from the database. Let’s take the table Employee

When I run the following code: Drop Table Employee;

You can see the entire table has been dropped from the database, data and all. There is no way to recover data from a dropped table, short of having a back up

Delete

Delete removes data from the table, not the table its. Also, Delete can be used in conjunction with a Where clause to choose exactly which data to delete

You see that only the row with ‘Chris’ was deleted

Without the Where clause, all data is deleted, however the table remains

Truncate

Truncate acts just like Delete but you can’t add a Where clause, it simply deletes all data from a table

SQL: Load (Insert) data into a table

Here are the steps to add data to existing table:

Let’s use the employee table created on the create table page: Create_table

To insert some data into this table, first you need to know what the data types of each of the columns in the table are. In this example I am using my MySQL. An easy way to see column data types is to Right Click on Table name> Send to SQL Editor > Create Statement

The results will appear in the query window. In this table we can see the columns are a integer (INT), string (varchar), DOB (date)

The syntax for inserting data into the table is as follows

insert into <table name> (column1, column2,...)
values (value1, value2,...)

In the example below, we are loading 3 rows in, separating each row by a comma:

If we run this, we can then check on the results with a Select statement

SQL: Create a Table

Create a table in MySql and Sql Server

Create table <table name> (
           <column 1> data type,
           <column 2> data type )

As an example, let’s create an employee table with 3 columns, ID, Name, and Date of Birth

create table Employee(
   Id int,
  Emp_NM varchar(255),
  DOB date);

** remember that unless you set your database to be case sensitive, most SQL databases are not

You may need to refresh your database explore view, but after running this command, you should now have a new table

If you need a reference table for the types of data types available, check out this page: data types

SQL: What is DDL and DML?

You might have heard of DDL and DML and been confused. Are they part of SQL or are they their own language? Actually Yes and No…

If you look at it as a purist computer programmer or an academic, then you will probably going to say they are all different languages, you would not be wrong. However for someone like me, I view them as a subset of commands used in the SQL language.

DDL: Data Definition Language is a set of commands used to create, modify, or drop databases, tables, views, indexes, schemas, and users.

DML: Data Manipulation Language is a set of commands used to add data to a table, move data around, read data, update data, or delete data.

SQL: Common Data Types in MySQL

Here is a table of the most commonly used data types in MySQL

Data TypeDescription
Char()Fixed length string, unused spaces get padded and eat up memory: size 0-255
Varchar()Variable length string, unused spaces don’t use memory: size 0 to 65535
MediumText()A string up to 16,777,215 characters long
LongText()A string up to 4,294,967,295 characters long
INTinteger (whole number, no decimals)
Double(x, d)floating point decimal number, x is size, d is number of places after the decimal
Bool or BooleanBinary choice, 0 = False and 1 = True
DateDate data type “YYYY-MM-DD” (if set to US settings)
DATETIMEdatetime data type “YYYY-MM-DD HH:MM:SS” (if set to US settings)
TIMETime “HH:MM:SS”
YEARyear in for digit representation (ex 1908,1965,2011)