R: Connecting to SQL Server Database

You can query data from a SQL Server database directly from R using the RODBC package.

install.packages("RODBC")

First you need to form a connection

library(RODBC)
##connection string
cn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=localhost; database=SSRSTraining;trusted_connection=yes;")

We use the odbcDriverConnect() function. Inside we pass a connection = value

Driver = {SQL Server Native Client 11.0};  — this is based on the version of SQL Server you have

server=localhost;  — I used localhost because the SQL Server was on the same computer I was working with. Otherwise, pass the server name

database=SSRSTraining; — name of database I want to work with

trusted_connection=yes; — this means I am able to pass my Windows credentials.

If you don’t have a trusted connect pass the user Id and password like this

uid = userName; pwd = Password;

Note each parameter is separated by a semicolon

Query the database

> ##passes query to SQL Server
> df <- sqlQuery(cn, "select * FROM [SSRSTraining].[dbo].[JobDataSet]")
> head(df)

    Name              Job Hours Complete
1  Sally Predictive Model     1        n
2 Philip      Maintanence    10        n
3    Tom    Ad-hoc Report    12        y
4    Bob             SSRS     3        y
5 Philip         Tableau      7        n
6    Tom         Tableau      9        n

using sqlQuery() – pass through the connection string (cn) and enclose your query in ” ”

 

 

 

SQL Server: Linked Servers

SQL Server Linked Servers feature allows you to query data from other databases (even non SQL Server databases) from within SQL Server.

While I believe large data transfer jobs are best handled using an ETL solution like SSIS, sometimes all you want is a just a couple hundred rows of data from another database and setting up an ETL jobs might be overkill.

In this example, I am going to set up a link to a Teradata Data Warehouse.  This will allow me to query data from the warehouse and use it in my SQL Server environment.

(Note that you can connect to any database, such as another SQL Server or Oracle. I am just using Teradata as an example)

First go into SSMS and connect to your SQL Server

Now, in Object Explorer go to Server Objects -> Linked Servers

2018-03-29_12-44-42

Right click on Linked Servers and select New Linked Server…

2018-03-29_12-45-49

For a Teradata connection, I am choosing Microsoft OLE DB Provider for ODBC Drivers

(on a side note, you will need to have the ODBC driver for Teradata installed for this to work)

2018-03-29_12-46-25

Linked Server = just whatever name you want to give your linked server – I chose EDW (short for Enterprise Data Warehouse)

Product Name = again free text, I just wrote Teradata to let people know what kind of product you are connecting too.

Data Souce: This is the server name you are trying to connect to

2018-03-29_12-48-05

Next, click Security tab.

I want to use my own credentials to connect to Teradata so I clicked Be Made Using This Security Context: ( the green arrow)

Type in your Username and password. Click Ok

2018-03-29_12-48-34

Now go to create a new query in SSMS (SQL Server Management Studio)

We will be using a function called OPENQUERY. The syntax is as follows

Select * from OPENQUERY(LINKEDSERVER, QUERY)

2018-03-29_12-50-47

SELECT * FROM OPENQUERY(EDW, 'SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE')

We can use the data returned from Teradata in SQL Server by simply putting it in a table or a temp table as shown below

SELECT * into ##TempTable FROM OPENQUERY(EDW, 'SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE')

We can also pass variables to our Teradata query. My preferred method is simply using a stored procedure.

2018-03-29_12-54-32

CREATE PROCEDURE [LINKED_SERVER_QUERY]
       @USER VARCHAR(8) = NULL
AS

declare @query varchar(8000) = 'Select * from OPENQUERY(EDW, ''SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE WHERE USER_NM =  ''''' + @USER + ''''''')' 
exec(@query)

GO

exec [LINKED_SERVER_QUERY] @USER = 'blarson'

Python: Accessing a SQL database

create database sandbox;

use sandbox;
CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
        emp_id varchar(8),
        b_emp_id varchar(8),
        PRIMARY KEY(emp_nm) );   

If you really want to do data work, you need to be able to connect to a database. In this example I will show you how to connect to and query data from MS SQL Server with the AdventureWorks2012 database installed.

This lesson assumes some very basic knowledge of SQL. If SQL is a complete mystery, head over to my SQL page: SQL  If you check out the first 4 intro lessons, you will know everything about SQL you need to know for this lesson.

Install pyodbc

To connect to the database, we need to install pyodbc. Go to your Anaconda terminal and type: pip install pyodbc

sqlpython.jpg

Now open up your jupyter notebook and start a new notebook

Connect to Database

import pyodbc

cnxn is our variable – it is commonly used as a shorten version of connection

syntax: pyodbc.connect(‘DRIVER={SQL Server}; SERVER=server name; DATABASE=database name;UID = user name; PWD = password’)

finally cursor =cnxn.cursor() creates a cursor for us. In SQL, a cursor is used to step through your results one row at a time.

sqlpython1.jpg

cursor.execute(place sql query here)  – this is how you pass a sql query – note query goes in quotes

tables = cursor.fetchall() – fetch all the rows in your query results

sqlpython2

We can now iterate through the rows in tables.

sqlpython3.jpg

I don’t like the layout of this. Also we can’t really work with the data.

Pandas Dataframe

first import pandas

  • d= [] – create empty dictionary
  • d.dappend({‘Name’:row.Name, ‘Class’: row.GroupName}) – fill dictionary 1 row at a time
  • df = pd.DataFrame(d) – convert your dictionary to a dataframe

sqlpython4.jpg


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

SQL: Aggregates, Group By, and Having

The Where clause is great. It helps you filter out items from a table, leaving only the records you want. But sometimes you are not looking for the actual records. Instead, sometimes you only want to know how many records.

In this example, I will using the [HumanResources].[EmployeeDepartmentHistory] from AdventureWorks2012:

sqlAggr.jpg

Count and Group By

The Count(*) function is used to count the number of rows.

Now, when we combine count(*) with a Group By clause, we can count how many records exist for each element in our group by clause.

All non aggregate columns in the select statement need to be in the group by function or you will get an error.

The code below counts the amount of rows containing each departmentID

sqlAggr1

Having

Run the following query:

select BusinessEntityID, count(*) as amt
from [HumanResources].[EmployeeDepartmentHistory]
group by BusinessEntityID

You will get a list 290 records long. Most of these records will have a 1 in the amt column (meaning there is only one records containing the BusinessEntityID referenced)

If you want to know how many BusinessEntityIDs in this table have more than one record, we just have to add the Having clause to our query.

sqlAggr2

having can only be used in conjunction with aggregate functions.

Min(), Max()

Min and Max return the minimum and maximum of whatever column you you place in the parenthesis

sqlAgg4

 

 

SQL: SELECT Functions (Top, Count, Distinct, Max, Min)

I introduced you to the SELECT statement in SQL in my last SQL posting: SQL: SELECT Statement

Now I am going to show you some tricks that will provide you with more functionality in your Select statements.

**note, in the examples below, I am working with the HumanResources.Employee table from the AdventureWorks2012 data set. If you want to follow along with the exercises, go to my previous SQL posting: SQL: SELECT Statement where I show you how to download and attach the AdventureWorks2012 database.

Top

Top allows you to limit the number of rows that are returned. (note Top is a T-SQL (Microsoft’s version of SQL command. MySQL and Oracle use the term Limit).


Select Top 10 *
from HumanResources.Employee
go

Using Top 10 we are querying only the first 100 records from the Employee table. This is often useful when you are exploring a database and just want to see what is in a table. Using Select * in some larger tables without using the Top command to limit the amount of output can result in a query that could take minutes to hours to complete.

Note that the query above returns the first 10 records based usually on the order records were entered. If you want to top 10 newest employees you would need to add an Order By command.


Select Top 10 *
from HumandResources.Employee
order by HireDate desc
go

The order by command sorts the results by the dates in the column HireDate in descending order (desc) most recent to earliest. If I wanted to go the other way, I would use the command: order by HireDate asc – ascending(asc)

Count()

If yon want a count of the number of records in a table, SQL has a Count() function. The syntax is below.


Select Count(*)
from HumandResources.Employee
go

This returns the number of records in the table. If you wanted to know how many male employees are in the table, you can add a Where clause (I will cover Where more in-depth in later lessons)


Select Count(*)
from HumandResources.Employee
where Gender like 'M'
go

Distinct()

Notice how there are repeated JobTitles in the table below. What if you wanted a list of job titles without any repeating elements.

sqlSelect14

In this case, you will use the Distinct() function. Placing the column you want in the parenthesis and the query will return a non repeating list.


Select Distinct(JobTitle)
from HumandResources.Employee
go

Now, what if you want to know how many distinct job titles exist in the company. Well, we can use the Count() function in conjunction with Distinct()


Select Count(Distinct(JobTitle))
from HumandResources.Employee
go

The query above will return a count (number) of the distinct job titles found in the table

Max() and Min()

Max() and Min() do pretty much what you would expect them to do. They return the maximum or minimum value in column.


Select Max(HireDate) as Max, Min(HireDate) as Min
from HumandResources.Employee
go

** “as” command in a Select statement gives a name to result columns in query


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more SQL content: SQL

SQL: SELECT Statement

 

SELECT is doubtlessly the most important command in SQL. A database is completely useless unless you can query the data it is holding. SELECT is how we query.


If you want to follow along:

You will need MS SQL Server installed. Here is a link to instructions: InstallSQL

Microsoft provides a great sample database known as Adventure Works. I am working with the 2012 release. You can download it here: AdventureWorksDownload


Once you have Adventureworks downloaded, you will need attach it to SQL Server.

Copy and paste the download file to somewhere you can find it. I recommend:

C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA

** the portion in red changes based on what version of SQL Server you downloaded. 

sqlSelect

Open SQL Server Management Studio. Windows Key + R – Type SSMS

sqlinstall2

Login to your server

sqlinstall3

Go to Object Explorer – Right Click Database > Attach…

sqlSelect1

Click Add…

sqlSelect2

Select Adventureworks2012_Data.mdf

sqlSelect3.jpg

In the bottom window, highlight the log file and click Remove. Then Click Okay

sqlSelect4.jpg

Go back to Object Explorer. Expand Databases > AdventureWorks2012 > Tables

sqlSelect6

Just picking a table at random, I selected HumanResources.Employee. Let us see what is in the table. Click the New Query button up top, and a new query window will open.

sqlSelect7

Make sure you are working with the right database. Select AdventureWorks2012 from the drop down:

sqlSelect17

Cut and Paste the following into the new query window.


select *
from HumanResources.Employee
go

select JobTitle, BirthDate, Gender, HireDate
from HumanResources.Employee
go

select JobTitle as Job, BirthDate as DOB, Gender, HireDate as [Start Date]
from HumanResources.Employee
go

select HireDate as [Start Date], JobTitle as Job, BirthDate as DOB, Gender
from HumanResources.Employee
order by HireDate
go
select HireDate as [Start Date], JobTitle as Job, BirthDate as DOB, Gender
from HumanResources.Employee
where gender like ‘F’
order by HireDate
go


 

One cool thing about SQL code is that you can select only the  code you want to execute

Highlight the first group of code and click the ! Execute button

sqlSelect8

Syntax

  • Select * – Select all fields in the table. “*” is a wildcard in SQL
  • from HumanResources.Employee – this is the table we want to work with
  • go – this indicates and end to the code block. It is not needed when executing code block by block (like we are doing here), but it is a good practice to get in the habit of adding it

Notice the results show every field in the table. It displays in a spreadsheet like table

sqlSelect9.jpg

Now Select the next code group  and click ! Execute

sqlSelect10

Syntax

  • select JobTitle, BirthDate, Gender, HireDate – Here we are selecting four specific columns from the table
  • The rest is same as above

Results show the 4 columns selected above

sqlSelect14.jpg

Next block of code – click ! Execute

sqlSelect11.jpg

  • select JobTitle as Job, BirthDate as DOB, Gender, HireDate as [Start Date] – In this case, we use the “as” command to give the field names more User friendly names

Note the column names have changed

sqlSelect15.jpg

Next block of code – click ! Execute

sqlSelect12

syntax

  • select HireDate as [Start Date], JobTitle as Job, BirthDate as DOB, Gender – here we reordered the fields so that HireDate appears first
  • order by HireDate – this SQL for Sort by. The results will now be ordered by HireDate

Notice Start Date(HireDate) is now the first column and the records (rows) are sorted in order from oldest Start Date to newest

sqlSelect16.jpg

FInal block of code – click ! Execute

sqlSelect13

Syntax

  • where gender like ‘F’ – here we are filtering the results to only ones were Gender is F.

Note now only Female employees are in the results

sqlSelect18


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more SQL content: SQL