R: Connecting to SQL Server Database

You can query data from a SQL Server database directly from R using the RODBC package.


First you need to form a connection

##connection string
cn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=localhost; database=SSRSTraining;trusted_connection=yes;")

We use the odbcDriverConnect() function. Inside we pass a connection = value

Driver = {SQL Server Native Client 11.0};  — this is based on the version of SQL Server you have

server=localhost;  — I used localhost because the SQL Server was on the same computer I was working with. Otherwise, pass the server name

database=SSRSTraining; — name of database I want to work with

trusted_connection=yes; — this means I am able to pass my Windows credentials.

If you don’t have a trusted connect pass the user Id and password like this

uid = userName; pwd = Password;

Note each parameter is separated by a semicolon

Query the database

> ##passes query to SQL Server
> df <- sqlQuery(cn, "select * FROM [SSRSTraining].[dbo].[JobDataSet]")
> head(df)

    Name              Job Hours Complete
1  Sally Predictive Model     1        n
2 Philip      Maintanence    10        n
3    Tom    Ad-hoc Report    12        y
4    Bob             SSRS     3        y
5 Philip         Tableau      7        n
6    Tom         Tableau      9        n

using sqlQuery() – pass through the connection string (cn) and enclose your query in ” ”




SQL Server: Linked Servers

SQL Server Linked Servers feature allows you to query data from other databases (even non SQL Server databases) from within SQL Server.

While I believe large data transfer jobs are best handled using an ETL solution like SSIS, sometimes all you want is a just a couple hundred rows of data from another database and setting up an ETL jobs might be overkill.

In this example, I am going to set up a link to a Teradata Data Warehouse.  This will allow me to query data from the warehouse and use it in my SQL Server environment.

(Note that you can connect to any database, such as another SQL Server or Oracle. I am just using Teradata as an example)

First go into SSMS and connect to your SQL Server

Now, in Object Explorer go to Server Objects -> Linked Servers


Right click on Linked Servers and select New Linked Server…


For a Teradata connection, I am choosing Microsoft OLE DB Provider for ODBC Drivers

(on a side note, you will need to have the ODBC driver for Teradata installed for this to work)


Linked Server = just whatever name you want to give your linked server – I chose EDW (short for Enterprise Data Warehouse)

Product Name = again free text, I just wrote Teradata to let people know what kind of product you are connecting too.

Data Souce: This is the server name you are trying to connect to


Next, click Security tab.

I want to use my own credentials to connect to Teradata so I clicked Be Made Using This Security Context: ( the green arrow)

Type in your Username and password. Click Ok


Now go to create a new query in SSMS (SQL Server Management Studio)

We will be using a function called OPENQUERY. The syntax is as follows




We can use the data returned from Teradata in SQL Server by simply putting it in a table or a temp table as shown below


We can also pass variables to our Teradata query. My preferred method is simply using a stored procedure.


       @USER VARCHAR(8) = NULL

declare @query varchar(8000) = 'Select * from OPENQUERY(EDW, ''SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE WHERE USER_NM =  ''''' + @USER + ''''''')' 


exec [LINKED_SERVER_QUERY] @USER = 'blarson'

Python: Accessing a SQL database

create database sandbox;

use sandbox;
CREATE TABLE employee_id (
        emp_nm varchar(30) not null,
        emp_id varchar(8),
        b_emp_id varchar(8),
        PRIMARY KEY(emp_nm) );   

If you really want to do data work, you need to be able to connect to a database. In this example I will show you how to connect to and query data from MS SQL Server with the AdventureWorks2012 database installed.

This lesson assumes some very basic knowledge of SQL. If SQL is a complete mystery, head over to my SQL page: SQL  If you check out the first 4 intro lessons, you will know everything about SQL you need to know for this lesson.

Install pyodbc

To connect to the database, we need to install pyodbc. Go to your Anaconda terminal and type: pip install pyodbc


Now open up your jupyter notebook and start a new notebook

Connect to Database

import pyodbc

cnxn is our variable – it is commonly used as a shorten version of connection

syntax: pyodbc.connect(‘DRIVER={SQL Server}; SERVER=server name; DATABASE=database name;UID = user name; PWD = password’)

finally cursor =cnxn.cursor() creates a cursor for us. In SQL, a cursor is used to step through your results one row at a time.


cursor.execute(place sql query here)  – this is how you pass a sql query – note query goes in quotes

tables = cursor.fetchall() – fetch all the rows in your query results


We can now iterate through the rows in tables.


I don’t like the layout of this. Also we can’t really work with the data.

Pandas Dataframe

first import pandas

  • d= [] – create empty dictionary
  • d.dappend({‘Name’:row.Name, ‘Class’: row.GroupName}) – fill dictionary 1 row at a time
  • df = pd.DataFrame(d) – convert your dictionary to a dataframe


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more Python content: Python

SQL: Aggregates, Group By, and Having

The Where clause is great. It helps you filter out items from a table, leaving only the records you want. But sometimes you are not looking for the actual records. Instead, sometimes you only want to know how many records.

In this example, I will using the [HumanResources].[EmployeeDepartmentHistory] from AdventureWorks2012:


Count and Group By

The Count(*) function is used to count the number of rows.

Now, when we combine count(*) with a Group By clause, we can count how many records exist for each element in our group by clause.

All non aggregate columns in the select statement need to be in the group by function or you will get an error.

The code below counts the amount of rows containing each departmentID



Run the following query:

select BusinessEntityID, count(*) as amt
from [HumanResources].[EmployeeDepartmentHistory]
group by BusinessEntityID

You will get a list 290 records long. Most of these records will have a 1 in the amt column (meaning there is only one records containing the BusinessEntityID referenced)

If you want to know how many BusinessEntityIDs in this table have more than one record, we just have to add the Having clause to our query.


having can only be used in conjunction with aggregate functions.

Min(), Max()

Min and Max return the minimum and maximum of whatever column you you place in the parenthesis




SQL: SELECT Functions (Top, Count, Distinct, Max, Min)

I introduced you to the SELECT statement in SQL in my last SQL posting: SQL: SELECT Statement

Now I am going to show you some tricks that will provide you with more functionality in your Select statements.

**note, in the examples below, I am working with the HumanResources.Employee table from the AdventureWorks2012 data set. If you want to follow along with the exercises, go to my previous SQL posting: SQL: SELECT Statement where I show you how to download and attach the AdventureWorks2012 database.


Top allows you to limit the number of rows that are returned. (note Top is a T-SQL (Microsoft’s version of SQL command. MySQL and Oracle use the term Limit).

Select Top 10 *
from HumanResources.Employee

Using Top 10 we are querying only the first 100 records from the Employee table. This is often useful when you are exploring a database and just want to see what is in a table. Using Select * in some larger tables without using the Top command to limit the amount of output can result in a query that could take minutes to hours to complete.

Note that the query above returns the first 10 records based usually on the order records were entered. If you want to top 10 newest employees you would need to add an Order By command.

Select Top 10 *
from HumandResources.Employee
order by HireDate desc

The order by command sorts the results by the dates in the column HireDate in descending order (desc) most recent to earliest. If I wanted to go the other way, I would use the command: order by HireDate asc – ascending(asc)


If yon want a count of the number of records in a table, SQL has a Count() function. The syntax is below.

Select Count(*)
from HumandResources.Employee

This returns the number of records in the table. If you wanted to know how many male employees are in the table, you can add a Where clause (I will cover Where more in-depth in later lessons)

Select Count(*)
from HumandResources.Employee
where Gender like 'M'


Notice how there are repeated JobTitles in the table below. What if you wanted a list of job titles without any repeating elements.


In this case, you will use the Distinct() function. Placing the column you want in the parenthesis and the query will return a non repeating list.

Select Distinct(JobTitle)
from HumandResources.Employee

Now, what if you want to know how many distinct job titles exist in the company. Well, we can use the Count() function in conjunction with Distinct()

Select Count(Distinct(JobTitle))
from HumandResources.Employee

The query above will return a count (number) of the distinct job titles found in the table

Max() and Min()

Max() and Min() do pretty much what you would expect them to do. They return the maximum or minimum value in column.

Select Max(HireDate) as Max, Min(HireDate) as Min
from HumandResources.Employee

** “as” command in a Select statement gives a name to result columns in query

If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more SQL content: SQL

SQL: SELECT Statement


SELECT is doubtlessly the most important command in SQL. A database is completely useless unless you can query the data it is holding. SELECT is how we query.

If you want to follow along:

You will need MS SQL Server installed. Here is a link to instructions: InstallSQL

Microsoft provides a great sample database known as Adventure Works. I am working with the 2012 release. You can download it here: AdventureWorksDownload

Once you have Adventureworks downloaded, you will need attach it to SQL Server.

Copy and paste the download file to somewhere you can find it. I recommend:

C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA

** the portion in red changes based on what version of SQL Server you downloaded. 


Open SQL Server Management Studio. Windows Key + R – Type SSMS


Login to your server


Go to Object Explorer – Right Click Database > Attach…


Click Add…


Select Adventureworks2012_Data.mdf


In the bottom window, highlight the log file and click Remove. Then Click Okay


Go back to Object Explorer. Expand Databases > AdventureWorks2012 > Tables


Just picking a table at random, I selected HumanResources.Employee. Let us see what is in the table. Click the New Query button up top, and a new query window will open.


Make sure you are working with the right database. Select AdventureWorks2012 from the drop down:


Cut and Paste the following into the new query window.

select *
from HumanResources.Employee

select JobTitle, BirthDate, Gender, HireDate
from HumanResources.Employee

select JobTitle as Job, BirthDate as DOB, Gender, HireDate as [Start Date]
from HumanResources.Employee

select HireDate as [Start Date], JobTitle as Job, BirthDate as DOB, Gender
from HumanResources.Employee
order by HireDate
select HireDate as [Start Date], JobTitle as Job, BirthDate as DOB, Gender
from HumanResources.Employee
where gender like ‘F’
order by HireDate


One cool thing about SQL code is that you can select only the  code you want to execute

Highlight the first group of code and click the ! Execute button



  • Select * – Select all fields in the table. “*” is a wildcard in SQL
  • from HumanResources.Employee – this is the table we want to work with
  • go – this indicates and end to the code block. It is not needed when executing code block by block (like we are doing here), but it is a good practice to get in the habit of adding it

Notice the results show every field in the table. It displays in a spreadsheet like table


Now Select the next code group  and click ! Execute



  • select JobTitle, BirthDate, Gender, HireDate – Here we are selecting four specific columns from the table
  • The rest is same as above

Results show the 4 columns selected above


Next block of code – click ! Execute


  • select JobTitle as Job, BirthDate as DOB, Gender, HireDate as [Start Date] – In this case, we use the “as” command to give the field names more User friendly names

Note the column names have changed


Next block of code – click ! Execute



  • select HireDate as [Start Date], JobTitle as Job, BirthDate as DOB, Gender – here we reordered the fields so that HireDate appears first
  • order by HireDate – this SQL for Sort by. The results will now be ordered by HireDate

Notice Start Date(HireDate) is now the first column and the records (rows) are sorted in order from oldest Start Date to newest


FInal block of code – click ! Execute



  • where gender like ‘F’ – here we are filtering the results to only ones were Gender is F.

Note now only Female employees are in the results


If you enjoyed this lesson, click LIKE below, or even better, leave me a COMMENT. 

Follow this link for more SQL content: SQL