SQL Server: Importing Excel File to SQL Server

Working with data inside a database has many advantages to working with data in an Excel spreadsheet. Luckily SQL Server makes it relatively easy to import data from an Excel File into the database.

We will be using the Excel files below:

JobDataSet

JobDesc

Let’s start by opening SSMS (SQL Server Management Studio)

Next, let’s create a database to hold these files. You don’t need to create a new database to import data, but I am building this tutorial as part of a series on SSRS, so I am building a new database for that purpose.

To create a new Database, right click on Databases on the upper left and click New Database…

2018-04-09_8-55-16.png

Now name your new database, we will just accept the defaults

2018-04-09_8-59-56

Now go your newly created database, right click, and go to Tasks

2018-04-09_9-00-48

From the Tasks sub-menu, select Import Data

2018-04-09_8-57-20.png

The import Wizard will open, simply click Next

2018-04-09_9-01-12.png

Next, select Excel from the drop down

2018-04-09_9-01-41.png

Next, click Browse

2018-04-09_9-02-13

Select your file

2018-04-09_9-05-24.png

Make sure First row has column names is checked and click Next

2018-04-09_9-07-04.png

On the next screen select SQL Server Native Client 11.0 (If you don’t have 11.0 – 10.0 should work)

2018-04-09_9-11-31.png

Make sure the database you want is selected and click next

2018-04-09_9-11-52.png

In this example, we are going to use Copy data from one or more tables or views

2018-04-09_9-12-29.png

Make sure to name the table you want to create in SQL Server (red arrow)

2018-04-09_9-13-23

If you click Preview you can get a look at what the new table will be loaded with

2018-04-09_9-15-34.png

Click Okay on the preview window and click Next on the Import Wizard

Leave the default Run Immediately checked and click next

2018-04-09_9-16-07.png

Review info on the next window and click Finish

2018-04-09_9-16-31

The package will run

Note the blue lettering will let you see how many rows transfer from the Excel file to SQL Server

2018-04-09_9-16-53.png

If you check your database, you will see your tables. (I loaded both spreadsheets in to the database for the upcoming SSRS tutorial)

2018-04-09_9-19-58.png

Finally, run a select * on your new table to see the data you transferred into SQL Server

2018-04-09_9-20-35.png

Advertisements

SQL: User Defined Functions

Functions in SQL work just like functions in most programming languages. If you aren’t familiar with a function, you should know that you are already using them without even knowing it.

Consider this for example:

Select Count(*)
From Table

COUNT() is a function. When you pass it rows from your query, it counts the rows and returns a value in the form of an integer. But COUNT() is a built in function, meaning it came as part of SQL Server, you did not have to create it.

SQL Server allows you the option of creating User Defined Functions, meaning functions you develop yourself. These are handy when you find yourself handling repeated tasks, such as date formatting or string manipulation. Instead of having to repeatedly code a complex command, you can just build a function once and then call on it whenever needed.

Let’s start with a basic example:

2018-04-03_14-23-36.png

Here I created a function called ADD_UP that accepts 2 numbers and outputs the SUM of the two numbers ( yes I know this already available as the built in function SUM(), but I want to start nice and easy)

Lets start by discussing the syntax. The basic syntax for creating a function is as follows:

CREATE FUNCTION name (@var data-type)
RETURNS data-type
AS
BEGIN
   RETURNS (some type of action)
END

In my example we are naming the function ADD_UP and supplying two integer variables: @NUM1 and @NUM2

CREATE FUNCTION ADD_UP (@NUM1 INT, @NUM2 INT)

Then we define the data-type our function will return. In this case, since we are adding 2 integers, our function will return an INT

RETURNS INT

Next we wrap out function in

AS
BEGIN
ENDS

Finally, we perform an action

RETURNS (@NUM1+ @NUM2)

Finally, when you want to call the Function, just use it in a select statement.

(**Note, user defined functions require you to use the schema prefix. Since I just used the default dbo schema, this example uses dbo.ADD_UP)

select dbo.ADD_UP(2,3) as ADDED

and as you see, we get 5 as our answer.

2018-04-03_14-23-36

Now, let’s try something different. Here we are going to work with a date function. In this example I built a function called MNTH that accepts one variable @DT – a date data-type and returns an Integer representing the month of the date passed to it.

Again, all I am really doing is duplicating the built-in function MONTH(), but I wanted to show different data-types

2018-04-03_14-32-54.png

(** getdate() is a built-in function that returns the current date. I ran this SQL on 4/2/2018, so it returns 4 as a result)

Now finally here is an example of how you might use a function in real life. Let’s say you have lots of reports that call for your date to be represented in MM-YYYY format. Instead of having to repeatedly type a complex date formatting, you can build it into a User Defined Function and pass a regular date to the function.

2018-04-03_14-34-36

If you are not familiar with cast(concat(month(@DT),’-‘,year(@DT))as varchar(8))) statement, I’ll break it down here:

Let’s go from the inside out:

concat is a string function meaning to concatenate or “string together” – so

concat(month(@DT),’-‘,year(@DT))

concat(4, ‘-‘, 2018)

4-2018

Cast allows us to convert the output of the concat statement into a string (varchar) data-type

cast(4-2018 as varchar(8)) = ‘4-2018’

Finally, if you want to find your functions after you create them, they are located under your database -> Programmability -> Functions

In this case, I only built Scalar-valued Functions, I’ll cover the other types in future lessons.

2018-04-03_14-39-21.png

 

SQL Server: Linked Servers

SQL Server Linked Servers feature allows you to query data from other databases (even non SQL Server databases) from within SQL Server.

While I believe large data transfer jobs are best handled using an ETL solution like SSIS, sometimes all you want is a just a couple hundred rows of data from another database and setting up an ETL jobs might be overkill.

In this example, I am going to set up a link to a Teradata Data Warehouse.  This will allow me to query data from the warehouse and use it in my SQL Server environment.

(Note that you can connect to any database, such as another SQL Server or Oracle. I am just using Teradata as an example)

First go into SSMS and connect to your SQL Server

Now, in Object Explorer go to Server Objects -> Linked Servers

2018-03-29_12-44-42

Right click on Linked Servers and select New Linked Server…

2018-03-29_12-45-49

For a Teradata connection, I am choosing Microsoft OLE DB Provider for ODBC Drivers

(on a side note, you will need to have the ODBC driver for Teradata installed for this to work)

2018-03-29_12-46-25

Linked Server = just whatever name you want to give your linked server – I chose EDW (short for Enterprise Data Warehouse)

Product Name = again free text, I just wrote Teradata to let people know what kind of product you are connecting too.

Data Souce: This is the server name you are trying to connect to

2018-03-29_12-48-05

Next, click Security tab.

I want to use my own credentials to connect to Teradata so I clicked Be Made Using This Security Context: ( the green arrow)

Type in your Username and password. Click Ok

2018-03-29_12-48-34

Now go to create a new query in SSMS (SQL Server Management Studio)

We will be using a function called OPENQUERY. The syntax is as follows

Select * from OPENQUERY(LINKEDSERVER, QUERY)

2018-03-29_12-50-47

SELECT * FROM OPENQUERY(EDW, 'SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE')

We can use the data returned from Teradata in SQL Server by simply putting it in a table or a temp table as shown below

SELECT * into ##TempTable FROM OPENQUERY(EDW, 'SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE')

We can also pass variables to our Teradata query. My preferred method is simply using a stored procedure.

2018-03-29_12-54-32

CREATE PROCEDURE [LINKED_SERVER_QUERY]
       @USER VARCHAR(8) = NULL
AS

declare @query varchar(8000) = 'Select * from OPENQUERY(EDW, ''SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE WHERE USER_NM =  ''''' + @USER + ''''''')' 
exec(@query)

GO

exec [LINKED_SERVER_QUERY] @USER = 'blarson'

SQL: Coalesce

Coalesce is a simple but useful SQL function I use quite often. It returns the first non-null value in a list of values.

A common real world application for this function is when you are trying to join data from multiple sources. If you work for a company known for acquiring other companies, your HR tables are bound to be full of all kinds of inconsistencies.

In this example below, you will see a snip-it of an HR table created by merging tables from two companies. The first two employees have an emp_id and the last one has a b_emp_id. The last one, Priyanka, is an employee from a newly acquired company, so her emp_id is from the HR table of the newly acquired company.

c1

So if you just want to merge the 2 emp_id columns into one, you can use the coalesce() function.

The syntax for coalesce is: coalesce (col1,col2,….)  as alias

Coalesce takes the first non-null. In the example below, we check emp_id first, if it has a value, we use that. If emp_id is null, we look for the value in b_emp_id.

select emp_nm,
coalesce (emp_id, b_emp_id) as emp_id
from employee_id

c2

If you want to try this for yourself, here is the code to build the table out for you.

CREATE TABLE employee_id (
        emp_nm nvarchar(30) not null,
             emp_id nvarchar(8),
             b_emp_id nvarchar(8)
             PRIMARY KEY(emp_nm) );
INSERT INTO employee_id
       (emp_nm, emp_id)
VALUES
       ('Bob', 'A1234567'),
       ('Lisa', 'A1234568')
INSERT INTO employee_id
       (emp_nm, b_emp_id)
VALUES
       ('Priyanka', 'B1234567');

 

SSIS Tutorial: Introduction

SSIS: Introduction

SQL Server Integration Services is the ETL tool for the Microsoft SQL Server platform. SSIS allows you to take data from various sources (from Excel files, to text files, to other databases, etc), and bring it all together.

If you are new to concept of ETL, SSIS is great place to start. Click here to learn about ETL

If you are well versed in another ETL platform, SSIS is a relatively easy system to get up to speed on.

SSIS comes as part of SQL Server Data Tools, which you should be able to install with your SQL Server installation software. You will need SQL Server Standard, Developer or above editions to run SQL Server Data Tools. SQL Server Express does not support Data Tools.

For some unknown reason, once it is installed, you will not find a program called SSIS. Maybe the engineers at Microsoft think this is funny, but in order to run SSIS you will need to look for the following program instead.

2018-02-26_14-26-37

I’m using 2015, but for most of what I am doing here, any version should be compatible.

When you launch data tools, you will notice it run in Visual Studios

2018-03-06_10-37-28

To start an SSIS job, you will either need to open an existing project, or create a new one. In this example, I will create a new one.

File ->New -> Project  (or Ctrl+Shift+N)
2018-03-06_10-37-54

Inside the Business Intelligence Templates, select Integration Services. I always just select Integrations Service Project, I’m not a big fan of the Wizard

2018-03-06_10-38-37

Next step: Name your project

2018-03-06_10-40-08

Now you are in SSIS. Here are the 3 main windows you will be starting with.

From right to left:

SSIS Toolbox

2018-03-06_10-43-40

Package Designer

2018-03-06_10-44-00

Solution Explorer

2018-03-06_10-44-19

Packages

Inside solutions, packages are the collections of jobs or scripts found inside a project. It is inside the package that you will build out your ETL job.

For our first lesson, we are just going to build a simple package. Your new solution should have opened with a new package when it opened. If it is, right click on the green arrow to rename it, if not, right click on the red arrow to create a new package.

 

2018-03-06_10-44-19_1

I renamed my package First_Package, you can name your package whatever you choose.

This first package will simply just display a pop up message. In the SSIS Toolbox, go to Script Task and drag it into the package designer window.

2018-03-06_13-09-46

2018-03-06_13-25-47

Double click on the Script Task Box in the Design window

Note in my example, I have C# set as the scripting language. The other default option is Visual Basic. If you are more comfortable with that, feel free to use it. I prefer C# mainly because I spent more time working with it.

You don’t need to know any C# for this tutorial. This is literally a single line of code assignment. I will cover more C# in the future.

Click Edit Script… to continue

2018-03-06_13-26-45

Note, this step can take a minute or so for the script editor to appear. Don’t panic if your computer appears locked up.

Once the script editor opens, don’t panic by all the code you see. Luckily Microsoft has done most of the ground work for us. We only need to scroll down until you see:

public void Main()

Now place your cursor below //TODO: Add your code here

The code you need to type for this script is:

MessageBox.Show(“This is my first SSIS Package”);

2018-03-06_15-12-02

Now I know you will be looking for a save button. But again, our friends at Microsoft might have been drinking when they coded this. Instead, just click the upper right X to close out the whole window –

I know – why would they do it that way? how much effort would a save and close button have cost them? I don’t know. It just is what it is. Just click the X and move on with your life.

2018-03-06_15-13-33

Now click the OK button – again I guess Save was too much to type

2018-03-06_15-14-08

Now right click on your package (green arrow) and click Execute Package

2018-03-06_10-44-19_1

Your message will pop up in a Message box window

2018-03-06_15-17-49

Click OK on the messagebox and Click the red square to end the package execution.

2018-03-06_15-18-09

Congrats, you have just built and executed your first SSIS Package.

 

XML Parsing: Advanced SQL

If you want to play along with the lesson, use the following code to create the table I will be using:

CREATE TABLE employee_lang (
        emp_nm nvarchar(30) not null,
             lang nvarchar(255),
             PRIMARY KEY(emp_nm) );
INSERT INTO employee_lang
       (emp_nm, lang)
VALUES
       ('Bob', 'Python, R, Java'),
       ('Lisa', 'R, Java, Ruby, JavaScript'),
       ('Priyanka', 'SQL, Python');

 

XML Parsing

XML parsing is a SQL method for separating string data found in a single field in a table. Look at the table below:

1

This table has two columns, emp_nm (employee name), lang (programming languages the employee is proficient in). Notice that the column lang has multiple values for each record. While this is easily human readable, if you want it to more machine usable (think Pivot tables or R statistical analysis), you are going to want your data to look more this this:

2

Notice now the table has the same two columns, but the lang column now only has 1 word per record. So the question is, how do you do this using SQL?

XML parsing

The code used to “parse” out the data in the lang column is below:

SELECT emp_nm
,SUBSTRING(LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))),1,75) AS lang
FROM
(
SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM employee_lang
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

Let’s break it down a little first.

SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM  employee_lang

What we are doing with the code about is using a CAST and REPLACE functions to convert the elements in column lang into a XML line.  See results below:

3

For Bob, this is the result of the CAST/REPLACE code on the lang column

<XMLRoot><RowData>Python</RowData><RowData> R</RowData><RowData> Java</RowData></XMLRoot>

This is how the code works from the inside out.

REPLACE()

select REPLACE (‘SQL ROCKS’, ‘S’, ‘!’)

If you run the above code, it will return !QL ROCK!

Replace is saying — everywhere an S is in the string, replace it with a !

CAST()

The cast function is casting the string as an XML data point. This is needed for the next section, when we unpack the XLM string

Cross Apply / Value

SELECT emp_nm
,m.n.value('.[1]','varchar(8000)')
FROM
(
SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM employee_lang
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

So without going into way too much detail, you can find pages dedicated to Cross Apply and Value(), I’ll give you the quick breakdown.

First notice we aliased our CAST() statement as x.  So if you look at the CROSS APPLY, you will see we are asking to look at x.nodes. Had we aliased our cast y, we would be looking at y.nodes.

Now look at m(n) at the end of the line. This is like an array or list in programming languages. Keep that in mind for the next step.

Inside the x.nodes() is ‘/XMLRoot/RowData’ , this is telling us to assign to m everything following /XMLRoot and to iterate n by everything following /RowData, so for Bob:

m(n=1) = Python

m(n=2) = R

m(n=3) = Java

Now we pass that array m(n) to our Value() method.  Hence m.n.value().  Note m and n were just letters I picked, you can use others.

Inside m.n.value(‘.[1]’,’varchar(8000)’) was used as it should pretty much cover any size string you may have to deal with.

So the final iteration simply add as SUBSTRING to clean it up and get rid of white space

4

SQL: 4 Types of Joins

SQL Joins come in 4 major types. In T-SQL – Microsoft’s SQL language, unless otherwise specified, a Join defaults to Inner Join. The other three are Right Outer Join, Left Outer Join, and Full Outer Join.

Inner Join

Inner joins match up data that exists in both tables. In the example below, only A and C found in the result table since B,D don’t exist in TableB and E,F don’t exist in TableA.

Syntax

Select *
from TableA as A Join TableB as B
on A.ID = B.ID

innerjoin

Full Outer Join

A Full Outer Join puts all elements in both tables together. Where information is missing, Nulls will appear

Select *
from TableA as A Full Outer Join TableB as B
on A.ID = B.ID

fullouterjoin

Left Outer Join

Left Outer Join takes all data from the Left Table and joins up matching data from the Right table. Left and Right is determined based on the table’s position in the Where clause

Select *
from TableA as A Left Outer Join TableB as B
on A.ID = B.ID

leftouterjoin.jpg

Right Outer Join

Right Outer Join is the exact opposite of the Left Outer Join, with all elements from the Right Table being used and only matching elements from the Left Table.

Select *
from TableA as A Right Outer Join TableB as B
on A.ID = B.ID

rightouterjoin.jpg

SQL: Intro to Joins

More often than not, the information you need does not all reside in one table. To query information from more than one table, use the Join command.

In our example, we are going to be using AdventureWorks2012. If you want to follow along, but do not have SQL Server or AdventureWorks2012 installed, click in the following links:

We are going to be using the following Tables for our Join

Notice they both have a matching field – BusinessEntityID – This field will be important when creating a join.

sqlJoin7

We can call up both tables individually through separate SQL Queries

sqljoin9

However, to combine the two tables into a single result, you will need a Join.

Inner Join

SQL uses Inner Joins by default. In this lesson, I am only going to focus on inner joins. Inner joins work by focusing on columns with matching information. In the example below, the columns with matching information are the Name columns.

The combined result only contains rows who have a match in both source tables. Notice that Sally and Sarah do not appear in the result table.

sqljoin10.jpg

The syntax is as follows:

Select *
from Table A join Table B
on TableA.Match = TableB.Match

sqljoin11.jpg

In our example, the matching column is BusinessEntityID in both tables.

To clean the code up a little bit, we can assign aliases to our tables using the “as” keyword. We are using E and S as our aliases. We can now use those aliases in our “on” clause.

sqlJoin12

And, just as in a regular select statement, we can choose which columns we want. You do however, have to identify which table the columns you are requesting are from. Notice how I did this using the E and S aliases.

sqljoin13.jpg

 

MS SQL Server: Database Diagrams

MS SQL Server comes with a function called Database Diagrams. You can use it to produce ERDs (Entity Relationship Diagrams).

Start by right clicking on Database Diagrams > New Database Diagram in the Object Explorer

sqlJoin

If you get the following error, you database does not have an owner assigned.

sqlJoin1.jpg

To assign an owner, right click on your database name and select Properties

sqljoin2

Under Select a page > FilesBy Owner select radio button> Browse

sqlJoin3

Select your user name from the list and then hit Okay until you are back to the main screen.

sqlJoin4.jpg

Right click Database Diagrams > New Database Diagram

sqlJoin

Click Yes

sqlJoin5

Now select some tables. In this example, I am choosing Employee and SalesPerson

sqljoin6

The diagram shows the tables you picked, and the data fields. It also shows connections.

sqlJoin7.jpg

Notice the tables do not all connect directly with each other. Another point to take note of is the small yellow keys located in from of some fields. These let you know that those fields are the primary key(s) for their respective table. The primary key is the unique identifier for the table.

sqlJoin8

SQL: Aggregates, Group By, and Having

The Where clause is great. It helps you filter out items from a table, leaving only the records you want. But sometimes you are not looking for the actual records. Instead, sometimes you only want to know how many records.

In this example, I will using the [HumanResources].[EmployeeDepartmentHistory] from AdventureWorks2012:

sqlAggr.jpg

Count and Group By

The Count(*) function is used to count the number of rows.

Now, when we combine count(*) with a Group By clause, we can count how many records exist for each element in our group by clause.

All non aggregate columns in the select statement need to be in the group by function or you will get an error.

The code below counts the amount of rows containing each departmentID

sqlAggr1

Having

Run the following query:

select BusinessEntityID, count(*) as amt
from [HumanResources].[EmployeeDepartmentHistory]
group by BusinessEntityID

You will get a list 290 records long. Most of these records will have a 1 in the amt column (meaning there is only one records containing the BusinessEntityID referenced)

If you want to know how many BusinessEntityIDs in this table have more than one record, we just have to add the Having clause to our query.

sqlAggr2

having can only be used in conjunction with aggregate functions.

Min(), Max()

Min and Max return the minimum and maximum of whatever column you you place in the parenthesis

sqlAgg4