SSRS: Introdution: 1rst Report

SSRS stands for Sql Server Reporting Service. This is Microsoft’s BI reporting tool integrated into their Sql Server platform. SSRS allows you to create, deploy, and manage reports from a server platform.

SSRS comes as part of the SQL Server suite. It is not available as part of Express, but if you buy the developers edition, you will get SSRS (as well as SSIS and SSAS). You may need to download and install it separately. You’ll find it under the title, SQL Server Data Tools (SSDT).

SSDT download link

In this tutorial, I will walk you through creating your first SSRS report. In this example, I created two SQL Server tables from the following Excel files:

JobDataSet

JobDesc

If you aren’t sure how to upload an Excel file to SQL Server, you can go to my tutorial on how to do it. The SQL Server upload tutorial was actually created to support the SSRS lessons.

SQL Server: Importing Excel File to SQL Server

To create a new SSRS project, open Visual Studios (the platform SSDT runs on) and go to File->New->Project

2018-04-06_8-56-18

Select Reporting Services -> Report Server Project

Name your project. I typically leave Create directory for solution checked.

2018-04-06_8-58-27.png

Your new “solution” will open up. I still haven’t really figured out why MS changes the name from Project when creating and opening it to Solution once you are working on it. I am sure someone, somewhere had a reason for it. That person is probably retired now and won’t return any email requests as to why he decided on the wording choice.  We’ll just have to chalk it up to another one of life’s mysteries. Like why is Regis Philbin famous?

But I digress…

Now our “solution” is open, we will see 3 sub folders. Shared Data Sources, Shared Datasets and Reports

2018-04-06_9-01-16

To create a new report, right click on Reports -> Add -> New Item.  Don’t click Add New Report unless you want to meet the world’s most unhelpful Wizard. I call him Gandalf the Drunk…

2018-04-06_9-01-47

After clicking add new item, click Report and name your report.

2018-04-06_9-02-32

Now your new report will open up.

2018-04-06_9-33-25.png

Data Source

So, before we can actually report on anything, we are going to need data. And in order to get data, we are going to need a Data Source. Now you will notice you have Shared Data Sources on the right and Data Sources on the left. Shared Data Sources, once established, can be used in all reports you create. If you create an “embedded” data source in the report, you will only be able to use it in that report.

For this example, we will make a shared data source.

Go to the right under Solution Explorer and right click Shared Data Sources. This time you can click Add New Data Source.

2018-04-06_9-33-48

This wizard is just Gandolf the Tipsy. While I harbor a general dislike for most wizards, this one isn’t completely useless at least.

First name your data set something you will remember later.

Select Type ->Microsoft SQL Server

Click Edit

2018-04-06_9-37-25

Copy and paste your server name in the 2nd box. In this example, my SQL Server is locally installed on my computer, so I just used localhost as my server name. Next select the Database you want to work with from the drop down. I created a database call SSRSTraining for this example

2018-04-06_9-40-23.png

Hit Test Connection, you should get a success message.

2018-04-06_9-40-42

Click Okay, you’ll new see your Data Source in the Solution Explorer

2018-04-06_9-41-33.png

Now go to the left and right click on Data Source for your report. Select Add New Data Source

2018-04-06_9-43-10

Name your data source and click on the Use shared data source reference radio button.

Pick you data source from the drop down. There should only be one to choose from

2018-04-06_9-43-36

Datasets:

Now click okay, go back to the left and right click on Dataset.

2018-04-06_9-44-02

Select New Data Set,

Name the Data Set

Select Use a dataset embedded in my report. This is generally how I do things, as Data Sources are usually reusable, Datasets are more designed for specific reports, so I leave them embedded.

Select your Data source from the drop down

For Query type we are using Text in this example

I am using a simple select all statement from the dbo.JobDataSet table I created

2018-04-06_9-47-16.png

If you click on fields in the upper right, you’ll now see the columns that will be feeding in from the query. You can rename the columns if you wish

2018-04-06_9-48-00.png

For now, let’s just click Okay, now you will see your data set expanded on the left side of your screen.

2018-04-06_9-48-46

Now to the far left, you should see the word Toolbox, click on that. This is a list of the tools we can work with within SSRS. Let’s start with a table

2018-04-06_9-49-09

Click on the table and drag it into the design window in the middle of the screen

2018-04-06_9-49-27.png

Now you can simply drag and drop columns from your dataset into your new table.

2018-04-06_9-49-53.png

By default, a table comes with 3 columns. You can add columns to this table by dragging a field over to the end of the table (note you will see a blue bar indicator letting you know your mouse is in the right spot. The column will populate to the right of the blue bar).

2018-04-06_9-50-24

This will add a new column to the end of your table. You can also use this method to insert a column in between existing table columns

2018-04-06_9-50-39

Now click on Preview button above your table

2018-04-06_10-23-01

When you do, you will get to see your table complete with data. Notice how the job column is too small for the job description.

2018-04-06_9-51-14

Step 5: Formatting

To fix the job column, let’s go back to our Design screen. Do this by clicking on the Design tab in the upper left.

2018-04-06_10-23-01

Now hover your mouse over the table until you get the double arrow icon seen below

2018-04-06_10-23-28

Once you have that, simply click and drag the column over to make it wider

2018-04-06_10-24-09

Since we are in the design window anyway, let us do a little more formatting. Click on the gray box to the left the header row to highlight the entire row. Now we can do things like Bold the font or change the background color

2018-04-06_10-24-49.png

Go back to the preview window to check out your results.

2018-04-06_10-25-55

There you have your very first SSRS Report from top to bottom.

Advertisements

SQL Server: Importing Excel File to SQL Server

Working with data inside a database has many advantages to working with data in an Excel spreadsheet. Luckily SQL Server makes it relatively easy to import data from an Excel File into the database.

We will be using the Excel files below:

JobDataSet

JobDesc

Let’s start by opening SSMS (SQL Server Management Studio)

Next, let’s create a database to hold these files. You don’t need to create a new database to import data, but I am building this tutorial as part of a series on SSRS, so I am building a new database for that purpose.

To create a new Database, right click on Databases on the upper left and click New Database…

2018-04-09_8-55-16.png

Now name your new database, we will just accept the defaults

2018-04-09_8-59-56

Now go your newly created database, right click, and go to Tasks

2018-04-09_9-00-48

From the Tasks sub-menu, select Import Data

2018-04-09_8-57-20.png

The import Wizard will open, simply click Next

2018-04-09_9-01-12.png

Next, select Excel from the drop down

2018-04-09_9-01-41.png

Next, click Browse

2018-04-09_9-02-13

Select your file

2018-04-09_9-05-24.png

Make sure First row has column names is checked and click Next

2018-04-09_9-07-04.png

On the next screen select SQL Server Native Client 11.0 (If you don’t have 11.0 – 10.0 should work)

2018-04-09_9-11-31.png

Make sure the database you want is selected and click next

2018-04-09_9-11-52.png

In this example, we are going to use Copy data from one or more tables or views

2018-04-09_9-12-29.png

Make sure to name the table you want to create in SQL Server (red arrow)

2018-04-09_9-13-23

If you click Preview you can get a look at what the new table will be loaded with

2018-04-09_9-15-34.png

Click Okay on the preview window and click Next on the Import Wizard

Leave the default Run Immediately checked and click next

2018-04-09_9-16-07.png

Review info on the next window and click Finish

2018-04-09_9-16-31

The package will run

Note the blue lettering will let you see how many rows transfer from the Excel file to SQL Server

2018-04-09_9-16-53.png

If you check your database, you will see your tables. (I loaded both spreadsheets in to the database for the upcoming SSRS tutorial)

2018-04-09_9-19-58.png

Finally, run a select * on your new table to see the data you transferred into SQL Server

2018-04-09_9-20-35.png

SQL: User Defined Functions

Functions in SQL work just like functions in most programming languages. If you aren’t familiar with a function, you should know that you are already using them without even knowing it.

Consider this for example:

Select Count(*)
From Table

COUNT() is a function. When you pass it rows from your query, it counts the rows and returns a value in the form of an integer. But COUNT() is a built in function, meaning it came as part of SQL Server, you did not have to create it.

SQL Server allows you the option of creating User Defined Functions, meaning functions you develop yourself. These are handy when you find yourself handling repeated tasks, such as date formatting or string manipulation. Instead of having to repeatedly code a complex command, you can just build a function once and then call on it whenever needed.

Let’s start with a basic example:

2018-04-03_14-23-36.png

Here I created a function called ADD_UP that accepts 2 numbers and outputs the SUM of the two numbers ( yes I know this already available as the built in function SUM(), but I want to start nice and easy)

Lets start by discussing the syntax. The basic syntax for creating a function is as follows:

CREATE FUNCTION name (@var data-type)
RETURNS data-type
AS
BEGIN
   RETURNS (some type of action)
END

In my example we are naming the function ADD_UP and supplying two integer variables: @NUM1 and @NUM2

CREATE FUNCTION ADD_UP (@NUM1 INT, @NUM2 INT)

Then we define the data-type our function will return. In this case, since we are adding 2 integers, our function will return an INT

RETURNS INT

Next we wrap out function in

AS
BEGIN
ENDS

Finally, we perform an action

RETURNS (@NUM1+ @NUM2)

Finally, when you want to call the Function, just use it in a select statement.

(**Note, user defined functions require you to use the schema prefix. Since I just used the default dbo schema, this example uses dbo.ADD_UP)

select dbo.ADD_UP(2,3) as ADDED

and as you see, we get 5 as our answer.

2018-04-03_14-23-36

Now, let’s try something different. Here we are going to work with a date function. In this example I built a function called MNTH that accepts one variable @DT – a date data-type and returns an Integer representing the month of the date passed to it.

Again, all I am really doing is duplicating the built-in function MONTH(), but I wanted to show different data-types

2018-04-03_14-32-54.png

(** getdate() is a built-in function that returns the current date. I ran this SQL on 4/2/2018, so it returns 4 as a result)

Now finally here is an example of how you might use a function in real life. Let’s say you have lots of reports that call for your date to be represented in MM-YYYY format. Instead of having to repeatedly type a complex date formatting, you can build it into a User Defined Function and pass a regular date to the function.

2018-04-03_14-34-36

If you are not familiar with cast(concat(month(@DT),’-‘,year(@DT))as varchar(8))) statement, I’ll break it down here:

Let’s go from the inside out:

concat is a string function meaning to concatenate or “string together” – so

concat(month(@DT),’-‘,year(@DT))

concat(4, ‘-‘, 2018)

4-2018

Cast allows us to convert the output of the concat statement into a string (varchar) data-type

cast(4-2018 as varchar(8)) = ‘4-2018’

Finally, if you want to find your functions after you create them, they are located under your database -> Programmability -> Functions

In this case, I only built Scalar-valued Functions, I’ll cover the other types in future lessons.

2018-04-03_14-39-21.png

 

SQL Server: Linked Servers

SQL Server Linked Servers feature allows you to query data from other databases (even non SQL Server databases) from within SQL Server.

While I believe large data transfer jobs are best handled using an ETL solution like SSIS, sometimes all you want is a just a couple hundred rows of data from another database and setting up an ETL jobs might be overkill.

In this example, I am going to set up a link to a Teradata Data Warehouse.  This will allow me to query data from the warehouse and use it in my SQL Server environment.

(Note that you can connect to any database, such as another SQL Server or Oracle. I am just using Teradata as an example)

First go into SSMS and connect to your SQL Server

Now, in Object Explorer go to Server Objects -> Linked Servers

2018-03-29_12-44-42

Right click on Linked Servers and select New Linked Server…

2018-03-29_12-45-49

For a Teradata connection, I am choosing Microsoft OLE DB Provider for ODBC Drivers

(on a side note, you will need to have the ODBC driver for Teradata installed for this to work)

2018-03-29_12-46-25

Linked Server = just whatever name you want to give your linked server – I chose EDW (short for Enterprise Data Warehouse)

Product Name = again free text, I just wrote Teradata to let people know what kind of product you are connecting too.

Data Souce: This is the server name you are trying to connect to

2018-03-29_12-48-05

Next, click Security tab.

I want to use my own credentials to connect to Teradata so I clicked Be Made Using This Security Context: ( the green arrow)

Type in your Username and password. Click Ok

2018-03-29_12-48-34

Now go to create a new query in SSMS (SQL Server Management Studio)

We will be using a function called OPENQUERY. The syntax is as follows

Select * from OPENQUERY(LINKEDSERVER, QUERY)

2018-03-29_12-50-47

SELECT * FROM OPENQUERY(EDW, 'SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE')

We can use the data returned from Teradata in SQL Server by simply putting it in a table or a temp table as shown below

SELECT * into ##TempTable FROM OPENQUERY(EDW, 'SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE')

We can also pass variables to our Teradata query. My preferred method is simply using a stored procedure.

2018-03-29_12-54-32

CREATE PROCEDURE [LINKED_SERVER_QUERY]
       @USER VARCHAR(8) = NULL
AS

declare @query varchar(8000) = 'Select * from OPENQUERY(EDW, ''SELECT USER_NM, LAST_NM, FIRST_NM FROM EMPLOYEE_TABLE WHERE USER_NM =  ''''' + @USER + ''''''')' 
exec(@query)

GO

exec [LINKED_SERVER_QUERY] @USER = 'blarson'

SSIS Lesson 3 – Derived Column

SSIS Lesson 3 – Derived Column

In this lesson, we are going to take the first basic step towards building a data warehouse. We are going to create a historical table from the table we built in Lesson 2. If you didn’t do Lesson 2, you can find the SQL needed to build the table at the bottom of this lesson.

If you remember the table we built in Lesson 2, we had a list of students and the work teams they were assigned to. In lesson 2, this list was designed to change every week, and we only wanted to know what the current week data was.  However, now Principal Skinner has asked if we can keep a listing of all the weeks so he can go back and see how many times someone was assigned to one of the work teams.

The table below is our starting table.

l3

What we want to do is copy the table’s contents into a new historical table and mark it somehow so we know what week each kid was on each team. We are going to do this by creating a new column called INSERT_DT which will record the date the data was added to the historical table.

First things first, let’s build our historical table in SQL Server

CREATE TABLE [dbo].[STUDENT_TEAM_HIST](
       [STUDENT_NM] [varchar](50) NULL,
       [STUDENT_AGE] [int] NULL,
       [STUDENT_TEAM] [varchar](50) NULL,
       [INSERT_DT] [datetime] NULL
)

Now go into SSIS and go to your Training Project

Create a new Package, I called mine Lesson_3

2018-03-16_14-18-13.png

Drag a Data Flow object onto your design window and click on it

2018-03-16_15-03-23.png

Now drag an OLE DB Source box over and click on it

2018-03-16_14-22-51.png

Select dbo.Student_Name from the drop down and click OK

2018-03-16_14-22-36.png

Now we are going to use a new object – Derived Column, drag it over and connect the Blue arrow from the bottom of the OLE DB Source box to it

Click on Derived Column: Purple Arrow (far left) – Name your new column, I named mine INSERT_DT

Green Arrow (Middle) : in the Expression column, type GETDATE()

Yellow Arrow (far right) : once you tab out of the Expression column, you will see your Data Type changed itself to be a timestamp

Red Arrow (Bottom) : hit Okay

2018-03-16_14-25-18

What we just did was create a new column called INSERT_DT which will use the GETDATE() method to get the current date and time. Next we are going to use this column we just created.

Grab and OLE DB Destination and drag it over. Connect it with the outbound arrow from the Derived Column box

2018-03-16_14-41-01

Select the new dbo.STUDENT_TEAM_HIST table we created and next click on Mappings

2018-03-16_14-41-31

You will see you now have four columns feeding into your new destination table. If you go back and look at your source table, you will see you still only have 3 columns there. The fourth column is the one we just derived.

Hit okay, now we are ready to run the package

2018-03-16_14-42-16.png

Right click on Package and hit Execute Package

2018-03-16_14-42-41.png

It should run. Notice the little comments telling you how many rows moved.

2018-03-16_14-43-16.png

Go to SQL Server and check your new table. You will see the INSERT_DT

2018-03-16_14-43-59

Okay, now to see the historical data in action

Let’s change Micah’s team from Red to Blue in our STUDENT_TEAM table

update [dbo].[STUDENT_TEAM]
set STUDENT_TEAM = 'Blue'
where STUDENT_NM = 'Micah'

select *
from dbo.STUDENT_TEAM

2018-03-16_14-45-27.png

Go back to SSIS, hit the red box at the top to end debugging mode

2018-03-16_15-16-29.png

Right click and execute the package again

Now run your query again. Note the two different teams Micah was on are listed. Along with the timestamps so you can tell when he was on which team.

2018-03-16_14-46-29

SQL Query to build table from lesson 2

CREATE TABLE [dbo].[STUDENT_TEAM](
       [STUDENT_NM] [varchar](50) NULL,
       [STUDENT_AGE] [int] NULL,
       [STUDENT_TEAM] [varchar](50) NULL
)

insert into dbo.STUDENT_TEAM
       (STUDENT_NM, STUDENT_AGE, STUDENT_TEAM)
VALUES
       ('Bob', 15, 'Red'),
       ('Claire', 17, 'Blue'),
       ('Chris', 17, 'Blue'),
       ('Candice', 14, 'Red'),
       ('Holly', 16, 'Blue'),
       ('Micah',15,'Red'),
       ('Stephanie', 16, 'Blue'),
       ('Joshua', 14, 'Red'),
       ('Sherrie', 18, 'Blue')

SSIS: Lesson 2 Import data from CSV into database

SSIS: Lesson 2

Import data from CSV into database

In the first lesson we exported data from a table in our SQL Server database to a CSV file. Today we are going to do just the opposite, we are going to bring data from a CSV file into our database.

I like to try to provide close to real world scenarios when I make my tutorials. Obviously in the simpler intro lessons, that can be difficult. That being said, for this lesson, imagine you manage a database for a school. Every week 9 students are selected to be on either the Red Team (Hall Safety Monitors) or the Blue Team (Lunch Room Aids). Every Friday, the Vice Principal Smith picks the student names at random and puts them into a CSV file that is emailed out to all the teachers. You point out that if you could just get the names into your database, you could put the information on the school’s main Intranet page, so the teachers don’t need to download and open the same CSV file 300 times a week.

So our task is to upload the new weekly CSV file into the database every Friday at the end of the school day.

Here is this week’s CSV file:

Lesson_2

First, before heading into SSIS, we need to create a destination table in our database for the CSV file. Here is the file we want to import into our database, you’ll note it has three columns, two columns are strings and one is an integer:

2018-03-12_12-28-28

So go into SSMS (SQL Server Management Studio) and run the following query to create the table:

CREATE TABLE STUDENT_TEAM (
 STUDENT_NM varchar(50),
 STUDENT_AGE int,
 STUDENT_TEAM varchar(50))
GO

This code will create an empty table with 3 columns named STUDENT_NM, STUDENT_AGE, STUDENT_TEAM.

Now you can minimize SSMS and open up Visual Studios Data Tools

Once open, go into your Training Project and create a new SSIS Package. I’m naming this one Lesson_2

2018-03-12_12-22-19

Now grab an Execute SQL Task from the SSIS Toolbox and drag it to the Design window.

Notice the red arrow at the bottom, there is already a Connection set up for the Sandbox database. This is because when we made this connection in Lesson_1, we made it by clicking on the Connection Managers in the Solution Explorer window.

2018-03-12_12-24-46.png

1

This creates a (Project) connection – one that can be seen in every package. If you only want your connection to exist inside a single package (this becomes important as the number of packages you have grows, otherwise you’d have thousands of connections in Solution Explorer) – you can right click anywhere inside the Connection Managers box at bottom of the design window.

2018-03-12_12-32-41.png

But this particular package will work fine just using the existing connection, we can move on without creating a new one.

Double Click on your Execute SQL Task icon in the design window.

In the window that pops up, you can rename your Task, provide a description of the task if you would like. This does come in handy when you have to go back to old package. It makes it easier to understand what is going on.

Down at the green arrow in the picture, you’ll have to selection a Connection from the drop down. Luckily were currently only have one to choose from.

2018-03-12_12-38-43

Next go to SQL Statement and click on the little ‘…’ box that appears in the right corner.

2018-03-12_12-41-59

In the pop up window, this is where you enter your SQL Query. In this case were going to delete all data from table in this box. The reason for this step is so that last week’s students won’t still be in the table on Monday morning. We only want the current students in our database.

2018-03-12_14-18-53

Now click Okay on this window and click OKAY again to get back to the Design Window

Next drag over a Data Flow Task. Connect the Create Table task by dragging the arrow over from the Create Table task. Note the direction of the arrow, this is the order in which tasks will run.

2018-03-12_12-51-28

Now open Data Flow Task. In the SSIS Toolbox, go to Other Sources and drag a Flat File Source over to the Design Window.

Double click on your new Flat File Source box and a new window will open. Click on New… next to Flat File Connection Manager

2018-03-12_12-59-37.png

On the new window, click Browse

2018-03-12_13-01-20

Browse to your CSV File. I created a folder called SSIS Training to make it easy to store all my files.

2018-03-12_13-02-21

If you click on Columns in the left window, you can see what the CSV file contains.

2018-03-12_13-09-10

Finally, before existing this window, Click back on General and make sure Column names in the first data row is checked.

2018-03-12_13-10-32

Now go back to the Design window and drag an OLE DB Destination box onto the design window. Connect the blue arrow from the Flat File Source to the OLE DB Destination

2018-03-12_13-13-52.png

Now double click on OLE DB Destination

Select STUDENT_TEAM from Name of table or the view

2018-03-12_14-20-06_1

Now Click on Mappings

2018-03-12_14-20-06.png

Everything should line up nicely since we named the table columns to match the CSV file.

2018-03-12_14-26-33.png

Now click okay to go back to the Design Window, Click on Control Flow in the upper left to get back to the main page of the package

2018-03-12_14-27-44.png

Right click the package and click Execute Package

2018-03-12_14-28-06.png

If all goes well, you should end up with two green checks

2018-03-12_14-33-37

If this happens, check to make sure your CSV file is not currently open, that can cause errors. If so, close the file and try executing the package again

2018-03-12_14-31-00

Finally, check the table in your SQL Server to see if it populated

2018-03-12_14-35-30.png

If you want to see how it would work the next week, go into the CSV file and change the data. When you run the package again, the names in the SQL table will change too.

 

SQL: Check if table exists

To check if a table exists in SQL Server, you can use the INFORMATION_SCHEMA.TABLES table.

Running the following code, produces the results below:

USE SANDBOX 
GO;

Select *
from INFORMATION_SCHEMA.TABLES
GO;

1

You can use this table with an IF THEN clause do determine how your query responds whether or not a table exists.

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
           WHERE TABLE_NAME = N'employee_id')
BEGIN
  PRINT 'Yes'
END

2

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
           WHERE TABLE_NAME = N'employee_ids')
BEGIN
  PRINT 'Yes'
END

ELSE

BEGIN
  PRINT 'No'
End

3

One of the more common uses I find for this when I need to create a table in a script. I want to make sure a table with same name doesn’t already exist, or my query will fail. So I write a query like the one below.

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
           WHERE TABLE_NAME = N'employee_id')
BEGIN
  drop table employee_id
END

ELSE

BEGIN
 CREATE TABLE employee_id (
        emp_nm nvarchar(30) not null,
             emp_id nvarchar(8),
             b_emp_id nvarchar(8)
             PRIMARY KEY(emp_nm) );

INSERT INTO employee_id
       (emp_nm, emp_id)
VALUES
       ('Bob', 'A1234567'),
       ('Lisa', 'A1234568')

INSERT INTO employee_id
       (emp_nm, b_emp_id)
VALUES
       ('Priyanka', 'B1234567');
End

SQL: Coalesce

Coalesce is a simple but useful SQL function I use quite often. It returns the first non-null value in a list of values.

A common real world application for this function is when you are trying to join data from multiple sources. If you work for a company known for acquiring other companies, your HR tables are bound to be full of all kinds of inconsistencies.

In this example below, you will see a snip-it of an HR table created by merging tables from two companies. The first two employees have an emp_id and the last one has a b_emp_id. The last one, Priyanka, is an employee from a newly acquired company, so her emp_id is from the HR table of the newly acquired company.

c1

So if you just want to merge the 2 emp_id columns into one, you can use the coalesce() function.

The syntax for coalesce is: coalesce (col1,col2,….)  as alias

Coalesce takes the first non-null. In the example below, we check emp_id first, if it has a value, we use that. If emp_id is null, we look for the value in b_emp_id.

select emp_nm,
coalesce (emp_id, b_emp_id) as emp_id
from employee_id

c2

If you want to try this for yourself, here is the code to build the table out for you.

CREATE TABLE employee_id (
        emp_nm nvarchar(30) not null,
             emp_id nvarchar(8),
             b_emp_id nvarchar(8)
             PRIMARY KEY(emp_nm) );
INSERT INTO employee_id
       (emp_nm, emp_id)
VALUES
       ('Bob', 'A1234567'),
       ('Lisa', 'A1234568')
INSERT INTO employee_id
       (emp_nm, b_emp_id)
VALUES
       ('Priyanka', 'B1234567');

 

XML Parsing: Advanced SQL

If you want to play along with the lesson, use the following code to create the table I will be using:

CREATE TABLE employee_lang (
        emp_nm nvarchar(30) not null,
             lang nvarchar(255),
             PRIMARY KEY(emp_nm) );
INSERT INTO employee_lang
       (emp_nm, lang)
VALUES
       ('Bob', 'Python, R, Java'),
       ('Lisa', 'R, Java, Ruby, JavaScript'),
       ('Priyanka', 'SQL, Python');

 

XML Parsing

XML parsing is a SQL method for separating string data found in a single field in a table. Look at the table below:

1

This table has two columns, emp_nm (employee name), lang (programming languages the employee is proficient in). Notice that the column lang has multiple values for each record. While this is easily human readable, if you want it to more machine usable (think Pivot tables or R statistical analysis), you are going to want your data to look more this this:

2

Notice now the table has the same two columns, but the lang column now only has 1 word per record. So the question is, how do you do this using SQL?

XML parsing

The code used to “parse” out the data in the lang column is below:

SELECT emp_nm
,SUBSTRING(LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))),1,75) AS lang
FROM
(
SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM employee_lang
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

Let’s break it down a little first.

SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM  employee_lang

What we are doing with the code about is using a CAST and REPLACE functions to convert the elements in column lang into a XML line.  See results below:

3

For Bob, this is the result of the CAST/REPLACE code on the lang column

<XMLRoot><RowData>Python</RowData><RowData> R</RowData><RowData> Java</RowData></XMLRoot>

This is how the code works from the inside out.

REPLACE()

select REPLACE (‘SQL ROCKS’, ‘S’, ‘!’)

If you run the above code, it will return !QL ROCK!

Replace is saying — everywhere an S is in the string, replace it with a !

CAST()

The cast function is casting the string as an XML data point. This is needed for the next section, when we unpack the XLM string

Cross Apply / Value

SELECT emp_nm
,m.n.value('.[1]','varchar(8000)')
FROM
(
SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM employee_lang
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

So without going into way too much detail, you can find pages dedicated to Cross Apply and Value(), I’ll give you the quick breakdown.

First notice we aliased our CAST() statement as x.  So if you look at the CROSS APPLY, you will see we are asking to look at x.nodes. Had we aliased our cast y, we would be looking at y.nodes.

Now look at m(n) at the end of the line. This is like an array or list in programming languages. Keep that in mind for the next step.

Inside the x.nodes() is ‘/XMLRoot/RowData’ , this is telling us to assign to m everything following /XMLRoot and to iterate n by everything following /RowData, so for Bob:

m(n=1) = Python

m(n=2) = R

m(n=3) = Java

Now we pass that array m(n) to our Value() method.  Hence m.n.value().  Note m and n were just letters I picked, you can use others.

Inside m.n.value(‘.[1]’,’varchar(8000)’) was used as it should pretty much cover any size string you may have to deal with.

So the final iteration simply add as SUBSTRING to clean it up and get rid of white space

4

SQL: Case Statement

The Case Statement is SQL’s version of IF/THEN logic statements.

In this example, I am using the AdventureWorks2012 database, which is free to download. You do not need this to follow the lesson, but if you want to play along, follow this link: SQL: SELECT Statement

 

Let’s start by looking at our data, using the following Select Statement:

SELECT 
 [NationalIDNumber]
 ,[LoginID]
 ,[JobTitle]
 ,[BirthDate]
 ,[MaritalStatus]
 ,[Gender]
 ,[HireDate] 
 FROM [AdventureWorks2012].[HumanResources].[Employee]

Here is the output

sqlagg3

Now what if you don’t really care about the particular employee birthday, you simply want to know who is over 40 and who is 40 or younger.

To do that, we need a mechanism to determine an age above or below 40 from the birth date. In most programming languages, you would simply use an If / Then statement. In SQL, we need to use a Case Statement.

The Case Syntax is simple. Let’s assume a variable A, set to either 1 or 0 representing True or False.

Case
     when A = 1 Then "True"
     when B = 0 Then "False"
End

Note the statement is started with the keyword Case and closed with the keyword End.

In the middle, the logic is controlled using When and Then. When <condition> Then  <Action>

If needed, you can append an Else to your statement to cover conditions not specified in your statements.

Case
     when A = 1 Then "True"
     when B = 0 Then "False"
     else "Neither" 
End

 

 

Now, let’s apply it to our case above. Notice the Case statement is put right into my Select statement just as if it were a common field.

select loginID,
 [NationalIDNumber]
 ,[LoginID]
 ,[JobTitle]
 ,
 case 
    when BirthDate < '1977-01-01' then 'Over40'
    when BirthDate >= '1977-01-01' then 'Under40'
 end
 ,[MaritalStatus]
 ,[Gender]
 ,[HireDate]
 FROM [AdventureWorks2012].[HumanResources].[Employee]

Here are my results. You will note a new column with values Over40 or Under40

2017-01-09_11-29-42.jpg

But what does that column mean? Let’s name it to make it clear.  After the End keyword add as Over40

 select loginID,
 [NationalIDNumber]
 ,[LoginID]
 ,[JobTitle],
 case 
        when BirthDate < '1977-01-01' then 'Over40'
         when BirthDate >= '1977-01-01' then 'Under40'
 end as Over40,
 [MaritalStatus]
 ,[Gender]
 ,[HireDate]
 FROM [AdventureWorks2012].[HumanResources].[Employee]

Here, now the column has a name.

2017-01-09_11-32-09.jpg