XML Parsing: Advanced SQL

If you want to play along with the lesson, use the following code to create the table I will be using:

CREATE TABLE employee_lang (
        emp_nm nvarchar(30) not null,
             lang nvarchar(255),
             PRIMARY KEY(emp_nm) );
INSERT INTO employee_lang
       (emp_nm, lang)
VALUES
       ('Bob', 'Python, R, Java'),
       ('Lisa', 'R, Java, Ruby, JavaScript'),
       ('Priyanka', 'SQL, Python');

 

XML Parsing

XML parsing is a SQL method for separating string data found in a single field in a table. Look at the table below:

1

This table has two columns, emp_nm (employee name), lang (programming languages the employee is proficient in). Notice that the column lang has multiple values for each record. While this is easily human readable, if you want it to more machine usable (think Pivot tables or R statistical analysis), you are going to want your data to look more this this:

2

Notice now the table has the same two columns, but the lang column now only has 1 word per record. So the question is, how do you do this using SQL?

XML parsing

The code used to “parse” out the data in the lang column is below:

SELECT emp_nm
,SUBSTRING(LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))),1,75) AS lang
FROM
(
SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM employee_lang
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

Let’s break it down a little first.

SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM  employee_lang

What we are doing with the code about is using a CAST and REPLACE functions to convert the elements in column lang into a XML line.  See results below:

3

For Bob, this is the result of the CAST/REPLACE code on the lang column

<XMLRoot><RowData>Python</RowData><RowData> R</RowData><RowData> Java</RowData></XMLRoot>

This is how the code works from the inside out.

REPLACE()

select REPLACE (‘SQL ROCKS’, ‘S’, ‘!’)

If you run the above code, it will return !QL ROCK!

Replace is saying — everywhere an S is in the string, replace it with a !

CAST()

The cast function is casting the string as an XML data point. This is needed for the next section, when we unpack the XLM string

Cross Apply / Value

SELECT emp_nm
,m.n.value('.[1]','varchar(8000)')
FROM
(
SELECT
emp_nm
,CAST('<XMLRoot><RowData>' + REPLACE([lang],',','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
FROM employee_lang
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

So without going into way too much detail, you can find pages dedicated to Cross Apply and Value(), I’ll give you the quick breakdown.

First notice we aliased our CAST() statement as x.  So if you look at the CROSS APPLY, you will see we are asking to look at x.nodes. Had we aliased our cast y, we would be looking at y.nodes.

Now look at m(n) at the end of the line. This is like an array or list in programming languages. Keep that in mind for the next step.

Inside the x.nodes() is ‘/XMLRoot/RowData’ , this is telling us to assign to m everything following /XMLRoot and to iterate n by everything following /RowData, so for Bob:

m(n=1) = Python

m(n=2) = R

m(n=3) = Java

Now we pass that array m(n) to our Value() method.  Hence m.n.value().  Note m and n were just letters I picked, you can use others.

Inside m.n.value(‘.[1]’,’varchar(8000)’) was used as it should pretty much cover any size string you may have to deal with.

So the final iteration simply add as SUBSTRING to clean it up and get rid of white space

4

4 thoughts on “XML Parsing: Advanced SQL

  1. Ryan

    Ben,

    You are a gentleman and a scholar.

    I’ve been trying to figure this out for years and called it “reverse-concatenate” because I didn’t know what the term was!

    This was a huge help to me. Thank you!

Leave a Reply