R: Connecting to SQL Server Database

You can query data from a SQL Server database directly from R using the RODBC package.

install.packages("RODBC")

First you need to form a connection

library(RODBC)
##connection string
cn <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=localhost; database=SSRSTraining;trusted_connection=yes;")

We use the odbcDriverConnect() function. Inside we pass a connection = value

Driver = {SQL Server Native Client 11.0};  — this is based on the version of SQL Server you have

server=localhost;  — I used localhost because the SQL Server was on the same computer I was working with. Otherwise, pass the server name

database=SSRSTraining; — name of database I want to work with

trusted_connection=yes; — this means I am able to pass my Windows credentials.

If you don’t have a trusted connect pass the user Id and password like this

uid = userName; pwd = Password;

Note each parameter is separated by a semicolon

Query the database

> ##passes query to SQL Server
> df <- sqlQuery(cn, "select * FROM [SSRSTraining].[dbo].[JobDataSet]")
> head(df)

    Name              Job Hours Complete
1  Sally Predictive Model     1        n
2 Philip      Maintanence    10        n
3    Tom    Ad-hoc Report    12        y
4    Bob             SSRS     3        y
5 Philip         Tableau      7        n
6    Tom         Tableau      9        n

using sqlQuery() – pass through the connection string (cn) and enclose your query in ” ”

 

 

 

Advertisements

R: Importing Excel Files

To work with Excel files in R, you can use the readxl library

install.packages(“readxl”)

Use the read_excel() function to read and Excel workbook

> library(readxl)
>
> ## read excel file
> df1 <- read_excel("r_excel.xlsx")
> head(df1)
# A tibble: 6 x 4
  Name   Job              Hours Complete
  <chr>  <chr>            <dbl> <chr>  
1 Sally  Predictive Model    1. n      
2 Philip Maintanence        10. n      
3 Tom    Ad-hoc Report      12. y      
4 Bob    SSRS                3. y      
5 Philip Tableau             7. n      
6 Tom    Tableau             9. n

By default, read_excel() reads only the first sheet in the Excel file. To read other sheets using the sheet key word.

> ## read  sheet 2
> df2 <- read_excel("r_excel.xlsx", sheet =2)
> head(df2)

# A tibble: 6 x 2
  Animal Num_Legs
  <chr>     <dbl>
1 Dog          4.
2 Duck         2.
3 Snake        0.
4 Horse        4.
5 Spider       8.
6 Human        2.

The next example, we are reading range B2 – C6 on sheet 1 (same as Excel’s range function)

> ## read sheet 1, range B2 - C6
> df3 <- read_excel("r_excel.xlsx", sheet =1, range = "B2:C6")
> df3
# A tibble: 4 x 2
  `Predictive Model`   `1`
  <chr>              <dbl>
1 Maintanence          10.
2 Ad-hoc Report        12.
3 SSRS                  3.
4 Tableau               7.

In this last example, we are importing only the first 4 rows on sheet 2

> ## read sheet 2, first 4 rows only
> df4 <- read_excel("r_excel.xlsx", sheet = 2, n_max = 4)
> df4
# A tibble: 4 x 2
  Animal Num_Legs
  <chr>     <dbl>
1 Dog          4.
2 Duck         2.
3 Snake        0.
4 Horse        4.

Python: Simulate Blockchain Mining

In my earlier tutorial, I demonstrated how to use the Python library hashlib to create a sha256 hash function. Now, using Python, I am going to demonstrate the principle of blockchain mining. Again using BitCoin as my model, I will be trying to find a nonce value that will result in a hash value below a predetermined target.

We will start by simply enumerating an integer through our sha 256 hash function until we find a hash with 4 leading zeros.

I used a while loop, passing the variable “y” through my hashing function each time the loop runs. I then inspect the first 4 digits [:4] of my hash value. If the first four digits equal 0000 then I exit the loop by setting the found variable to 1

(*note, a hash value is a string – hence the need for quotes around ‘0000’)

2018-04-27_13-30-18

As you can see in the version above, it took 88445 iterations to find an acceptable hash value

Now, using the basic example of a blockchain I gave in an earlier lesson, let’s simulate mining a block

2018-04-04_12-39-34

You’ll see, I am now combining the block number, nonce, data, and previous hash of my simulated block and passing it through my encryption function.  Just like in BitCoin, the only value I change per iteration is the Nonce. I keep passing my block through the hashing function until I find the Nonce that gives me a hash below the target.

2018-04-27_13-36-38

Now, let’s lower the target value to 6 leading zeros. This should result in a longer runtime to get your hash

2018-04-27_13-37-16

To measure the run time difference, let’s add some time stamps to our code

2018-04-27_13-39-31.png

So, I am using the timestamp function twice. D1 will be our start time, d2 will be our end time, and I am subtracting d1 from d2 to get our elapsed time. In the example below, my elapsed time was 5 secs

2018-04-27_13-50-43

Now, let’s bump the target down to 7 leading zeros. Now this brings my elapsed computing time to 20 minutes. That is a considerable commitment of resources. You can see why they call it a “proof of work” now.

2018-04-27_14-26-04

 

Blockchain: Mining

This was, without a question, the most confusing aspect of blockchain to me when I first tried to learn it. Maybe it was because all the buzz around BitCoin mining. I usually find that when a topic becomes popular, misinformation spreads just as fast, if not faster, than actual information.

In an attempt to explain this topic I am going to be using BitCoin as my primary example. I am doing this mainly because I am sure that is where most of you first heard of the term blockchain mining. I will however, begin with a more generic explanation.

First off, let us separate blockchain mining from the idea of financial reward. Yes, in BitCoin, the miner who successfully mines a block is rewarded with BitCoin, but that is not a required part of a blockchain environment.

Mining, at its simplest form, just means successfully adding a new block to the blockchain.

So why can’t you just add a new block like you add a new element to a list or array in any other programming language? This has to do with the decentralized nature of blockchain. Since there is no centralized authority, blockchains rely on group consensus to verify that a new block added to the chain is valid. Keep in mind, this group is made up of anonymous nodes all over the world who do not know each other, and have no good reason to trust one another.

So, if you were to add a new block to the chain, the rest of the chain would need a mechanism in place that gives them time to update their copy of the chain and verify the block you added is good. So effectively we need a pause button.

How do we do that fairly and in a random manner that doesn’t allow for gaming of the system? We force anyone wishing to add a new block to show what is known as proof of work. Proof of work is proof that the person wishing to add the new block has completed a complex mathematical puzzle that required some level of resource allocation on their end. This effectively means they have skin in the game. There is a “cost” associated on their end. This “cost” deters the typical denial of service or blasting type attachk.

In BitCoin the mining process goes like this:

  1. Bob wants to buy a guitar from Philip. They agree on a price of .2 BitCoin and through what is known as their BitCoin wallet a transaction is sent out to the BitCoin universe. A BitCoin wallet is a software client that the person trading with BitCoin uses. From the end user’s point of view, it is a lot like a Paypal type interaction.
  2. Once the transaction is out in the ether, nodes (computers) known as BitCoin miners verify the transaction using a set of established rules built into the BitCoin software they are all running. These rules verify sender and receiver public keys, timestamps, etc. Once verified, the transaction is put into a queue.
  3. Next the bitcoin miners build a new block. The goal is to make their new block the next block in the chain. This block with contain the following items:
    1. Block number – just the next number in the line
    2. Previous Hash – this the hash value of the current last block in the chain
    3. Transactions – they will fill the block with verified transactions from the queue
    4. A timestamp
    5. The Nonce

On a side note – BitCoin only releases a block every 10 minutes. This is a design decision made by the makers of BitCoin, this is not a requirement of Blockchain

Once they have all of that information, they pass their block into a hashing algorithm and get a hash for their new block – This is where mining gets interesting

You see, not any old hash will do. In order for the block to be added to the chain, the hash must be less than the target hash.

The target hash is established by the initial creator of the blockchain. It can be whatever they chose. In the case of BitCoin, the target hash is actually programmed to drop lower and lower every couple thousand blocks.

Okay, so what is the big deal about trying to get below the hash?

Let’s consider this problem using a 6 digit number. (keep in mind that sha256 uses a 64 digit hexadecimal number – much bigger).

So we all agree that a hash is nothing more than creating a random number. Sure it looks funny in hexadecimal, but it can be converted to a base 10 number we all understand.

So, when I pass anything into my imaginary 6 digit hashing function, I can expect 1 million results, (if I only consider positive numbers) – 000,000 to 999,999

Now, let’s set a target for our hash. Let’s say that in order for a hash to be under the target, it must have a first digit of 0. I know what you are thinking – big deal!  Well, actually, it kind of is. We have just gone from 1 million possible hashes to 100,000.  We have effectively made 9 out of 10 available hashes invalid. We can now only accept 000,000 – 099,999. Now, let’s make the first 3 digits of our target 0.  So now we have gone from 1 million possible hashes to 1000. 1000/1,000,000 = .001. So every time you run a hashing function, you now have a 0.1% chance of getting a valid hash.

Now think about our 64 digit hexadecimal number. It has a maximum value of 18,446,744,073,709,551,615 when converted to base 10. So If I made a requirement for the first 6 digits to be 0, we are now looking at: 99,999,999,999,999/18,446,744,073,709,551,615

ca

Every time you run your hashing function, you have a 0.00054% chance of getting one below the target. So as you can see, in order to get a hash below the target, you will most likely have to use a brute force approach.

And that is what miners do. In fact the current difficultly related to finding a hash below the target in BitCoin has led to hundreds of thousands of nodes teaming up together to find a good hash in a brute force manner.

The way they find the hash is through the Nonce. You see, if you pass the word “dog” through my imaginary hash, you will get 123456. If you pass it to my hash 1 million times, you will get 123456 1 million times. So how it working in Blockchain, is we add the nonce to the hash, so dog+1 will give us 879602 and dog+2 will give us 258665. We will repeat this, enumerating the nonce until we get a good hash: dog+28549 gives us 000587. The nonce that gave us that hash 28549 – is called the golden nonce.

And in the world of BitCoin, once you have a golden nonce, you have your proof of work. You can now place your block on the chain.

It is called a proof of work, because other nodes on the blockchain  can very quickly verify your work. They just pass your block with your golden nonce and they will get a hash of 000587, which is below the target. When enough nodes have verified your proof of work ( a consensus), your block becomes locked into the chain, it can no longer be changed or removed.

Okay, so what about the reward BitCoin miners get? Well, built into the BitCoin algorithm, each block mined with worth an ever decreasing number of BitCoin. At the time of this writing, I believe a mined block is worth 12.5 BitCoins. On top of that, the miners also get transaction fees from the people wishing to buy something with BitCoin. In the example above, Bob might offer up .1 BitCoin as a transaction fee as encouragement for some miner to put their transaction into the Blockchain. Keep in mind, there is no bank here, no centralized entity. So the transaction fee is the fee you pay help encourage total strangers to use their time and electricity to verify and move your transaction into the blockchain.

I promise to create another lesson diving deeper into transaction fees, until then, I hope this offers up at basic understanding of how mining works.

Python: Create a Blockchain Hash Function

If you are at all like me, reading about a concept is one thing. Actually practicing it though, that helps me to actually understand it. If you have been reading my blockchain tutorial, or if you came from an outside tutorial, then you have undoubtedly read enough about cryptographic hashes.

Enough reading, let’s make one:

( if you are unfamiliar with crytographic hashes, you can reference my tutorial on them here: Blockchain: Cryptographic Hash )

For this example, I am using the Anaconda Python 3 distribution.

Like most things in Python, creating a hash is as simple as importing a library someone has already created for us. In this case, that library is: hashlib

So our first step is to import hashlib

import hashlib

Now let us take a moment to learn the syntax require to create a cryptographic hash with hashlib. In this example, I am using the SHA 256 hashing algorithm. I am using this because it is the same algorithm used by BitCoin.

Here is the syntax used

hashlib.sha256(string.encode()).hexdigest()

To understand the syntax, we are calling the hashlib method sha256(): hashlib.sha256()

Inside the brackets, we are entering the string we want to encode in the hash. Yes it must be a string for this function to work.

Still inside the brackets we use the method .encode() to (surprise, surprise) ENCODE the string as a hash

Finally, I added the method .hexdigest() to have the algorithm return our hash in hexadecimal format. This format will help in understanding future lessons on blockchain mining.

So in the example below, you can see that I assigned the variable x the string ‘doggy’. I then passed x to our hash function. The output can be seen below.

2018-04-19_15-52-05.png

Now a hash can hold much more than just a simple word. Below, I have passed the Gettysburg Address to the hashing function.

(**note the ”’ ”’ triple quotes. Those are used in Python if your string takes up more than one line **)

2018-04-19_15-54-59

Now I try passing a number. You will notice I get an error.

2018-04-19_15-55-41.png

To avoid the error, I turn the integer 8 into a string with the str() function

2018-04-19_15-56-17.png

Below I concatenation a string and an integer.

2018-04-19_15-57-13.png

Last I want to show the avalanche effect of the hash function.

2018-04-19_15-58-01

By simply changing the first letter from an uppercase T to a lowercase t the hash changes completely. This is a requirement for hashing functions. If the hash did not change dramatically from a small change to the string, it would be easy to reverse engineer the hash. This is known as the avalanche effect.

2018-04-19_15-58-27

 

SSRS: Drill down

Let’s make the report at little more readable and interactive.

We are going to do a drill down on the names in our report. This mean when the report first opens, you will only see one row for each name. If you want to see the details of that person, you can click on a + by their name and the details for that person will pop down like an accordion.

Below your report design window, you should see another window called Row Groups

Go to the area marked Details (green arrow) and right click – Group Properties…

1

Click Visibility > Hide > Display can be toggled.

(Note I am selecting Name1 not Name. If you remember Name was the column we hid. Name1 is the column created when we created the Parent Group)

2.png

Now when you open your report, you will see it is in drill down format.

3.png

SSRS: Grouping

In our last lesson we made the basic report seen below:

2018-04-06_10-25-55

**Note, you can follow the link below to the first lesson:

SSRS: Introdution: 1rst Report

Grouping

Our data is currently in table form, but is otherwise still nothing more than a raw data dump. Let’s make our report a little nicer with some grouping. Right click on the data row (not header) in your table. Mouse over to Add Group and select Parent Group…

2018-04-06_10-34-35.png

In the new window, select [Name] from the drop down.

2018-04-06_10-35-12

Click Add group header and Add group footer boxes. Now click OK

2018-04-06_10-35-27.png

Now a group has been added to your report.

2018-04-06_10-36-00

If you click on preview, you will now see the table is grouped by Names

2018-04-06_10-36-27.png

But you will notice we now have 2 columns showing the Name, one – our new grouping column and the other – the original column. This is redundant. To get rid of it, go back to Design, right click on the second name column and select Column Visibility…

2018-04-06_10-36-52

When the new window opens up, click Hide.

2018-04-06_10-37-11.png

Now when you look at the report now, you will see the second Name column is now hidden.

2018-04-06_10-38-16.png

Running Total:

Next, let’s set up a running total for Hours spent on each job. To do so, right click on the Hours text box and select Add Total

2018-04-06_10-39-04

Now when we go back to preview, we will see at total in the group footer for each person

2018-04-06_10-39-36

Average:

Now what if we wanted an average instead? Right click on the textbox that says [Sum(Hours)] and select Expression

2018-04-06_10-40-15

You can just type = Avg(Fields!Hours.Value) in the expression builder box, but if you don’t know the code, you can use information in the boxes below. As you can see in the example below, if you go to Common Functions > Aggregate you will see the code for lots of functions like Average, count, standard deviation.

2018-04-06_10-41-25

Now when you go to preview, you will see an average.

But now we have a new problem. If you are trying to average something like work hours, odds are you will not need to go out to 10 decimal places. So a number like 8.272727272727 is pretty much ridiculous for a report like this.

2018-04-06_10-42-19.png

Now go to Number > Number and set the Decimal Places to 2

2018-04-06_10-43-26.png

So if you look at it again, you will see you only have 2 decimal points now.

2018-04-06_10-46-41

 

 

Blockchain: P2p distributed networks

Another concept you need to be familiar with to understand Blockchain is the concept of a P2p distributed network. This is the physical architecture that allows Blockchain to work and provides a blockchain with redundancy.

The P2P in P2P distributed network stands for peer to peer, indicating a network comprised of peers. What do I mean by that? The majority of computer networks in place right now are what is known as Server/Client networks.

In the picture below, the center square represents a server, with the boxes around it representing nodes (or in your case, the computer/tablet/phone you are reading this on). When you want to view a web page, you send a request from your node to a server. The server will then respond with the requested information.

p2p1.png

While this works well, it does have some drawbacks.   First off, since the server is central point of communication and the holder of all the information (webpages, databases, etc), if the server goes down, the network is essentially dead. This is the whole idea behind one of the more successful methods of cyber attack – the Denial of Service in which a server is targeted with more traffic than it can handle, shutting it down. You will often see it called a Distributed Denial of Server of DDoS as in order to hit the server with enough traffic to break it, hackers use multiple computers synced to deliver enough requests to the server all at the same time, overwhelming it. In other words, the attack is “distributed” across multiple computers.

p2p2

Blockchain does not use a server client approach. Instead it uses a P2p or peer to peer network to function.  In a peer to peer, the nodes (laptops, tablets, etc) all talk directly to each other. Instead of a server holding all the information, the data that makes up the blockchain is instead distributed across all the different nodes. So the more nodes that are part of the blockchain, the more copies of it that exist.

p2p3.png

This works great for redundancy as even if you took out a couple of nodes in the network, it would still be able to function as normal. And as we will see a future lesson, even if you were able to hack in and corrupt the blockchain in one of the nodes, the fact that copies of it exist on all the other nodes protect it from corruption.

This architecture is also at the heart of philosophy around crypto-currencies like BitCoin. Unlike traditional banking systems that have centralized management, BitCoin is programmed with a deflationary policy that no one person (or group of person) can control. This is an interesting economic experiment unfurling before all of us. And I for one am curious to see how it plays out. While organizations like the Fed (in the United States) have done a relatively good job of keeping the US dollar strong, poor centralized economic management has spelled disaster in countries like Venezuela and Zimbabwe.

I’ll discuss more on the economic theory behind BitCoin in later lessons. It is enough for now for you to know that the P2P decentralized nature of the network is all part of the design in ensuring no one person can make such drastic changes.

If you want to learn more about networks and how they interact, here are some further resources:

https://www.khanacademy.org/computing/computer-science/internet-intro

https://www.techsoup.org/support/articles-and-how-tos/networking-101-concepts-and-definitions

http://www.tomshardware.com/reviews/local-area-network-wi-fi-wireless,3020.html

SSRS: Introdution: 1rst Report

SSRS stands for Sql Server Reporting Service. This is Microsoft’s BI reporting tool integrated into their Sql Server platform. SSRS allows you to create, deploy, and manage reports from a server platform.

SSRS comes as part of the SQL Server suite. It is not available as part of Express, but if you buy the developers edition, you will get SSRS (as well as SSIS and SSAS). You may need to download and install it separately. You’ll find it under the title, SQL Server Data Tools (SSDT).

SSDT download link

In this tutorial, I will walk you through creating your first SSRS report. In this example, I created two SQL Server tables from the following Excel files:

JobDataSet

JobDesc

If you aren’t sure how to upload an Excel file to SQL Server, you can go to my tutorial on how to do it. The SQL Server upload tutorial was actually created to support the SSRS lessons.

SQL Server: Importing Excel File to SQL Server

To create a new SSRS project, open Visual Studios (the platform SSDT runs on) and go to File->New->Project

2018-04-06_8-56-18

Select Reporting Services -> Report Server Project

Name your project. I typically leave Create directory for solution checked.

2018-04-06_8-58-27.png

Your new “solution” will open up. I still haven’t really figured out why MS changes the name from Project when creating and opening it to Solution once you are working on it. I am sure someone, somewhere had a reason for it. That person is probably retired now and won’t return any email requests as to why he decided on the wording choice.  We’ll just have to chalk it up to another one of life’s mysteries. Like why is Regis Philbin famous?

But I digress…

Now our “solution” is open, we will see 3 sub folders. Shared Data Sources, Shared Datasets and Reports

2018-04-06_9-01-16

To create a new report, right click on Reports -> Add -> New Item.  Don’t click Add New Report unless you want to meet the world’s most unhelpful Wizard. I call him Gandalf the Drunk…

2018-04-06_9-01-47

After clicking add new item, click Report and name your report.

2018-04-06_9-02-32

Now your new report will open up.

2018-04-06_9-33-25.png

Data Source

So, before we can actually report on anything, we are going to need data. And in order to get data, we are going to need a Data Source. Now you will notice you have Shared Data Sources on the right and Data Sources on the left. Shared Data Sources, once established, can be used in all reports you create. If you create an “embedded” data source in the report, you will only be able to use it in that report.

For this example, we will make a shared data source.

Go to the right under Solution Explorer and right click Shared Data Sources. This time you can click Add New Data Source.

2018-04-06_9-33-48

This wizard is just Gandolf the Tipsy. While I harbor a general dislike for most wizards, this one isn’t completely useless at least.

First name your data set something you will remember later.

Select Type ->Microsoft SQL Server

Click Edit

2018-04-06_9-37-25

Copy and paste your server name in the 2nd box. In this example, my SQL Server is locally installed on my computer, so I just used localhost as my server name. Next select the Database you want to work with from the drop down. I created a database call SSRSTraining for this example

2018-04-06_9-40-23.png

Hit Test Connection, you should get a success message.

2018-04-06_9-40-42

Click Okay, you’ll new see your Data Source in the Solution Explorer

2018-04-06_9-41-33.png

Now go to the left and right click on Data Source for your report. Select Add New Data Source

2018-04-06_9-43-10

Name your data source and click on the Use shared data source reference radio button.

Pick you data source from the drop down. There should only be one to choose from

2018-04-06_9-43-36

Datasets:

Now click okay, go back to the left and right click on Dataset.

2018-04-06_9-44-02

Select New Data Set,

Name the Data Set

Select Use a dataset embedded in my report. This is generally how I do things, as Data Sources are usually reusable, Datasets are more designed for specific reports, so I leave them embedded.

Select your Data source from the drop down

For Query type we are using Text in this example

I am using a simple select all statement from the dbo.JobDataSet table I created

2018-04-06_9-47-16.png

If you click on fields in the upper right, you’ll now see the columns that will be feeding in from the query. You can rename the columns if you wish

2018-04-06_9-48-00.png

For now, let’s just click Okay, now you will see your data set expanded on the left side of your screen.

2018-04-06_9-48-46

Now to the far left, you should see the word Toolbox, click on that. This is a list of the tools we can work with within SSRS. Let’s start with a table

2018-04-06_9-49-09

Click on the table and drag it into the design window in the middle of the screen

2018-04-06_9-49-27.png

Now you can simply drag and drop columns from your dataset into your new table.

2018-04-06_9-49-53.png

By default, a table comes with 3 columns. You can add columns to this table by dragging a field over to the end of the table (note you will see a blue bar indicator letting you know your mouse is in the right spot. The column will populate to the right of the blue bar).

2018-04-06_9-50-24

This will add a new column to the end of your table. You can also use this method to insert a column in between existing table columns

2018-04-06_9-50-39

Now click on Preview button above your table

2018-04-06_10-23-01

When you do, you will get to see your table complete with data. Notice how the job column is too small for the job description.

2018-04-06_9-51-14

Step 5: Formatting

To fix the job column, let’s go back to our Design screen. Do this by clicking on the Design tab in the upper left.

2018-04-06_10-23-01

Now hover your mouse over the table until you get the double arrow icon seen below

2018-04-06_10-23-28

Once you have that, simply click and drag the column over to make it wider

2018-04-06_10-24-09

Since we are in the design window anyway, let us do a little more formatting. Click on the gray box to the left the header row to highlight the entire row. Now we can do things like Bold the font or change the background color

2018-04-06_10-24-49.png

Go back to the preview window to check out your results.

2018-04-06_10-25-55

There you have your very first SSRS Report from top to bottom.