Top 5 Data Visualization Tools on the Market

On February 16, 2023 By Ben Larson Ph.D.In Analytics, Qlik, Tableau, VizualizationLeave a comment

Data visualization is an essential aspect of data analysis and communication. Data visualization tools enable users to transform data into charts, graphs, and other visual representations that are easier to understand and interpret. In this article, we will look at some of the top data visualization tools available in the market.

Photo by Luke Chesser on Unsplash

Tools are listed in no particular order

1. Tableau

Tableau is a powerful data visualization tool that enables users to create interactive dashboards, reports, and charts. It has a user-friendly interface, which allows users to drag and drop data to create visuals quickly. Tableau is known for its robust features, including data blending, mapping, and real-time collaboration. It also has a vibrant community, which makes it easy to find resources and solutions to any challenge.

2. Power BI

Power BI is a popular data visualization tool developed by Microsoft. It enables users to create interactive dashboards and reports that can be shared across an organization. Power BI has a user-friendly interface and offers a wide range of features, including data modeling, forecasting, and natural language processing. It also integrates seamlessly with other Microsoft products like Excel, SharePoint, and Teams.

3. QlikView

QlikView is a business intelligence tool that enables users to create interactive visualizations, reports, and dashboards. It has an intuitive interface that allows users to drag and drop data and create charts and graphs quickly. QlikView also offers advanced features like data modeling, association analysis, and collaboration capabilities.

4. D3.js

D3.js is a data visualization library that allows users to create custom visualizations using web standards like HTML, CSS, and SVG. It provides a high degree of flexibility, allowing users to create unique visualizations that match their specific needs. D3.js has a steep learning curve, but its versatility and customization options make it a favorite among developers.

5. Google Data Studio

Google Data Studio is a free data visualization tool that enables users to create interactive reports and dashboards. It integrates with Google Analytics and other Google products, making it easy to gather and analyze data. Google Data Studio also offers collaboration capabilities, allowing teams to work together on reports and dashboards.

Conclusion

In conclusion, data visualization tools play a crucial role in helping organizations make sense of their data. The tools mentioned above are just a few of the many available in the market. When choosing a data visualization tool, it’s essential to consider factors like ease of use, features, and cost. Ultimately, the right tool will depend on the specific needs of your organization.

Python: Create a Word Cloud

On June 17, 2022 By Ben Larson Ph.D.In Analytics, NLP, Python, VizualizationLeave a comment

Word Clouds are a simple way of visualizing word frequency in a corpus of text. Word Clouds typically work by displaying frequently used words in a text corpus, with the most frequent words appearing in larger text.

Here is the data file I will be using in this example if you want to follow along:

movies Download

As far as libraries go, you will need pandas, matplotlib, os, and wordcloud. If you are using the Anaconda python distribution you should have all the libraries but wordcloud. You can install it using PIP or Conda install.

Lets start by loading the data

import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os

#Set working directory
os.chdir('C:\\Users\\blars\\Documents')

#Import CSV
df = pd.read_csv("movies.csv")

#First look at the Data
df.head()

** Note: if you are using Jupyter notebooks to run this, add %matplotlib inline to the end of the import matplotlib line, otherwise you will not be able to see the word cloud

import matplotlib.pyplot as plt %matplotlib inline

We can use df.info() to look a little closer at the data

We have to decide what column we want to build our word cloud from. In this example I will be using the title column, but feel free to use any text column you would like.

Let look at the title column

As you can see, we have 20 movie titles in our data set. Next thing we have to do is merge these 20 rows into one large string

corpus = " ".join(tl for tl in df.title)

The code above is basically a one line for loop. For every Row in the Column df.title, join it with the next row, separating by a space ” “

Now build the word cloud

wordcloud = WordCloud(width=640, height=480, max_words=20).generate(corpus)

You can change the width and height, number of words that will appear. Play around with the numbers, see how it changes your output

Finally, let’s chart it, so we can see the cloud

plt.imshow(wordcloud,interpolation="bilinear")
plt.axis("off")
plt.show()

interpolation = “bilinear” is what lets the words so sideways and up and down

plt.axis(“off”) gets rid or axis markers (see below)

You can also go back to the word cloud and change the background color

wordcloud = WordCloud(width=640, height=480, background_color = 'white', max_words=25).generate(corpus)
plt.imshow(wordcloud,interpolation="bilinear")
plt.axis("off")
plt.show()

R Shiny: Introduction

On March 19, 2018March 19, 2018 By Ben Larson Ph.D.In R, VizualizationLeave a comment

Shiny is an R Studio application that allows you to how host your R code on a website. This is a great method for creating a dashboard or deploying a predictive model. The best part is, it is free. There are, however, paid services associated with R Studio and Shiny if you want to get professional support and such.

In this lesson, we are going to learn to build a Shiny server application step by step.

Let’s start at the beginning. First off, to use Shiny, you need to have R Studio installed on your computer. Once you have that, you will need to install the Shiny package

install.packages(‘shiny’)

Next, we are going to learn about the three parts of a Shiny server application.

Below is the basic framework of a Shiny app:

library(shiny)

ui <- fluidPage("Test")
server <- function(input, output) {}
shinyApp(ui = ui, server = server)

Let’s break this down:

ui: ui is the user interface. This is the code that makes up the web page the user will launch and interact with. Anyone familiar with HTML, this is the basic markup. This is where you set the wording, font, colors, what kind of object will be on the page, etc

server: server is the brains of the operation. This is where you load and manipulate any data being used in your page. Again, if you are a web developer, think Java Script here.

shinyApp: this is the code that launches your App.

If you run the code above, you will see something like this

2018-03-19_11-06-29

Note the IP address in the upper left. If you copy this into a browser of your choice, it will run there too.

You will see the same address in the R Console. As long as the R Shiny server is running, you will see this Listening command. For those not familiar, the address 127.0.0.1 is called a pingback address. This is the internal address of the NIC card on your machine. This means the server is running on the machine you are on. The :4654 is the port my computer grabbed. This can be different for you. The easiest way for me to explain the port without going too deep into network theory is: think of it like a TV channel. From your computer, you could technically be running hundreds of these R Shiny servers. The port (TV channel) is what tells the computer which one of these web pages you want to visit.

If you happen to be on a network, you can go to another computer and type in your host computers network IP address (you can get it by going to a cmd console and typing ipconfig). It will be the address I have the arrows pointing to in my example

Then, using my example, type http://10.171.46.91:4654 into the second computer’s browser and you should see your Shiny website.

Step 2

Let’s see if we can make our website more interesting, let’s start easy, we will format the text.

First close out your shiny window to end the session.

Change your ui line to match the one below.

ui <- fluidPage(h1(“Test”))

Now run your code again

library(shiny)
ui <- fluidPage(h1(“Test”))
server <- function(input, output) {}
shinyApp(ui = ui, server = server)

2018-03-19_13-38-26

Step 3

Okay, let’s make it a little more interesting by adding some detail to our page. We will do so by using panel methods in our fluidPage()

library(shiny)

ui <- fluidPage(
  titlePanel("Title Panel"),
     sidebarPanel("Sidebar Panel"),
      mainPanel("Main Panel")
    )
server <- function(input, output) {}
shinyApp(ui = ui, server = server)

Notice your page is now segmented into a title bar, a sidebar and a main panel

Okay, now let’s actually add some R data magic to this.

library(shiny)
library(datasets)
mpgData <- mtcars

ui <- fluidPage(
  titlePanel("Mpg vs Cylinder Plot"),
     sidebarPanel("Sidebar Panel"),
      mainPanel(  plotOutput("mpgPlot"))
    )
server <- function(input, output) {
  output$mpgPlot <- renderPlot({
    boxplot(mpg~cyl, data= mpgData)})
}
shinyApp(ui = ui, server = server)

Let’s break down the new code. First thing first, we are loading the built in datasets library and grabbing the mtcars data set from it.

library(datasets)

mpgData <- mtcars

Next we are going to skip down to the server section next as this is where we work with our data:

First thing you will notice is function(input, output) – this is a standard call that allows you to pass information back and forth through the server code. In our example, we are only using the output right now.

We create a variable called output$mpgPlot to which we pass the shiny method renderPlot. Inside that method, we are simply creating a simple boxplot comparing mpg to cyl using the mtcars data set that we labeled mpgData

server <- function(input, output) {
  output$mpgPlot <- renderPlot({
    boxplot(mpg~cyl, data= mpgData)}) 
}

The boxplot we rendered as output$mpgPlot is now passed to our ui

Next, in the ui I renamed the title panel and in the mainPanel, I added the command plotOutput(“mpgData”). That will allow me to post the boxplot I created in the server to the ui web page.

ui <- fluidPage(
  titlePanel("Mpg vs Cylinder Plot"),
     sidebarPanel("Sidebar Panel"),
      mainPanel(  plotOutput("mpgPlot"))
    )

If you run the code, it should look like this:

2018-03-19_14-59-52

Step 4

Now let’s add a little interactivity to our web page. We are going to give the user the option to view the boxplots with or without the outliers taken into effect.

library(shiny)
library(datasets)
mpgData <- mtcars

ui <- fluidPage(
  titlePanel("Mpg vs Cylinder Plot"),
     sidebarPanel(checkboxInput("outliers", "Show outliers", FALSE)),
      mainPanel( 
                      plotOutput("mpgPlot"))
    )
server <- function(input, output) {
  output$mpgPlot <- renderPlot({
    boxplot(mpg~cyl, data= mpgData,
            outline = input$outliers)})
}
shinyApp(ui = ui, server = server)

This involves only two changes to our code

First we will add a checkbox to our sidebar, in the ui we will alter the sidebarPanel line to look like this

sidebarPanel(checkboxInput(“outliers”, “Show outliers”, FALSE)),

The breakdown of the syntax here is

checkboxinput(variable name, display text, default )– our default is set to FALSE so our box will be unchecked by default

Next, we will alter our boxplot to take into account the checkbox input

boxplot(mpg~cyl, data= mpgData,
outline = input$outliers)

— notice the variable name is input$outliers It is a composite of the variable from our checkboxinput method and “input” from out function(input, output) function that opens up our server.

Now run your code

Unchecked

Checked

Qlik Sense: Sorting by Expression

On December 29, 2016December 29, 2016 By Ben Larson Ph.D.In Qlik, VizualizationLeave a comment

This can be a very frustrating problem for many Qlik users.

I have a list of the days of the week, but as you can see they are not in order. Unfortunately, sorting them alphabetically will not help your situation either. The solution is to Sort by Expression

Go into to edit mode and click on the object in question (in this case a Filter). Now click on Sorting in the right side bar and unclick Auto Sorting

Uncheck Sort numerically and Sort alphabetically, and check Sort by expression

Click on the Expression box and the formula window opens.

The formula we are going to use is the Match() formula:

Syntax: match(field, condition 1, condition 2, etc)

Hit apply and check your Filter box.

R: Creating a Word Cloud

On December 17, 2016December 30, 2016 By Ben Larson Ph.D.In Analytics, R, Vizualization1 Comment

Word Clouds are great visualization techniques for dealing with text analytics. The idea behind them is they display the most common words in a corpus of text. The more often a word is used, the larger and darker it is.

Making a word cloud in R is relatively easy. The tm and wordcloud libraries from R’s CRAN repository is used to create one.

library(tm)
library(wordcloud)

If you do not have either of these loaded on your machine, you will have to use the following commands

install.packages("tm")
install.packages("wordcloud")

Now in order to make a word cloud, you first need a collection of words. In our example I am going to use a text file I created from the Wikipedia page on R.

You can download the text file here: rwiki

Now let’s load the data file.

text <- readLines("rWiki.txt")
> head(text)
[1] "R is a programming language and software environment 
[2] "The R language is widely used among statisticians and 
[3] "Polls, surveys of data miners, and studies of scholarly 
[4] "R is a GNU package.[9] The source code for the R 
[5] "General Public License, and pre-compiled binary versions
[6] "R is an implementation of the S programming language "
>

Notice each line in the text file is an individual element in the vector – text

Now we need to move the text into a tm element called a Corpus. First we need to convert the vector text into a VectorSource.

wc <- VectorSource(text)
wc <- Corpus(wc)

Now we need to pre-process the data. Let’s start by removing punctuation from the corpus.

wc <- tm_map(wc, removePunctuation)

Next we need to set all the letters to lower case. This is because R differentiates upper and lower case letters. So “Program” and “program” would treated as 2 different words. To change that, we set everything to lowercase.

wc <- tm_map(wc, content_transformer(tolower))

Next we will remove stopwords. Stopwords are commonly used words that provide no value to the evaluation of the text. Examples of stopwords are: the, a, an, and, if, or, not, with ….

wc <- tm_map(wc, removeWords, stopwords("english"))

Finally, let’s strip away the whitespace

wc <- tm_map(wc, stripWhitespace)

Now let us make our first word cloud

The syntax is as follows – wordcloud( words = corpus, scale = physical size, max.word = number of words in cloud)

wordcloud(words = wc, scale=c(4,0.5), max.words=50)

Now we have a word cloud, let’s add some more elements to it.

random.order = False brings the most popular words to the center

wordcloud(words = wc, scale=c(4,0.5), max.words=50,random.order=FALSE)

To add a little more rotation to your word cloud use rot.per

wordcloud(words = wc, scale=c(4,0.5), max.words=50,random.order=FALSE,
 rot.per=0.25)

Finally, lets add some color. We are going to use brewer.pal. The syntax is brewer.pal(number of colors, color mix)

cp <- brewer.pal(7,"YlOrRd")
wordcloud(words = wc, scale=c(4,0.5), max.words=50,random.order=FALSE,
 rot.per=0.25, colors=cp)

2016-12-16_22-48-06

QLIK: Build a Dashboard Part 2 – Formulas and Set Analysis

On June 22, 2016June 22, 2016 By Ben Larson Ph.D.In Qlik, Vizualization1 Comment

This is part two of my Qlik Sense tutorial on how to build a dashboard. In this part I will be covering some more advanced topics, namely formulas and set analysis.

If you want to play along, but have not taken part 1 yet, here is a link to part 1: QLIK: Build a Dashboard Part 1

Formulas

Let’s start with formulas. Open up your Dashboard and hit the edit button to get to the editor

Now click on the Sum(Scheduled) KPI we built. Next, on the right side bar, click Data > the arrow by Sum(Scheduled)

Now click on the orange fx – this will launch the expression editor

This is the expression editor. It is currently summing up the values of the Scheduled column. Scheduled is a binary with 1 meaning scheduled inspection and 0 not scheduled. So summing up this column will tell me how many scheduled inspections I have in my data set.

Let’s alter the expression a little. Add ‘/100‘ to the end. This will divide our expression by 100

Here is the result

Now, let go back into our expression editor. Notice the menus on the right. The aggregation menu tells us the options available for aggregation. Let’s use Count

Delete the current expression and start typing Count(WorkOrderKey), you should notice a helper box popping up asking if this is the field you are looking for.

Notice the colors in your expression. Blue is a key word while gold is a field name from your data

Hit apply, now you have a count of all work orders in your data

qlikd2p10

Got back to the expression editor one more time, and type the following:

Count(WorkOrderKey) – Sum(Scheduled)

This will give us the total work orders – number of scheduled work orders. The result being total unscheduled work orders.

Set Analysis

Set analysis is Qlik’s way of localizing filters. If I were to apply a filter of Scheduled = True to my dashboard, all charts and graphs in my dashboard would be filtered to only show Scheduled work orders. But what if I didn’t want that. What if I only want to filter my bar chart by Scheduled work orders? That is where set analysis comes in.

The syntax for set analysis is {<filtering expression>}

Bar Chart Part 1

Lets do one, and you will better understand. Click on the bar chart > Measures>fx to open expression editor

qlikd2p12

Now inside the brackets of the expression add the following: {$<Scheduled={1}>}

Your final expression should look like: Count({$<Scheduled={1}>}WorkOrderKey)

Count( – count number of elements inside brackets
{ – start of set analysis
$ – this means I am going to filter on the data currently on the dashboard. So if I have Hospital A checked on my filter list, I am only going to look at data from Hospital A. If I wanted to use all data, regardless of active filters on the page, I would replace my $ with a 1
<> – our set analysis lives inside the carrots
Scheduled = {1} – filter my data to only show rows with Scheduled = 1 ** note that I have placed 1 inside {}. This is a syntax requirement. The argument must be in {}
} – close out set analysis
WorkOrderKey – the element I am going to Count
) – close Count

End result. Only counting Scheduled Work Orders

Bar Chart Part 2 – date filtering

Now we are going to add one more element, and I promise, this will be the most difficult thing we tackle here. And it actually looks a lot worse than it actually is.

Go back into your expression editor and type this

Count({$<Scheduled={1},RequestedOn.autoCalendar.Month={“>=$(=date(max(RequestedOn.autoCalendar.Month)-2))”}>}WorkOrderKey)

Looks horrible right? Well, let’s break it down.

Count({$<Scheduled={1},RequestedOn.autoCalendar.Month={“>=$(=date(max(RequestedOn.autoCalendar.Month)-2))”}>}WorkOrderKey)

Everything in blue was covered in the last example, so we can get rid of that.

Now we just have this.

RequestedOn.autoCalendar.Month={“>=$(=date(max(RequestedOn.autoCalendar.Month)-2))”}

autoCalendar is a feature Qlik adds to your data so you can easily work with parts of a date field. RequestedOn.autoCalendar.Month is simply the month from the field RequestedOn. So 9/28/15 = September. To simplify the explanation, I will be replacing RequestedOn.autoCalendar.Month with Month

Month = {“>=$(=date(max(Month)-2))”}

Month = — filter by Month
{ — remember 1 had to be in {} for schedule. The argument must be in {}
“>= — >= means greater than or equal to. We need the ” because otherwise Qlik would think > meant end of set analysis
$ – again this means only filter active data currently on the dashboard
(=date(max(Month) – (=date(Max()) – these are built in functions that find the max value in the data set from field given – Month and return the value as a date value not a string
-2 — go back 2 month from max date

So, all the big ridiculously looking expression says is – Count all scheduled work orders in the data set that have a Requested On date >= to October ( December – 2 Months)

Still on the bar chart, now click Add under Measures

Now click fx to enter expression editor

The the following:

Count({$<Scheduled={1}, Closed={0},RequestedOn.autoCalendar.Month={“>=$(=date(max(RequestedOn.autoCalendar.Month)-2))”}>}WorkOrderKey)

I have highlighted the only new code added. Closed = {0} means the work order is still open. So I have added a second measure that only counts open scheduled work orders

So now if you look at your graph, you have 2 bars per month.The blue is our first expression and pink is our second. So we now have all Scheduled Work Order and Open Scheduled work orders for each month for easy comparision

Gauge – How many pumps are in the shop?

Grab gauge from the left bar and drag it to the bottom of your sheet.

Click Add Dimension > Fx to go to expression editor

Now type the following:

count({$<ClassCode={‘PUMPINFU’}, Closed = {0}, Scheduled = {0}>}WorkOrderKey)/count(distinct{$<ClassCode={‘PUMPINFU’}>}EquipmentKey)

Again, don’t let it intimidate you, let’s break it down.

First, notice that what we have here is 2 counting expressions – one divided by the other

count({$<ClassCode={‘PUMPINFU’}, Closed = {0}, Scheduled = {0}>}WorkOrderKey)/count(distinct{$<ClassCode={‘PUMPINFU’}>}EquipmentKey)

The first expression simply states – Count WorkOrderKeys of all rows where Closed is 0 and Scheduled is 0 and the ClassCode = PUMPINFU – this means count all rows with open unscheduled work orders where the piece of equipment was an infusion pump. (** PUMPINFU is just the class code we use in our database **)

The second expression says count all rows contain infusion pumps. I added the distinct key word so that any individual piece of equipment that appears more than once in my list won’t be counted twice by accident.

So, my expression is – Number of Open Infusion Pump Work Orders / Number of Infusion Pumps

Now, lets make it look nice.

On the right bar, set Number formatting to Number

qlikd2p20

Chance Formatting to % with 1 decimal place

Now click Appearance, set our max Range Limit to 0.3 (30%), and click use segments

Click Add limit, use scroll bar or just type in 0.1

qlikd2p24

Hit Add limit again, set it for 0.2

Click on any of your boxes now and you will get color options

Now I did tradition green, yellow, red, but you can use any color combination you want.

You now have all the information you need to finish the dashboard on your own – which is really the best way to learn. Go back, look at my examples. The set analysis really is the hardest part to worry about for this dashboard. If you get stuck, Qlik has a great user community that can answer most questions. Or, you can leave a question in the comments or email me at Analytics4all@yahoo.com

QLIK: Build a Dashboard Part 1

On June 21, 2016June 21, 2016 By Ben Larson Ph.D.In Qlik, Vizualization3 Comments

qlikDash

In this two part lesson, we are going to build the above dashboard using QLIK Sense. If you don’t already have QLIK Sense, you can download a free trial here: Download

You can download the data set I am working with here: WO

Create new app

Once you have QLIK Sense up and running, click Create new app

qlikDP1

Name you new app. I am calling mine DASH

Click Open App

Next click Add data

qlikSense4

Now select Excel files – I created my real dashboard using the SQL connector, however it is easier to load the Excel files. So I am using Excel in this lesson. I will show how to load SQL into QLIK in another lesson.

qlikSense5

Navigate to the folder you saved the downloaded files to and select WO.xlsx

qlikDP3

Note – you can uncheck columns you do not want to use from your data, but I have already removed all unwanted columns for this exercise.

qlikDP4

click Load data and finish on the bottom

qlikDP5

Once data loads, click Edit the sheet

Now we have a blank sheet to work with

qlikDP7

Build the Dashboard

Click on Bar chart and drag it into the upper right corner

You will now be asked for a Dimension and a Measure. If you are not familiar with these terms in data, a dimension is like a category while a measure is something you can add, count, etc.

For Dimension, we will select RequestedOn > Month

qlikDP9

For Measure, select WorkOrderKey

Now select Count

qlikDP11

Now resize your bar chart by dragging it by the corner.

Next, drag a KPI over from the options in the left column. Place it to the right of your Bar Chart

As a measure, select Scheduled > SUM

qlikDP14

Now select Filter pane from the left column and place it next to your KPI. For dimension, select FacilityCode

Now, let’s drag over a Line Chart. Place it below your bar chart. Set Dimension to RequestedON> Date. Set your measure to WorkOrderKey> Count

qlikdp16

Now, let’s add a Pie Chart to the right of our line chart. Set Dimension to DepartmentCode and Measure to WorkOrderKey > Count

Now lets clean it up. Click in the upper right corners of your charts that say Click to add title. Give your charts some titles.

Once you have done that, click the Done button in the upper right corner.

qlikdp20

Now play with your new dashboard.

In the next lesson, I will show how to add more versatility to your dashboard through used of formulas and Set Analysis.

QLIK: QLIK Sense Dashboard

On June 20, 2016June 20, 2016 By Ben Larson Ph.D.In Qlik, VizualizationLeave a comment

This article here is more of a show and tell than a how to lesson. I wanted to demonstrate a really cool product that I have been demoing for work called QLIK Sense. I work for a Clinical Engineering Department at a multi-site hospital system. While the main thrust of my job is data related, the department as a whole is responsible for the maintenance of medical equipment, everything from IV pumps to MRI machines. Anyone who has ever worked into a busy maintenance/service environment knows how difficult it can be to optimize labor so that proper resources are allocated to the right jobs.

One great tool that can help is an operations dashboard. This gives everyone in the shop a single point of reference for what is happening at any given time. Using QLIK Sense, I was able quickly to connect my work order database and within less than an hour, I had the following dashboard created.

Closer Look at the Elements

Across the top I have, Scheduled Inspections (Total vs Still Open), Repairs Calls Opened Today, Repair Calls Closed Today, and Open High Risk Equipment Repair Calls

Next I have a trend line showing work orders created vs closed over time. Next to that, I have a pie chart showing what departments the majority of my repair calls are originating from.

On the bottom, I have a gauge that shows the number of IV Pumps in the shop for repair vs the total number of pumps in Inventory. The goal is to keep that number below 20%. So as the needle starts edging towards red, we know to move resources onto pumps.

Finally, I have open repair work orders by employee.

Cool Features

QLIK Sense provides more than just a dashboard. It is totally interactive. You can use basic filters like the one below:

Or you can simple click on any element with the dashboard to filter the data.

Notice that after selecting Monitor Technicians from the Department Pie Chart, my Repair Work Orders by Employee chart has automatically filtered down to only the employees with work orders open in that department.

qlikDash3

With the trend line graph, I can use the “lasso” to select a range I want to filter by.

Of course you can simply “fly over” element in the dashboard to get information as well.

Now I have an effective BI tool

I know a lot of people out there are still living in the world of static reporting. Getting near real time information up in front of the decision makers can allow them to make near real time decisions. The days of basing decisions off of a static report run daily or even (gasp.. ) weekly are over. We need to start deploying tools like this dashboard so we can manage less by what we think is happening and more by what we know is happening.

R: ggplot using facets

On June 3, 2016June 3, 2016 By Ben Larson Ph.D.In R, VizualizationLeave a comment

In this example I am going to teach you to use facets to add another level to your ggplot data visualizations. In this example we will be working with the ChickWeight data set from R Datasets

First, let’s check out the data by running head()

head(ChickWeight)

We have 4 columns in our data, weight, Time, Chick, and Diet.

Now, let’s build our ggplot model. Below, we are setting our model to a variable “p1”. We then set ChickWeight as our data set and the weight column to our x axis. Remember, nothing will happen when running the line below.

p1 <- ggplot(data=ChickWeight, aes(x=weight))

Now lets graph our plot as a histogram – we will set our binwidth to 10 – if any of this confusing you, go back and check out my earlier lessons on ggplot and ggplot histograms

p1 + geom_histogram(binwidth = 10)

Let’s now add some color. I am going to set our “fill” to be Diet

p1 + geom_histogram(binwidth = 10, aes(fill=Diet))

Now, using facets, I am going to take it one step further and separate this into 4 histograms, one for each diet.

 p1 + geom_histogram(binwidth = 10, aes(fill=Diet)) + facet_grid(Diet~.)

The Code

#-- Look at data

head(ChickWeight)

#-- build ggplot model
p1 <- ggplot(data=ChickWeight, aes(x=weight))

#--plot ggplot as histogram
p1 + geom_histogram(binwidth = 10)

#-- set fill color to Diet
p1 + geom_histogram(binwidth = 10, aes(fill=Diet))

#-- create four histograms - one for each Diet
p1 + geom_histogram(binwidth = 10, aes(fill=Diet)) + facet_grid(Diet~.)

R: ggplot – Histograms

On May 30, 2016May 30, 2016 By Ben Larson Ph.D.In R, VizualizationLeave a comment

Let’s make a Histogram using ggplot

First step, import the ggplot2 library

library(ggplot2)

Now let’s look at our data. In this example I am using chickwts from R Data

head(chickwts)

str(chickwts)

As you can see above, this data contains 2 columns weight ( numeric) and feed(Factor with 6 levels)

Histogram

In a histogram, we don’t need to worry about assigning a Y axis, the Y axis is the frequency count of our X variable.

Let’s set our data and x axis in ggplot. Let’s also assign it to variable.

pl <- ggplot(data=chickwts, aes(x=weight))

Now we can use the assigned variable in conjunction with our geom()

pl + geom_histogram(binwidth=10)

In our geom, I set a binwidth. This tells us how wide to make our bars. Setting binwidth to 10 makes each bar 10 units wide.

histogram2

Let’s give it some color

pl + geom_histogram(binwidth=10, fill="blue")

Now, let’s set our colors based on our 2nd column “feed”. Note how a legend is automatically generated.

pl + geom_histogram(binwidth=10, aes(fill=feed))

For the finishing touch, let’s add some black bordering around our boxes.

pl + geom_histogram(binwidth=10, aes(fill=feed), color="black")

The Code

library(ggplot2)

head(chickwts)

#-- look at structure
str(chickwts)

#set data and x axis value
pl <- ggplot(data=chickwts, aes(x=weight))

#create histogram, binwidth 10
pl + geom_histogram(binwidth=10)

#change bars to blue
pl + geom_histogram(binwidth=10, fill="blue")

#set color of bars by feed column values 
pl + geom_histogram(binwidth=10, aes(fill=feed))

#add black border around boxes
pl + geom_histogram(binwidth=10, aes(fill=feed), color="black")

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

	Anonymous on Python Web Scraping / Automati…
	rajendarqvkelly on PIG: Use GRUNT to Access PIG f…
	A Transient Historic… on Inverted Index Database
	Anonymous on XML Parsing: Advanced SQL
	Database Development… on SQL Server: Importing Excel Fi…

Analytics4All

Category: Vizualization

Top 5 Data Visualization Tools on the Market

1. Tableau

2. Power BI

3. QlikView

4. D3.js

5. Google Data Studio

Conclusion

Python: Create a Word Cloud

Lets start by loading the data

R Shiny: Introduction

Step 2

Step 3

Step 4

Qlik Sense: Sorting by Expression

R: Creating a Word Cloud

QLIK: Build a Dashboard Part 2 – Formulas and Set Analysis

Formulas

Set Analysis

Bar Chart Part 1

Bar Chart Part 2 – date filtering

Gauge – How many pumps are in the shop?

QLIK: Build a Dashboard Part 1

Create new app

Build the Dashboard

QLIK: QLIK Sense Dashboard

Closer Look at the Elements

Cool Features

Now I have an effective BI tool

R: ggplot using facets

The Code

R: ggplot – Histograms

Histogram

The Code