Data visualization is an essential aspect of data analysis and communication. Data visualization tools enable users to transform data into charts, graphs, and other visual representations that are easier to understand and interpret. In this article, we will look at some of the top data visualization tools available in the market.
Tableau is a powerful data visualization tool that enables users to create interactive dashboards, reports, and charts. It has a user-friendly interface, which allows users to drag and drop data to create visuals quickly. Tableau is known for its robust features, including data blending, mapping, and real-time collaboration. It also has a vibrant community, which makes it easy to find resources and solutions to any challenge.
2. Power BI
Power BI is a popular data visualization tool developed by Microsoft. It enables users to create interactive dashboards and reports that can be shared across an organization. Power BI has a user-friendly interface and offers a wide range of features, including data modeling, forecasting, and natural language processing. It also integrates seamlessly with other Microsoft products like Excel, SharePoint, and Teams.
3. QlikView
QlikView is a business intelligence tool that enables users to create interactive visualizations, reports, and dashboards. It has an intuitive interface that allows users to drag and drop data and create charts and graphs quickly. QlikView also offers advanced features like data modeling, association analysis, and collaboration capabilities.
4. D3.js
D3.js is a data visualization library that allows users to create custom visualizations using web standards like HTML, CSS, and SVG. It provides a high degree of flexibility, allowing users to create unique visualizations that match their specific needs. D3.js has a steep learning curve, but its versatility and customization options make it a favorite among developers.
5. Google Data Studio
Google Data Studio is a free data visualization tool that enables users to create interactive reports and dashboards. It integrates with Google Analytics and other Google products, making it easy to gather and analyze data. Google Data Studio also offers collaboration capabilities, allowing teams to work together on reports and dashboards.
Conclusion
In conclusion, data visualization tools play a crucial role in helping organizations make sense of their data. The tools mentioned above are just a few of the many available in the market. When choosing a data visualization tool, it’s essential to consider factors like ease of use, features, and cost. Ultimately, the right tool will depend on the specific needs of your organization.
Word Clouds are a simple way of visualizing word frequency in a corpus of text. Word Clouds typically work by displaying frequently used words in a text corpus, with the most frequent words appearing in larger text.
Here is the data file I will be using in this example if you want to follow along:
As far as libraries go, you will need pandas, matplotlib, os, and wordcloud. If you are using the Anaconda python distribution you should have all the libraries but wordcloud. You can install it using PIP or Conda install.
Lets start by loading the data
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os
#Set working directory
os.chdir('C:\\Users\\blars\\Documents')
#Import CSV
df = pd.read_csv("movies.csv")
#First look at the Data
df.head()
** Note: if you are using Jupyter notebooks to run this, add %matplotlib inline to the end of the import matplotlib line, otherwise you will not be able to see the word cloud
import matplotlib.pyplot as plt %matplotlib inline
We can use df.info() to look a little closer at the data
We have to decide what column we want to build our word cloud from. In this example I will be using the title column, but feel free to use any text column you would like.
Let look at the title column
As you can see, we have 20 movie titles in our data set. Next thing we have to do is merge these 20 rows into one large string
corpus = " ".join(tl for tl in df.title)
The code above is basically a one line for loop. For every Row in the Column df.title, join it with the next row, separating by a space ” “
Shiny is an R Studio application that allows you to how host your R code on a website. This is a great method for creating a dashboard or deploying a predictive model. The best part is, it is free. There are, however, paid services associated with R Studio and Shiny if you want to get professional support and such.
In this lesson, we are going to learn to build a Shiny server application step by step.
Let’s start at the beginning. First off, to use Shiny, you need to have R Studio installed on your computer. Once you have that, you will need to install the Shiny package
install.packages(‘shiny’)
Next, we are going to learn about the three parts of a Shiny server application.
Below is the basic framework of a Shiny app:
library(shiny)
ui <- fluidPage("Test")
server <- function(input, output) {}
shinyApp(ui = ui, server = server)
Let’s break this down:
ui: ui is the user interface. This is the code that makes up the web page the user will launch and interact with. Anyone familiar with HTML, this is the basic markup. This is where you set the wording, font, colors, what kind of object will be on the page, etc
server: server is the brains of the operation. This is where you load and manipulate any data being used in your page. Again, if you are a web developer, think Java Script here.
shinyApp: this is the code that launches your App.
If you run the code above, you will see something like this
Note the IP address in the upper left. If you copy this into a browser of your choice, it will run there too.
You will see the same address in the R Console. As long as the R Shiny server is running, you will see this Listening command. For those not familiar, the address 127.0.0.1 is called a pingback address. This is the internal address of the NIC card on your machine. This means the server is running on the machine you are on. The :4654 is the port my computer grabbed. This can be different for you. The easiest way for me to explain the port without going too deep into network theory is: think of it like a TV channel. From your computer, you could technically be running hundreds of these R Shiny servers. The port (TV channel) is what tells the computer which one of these web pages you want to visit.
If you happen to be on a network, you can go to another computer and type in your host computers network IP address (you can get it by going to a cmd console and typing ipconfig). It will be the address I have the arrows pointing to in my example
Then, using my example, type http://10.171.46.91:4654 into the second computer’s browser and you should see your Shiny website.
Step 2
Let’s see if we can make our website more interesting, let’s start easy, we will format the text.
First close out your shiny window to end the session.
Change your ui line to match the one below.
ui <- fluidPage(h1(“Test”))
Now run your code again
library(shiny)
ui <- fluidPage(h1(“Test”))
server <- function(input, output) {}
shinyApp(ui = ui, server = server)
Step 3
Okay, let’s make it a little more interesting by adding some detail to our page. We will do so by using panel methods in our fluidPage()
Let’s break down the new code. First thing first, we are loading the built in datasets library and grabbing the mtcars data set from it.
library(datasets)
mpgData <- mtcars
Next we are going to skip down to the server section next as this is where we work with our data:
First thing you will notice is function(input, output) – this is a standard call that allows you to pass information back and forth through the server code. In our example, we are only using the output right now.
We create a variable called output$mpgPlot to which we pass the shiny method renderPlot. Inside that method, we are simply creating a simple boxplot comparing mpg to cyl using the mtcars data set that we labeled mpgData
The boxplot we rendered as output$mpgPlot is now passed to our ui
Next, in the ui I renamed the title panel and in the mainPanel, I added the command plotOutput(“mpgData”). That will allow me to post the boxplot I created in the server to the ui web page.
Now let’s add a little interactivity to our web page. We are going to give the user the option to view the boxplots with or without the outliers taken into effect.
— notice the variable name is input$outliers It is a composite of the variable from our checkboxinput method and “input” from out function(input, output) function that opens up our server.
This can be a very frustrating problem for many Qlik users.
I have a list of the days of the week, but as you can see they are not in order. Unfortunately, sorting them alphabetically will not help your situation either. The solution is to Sort by Expression
Go into to edit mode and click on the object in question (in this case a Filter). Now click on Sorting in the right side bar and unclick Auto Sorting
Uncheck Sort numerically and Sort alphabetically, and check Sort by expression
Click on the Expression box and the formula window opens.
The formula we are going to use is the Match() formula:
Word Clouds are great visualization techniques for dealing with text analytics. The idea behind them is they display the most common words in a corpus of text. The more often a word is used, the larger and darker it is.
Making a word cloud in R is relatively easy. The tm and wordcloud libraries from R’s CRAN repository is used to create one.
library(tm)
library(wordcloud)
If you do not have either of these loaded on your machine, you will have to use the following commands
Now in order to make a word cloud, you first need a collection of words. In our example I am going to use a text file I created from the Wikipedia page on R.
text <- readLines("rWiki.txt")
> head(text)
[1] "R is a programming language and software environment
[2] "The R language is widely used among statisticians and
[3] "Polls, surveys of data miners, and studies of scholarly
[4] "R is a GNU package.[9] The source code for the R
[5] "General Public License, and pre-compiled binary versions
[6] "R is an implementation of the S programming language "
>
Notice each line in the text file is an individual element in the vector – text
Now we need to move the text into a tm element called a Corpus. First we need to convert the vector text into a VectorSource.
wc <- VectorSource(text)
wc <- Corpus(wc)
Now we need to pre-process the data. Let’s start by removing punctuation from the corpus.
wc <- tm_map(wc, removePunctuation)
Next we need to set all the letters to lower case. This is because R differentiates upper and lower case letters. So “Program” and “program” would treated as 2 different words. To change that, we set everything to lowercase.
wc <- tm_map(wc, content_transformer(tolower))
Next we will remove stopwords. Stopwords are commonly used words that provide no value to the evaluation of the text. Examples of stopwords are: the, a, an, and, if, or, not, with ….
This is part two of my Qlik Sense tutorial on how to build a dashboard. In this part I will be covering some more advanced topics, namely formulas and set analysis.
Let’s start with formulas. Open up your Dashboard and hit the edit button to get to the editor
Now click on the Sum(Scheduled) KPI we built. Next, on the right side bar, click Data > the arrow by Sum(Scheduled)
Now click on the orange fx – this will launch the expression editor
This is the expression editor. It is currently summing up the values of the Scheduled column. Scheduled is a binary with 1 meaning scheduled inspection and 0 not scheduled. So summing up this column will tell me how many scheduled inspections I have in my data set.
Let’s alter the expression a little. Add ‘/100‘ to the end. This will divide our expression by 100
Here is the result
Now, let go back into our expression editor. Notice the menus on the right. The aggregation menu tells us the options available for aggregation. Let’s use Count
Delete the current expression and start typing Count(WorkOrderKey), you should notice a helper box popping up asking if this is the field you are looking for.
Notice the colors in your expression. Blue is a key word while gold is a field name from your data
Hit apply, now you have a count of all work orders in your data
Got back to the expression editor one more time, and type the following:
Count(WorkOrderKey) – Sum(Scheduled)
This will give us the total work orders – number of scheduled work orders. The result being total unscheduled work orders.
Set Analysis
Set analysis is Qlik’s way of localizing filters. If I were to apply a filter of Scheduled = True to my dashboard, all charts and graphs in my dashboard would be filtered to only show Scheduled work orders. But what if I didn’t want that. What if I only want to filter my bar chart by Scheduled work orders? That is where set analysis comes in.
The syntax for set analysis is {<filtering expression>}
Bar Chart Part 1
Lets do one, and you will better understand. Click on the bar chart > Measures>fx to open expression editor
Now inside the brackets of the expression add the following: {$<Scheduled={1}>}
Your final expression should look like: Count({$<Scheduled={1}>}WorkOrderKey)
Count( – count number of elements inside brackets
{ – start of set analysis
$ – this means I am going to filter on the data currently on the dashboard. So if I have Hospital A checked on my filter list, I am only going to look at data from Hospital A. If I wanted to use all data, regardless of active filters on the page, I would replace my $ with a 1
<> – our set analysis lives inside the carrots
Scheduled = {1} – filter my data to only show rows with Scheduled = 1 ** note that I have placed 1 inside {}. This is a syntax requirement. The argument must be in {}
} – close out set analysis
WorkOrderKey – the element I am going to Count
) – close Count
End result. Only counting Scheduled Work Orders
Bar Chart Part 2 – date filtering
Now we are going to add one more element, and I promise, this will be the most difficult thing we tackle here. And it actually looks a lot worse than it actually is.
autoCalendar is a feature Qlik adds to your data so you can easily work with parts of a date field. RequestedOn.autoCalendar.Month is simply the month from the field RequestedOn. So 9/28/15 = September. To simplify the explanation, I will be replacing RequestedOn.autoCalendar.Month with Month
Month = {“>=$(=date(max(Month)-2))”}
Month = — filter by Month
{ — remember 1 had to be in {} for schedule. The argument must be in {}
“>= — >= means greater than or equal to. We need the ” because otherwise Qlik would think > meant end of set analysis
$ – again this means only filter active data currently on the dashboard
(=date(max(Month) – (=date(Max()) – these are built in functions that find the max value in the data set from field given – Month and return the value as a date value not a string
-2 — go back 2 month from max date
So, all the big ridiculously looking expression says is – Count all scheduled work orders in the data set that have a Requested On date >= to October ( December – 2 Months)
Still on the bar chart, now click Add under Measures
I have highlighted the only new code added. Closed = {0} means the work order is still open. So I have added a second measure that only counts open scheduled work orders
So now if you look at your graph, you have 2 bars per month.The blue is our first expression and pink is our second. So we now have all Scheduled Work Order and Open Scheduled work orders for each month for easy comparision
Gauge – How many pumps are in the shop?
Grab gauge from the left bar and drag it to the bottom of your sheet.
Click Add Dimension > Fx to go to expression editor
The first expression simply states – Count WorkOrderKeys of all rows where Closed is 0 and Scheduled is 0 and the ClassCode = PUMPINFU – this means count all rows with open unscheduled work orders where the piece of equipment was an infusion pump. (** PUMPINFU is just the class code we use in our database **)
The second expression says count all rows contain infusion pumps. I added the distinct key word so that any individual piece of equipment that appears more than once in my list won’t be counted twice by accident.
So, my expression is – Number of Open Infusion Pump Work Orders / Number of Infusion Pumps
Now, lets make it look nice.
On the right bar, set Number formatting to Number
Chance Formatting to % with 1 decimal place
Now click Appearance, set our max Range Limit to 0.3 (30%), and click use segments
Click Add limit, use scroll bar or just type in 0.1
Hit Add limit again, set it for 0.2
Click on any of your boxes now and you will get color options
Now I did tradition green, yellow, red, but you can use any color combination you want.
You now have all the information you need to finish the dashboard on your own – which is really the best way to learn. Go back, look at my examples. The set analysis really is the hardest part to worry about for this dashboard. If you get stuck, Qlik has a great user community that can answer most questions. Or, you can leave a question in the comments or email me at Analytics4all@yahoo.com
In this two part lesson, we are going to build the above dashboard using QLIK Sense. If you don’t already have QLIK Sense, you can download a free trial here: Download
You can download the data set I am working with here: WO
Create new app
Once you have QLIK Sense up and running, click Create new app
Name you new app. I am calling mine DASH
Click Open App
Next click Add data
Now select Excel files – I created my real dashboard using the SQL connector, however it is easier to load the Excel files. So I am using Excel in this lesson. I will show how to load SQL into QLIK in another lesson.
Navigate to the folder you saved the downloaded files to and select WO.xlsx
Note – you can uncheck columns you do not want to use from your data, but I have already removed all unwanted columns for this exercise.
click Load data and finish on the bottom
Once data loads, click Edit the sheet
Now we have a blank sheet to work with
Build the Dashboard
Click on Bar chart and drag it into the upper right corner
You will now be asked for a Dimension and a Measure. If you are not familiar with these terms in data, a dimension is like a category while a measure is something you can add, count, etc.
For Dimension, we will select RequestedOn > Month
For Measure, select WorkOrderKey
Now select Count
Now resize your bar chart by dragging it by the corner.
Next, drag a KPI over from the options in the left column. Place it to the right of your Bar Chart
As a measure, select Scheduled > SUM
Now select Filter pane from the left column and place it next to your KPI. For dimension, select FacilityCode
Now, let’s drag over a Line Chart. Place it below your bar chart. Set Dimension to RequestedON> Date. Set your measure to WorkOrderKey> Count
Now, let’s add a Pie Chart to the right of our line chart. Set Dimension to DepartmentCode and Measure to WorkOrderKey > Count
Now lets clean it up. Click in the upper right corners of your charts that say Click to add title. Give your charts some titles.
Once you have done that, click the Done button in the upper right corner.
Now play with your new dashboard.
In the next lesson, I will show how to add more versatility to your dashboard through used of formulas and Set Analysis.
This article here is more of a show and tell than a how to lesson. I wanted to demonstrate a really cool product that I have been demoing for work called QLIK Sense. I work for a Clinical Engineering Department at a multi-site hospital system. While the main thrust of my job is data related, the department as a whole is responsible for the maintenance of medical equipment, everything from IV pumps to MRI machines. Anyone who has ever worked into a busy maintenance/service environment knows how difficult it can be to optimize labor so that proper resources are allocated to the right jobs.
One great tool that can help is an operations dashboard. This gives everyone in the shop a single point of reference for what is happening at any given time. Using QLIK Sense, I was able quickly to connect my work order database and within less than an hour, I had the following dashboard created.
Closer Look at the Elements
Across the top I have, Scheduled Inspections (Total vs Still Open), Repairs Calls Opened Today, Repair Calls Closed Today, and Open High Risk Equipment Repair Calls
Next I have a trend line showing work orders created vs closed over time. Next to that, I have a pie chart showing what departments the majority of my repair calls are originating from.
On the bottom, I have a gauge that shows the number of IV Pumps in the shop for repair vs the total number of pumps in Inventory. The goal is to keep that number below 20%. So as the needle starts edging towards red, we know to move resources onto pumps.
Finally, I have open repair work orders by employee.
Cool Features
QLIK Sense provides more than just a dashboard. It is totally interactive. You can use basic filters like the one below:
Or you can simple click on any element with the dashboard to filter the data.
Notice that after selecting Monitor Technicians from the Department Pie Chart, my Repair Work Orders by Employee chart has automatically filtered down to only the employees with work orders open in that department.
With the trend line graph, I can use the “lasso” to select a range I want to filter by.
Of course you can simply “fly over” element in the dashboard to get information as well.
Now I have an effective BI tool
I know a lot of people out there are still living in the world of static reporting. Getting near real time information up in front of the decision makers can allow them to make near real time decisions. The days of basing decisions off of a static report run daily or even (gasp.. ) weekly are over. We need to start deploying tools like this dashboard so we can manage less by what we think is happening and more by what we know is happening.
In this example I am going to teach you to use facets to add another level to your ggplot data visualizations. In this example we will be working with the ChickWeight data set from R Datasets
First, let’s check out the data by running head()
head(ChickWeight)
We have 4 columns in our data, weight, Time, Chick, and Diet.
Now, let’s build our ggplot model. Below, we are setting our model to a variable “p1”. We then set ChickWeight as our data set and the weight column to our x axis. Remember, nothing will happen when running the line below.
p1 <- ggplot(data=ChickWeight, aes(x=weight))
Now lets graph our plot as a histogram – we will set our binwidth to 10 – if any of this confusing you, go back and check out my earlier lessons on ggplot and ggplot histograms
p1 + geom_histogram(binwidth = 10)
Let’s now add some color. I am going to set our “fill” to be Diet
#-- Look at data
head(ChickWeight)
#-- build ggplot model
p1 <- ggplot(data=ChickWeight, aes(x=weight))
#--plot ggplot as histogram
p1 + geom_histogram(binwidth = 10)
#-- set fill color to Diet
p1 + geom_histogram(binwidth = 10, aes(fill=Diet))
#-- create four histograms - one for each Diet
p1 + geom_histogram(binwidth = 10, aes(fill=Diet)) + facet_grid(Diet~.)
library(ggplot2)
head(chickwts)
#-- look at structure
str(chickwts)
#set data and x axis value
pl <- ggplot(data=chickwts, aes(x=weight))
#create histogram, binwidth 10
pl + geom_histogram(binwidth=10)
#change bars to blue
pl + geom_histogram(binwidth=10, fill="blue")
#set color of bars by feed column values
pl + geom_histogram(binwidth=10, aes(fill=feed))
#add black border around boxes
pl + geom_histogram(binwidth=10, aes(fill=feed), color="black")