
Word Clouds are a simple way of visualizing word frequency in a corpus of text. Word Clouds typically work by displaying frequently used words in a text corpus, with the most frequent words appearing in larger text.
Here is the data file I will be using in this example if you want to follow along:
As far as libraries go, you will need pandas, matplotlib, os, and wordcloud. If you are using the Anaconda python distribution you should have all the libraries but wordcloud. You can install it using PIP or Conda install.
Lets start by loading the data
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os
#Set working directory
os.chdir('C:\\Users\\blars\\Documents')
#Import CSV
df = pd.read_csv("movies.csv")
#First look at the Data
df.head()
** Note: if you are using Jupyter notebooks to run this, add %matplotlib inline to the end of the import matplotlib line, otherwise you will not be able to see the word cloud
import matplotlib.pyplot as plt %matplotlib inline

We can use df.info() to look a little closer at the data

We have to decide what column we want to build our word cloud from. In this example I will be using the title column, but feel free to use any text column you would like.
Let look at the title column

As you can see, we have 20 movie titles in our data set. Next thing we have to do is merge these 20 rows into one large string
corpus = " ".join(tl for tl in df.title)
The code above is basically a one line for loop. For every Row in the Column df.title, join it with the next row, separating by a space ” “
Now build the word cloud
wordcloud = WordCloud(width=640, height=480, max_words=20).generate(corpus)
You can change the width and height, number of words that will appear. Play around with the numbers, see how it changes your output
Finally, let’s chart it, so we can see the cloud
plt.imshow(wordcloud,interpolation="bilinear")
plt.axis("off")
plt.show()

interpolation = “bilinear” is what lets the words so sideways and up and down
plt.axis(“off”) gets rid or axis markers (see below)

wordcloud = WordCloud(width=640, height=480, background_color = 'white', max_words=25).generate(corpus)
plt.imshow(wordcloud,interpolation="bilinear")
plt.axis("off")
plt.show()
