Python: Create a Word Cloud

Word Clouds are a simple way of visualizing word frequency in a corpus of text. Word Clouds typically work by displaying frequently used words in a text corpus, with the most frequent words appearing in larger text.

Here is the data file I will be using in this example if you want to follow along:

As far as libraries go, you will need pandas, matplotlib, os, and wordcloud. If you are using the Anaconda python distribution you should have all the libraries but wordcloud. You can install it using PIP or Conda install.

Lets start by loading the data

import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os

#Set working directory

#Import CSV
df = pd.read_csv("movies.csv")

#First look at the Data

** Note: if you are using Jupyter notebooks to run this, add %matplotlib inline to the end of the import matplotlib line, otherwise you will not be able to see the word cloud

import matplotlib.pyplot as plt %matplotlib inline

We can use to look a little closer at the data

We have to decide what column we want to build our word cloud from. In this example I will be using the title column, but feel free to use any text column you would like.

Let look at the title column

As you can see, we have 20 movie titles in our data set. Next thing we have to do is merge these 20 rows into one large string

corpus = " ".join(tl for tl in df.title)

The code above is basically a one line for loop. For every Row in the Column df.title, join it with the next row, separating by a space ” “

Now build the word cloud

wordcloud = WordCloud(width=640, height=480, max_words=20).generate(corpus)

You can change the width and height, number of words that will appear. Play around with the numbers, see how it changes your output

Finally, let’s chart it, so we can see the cloud


interpolation = “bilinear” is what lets the words so sideways and up and down

plt.axis(“off”) gets rid or axis markers (see below)

You can also go back to the word cloud and change the background color
wordcloud = WordCloud(width=640, height=480, background_color = 'white', max_words=25).generate(corpus)