[Tut] How to Create Word Clouds Using Python? - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] How to Create Word Clouds Using Python? (/thread-99220.html) |
[Tut] How to Create Word Clouds Using Python? - xSicKxBot - 04-15-2022 How to Create Word Clouds Using Python? <div><p>You may have already learned how to analyze quantitative data using graphs such as <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-plotting-part-1/" data-type="post" data-id="171808" target="_blank">bar charts</a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/matplotlib-histogram/" data-type="post" data-id="5485" target="_blank">histograms</a>. </p> <p><strong>But do you know how to study textual data? </strong></p> <p>One way to analyze textual information is by using a <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-generate-a-word-cloud-with-newspaper3k-and-python/" data-type="post" data-id="34485" target="_blank">word cloud</a>:</p> <div class="wp-block-image"> <figure class="aligncenter"><img src="https://lh3.googleusercontent.com/bTJyjmnwhWvtymj6pKB8DlGTXOKPNZl_bbnM3--78ddmz5HdCSm76q1YmFaivDaVHFRCqSvHl4Ax9p74kxkRnzW9Rjrv_dYnAgAEfQVCBG1xWta80ZAsmatcw7-M1_sItvpYK-vE" alt=""/><figcaption><strong>Figure 0</strong>: Word cloud you’ll learn how to create in this article.</figcaption></figure> </div> <p>There are many ways to create word clouds, but we will use the <code>WordCloud</code> library in this blog post. <code>WordCloud</code> is a Python library that makes word clouds from text files.</p> <h2>What Are Word Clouds?</h2> <p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4ac.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Definition</strong>: A word cloud (also known as a <em>tag cloud</em>) is a visual representation of the words that appear most frequently in a given text. They can be used to summarize large bodies of text or to visualize the sentiment of a document. </p> <p>A word cloud is a graphical representation of text data in which the size of each word is proportional to the number of times it appears in the text. </p> <p>They can be used to visualize the most critical words in a document quickly or to get an overview of the sentiment of a piece of text. </p> <p>There are word clouds apps such as <strong>Wordle</strong>, but in this blog post, we will show how to create word clouds using the Python library <code>WordCloud</code>.</p> <h2>What’s the WordCloud Library in Python?</h2> <p>The <a href="https://pypi.org/project/wordcloud/" data-type="URL" data-id="https://pypi.org/project/wordcloud/" target="_blank" rel="noreferrer noopener">WordCloud library</a> is open source and easy to use to create word clouds in Python. </p> <p>It allows you to create word clouds in various formats, including PDF, SVG, and image files. </p> <p>In addition, it provides several options for customizing your word clouds, including the ability to control the font, color, and layout.</p> <p>You can install it using the following command in your terminal (without the <code>$</code> symbol):</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">$ pip install wordcloud</pre> <p>Related Article:</p> <ul> <li><a href="https://blog.finxter.com/how-to-install-a-library-on-pycharm/" data-type="URL" data-id="https://blog.finxter.com/how-to-install-a-library-on-pycharm/" target="_blank" rel="noreferrer noopener">How to Install a Library on PyCharm?</a></li> <li><a href="https://blog.finxter.com/a-guide-of-all-pip-commands/" data-type="URL" data-id="https://blog.finxter.com/a-guide-of-all-pip-commands/" target="_blank" rel="noreferrer noopener">PIP Commands A Simple Guide</a></li> </ul> <h2>Where Are Word Clouds Used?</h2> <p>Word clouds are a fun and easy way to visualize data. </p> <p>By displaying the most common words in a given text, they can provide insights into the overall themes and tone of the text. </p> <ul> <li>Word clouds can be used for various purposes, from educational to marketing. </li> <li>They can use word clouds for vocabulary building and text analysis in the classroom. </li> <li>You can also use word clouds to generate leads or track customer sentiment. </li> <li>For businesses, word clouds can be used to create marketing materials, such as blog posts, infographics, and social media content. </li> <li>Word clouds can also monitor customer feedback or identify negative sentiment. </li> <li>Students can also use word Clouds to engage in an analysis of a piece of text. By visually highlighting the most important words, Word Clouds can help students to identify the main ideas and make connections between different concepts.</li> </ul> <h2>Pros of Word Clouds</h2> <p>The advantages of using word clouds are:</p> <p>First, you can use them to <strong>summarize a large body of text</strong> quickly and easily. Identifying the most frequently used words in a text can provide a quick overview of the main points.</p> <p>Second, with word clouds, you can quickly <strong>visualize the sentiment</strong> in a document. The size and placement of words in the Word Cloud can give you insights into the overall tone of the document. This tool is handy when analyzing a large body of text, such as customer feedback or reviews.</p> <p>Third, word clouds can be a valuable tool for identifying the <strong>most critical keywords</strong> in a text. By analyzing the distribution of words, you can quickly identify which terms are most prominent. The word clouds can be beneficial when monitoring changing trends or assessing the overall importance.</p> <p>Fourth, word clouds can be used to <strong>create designs</strong> that incorporate both visual and textual elements. By blending words and images, word clouds can add another layer of meaning to an already exciting design.</p> <h2>How to Create Word Clouds in Python?</h2> <p>We will be using Disneyland reviews downloaded from Kaggle to create a word cloud data visualization. </p> <p>You can download the file from <a rel="noreferrer noopener" href="https://www.kaggle.com/datasets/arushchillar/disneyland-reviews?select=DisneylandReviews.csv" data-type="URL" data-id="https://www.kaggle.com/datasets/arushchillar/disneyland-reviews?select=DisneylandReviews.csv" target="_blank">here</a>.</p> <p>In this file, we will be focussing on the <code>Review_Text</code> column for creating a word cloud. You can ignore other columns.</p> <p>First, you have to install the WordCloud Python library. You can do this by running the following command in a terminal:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">pip install wordcloud</pre> <p>Once you have installed <code>WordCloud</code>, you must import <a href="https://blog.finxter.com/pandas-quickstart/" data-type="post" data-id="16511" target="_blank" rel="noreferrer noopener"><code>pandas</code></a>, <code>matplotlib.pyplot</code>, and <code>wordcloud</code> libraries.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt </pre> <p>The <code>pandas</code> library reads the Disneyland reviews <a href="https://blog.finxter.com/read-and-write-flat-files-with-pandas/" data-type="post" data-id="62847" target="_blank" rel="noreferrer noopener">CSV file</a> into a data frame.</p> <p>We will show you the use of STOPWORDS in the upcoming section.</p> <p>The data frame variable “<code>df</code>” stores the data from the <code>disneylandreviews.csv</code> file with the following command.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv("/Users/mohamedthoufeeq/Downloads/DisneylandReviews.csv")</pre> <p>Now run the program and see the output.</p> <p>You get the following Unicode decode error.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 121844: invalid continuation byte</pre> <p>The Unicode decode error means that the string could not be properly decoded into UTF-8. This can happen when a file is downloaded from the Kaggle, and it is not in the correct encoding format.</p> <p>To solve this problem, you need to specify the encoding format for the file. You can type the following command in a terminal:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv("/Users/mohamedthoufeeq/Downloads/DisneylandReviews.csv",encoding='ISO-8859-1')</pre> <p>The <code>encoding = 'ISO-8859-1'</code> tells pandas that the file is in the ISO-8859-1 encoding format.</p> <p>Next, create a word cloud using the <code>WordCloud</code> Python library.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">wordcloud = WordCloud().generate(['Review_Text'])</pre> <p>In this above code, <code>WordCloud().generate()</code> is used to create a word cloud object. </p> <p>The <code>generate()</code> function takes a <a href="https://blog.finxter.com/python-lists/" data-type="post" data-id="7332" target="_blank" rel="noreferrer noopener">list</a> of strings as input. The list we are interested in is <code>Review_Text</code> which contains reviews about Disney Land. The words from the review you want to appear in your word cloud.</p> <p>Go ahead and run the code.</p> <p>You get again following error.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">TypeError: expected string or bytes-like object</pre> <p>The type error means that the word cloud object expects a string or a bytes-like object. But the data type is Pandas series.</p> <p>To solve this, You have to type following command</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">wordcloud = WordCloud().generate(' '.join(df['Review_Text']))</pre> <p>The above command converts the series to strings data type.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.imshow(wordcloud)</pre> <p>The <code>plt.imshow()</code> call will create a word cloud image in 2D.</p> <p>Then remove the axis with the following command:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.axis("off")</pre> <p>The <code>"off"</code> parameter removes the axis from the plot.</p> <p>Finally, the below commands displays the image of the word cloud.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">plt.show()</pre> <p>Once run the program you will see a word cloud image as shown below:</p> <div class="wp-block-image"> <figure class="aligncenter"><img src="https://lh6.googleusercontent.com/dDqcONnxLEzoLmOdDevE606G1RaapwC6yleDjDH0k7CzA0Hy-DoO6Qfk6Q5bjj8vvZCM6H3054jjg2Hr6QgKZ5MafjwnyIc4FiaGXpHHcx2s4FAaN3eZTQQaOh79nmXkkqjFm12T" alt=""/><figcaption><em>Figure 1. </em></figcaption></figure> </div> <p>The word <code>"Park"</code> is bigger, representing that this word appears more in reviews.</p> <p>But there are words such as <code>"Disneyland"</code>, <code>"went"</code>, <code>"will"</code>, <code>"park"</code>, <code>"go"</code>, <code>"day"</code>, and <code>"One"</code> that are unrelated for analysis.</p> <p>So we can exclude them from the word cloud with the following command using the stopwords parameter.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">STOPWORDS.update(['Disneyland', 'went','will,'go',"park", "day","one"]) wordcloud = WordCloud(stopwords = STOPWORDS).generate(' '.join(df['Review_Text']))</pre> <p><code>STOPWORDS</code> will remove all the defined words from the text before creating the word cloud. The word cloud function inserts the <code>STOPWORDS</code> parameter.</p> <p>Now re-run the program, and you will get the following word cloud image.</p> <div class="wp-block-image"> <figure class="aligncenter"><img src="https://lh5.googleusercontent.com/AIwYU1SNe_8BqUUNG7TI0fH0o0vnm-lGgnz5A7kOaWc570rk2FrD00qB1X043vax4loe9iceeuhsBGl_CpKTQDilSrUFJXW6Whct67xib-6ulKrnA3s1i_-ppGYoCOiZq8EBszlE" alt=""/><figcaption><em>Figure 2. </em></figcaption></figure> </div> <p>Before we can analyze the words, let us see how to customize the words’ appearance.</p> <p>You can also customize the appearance of your word cloud by changing the font size and background color.</p> <p>The maximum font size can be set with the <code>max_font_size</code> option, and the minimum font size can be set with the <code>min_font_size</code> option. The background color of the word cloud can be set with the <code>background_color</code> option.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">wordcloud = WordCloud(min_font_size = 10, max_font_size = 70, stopwords = STOPWORDS, background_color="white").generate(' '.join(df['Review_Text']))</pre> <p>The code sets the font size to a minimum of 10 points and a maximum of 70 points, and the background color to white.</p> <p>Re-run the program, and you will get the following word cloud image.</p> <div class="wp-block-image"> <figure class="aligncenter"><img src="https://lh3.googleusercontent.com/ZN1j-Jeok_ZbM864_7qM44wG_rJTRmUlCPwL_4kSxCQqfSOkMR7Ane-k0oWIXtN03NF2lJcOLD9OMe47szW9VZo9o9LLsFms0bnjZgrbHXRku9ZSBjp6w7Gt5mi3n0wmPm91YD8N" alt=""/><figcaption><em>Figure 3. </em></figcaption></figure> </div> <p>Also, you can set the maximum amount of words to be generated using the <code>max_words</code> parameter.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">wordcloud = WordCloud(min_font_size = 5, max_font_size = 100, max_words = 1000, stopwords = STOPWORDS, background_color="white").generate(' '.join(df['Review_Text']))</pre> <p>The above code sets the maximum number of words generated in the word cloud to 1000. Also, change the font size to 5 and 100.</p> <p>Re-run the program, and you will get the following word cloud.</p> <div class="wp-block-image"> <figure class="aligncenter"><img src="https://lh4.googleusercontent.com/hC_gXcLfCu2_jVCI1d89uD6Gt-GEn56eFUKTHG9zUr5TBAJiuMdHPrT80YN0tIwHYzqu5twX_IbU0PRxawt2BnHQXFdP-lgAdkyGXZ_wJQcKEquHKyHzbb8kSnZZX6pFpZnWI93h" alt=""/><figcaption><em>Figure 4. </em></figcaption></figure> </div> <p>As you can see, when you increase the number of words to 1000, the words that are repeated more in the reviews are shown in a larger size. </p> <p>This makes it easier to find out which words are prominent. In this word cloud, you can see that <code>"ride"</code> is the largest word.</p> <p>You set width and height of the word cloud image.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">wordcloud = WordCloud(width=350, height=350, min_font_size=5, max_font_size=100, max_words=1000, stopwords=STOPWORDS, background_color="white").generate(' '.join(df['Review_Text']))</pre> <p>The above code sets the width and height of the word cloud to 350.</p> <p>Re-run the program, and you will get the following word cloud image.</p> <div class="wp-block-image"> <figure class="aligncenter"><img src="https://lh3.googleusercontent.com/bTJyjmnwhWvtymj6pKB8DlGTXOKPNZl_bbnM3--78ddmz5HdCSm76q1YmFaivDaVHFRCqSvHl4Ax9p74kxkRnzW9Rjrv_dYnAgAEfQVCBG1xWta80ZAsmatcw7-M1_sItvpYK-vE" alt=""/><figcaption><em>Figure 5. </em></figcaption></figure> </div> <p>Now let’s analyze the word cloud to get some insights.</p> <p>The word <code>"ride"</code> appears large in the word cloud as it is the most frequent word in the text. Most people like to ride in Disneyland, which is reflected in the word cloud. </p> <p>Next, the word <code>"attraction"</code> is also popular. It shows that people are attracted to the rides and attractions in Disneyland. </p> <p>Also, the word <code>"time"</code> appears frequently. The word indicates that people spend a lot of time in Disneyland. </p> <p>Staffs of Disney land were very lovely. It is reflected in the word cloud as the word <code>"nice"</code> appears frequently. From the reviews, we can see that there are more queues and people are waiting for a long time, which is also reflected in the word cloud. </p> <p>The words <code>"lines"</code> and <code>"queue"</code> are also more prominent words in the text. </p> <p>But the word <code>"hotel"</code> is not popular in the text and represents that people do not prefer to stay in the hotel and go back home after spending the whole day in Disneyland. </p> <p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4ac.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Exercise</strong>: You can get more insights by analyzing the word cloud data. Try it out!</p> <h2>Summary</h2> <p>Word clouds are a great way to summarize large bodies of text or visualize a document’s sentiment. </p> <p>Word clouds are a great way to understand large bodies of text and can be used for various purposes. </p> <p>This blog post showed how to create word clouds using the Python library <code>WordCloud</code>. </p> <p>We also discussed how to customize the appearance of the word cloud and analyzed the word cloud data to get insights into the text. </p> <p>What do you use?</p> <hr class="wp-block-separator"/> </div> https://www.sickgaming.net/blog/2022/04/09/how-to-create-word-clouds-using-python/ |