{"id":123709,"date":"2022-04-08T18:16:17","date_gmt":"2022-04-08T18:16:17","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=292393"},"modified":"2022-04-08T18:16:17","modified_gmt":"2022-04-08T18:16:17","slug":"how-to-color-a-scatter-plot-by-category-using-matplotlib-in-python","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2022\/04\/08\/how-to-color-a-scatter-plot-by-category-using-matplotlib-in-python\/","title":{"rendered":"How to Color a Scatter Plot by Category using Matplotlib in Python"},"content":{"rendered":"<h2>Problem Formulation<\/h2>\n<p>Given three arrays:<\/p>\n<ul>\n<li>The first two arrays <code>x<\/code> and <code>y<\/code> of length <code>n<\/code> contain the <code>(x_i, y_i)<\/code> data of a 2D coordinate system.<\/li>\n<li>The third array <code>c<\/code> provides categorical label information so we essentially get <code>n<\/code> data bundles <code>(x_i, y_i, c_i)<\/code> for an arbitrary number of categories <code>c_i<\/code>. <\/li>\n<\/ul>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/13.1.0\/72x72\/1f4ac.png\" alt=\"\ud83d\udcac\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Question<\/strong>: How to plot the data so that <code>(x_i, y_i)<\/code> and <code>(x_j, y_j)<\/code> with the same category <code>c_i == c_j<\/code> have the same color?<\/p>\n<h2>Solution: Use Pandas groupby() and Call plt.plot() Separately for Each Group<\/h2>\n<p class=\"has-global-color-8-background-color has-background\">To plot data by category, you iterate over all groups separately by using the <code>data.groupby()<\/code> operation. For each group, you execute the <code>plt.plot()<\/code> operation to plot only the data in the group.<\/p>\n<p>In particular, you perform the following steps:<\/p>\n<ol>\n<li>Use the <code>data.groupby(\"Category\")<\/code> function assuming that data is a Pandas DataFrame containing the <code>x<\/code>, <code>y<\/code>, and<code> category<\/code> columns for <em>n<\/em> data points (rows). <\/li>\n<li>Iterate over all <code>(name, group)<\/code> tuples in the grouping operation result obtained from step one.<\/li>\n<li>Use <code>plt.plot(group[\"X\"], group[\"Y\"], marker=\"o\", linestyle=\"\", label=name)<\/code> to plot each group separately using the <code>x<\/code>, <code>y<\/code> data and <code>name<\/code> as a label.<\/li>\n<\/ol>\n<p>Here&#8217;s what that looks like in code:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"13-15\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\nimport matplotlib.pyplot as plt # Generate the categorical data\nx = [1, 2, 3, 4, 5, 6]\ny = [42, 41, 40, 39, 38, 37]\nc = ['a', 'b', 'a', 'b', 'b', 'a'] data = pd.DataFrame({\"X\": x, \"Y\": y, \"Category\": c})\nprint(data) # Plot data by category\ngroups = data.groupby(\"Category\")\nfor name, group in groups: plt.plot(group[\"X\"], group[\"Y\"], marker=\"o\", linestyle=\"\", label=name) plt.legend()\nplt.show()<\/pre>\n<p>Before I show you how the resulting plot looks, allow me to show you the data output from the <code><a href=\"https:\/\/blog.finxter.com\/python-print\/\" data-type=\"post\" data-id=\"20731\" target=\"_blank\" rel=\"noreferrer noopener\">print()<\/a><\/code> function. Here&#8217;s the output of the categorical data:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> X Y Category\n0 1 42 a\n1 2 41 b\n2 3 40 a\n3 4 39 b\n4 5 38 b\n5 6 37 a<\/pre>\n<p>Now, how does the colored category plot look like? Here&#8217;s how:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"727\" height=\"540\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/04\/image-61.png\" alt=\"\" class=\"wp-image-292400\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/04\/image-61.png 727w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2022\/04\/image-61-300x223.png 300w\" sizes=\"auto, (max-width: 727px) 100vw, 727px\" \/><\/figure>\n<\/div>\n<p>If you want to learn more about Matplotlib, feel free to check out our full blog tutorial series:<\/p>\n<ul>\n<li><a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/matplotlib-full-guide\/\" data-type=\"URL\" data-id=\"https:\/\/blog.finxter.com\/matplotlib-full-guide\/\" target=\"_blank\">Python Matplotlib Full Guide<\/a><\/li>\n<li><a href=\"https:\/\/blog.finxter.com\/best-matplotlib-cheat-sheet\/\" data-type=\"post\" data-id=\"22273\" target=\"_blank\" rel=\"noreferrer noopener\">Matplotlib Cheat Sheets<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Problem Formulation Given three arrays: The first two arrays x and y of length n contain the (x_i, y_i) data of a 2D coordinate system. The third array c provides categorical label information so we essentially get n data bundles (x_i, y_i, c_i) for an arbitrary number of categories c_i. Question: How to plot the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-123709","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/123709","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=123709"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/123709\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=123709"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=123709"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=123709"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}