[Tut] Python List of Lists Group By – A Simple Illustrated Guide - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python List of Lists Group By – A Simple Illustrated Guide (/thread-94990.html) |
[Tut] Python List of Lists Group By – A Simple Illustrated Guide - xSicKxBot - 05-11-2020 Python List of Lists Group By – A Simple Illustrated Guide <div><figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"> <div class="wp-block-embed__wrapper"> <div class="ast-oembed-container"><iframe title="Python List of Lists Group By – A Simple Illustrated Guide" width="1400" height="788" src="https://www.youtube.com/embed/NaItYmuwPcs?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div> </div> </figure> <p>This tutorial shows you how to group the inner <a rel="noreferrer noopener" href="https://blog.finxter.com/python-lists/" target="_blank">lists </a>of a Python <a href="https://blog.finxter.com/python-list-of-lists/">list of lists</a> by common element. There are three basic methods: </p> <ol> <li>Group the inner lists together by common element.</li> <li>Group the inner lists together by common element AND aggregating them (e.g. averaging).</li> <li>Group the inner lists together by common element AND aggregating them (e.g. averaging) using the <a href="https://pandas.pydata.org/" target="_blank" rel="noreferrer noopener">Pandas</a> external library.</li> </ol> <p>Before we explore these three options in more detail, let’s give you the quick solution first using the Pandas library in our interactive shell:</p> <figure><iframe src="https://repl.it/@finxter/pandaslistoflistsgroupby?lite=true" allowfullscreen="true" width="100%" height="1000px"></iframe></figure> <p>You can run this code in your browser. If you want to learn about the <a href="https://blog.finxter.com/python-crash-course/" target="_blank" rel="noreferrer noopener">Pythonic </a>alternatives or you need a few more explanations, then read on!</p> <h2>Method 1: Group List of Lists By Common Element in Dictionary</h2> <p><strong>Problem</strong>: Given a <a href="https://blog.finxter.com/python-list-of-lists/" target="_blank" rel="noreferrer noopener">list of lists</a>. Group the elements by common element and store the result in a <a href="https://blog.finxter.com/python-dictionary/" target="_blank" rel="noreferrer noopener">dictionary </a>(key = common element). </p> <figure class="wp-block-image size-large is-resized"><img src="https://blog.finxter.com/wp-content/uploads/2020/05/groupby-1024x576.jpg" alt="" class="wp-image-8405" width="768" height="432" srcset="https://blog.finxter.com/wp-content/uploads/2020/05/groupby-scaled.jpg 1024w, https://blog.finxter.com/wp-content/uploads/2020/05/groupby-300x169.jpg 300w, https://blog.finxter.com/wp-content/uploads/2020/05/groupby-768x432.jpg 768w" sizes="(max-width: 768px) 100vw, 768px" /></figure> <p><strong>Example</strong>: Say, you’ve got a database with multiple rows (the list of lists) where each row consists of three attributes: Name, Age, and Income. You want to group by Name and store the result in a dictionary. The dictionary keys are given by the Name attribute. The dictionary values are a list of rows that have this exact Name attribute.</p> <p><strong>Solution</strong>: Here’s the data and how you can group by a common attribute (e.g., Name).</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Database: # row = [Name, Age, Income] rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] # Create a dictionary grouped by Name d = {} for row in rows: # Add name to dict if not exists if row[0] not in d: d[row[0]] = [] # Add all non-Name attributes as a new list d[row[0]].append(row[1:]) print(d) # {'Alice': [[19, 45000], [33, 118000]], # 'Bob': [[18, 22000]], # 'Ann': [[26, 88000]]}</pre> <p>You can see that the result is a dictionary with one key per name (<code>'Alice'</code>, <code>'Bob'</code>, and <code>'Ann'</code>). Alice appears in two rows of the original database (list of lists). Thus, you associate two rows to her name—maintaining only the Age and Income attributes per row. </p> <p>The strategy how you accomplish this is simple: </p> <ul> <li>Create the empty dictionary.</li> <li>Go over each row in the list of lists. The first value of the row list is the Name attribute.</li> <li>Add the Name attribute <code>row[0]</code> to the dictionary if it doesn’t exist, yet—initializing the dictionary to the empty list. Now, you can be sure that the key exist in the dictionary.</li> <li>Append the <a rel="noreferrer noopener" href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank">sublist slic</a><a href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank" rel="noreferrer noopener">e</a><a rel="noreferrer noopener" href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank"> </a><code>[Age, Income]</code> to the dictionary value so that this becomes a list of lists as well—one list per database row.</li> <li>You’ve now grouped all database entries by a common attribute (=Name).</li> </ul> <p>So far, so good. But what if you want to perform some aggregation on the grouped database rows?</p> <h2>Method 2: Group List of Lists By Common Element and Aggregate Grouped Elements</h2> <p><strong>Problem</strong>: In the previous example, you’ve seen that each dictionary value is a list of lists because you store each row as a separate list. But what if you want to aggregate all grouped rows? </p> <p><strong>Example</strong>: The dictionary entry for the key <code>'Alice'</code> may be <code>[[19, 45000], [33, 118000]]</code> but you want to <a href="https://blog.finxter.com/how-to-average-a-list-of-lists-in-python/" target="_blank" rel="noreferrer noopener">average </a>the age and income values: <code>[(19+33)/2, (45000+118000)/2]</code>. How do you do that?</p> <p><strong>Solution</strong>: The solution is simply to add one post-processing step after the above code to aggregate all attributes using the <a rel="noreferrer noopener" href="https://blog.finxter.com/zip-unzip-python/" target="_blank"><code>zip()</code> function</a> as follows. Note that this is the exact same code as before (without aggregation) with three lines added at the end to aggregate the list of lists for each grouped Name into a single <a href="https://blog.finxter.com/python-list-average/" target="_blank" rel="noreferrer noopener">average</a> value.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Database: # row = [Name, Age, Income] rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] # Create a dictionary grouped by Name d = {} for row in rows: # Add name to dict if not exists if row[0] not in d: d[row[0]] = [] # Add all non-Name attributes as a new list d[row[0]].append(row[1:]) print(d) # {'Alice': [[19, 45000], [33, 118000]], # 'Bob': [[18, 22000]], # 'Ann': [[26, 88000]]} # AGGREGATION FUNCTION: for key in d: d[key] = [sum(x) / len(x) for x in zip(*d[key])] print(d) # {'Alice': [26.0, 81500.0], 'Bob': [18.0, 22000.0], 'Ann': [26.0, 88000.0]} </pre> <p>In the code, you use the aggregation function <code>sum(x) / len(x)</code> to calculate the average value for each attribute of the grouped rows. But you can replace this part with your own aggregation function such as <a href="https://blog.finxter.com/how-to-calculate-weighted-average-numpy-array-along-axis/" target="_blank" rel="noreferrer noopener">average</a>, <a href="https://blog.finxter.com/how-to-get-the-variance-of-a-list-in-python/" target="_blank" rel="noreferrer noopener">variance</a>, <a href="https://blog.finxter.com/python-list-length-whats-the-runtime-complexity-of-len/" target="_blank" rel="noreferrer noopener">length</a>, <a href="https://blog.finxter.com/how-to-get-the-key-with-minimum-value-in-a-python-dictionary/" target="_blank" rel="noreferrer noopener">minimum</a>, <a href="https://blog.finxter.com/how-to-get-the-key-with-the-maximum-value-in-a-dictionary/" target="_blank" rel="noreferrer noopener">maximum</a>, etc.</p> <p><strong>Explanation</strong>: </p> <ul> <li>You go over each key in the <a href="https://blog.finxter.com/python-dictionary/" target="_blank" rel="noreferrer noopener">dictionary </a>(the Name attribute) and aggregate the list of lists into a flat list of averaged attributes.</li> <li>You zip the attributes together. For example, <code>zip(*d['Alice'])</code> becomes <code>[[19, 33], [45000, 118000]]</code> (conceptually). </li> <li>You iterate over each list <code>x</code> of this list of lists in the <a rel="noreferrer noopener" href="https://blog.finxter.com/list-comprehension/" target="_blank">list comprehension</a> statement.</li> <li>You aggregate the grouped attributes using your own custom function (e.g. <code>sum(x) / len(x)</code> to average the attribute values). </li> </ul> <p>See what happens in this code snippet in this interactive memory visualization tool (by clicking “Next”):</p> <p> <iframe width="800" height="500" frameborder="0" src="https://pythontutor.com/iframe-embed.html#code=%23%20Database%3A%0A%23%20row%20%3D%20%5BName,%20Age,%20Income%5D%0Arows%20%3D%20%5B%5B'Alice',%2019,%2045000%5D,%0A%20%20%20%20%20%20%20%20%5B'Bob',%2018,%2022000%5D,%0A%20%20%20%20%20%20%20%20%5B'Ann',%2026,%2088000%5D,%0A%20%20%20%20%20%20%20%20%5B'Alice',%2033,%20118000%5D%5D%0A%0A%0A%23%20Create%20a%20dictionary%20grouped%20by%20Name%0Ad%20%3D%20%7B%7D%0Afor%20row%20in%20rows%3A%0A%0A%20%20%20%20%23%20Add%20name%20to%20dict%20if%20not%20exists%0A%20%20%20%20if%20row%5B0%5D%20not%20in%20d%3A%0A%20%20%20%20%20%20%20%20d%5Brow%5B0%5D%5D%20%3D%20%5B%5D%0A%0A%20%20%20%20%23%20Add%20all%20non-Name%20attributes%20as%20a%20new%20list%0A%20%20%20%20d%5Brow%5B0%5D%5D.append%28row%5B1%3A%5D%29%0A%0A%23%20AGGREGATION%20FUNCTION%3A%0Afor%20key%20in%20d%3A%0A%20%20%20%20d%5Bkey%5D%20%3D%20%5Bsum%28x%29%20/%20len%28x%29%20for%20x%20in%20zip%28*d%5Bkey%5D%29%5D%0A%0Aprint%28d%29%0A&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=0&heapPrimitives=nevernest&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false"> </iframe> </p> <h2>Method 3: Pandas GroupBy</h2> <p>The <a href="https://blog.finxter.com/pandas-cheat-sheets/" target="_blank" rel="noreferrer noopener">Pandas library</a> has its own powerful implementation of the <a rel="noreferrer noopener" href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html" target="_blank">groupby() function</a>. Have a look at the code first:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Database: # row = [Name, Age, Income] rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] import pandas as pd df = pd.DataFrame(rows) print(df) ''' 0 1 2 0 Alice 19 45000 1 Bob 18 22000 2 Ann 26 88000 3 Alice 33 118000 ''' print(df.groupby([0]).mean()) ''' 1 2 0 Alice 26 81500 Ann 26 88000 Bob 18 22000 '''</pre> <p><strong>Explanation</strong>:</p> <ul> <li>Import the pandas library. <a href="https://blog.finxter.com/pandas-cheat-sheets/" target="_blank" rel="noreferrer noopener">Find your quick refresher cheat sheets here.</a></li> <li>Create a DataFrame object from the rows—think of it as an Excel spreadsheet in your code (with numbered rows and columns). </li> <li>Call the <code>groupby()</code> function on your DataFrame. Use the column index <code>[0]</code> (which is the Name attribute) to group your data. This creates a <code>DataFrameGroupBy</code> object.</li> <li>On the <code>DataFrameGroupBy</code> object call the <code>mean()</code> function or any other aggregator function you want. </li> <li>The result is the “spreadsheet” with grouped Name attributes where multiple rows with the same Name attributes are averaged (element-wise).</li> </ul> <h2>Where to Go From Here?</h2> <p>Enough theory, let’s get some practice!</p> <p>To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?</p> <p><strong>Practice projects is how you sharpen your saw in coding!</strong></p> <p>Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?</p> <p>Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.</p> <p>Join my free webinar <a rel="noreferrer noopener" href="https://blog.finxter.com/webinar-freelancer/" target="_blank">“How to Build Your High-Income Skill Python”</a> and watch how I grew my coding business online and how you can, too—from the comfort of your own home.</p> <p><a href="https://blog.finxter.com/webinar-freelancer/" target="_blank" rel="noreferrer noopener">Join the free webinar now!</a></p> </div> https://www.sickgaming.net/blog/2020/05/10/python-list-of-lists-group-by-a-simple-illustrated-guide/ |