Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python – Finding the Most Common Element in a Column

#1
Python – Finding the Most Common Element in a Column

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload="{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;644227&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\/5 - (1 vote)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}">
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 142.5px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div>
</div>
<h2 class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Problem Formulation and Solution Overview</h2>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">This article will show you how to find the most common element in a Pandas Column.</p>
<p>To make it more interesting, we have the following running scenario:</p>
<p><em>You have been provided with a <a rel="noreferrer noopener" href="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" data-type="URL" data-id="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" target="_blank">downloadable</a> CSV file containing crime statistics for the San Diego area, including their respective <a rel="noreferrer noopener" href="https://secure.ssa.gov/poms.nsf/lnx/0202613900" data-type="URL" data-id="https://secure.ssa.gov/poms.nsf/lnx/0202613900" target="_blank">NCIC</a> Crime Codes. </em></p>
<hr class="wp-block-separator has-alpha-channel-opacity wp-embed-aspect-16-9 wp-has-aspect-ratio"/>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio has-global-color-8-background-color has-background"><em><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4ac.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Question</strong>: How would you determine the most common <a rel="noreferrer noopener" href="https://secure.ssa.gov/poms.nsf/lnx/0202613900" data-type="URL" data-id="https://secure.ssa.gov/poms.nsf/lnx/0202613900" target="_blank">NCIC</a> Crime Code that occurs in San Diego’s jurisdiction?</em></p>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">We can accomplish this task by one of the following options:</p>
<ul type="video" class="wp-embed-aspect-16-9 wp-has-aspect-ratio">
<li><strong>Method </strong>1: Use Pandas <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-mode-method/" data-type="URL" data-id="https://blog.finxter.com/pandas-dataframe-mode-method/" target="_blank"><code>mode()</code></a></li>
<li><strong>Method </strong>2: Use Pandas <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a></li>
<li><strong>Method </strong>3: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-methods-idxmax-idxmin-reindex-reindex_like-rename-rename_axis/" data-type="URL" data-id="https://blog.finxter.com/pandas-dataframe-methods-idxmax-idxmin-reindex-reindex_like-rename-rename_axis/" target="_blank"><code>idxmax()</code></a></li>
<li><strong>Method 4</strong>: Use <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a> and <a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.keys.html" data-type="URL" data-id="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.keys.html" target="_blank"><code>keys()</code></a></li>
<li><strong>Method 5</strong>: Use Pandas <a rel="noreferrer noopener" href="https://blog.finxter.com/the-pandas-groupby-method/" data-type="URL" data-id="https://blog.finxter.com/the-pandas-groupby-method/" target="_blank"><code>groupby()</code></a></li>
</ul>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2 class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Preparation</h2>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Before moving forward, please ensure the <a href="https://blog.finxter.com/category/pandas-library/" data-type="URL" data-id="https://blog.finxter.com/category/pandas-library/" target="_blank" rel="noreferrer noopener">Pandas</a> library is installed. Click <a href="https://blog.finxter.com/how-to-install-pandas-in-python/" data-type="URL" data-id="https://blog.finxter.com/how-to-install-pandas-in-python/" target="_blank" rel="noreferrer noopener">here</a> if you require instructions.</p>
<p class="wp-embed-aspect-16-9 wp-has-aspect-ratio">Then, add the following code to the top of each script. This snippet will allow the code in this article to run error-free.</p>
<pre class="EnlighterJSRAW wp-embed-aspect-16-9 wp-has-aspect-ratio" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd</pre>
<p>After importing the <a rel="noreferrer noopener" href="https://blog.finxter.com/numpy-tutorial/" data-type="URL" data-id="https://blog.finxter.com/numpy-tutorial/" target="_blank"></a><a rel="noreferrer noopener" href="https://blog.finxter.com/category/pandas-library/" data-type="URL" data-id="https://blog.finxter.com/category/pandas-library/" target="_blank">Pandas</a> library, this library is referenced by calling the shortcode (<code>pd</code>).</p>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 1: Use Pandas mode()</h2>
<p class="has-global-color-8-background-color has-background">This example uses the <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-mode-method/" target="_blank"><code>mode()</code></a> method to determine the single most common crime committed in San Diego on a given day.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr'])
max_crime = df['crimedescr'].mode()
print(max_crime)</pre>
<p>The above code reads in the <code>crimedescr </code>column from the <a rel="noreferrer noopener" href="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" data-type="URL" data-id="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" target="_blank">crimes.csv</a> file downloaded earlier. This saves to the DataFrame <code>df</code>.</p>
<p>Next, the <code>crimedescr </code>column is then accessed, and the <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-mode-method/" data-type="URL" data-id="https://blog.finxter.com/pandas-dataframe-mode-method/" target="_blank"><code>mode()</code></a> method is appended. This method returns a value or set of values that appear most often along a selected axis. The results save to <code>max_crime</code>.</p>
<p>These results are output to the terminal.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><code>0 10851(A)VC TAKE VEH W/O OWNER<br />Name: crimedescr, dtype: object</code></td>
</tr>
</tbody>
</table>
</figure>
<p>So, out of 7,854 rows of crimes committed on a given day for San Diego, the above offense was committed the highest number of times.</p>
<p>The above code only provides us with the name of the most common crime; what if we need the crime name and the respective count?</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2-3" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
max_crime = df['crimedescr'].mode()
max_count = df['ucr_ncic_code'].mode() print(max_crime)
print(max_count)</pre>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<p>The above code is output to the terminal and displays the following.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><code>0 <strong>10851(A)VC TAKE VEH W/O OWNER</strong><br />Name: crimedescr, dtype: object<br />0 <strong>7000</strong><br />Name: ucr_ncic_code, dtype: int64</code></td>
</tr>
</tbody>
</table>
</figure>
<p>Now, you are equipped to return to your boss and tell them that 7,000 offenses of <strong>10851 (A) VC TAKE VEH W/O OWNER</strong> occurred on a given day in San Diego.</p>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/python-finding-the-most-common-element-in-a-column/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2F-JKVy_HliQE%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 2: Use value_counts()</h2>
<p class="has-global-color-8-background-color has-background">This example uses the <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a> function to determine the top 5 most common crimes committed in San Diego on a given day.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
top5_names = df['crimedescr'].value_counts()[:5].index.tolist()
print(top5_names)</pre>
<p>The above code reads in the <code>crimedescr</code> and <code>ucr_ncic_code </code>columns from the <a rel="noreferrer noopener" href="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" data-type="URL" data-id="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" target="_blank">crimes.csv</a> file downloaded earlier. This saves to the DataFrame <code>df</code>.</p>
<p>Then, the <code>crimedescr</code> column is accessed, and the <code><a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-mode-method/" data-type="URL" data-id="https://blog.finxter.com/pandas-dataframe-mode-method/" target="_blank"></a><a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank">value_counts()</a></code> function is appended. This function returns a series containing the counts of unique values. </p>
<p>However, since <a rel="noreferrer noopener" href="https://blog.finxter.com/introduction-to-slicing-in-python/" data-type="URL" data-id="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank">slicing</a> is also appended (<code>[:5]</code>), only the top five (5) common crimes are retrieved and then converted to a <a rel="noreferrer noopener" href="https://blog.finxter.com/python-lists/" data-type="URL" data-id="https://blog.finxter.com/python-lists/" target="_blank">List</a>. The results save to <code>top5_names</code>.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><code>['10851(A)VC TAKE VEH W/O OWNER', 'TOWED/STORED VEH-14602.6', '459 PC BURGLARY VEHICLE', 'TOWED/STORED VEHICLE', '459 PC BURGLARY RESIDENCE']</code></td>
</tr>
</tbody>
</table>
</figure>
<p>The above code only provides us with the names of the top 5 most common crimes; what if we need the names and their respective counts?</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
top5 = df['crimedescr'].value_counts()[:5].sort_values(ascending=False)
print(top5)</pre>
<p>The above output is sent to the terminal.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td>10851(A)VC TAKE VEH W/O OWNER</td>
<td>653</td>
</tr>
<tr>
<td>TOWED/STORED VEH-14602.6</td>
<td>463</td>
</tr>
<tr>
<td>459 PC BURGLARY VEHICLE</td>
<td>462</td>
</tr>
<tr>
<td>TOWED/STORED VEHICLE</td>
<td>434</td>
</tr>
<tr>
<td>459 PC BURGLARY RESIDENCE</td>
<td>356</td>
</tr>
<tr>
<td>Name: crimedescr, dtype: int64</td>
<td></td>
</tr>
</tbody>
</table>
</figure>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/python-finding-the-most-common-element-in-a-column/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FD2ZueuWXST8%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<p>A cleaner way to achieve the same results is to use the following code.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
top5 = df['crimedescr'].value_counts().nlargest(5)
print(top5)</pre>
<p>The above code calls the <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-nlargest-method/" data-type="URL" data-id="https://blog.finxter.com/pandas-dataframe-nlargest-method/" target="_blank"><code>nlargest()</code></a> method to determine and retrieve the top five (5) common crimes. The output is identical to the above.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td>10851(A)VC TAKE VEH W/O OWNER</td>
<td>653</td>
</tr>
<tr>
<td>TOWED/STORED VEH-14602.6</td>
<td>463</td>
</tr>
<tr>
<td>459 PC BURGLARY VEHICLE</td>
<td>462</td>
</tr>
<tr>
<td>TOWED/STORED VEHICLE</td>
<td>434</td>
</tr>
<tr>
<td>459 PC BURGLARY RESIDENCE</td>
<td>356</td>
</tr>
<tr>
<td>Name: crimedescr, dtype: int64</td>
<td></td>
</tr>
</tbody>
</table>
</figure>
<p>A much cleaner and more precise output to send to the boss!</p>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 3: Use value_counts() and idxmax()</h2>
<p class="has-global-color-8-background-color has-background">This example uses <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-methods-idxmax-idxmin-reindex-reindex_like-rename-rename_axis/" target="_blank"><code>idxmax()</code></a> to determine the single most common crime committed in San Diego on a given day.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
max_crime = df['crimedescr'].value_counts().idxmax()
print(max_crime)</pre>
<p>The above code reads in the <code>crimedescr</code> and <code>ucr_ncic_code </code>columns from the <a rel="noreferrer noopener" href="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" data-type="URL" data-id="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" target="_blank">crimes.csv</a> file downloaded earlier. This saves to the DataFrame <code>df</code>.</p>
<p>Then, the <code>crimedescr </code>column is accessed, and the <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-mode-method/" data-type="URL" data-id="https://blog.finxter.com/pandas-dataframe-mode-method/" target="_blank"></a><a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" data-type="URL" data-id="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a> function is appended. This function returns a series containing the count of unique values.</p>
<p>Next, <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-methods-idxmax-idxmin-reindex-reindex_like-rename-rename_axis/" target="_blank"><code>idxmax()</code></a> is appended. This method returns the index of the first occurrence of the maximum index(es) over a selected axis. </p>
<p>The results save to <code>max_crime</code> and are output to the terminal.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><code>10851(A)VC TAKE VEH W/O OWNER</code></td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 4: Use value_counts() and keys()</h2>
<p class="has-global-color-8-background-color has-background">This example uses <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-count-occurrences-of-elements-in-pandas/" target="_blank"><code>value_counts()</code></a> and <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-dataframe-methods-idxmax-idxmin-reindex-reindex_like-rename-rename_axis/" target="_blank"></a><a rel="noreferrer noopener" href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.keys.html" target="_blank"><code>keys()</code></a> to determine the top 5 most common crimes committed in unique grid areas of San Diego on a given day.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'grid', 'ucr_ncic_code'])
top5_grids = df['grid'].value_counts().keys()[:5]
print(top5_grids)</pre>
<p>The above code reads in the <code>crimedescr</code>, <code>grid</code>, and the <code>ucr_ncic_code </code>columns from the <a rel="noreferrer noopener" href="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" data-type="URL" data-id="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" target="_blank">crimes.csv</a> file downloaded earlier. This saves to the DataFrame <code>df</code>.</p>
<p>Let’s break the highlighted line down. </p>
<p>If <code>df['grid'].value_counts()</code> was output to the terminal, the following would display (snippet). However, we have added a heading row to make it more understandable, and only five (5) rows are displayed.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><strong>Grid #</strong></td>
<td><strong>Grid Total</strong></td>
</tr>
<tr>
<td>742</td>
<td>115</td>
</tr>
<tr>
<td>969</td>
<td>105</td>
</tr>
<tr>
<td>958 </td>
<td>100</td>
</tr>
<tr>
<td>564</td>
<td>80</td>
</tr>
<tr>
<td>1084</td>
<td>71</td>
</tr>
</tbody>
</table>
</figure>
<p>Next, the code <code>keys()[:5]</code> is appended. The final output displays as follows.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><code>Int64Index([742, 969, 958, 564, 1084], dtype='int64')</code></td>
</tr>
</tbody>
</table>
</figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Method 5: Use groupby()</h2>
<p class="has-global-color-8-background-color has-background">This examples uses <a rel="noreferrer noopener" href="https://blog.finxter.com/the-pandas-groupby-method/" data-type="URL" data-id="https://blog.finxter.com/the-pandas-groupby-method/" target="_blank"><code>groupby()</code></a> to group our data on the Crime Code and displays the totals in descending order.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code']) res = (df.groupby(['ucr_ncic_code','crimedescr']).size() .sort_values(ascending=False) .reset_index(name='count'))
print(res)</pre>
<p>The above code reads in the <code>crimedescr</code> and the <code>ucr_ncic_code </code>columns from the <a rel="noreferrer noopener" href="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" data-type="URL" data-id="https://blog.finxter.com/wp-content/uploads/2022/02/crimes.csv" target="_blank">crimes.csv</a> file downloaded earlier. This saves to the DataFrame <code>df</code>.</p>
<p>Next, the <a rel="noreferrer noopener" href="https://blog.finxter.com/the-pandas-groupby-method/" data-type="URL" data-id="https://blog.finxter.com/the-pandas-groupby-method/" target="_blank"><code>groupby()</code></a> function is called and passed the first argument: <code>df.groupby(['ucr_ncic_code','crimedescr']).size()</code>. If this was output to the terminal at this point, the following would display (snippet).</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">print(df.groupby(['ucr_ncic_code','crimedescr']).size())</pre>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td>ucr_ncic_code</td>
<td>crimedescr</td>
<td></td>
</tr>
<tr>
<td>909 </td>
<td></td>
<td>2</td>
</tr>
<tr>
<td>999</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>197</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>664</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>1099</td>
<td></td>
<td>1</td>
</tr>
</tbody>
</table>
</figure>
<p>As you can see, the other arguments need to be added to turn this into something usable. Sorting the data in descending order and adding a count column will provide the results we are looking for. </p>
<p>If the original Method 5 code example was output to the terminal, the following would display.</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td></td>
<td>ucr_ncic_code</td>
<td>crimedescr</td>
<td>count</td>
</tr>
<tr>
<td>0</td>
<td>2404</td>
<td>10851(A)VC TAKE VEH W/O OWNER</td>
<td>653</td>
</tr>
<tr>
<td>1</td>
<td>7000</td>
<td>TOWED/STORED VEH-14602.6</td>
<td>463</td>
</tr>
<tr>
<td>2</td>
<td>2299</td>
<td>459 PC BURGLARY VEHICLE</td>
<td>462</td>
</tr>
<tr>
<td>3</td>
<td>7000</td>
<td>TOWED/STORED VEHICLE</td>
<td>434</td>
</tr>
<tr>
<td>4</td>
<td>2204</td>
<td>459 PC BURGLARY RESIDENCE</td>
<td>356</td>
</tr>
</tbody>
</table>
</figure>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/python-finding-the-most-common-element-in-a-column/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FgUk4hMCuu2g%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Summary</h2>
<p>This article has provided five (5) ways to find the most common element in a Panda Column. These examples should provide you with enough information to select the one that best meets your coding requirements.</p>
<p>Good Luck &amp; Happy Coding!</p>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h2>Programming Humor – Python</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" width="518" height="588" src="https://blog.finxter.com/wp-content/uploads/2022/07/image-65.png" alt="" class="wp-image-471102" srcset="https://blog.finxter.com/wp-content/uploads/2022/07/image-65.png 518w, https://blog.finxter.com/wp-content/uplo...64x300.png 264w" sizes="(max-width: 518px) 100vw, 518px" /><figcaption><em>“I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I’m leaving you.”</em> — <a rel="noreferrer noopener" href="https://imgs.xkcd.com/comics/python.png" data-type="URL" data-id="https://imgs.xkcd.com/comics/python.png" target="_blank">xkcd</a></figcaption></figure>
</div>
</div>


https://www.sickgaming.net/blog/2022/09/...-a-column/
Reply



Forum Jump:


Users browsing this thread:
2 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016