[Tut] How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas? - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas? (/thread-94519.html) |
[Tut] How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas? - xSicKxBot - 04-13-2020 How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas? <div><p>Want to calculate the standard deviation of a column in your <a rel="noreferrer noopener" href="https://pandas.pydata.org/" target="_blank">Pandas </a>DataFrame?</p> <p>In case you’ve attended your last statistics course a few years ago, let’s quickly recap the <strong>definition of variance</strong>: it’s the <em>average squared deviation of the list elements from the average value.</em></p> <figure class="wp-block-image size-large is-resized"><img src="https://blog.finxter.com/wp-content/uploads/2020/04/image.png" alt="" class="wp-image-7490" width="185" height="66" srcset="https://blog.finxter.com/wp-content/uploads/2020/04/image.png 305w, https://blog.finxter.com/wp-content/uploads/2020/04/image-300x106.png 300w" sizes="(max-width: 185px) 100vw, 185px" /></figure> <div class="wp-block-image"> <figure class="aligncenter size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/04/image-1.png" alt="" class="wp-image-7491"/></figure> </div> <p><strong>You can do this by using the <code>pd.std()</code> function that calculates the standard deviation along all columns. You can then get the column you’re interested in after the computation.</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd # Create your Pandas DataFrame d = {'username': ['Alice', 'Bob', 'Carl'], 'age': [18, 22, 43], 'income': [100000, 98000, 111000]} df = pd.DataFrame(d) print(df)</pre> <p>Your DataFrame looks like this:</p> <figure class="wp-block-table is-style-stripes"> <table> <tbody> <tr> <td></td> <td>username</td> <td>age</td> <td>income</td> </tr> <tr> <td>0</td> <td>Alice</td> <td>18</td> <td>100000</td> </tr> <tr> <td>1</td> <td>Bob</td> <td>22</td> <td>98000</td> </tr> <tr> <td>2</td> <td>Carl</td> <td>43</td> <td>111000</td> </tr> </tbody> </table> </figure> <p>Here’s how you can calculate the standard deviation of all columns:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">print(df.std())</pre> <p>The output is the standard deviation of all columns:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">age 13.428825 income 7000.000000 dtype: float64</pre> <p>To get the variance of an individual column, access it using simple indexing:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">print(df.std()['age']) # 180.33333333333334</pre> <p>Together, the code looks as follows. Use the interactive shell to play with it!</p> <p> <iframe src="https://repl.it/@finxter/pandasstddev?lite=true" scrolling="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals" width="100%" height="700px" frameborder="no"></iframe> </p> <h2>Standard Deviation in NumPy Library</h2> <p>Python’s package for data science computation <a rel="noreferrer noopener" href="https://blog.finxter.com/numpy-tutorial/" target="_blank">NumPy</a> also has great statistics functionality. You can calculate all basic statistics functions such as <a rel="noreferrer noopener" href="https://blog.finxter.com/python-list-average/" target="_blank">average</a>, median, <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-calculate-variance-numpy-array/" target="_blank">variance</a>, and <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-calculate-column-standard-deviation-2d-numpy-array/" target="_blank">standard deviation</a> on NumPy arrays. Simply import the NumPy library and use the <code>np.var(a)</code> method to calculate the average value of NumPy array <code>a</code>.</p> <p>Here’s the code:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import numpy as np a = np.array([1, 2, 3]) print(np.std(a)) # 0.816496580927726 </pre> <h2>Where to Go From Here?</h2> <p>Before you can become a data science master, you first need to master Python. <a rel="noreferrer noopener" href="https://blog.finxter.com/subscribe/" target="_blank">Join my free Python email course </a>and receive your daily Python lesson directly in your INBOX. It’s fun!</p> <p><a rel="noreferrer noopener" href="https://blog.finxter.com/subscribe/" target="_blank">Join The World’s #1 Python Email Academy [+FREE Cheat Sheets as PDF]</a></p> </div> https://www.sickgaming.net/blog/2020/04/12/how-to-calculate-the-column-standard-deviation-of-a-dataframe-in-python-pandas/ |