{"id":111453,"date":"2020-04-12T15:50:41","date_gmt":"2020-04-12T15:50:41","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=7512"},"modified":"2020-04-12T15:50:41","modified_gmt":"2020-04-12T15:50:41","slug":"how-to-calculate-the-column-standard-deviation-of-a-dataframe-in-python-pandas","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2020\/04\/12\/how-to-calculate-the-column-standard-deviation-of-a-dataframe-in-python-pandas\/","title":{"rendered":"How to Calculate the Column Standard Deviation of a DataFrame in Python Pandas?"},"content":{"rendered":"<p>Want to calculate the standard deviation of a column in your <a rel=\"noreferrer noopener\" href=\"https:\/\/pandas.pydata.org\/\" target=\"_blank\">Pandas <\/a>DataFrame?<\/p>\n<p>In case you&#8217;ve attended your last statistics course a few years ago, let&#8217;s quickly recap the <strong>definition of variance<\/strong>: it&#8217;s the <em>average squared deviation of the list elements from the average value.<\/em><\/p>\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/04\/image.png\" alt=\"\" class=\"wp-image-7490\" width=\"185\" height=\"66\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/04\/image.png 305w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/04\/image-300x106.png 300w\" sizes=\"auto, (max-width: 185px) 100vw, 185px\" \/><\/figure>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/04\/image-1.png\" alt=\"\" class=\"wp-image-7491\"\/><\/figure>\n<\/div>\n<p><strong>You can do this by using the <code>pd.std()<\/code> function that calculates the standard deviation along all columns. You can then get the column you&#8217;re interested in after the computation.<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd # Create your Pandas DataFrame\nd = {'username': ['Alice', 'Bob', 'Carl'], 'age': [18, 22, 43], 'income': [100000, 98000, 111000]}\ndf = pd.DataFrame(d) print(df)<\/pre>\n<p>Your DataFrame looks like this:<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table>\n<tbody>\n<tr>\n<td><\/td>\n<td>username<\/td>\n<td>age<\/td>\n<td>income<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>Alice<\/td>\n<td>18<\/td>\n<td>100000<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>Bob<\/td>\n<td>22<\/td>\n<td>98000<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>Carl<\/td>\n<td>43<\/td>\n<td>111000<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>Here&#8217;s how you can calculate the standard deviation of all columns:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print(df.std())<\/pre>\n<p>The output is the standard deviation of all columns:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">age 13.428825\nincome 7000.000000\ndtype: float64<\/pre>\n<p>To get the variance of an individual column, access it using simple indexing:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print(df.std()['age'])\n# 180.33333333333334<\/pre>\n<p>Together, the code looks as follows. Use the interactive shell to play with it!<\/p>\n<p> <iframe loading=\"lazy\" src=\"https:\/\/repl.it\/@finxter\/pandasstddev?lite=true\" scrolling=\"no\" allowtransparency=\"true\" allowfullscreen=\"true\" sandbox=\"allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals\" width=\"100%\" height=\"700px\" frameborder=\"no\"><\/iframe> <\/p>\n<h2>Standard Deviation in NumPy Library<\/h2>\n<p>Python&#8217;s package for data science computation <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/numpy-tutorial\/\" target=\"_blank\">NumPy<\/a> also has great statistics functionality. You can calculate all basic statistics functions such as <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/python-list-average\/\" target=\"_blank\">average<\/a>, median, <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/how-to-calculate-variance-numpy-array\/\" target=\"_blank\">variance<\/a>, and <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/how-to-calculate-column-standard-deviation-2d-numpy-array\/\" target=\"_blank\">standard deviation<\/a> on NumPy arrays. Simply import the NumPy library and use the <code>np.var(a)<\/code> method to calculate the average value of NumPy array <code>a<\/code>.<\/p>\n<p>Here&#8217;s the code:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np a = np.array([1, 2, 3])\nprint(np.std(a))\n# 0.816496580927726\n<\/pre>\n<h2>Where to Go From Here?<\/h2>\n<p>Before you can become a data science master, you first need to master Python. <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/subscribe\/\" target=\"_blank\">Join my free Python email course <\/a>and receive your daily Python lesson directly in your INBOX. It&#8217;s fun!<\/p>\n<p><a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/subscribe\/\" target=\"_blank\">Join The World&#8217;s #1 Python Email Academy [+FREE Cheat Sheets as PDF]<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Want to calculate the standard deviation of a column in your Pandas DataFrame? In case you&#8217;ve attended your last statistics course a few years ago, let&#8217;s quickly recap the definition of variance: it&#8217;s the average squared deviation of the list elements from the average value. You can do this by using the pd.std() function that [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-111453","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/111453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=111453"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/111453\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=111453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=111453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=111453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}