{"id":121910,"date":"2020-12-11T19:18:26","date_gmt":"2020-12-11T19:18:26","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=18235"},"modified":"2020-12-11T19:18:26","modified_gmt":"2020-12-11T19:18:26","slug":"pandas-apply-a-helpful-illustrated-guide","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2020\/12\/11\/pandas-apply-a-helpful-illustrated-guide\/","title":{"rendered":"Pandas apply() \u2014 A Helpful Illustrated Guide"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"624\" height=\"289\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-13.png\" alt=\"\" class=\"wp-image-18236\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-13.png 624w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-13-300x139.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-13-150x69.png 150w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure>\n<p>The Pandas <code>apply( )<\/code> function is used to apply the functions on the <a href=\"https:\/\/blog.finxter.com\/pandas-quickstart\/\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"10 Minutes to Pandas (in 5 Minutes)\">Pandas <\/a>objects. We have so many built-in aggregation functions in pandas on Series and DataFrame objects. But, to apply some application-specific functions, we can leverage the <code>apply( )<\/code> function. Pandas <code>apply( )<\/code> is both the Series method and DataFrame method.<\/p>\n<h1>Pandas apply function to one column &#8211; apply( ) as Series method<\/h1>\n<p>Let\u2019s construct a DataFrame in which we have the information of 4 persons.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import pandas as pd\n>>> df = pd.DataFrame(\n... {\n... 'Name': ['Edward', 'Natalie', 'Chris M', 'Priyatham'],\n... 'Sex' : ['M', 'F', 'M', 'M'],\n... 'Age': [45, 35, 29, 26],\n... 'weight(kgs)': [68.4, 58.2, 64.3, 53.1]\n... }\n... ) >>> print(df) Name Sex Age weight(kgs)\n0 Edward M 45 68.4\n1 Natalie F 35 58.2\n2 Chris M M 29 64.3\n3 Priyatham M 26 53.1<\/pre>\n<p><code>pandas.Series.apply<\/code> takes any of the below two different kinds of functions as an argument.\u00a0 They are:<\/p>\n<ul>\n<li>Python functions<\/li>\n<li>Numpy\u2019s universal functions (ufuncs)<\/li>\n<\/ul>\n<h2>1. Python functions<\/h2>\n<p>In Python, there are 3 different kinds of functions in general;<\/p>\n<ul>\n<li>Built-in functions<\/li>\n<li>User-defined functions<\/li>\n<li>Lambda functions<\/li>\n<\/ul>\n<h2>a) Applying Python built-in functions on Series<\/h2>\n<p>If we would like to know the length of the names of each person, we can do so using the <code>len( )<\/code> function in python.<\/p>\n<p>For example, if we want to know the length of the \u201cPython\u201d string, we can get by the following code;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> len(\"Python\")\n6<\/pre>\n<p>A single column in the DataFrame is a Series object. Now, we would like to apply the same <code>len( )<\/code> function on the whole \u201cName\u201d column of the DataFrame.\u00a0 This can be achieved using the <code>apply( )<\/code> function in the below code;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> df['Name'].apply(len)\n0 6\n1 7\n2 7\n3 9\nName: Name, dtype: int64<\/pre>\n<p>If you observe the above code snippet, the <code>len<\/code> inside the <code>apply( )<\/code> function is not taking any argument. In general, any function takes some data to operate on them. In the <code>len(\u201cPython\u201d)<\/code> code snippet, it\u2019s taking the <code>\u201cPython\u201d<\/code> string as input data to calculate its length. Here, the input data is directly taken from the Series object that called the function using <code>apply( )<\/code>.<\/p>\n<p>When applying the Python functions, each value in the Series is applied one by one and returns the Series object.<\/p>\n<p>The above process can be visualised as:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/media.giphy.com\/media\/LwwQ79YQfRgE7XVUuL\/source.gif\" alt=\"\"\/><\/figure>\n<\/div>\n<p>In the above visualisation, you can observe that each element of Series is applied to the function one by one.<\/p>\n<h2>b) Applying user-defined functions on Series<\/h2>\n<p>Let\u2019s assume that the data we have is a year old. So, we would like to update the age of each person by adding 1. We can do so by applying a user-defined function on the Series object using the <code>apply( )<\/code> method.<\/p>\n<p>The code for it is,<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> def add_age(age):\n... return age + 1 >>> df['Age'].apply(add_age)\n0 46\n1 36\n2 30\n3 27\nName: Age, dtype: int64 >>> df['Age'] = df['Age'].apply(add_age) >>> df Name Sex Age weight(kgs)\n0 Edward M 46 68.4\n1 Natalie F 36 58.2\n2 Chris M M 30 64.3\n3 Priyatham M 27 53.1<\/pre>\n<p>From the above result, the major point to be noted is,<\/p>\n<ul>\n<li>The index of the resultant <a href=\"https:\/\/blog.finxter.com\/pandas-quickstart\/\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"10 Minutes to Pandas (in 5 Minutes)\">Series <\/a>is equal to the index of the caller Series object. This makes the process of adding the resultant Series as a column to the DataFrame easier.<\/li>\n<\/ul>\n<p>It operates in the same way as applying built-in functions. Each element in the Series is passed one by one to the function.<\/p>\n<ul>\n<li><strong>&nbsp;<\/strong><strong>User-defined functions are used majorly when we would like to apply some application-specific complex functions.<\/strong><\/li>\n<\/ul>\n<h2>c) Applying Lambda functions on Series<\/h2>\n<p>Lambda functions are used a lot along with the <code>apply( ) <\/code>method. We used a user-defined function for an easy addition operation in the above section. Let\u2019s achieve the same result using a Lambda function.<\/p>\n<p>The code for it is,<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> df['Age'].apply(lambda x: x+1)\n0 46\n1 36\n2 30\n3 27\nName: Age, dtype: int64 >>> # Comparing the results of applying both the user-defined function and Lambda function\n>>> df['Age'].apply(lambda x: x+1) == df['Age'].apply(add_age)\n0 True\n1 True\n2 True\n3 True\nName: Age, dtype: bool<\/pre>\n<p>From the above result, you can observe the results of applying the user-defined function and Lambda function are the same.<\/p>\n<ul>\n<li><strong><a href=\"https:\/\/blog.finxter.com\/a-simple-introduction-of-the-lambda-function-in-python\/\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"Lambda Functions in Python: A Simple Introduction\">Lambda functions<\/a> are used majorly when we would like to apply some application-specific small functions.<\/strong><\/li>\n<\/ul>\n<h2>2. Numpy\u2019s universal functions (ufuncs)<\/h2>\n<p><a href=\"https:\/\/blog.finxter.com\/numpy-tutorial\/\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"NumPy Tutorial \u2013 Everything You Need to Know to Get Started\">Numpy <\/a>has so many built-in universal functions (<a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/ufuncs.html\">ufuncs<\/a>). We can provide any of the ufuncs as an argument to the <code>apply( )<\/code> method on Series. A series object can be thought of as a NumPy array.<\/p>\n<p>The difference between applying Python functions and ufuncs is;<\/p>\n<ul>\n<li>When applying the Python Functions, each element in the Series is operated one by one.<\/li>\n<li>When applying the ufuncs, the entire Series is operated at once.<\/li>\n<\/ul>\n<p>Let\u2019s choose to use a ufunc to floor the floating-point values of the weight column. We have <code>numpy.floor( )<\/code> ufunc to achieve this.<\/p>\n<p>The code for it is,<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import numpy as np >>> df['weight(kgs)']\n0 68.4\n1 58.2\n2 64.3\n3 53.1\nName: weight(kgs), dtype: float64 >>> df['weight(kgs)'].apply(np.floor)\n0 68.0\n1 58.0\n2 64.0\n3 53.0\nName: weight(kgs), dtype: float64<\/pre>\n<p>In the above result, you can observe the floored to the nearest lower decimal point value and maintain its float64 data type.<\/p>\n<p>We can visualise the above process as:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/media.giphy.com\/media\/Glhk7adwbr0gRkvJdR\/source.gif\" alt=\"\"\/><\/figure>\n<\/div>\n<p>In the above visualisation, you can observe that all elements of Series are applied to the function at once.<\/p>\n<ul>\n<li><strong>Whenever we have a <code>ufunc<\/code> to achieve our functionality, we can use it instead of defining a Python function.<\/strong><\/li>\n<\/ul>\n<h1>Pandas apply( ) as a DataFrame method<\/h1>\n<p>We will take a look at the official documentation of the <code>apply( )<\/code> method on DataFrame:<\/p>\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"624\" height=\"155\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-16.png\" alt=\"\" class=\"wp-image-18240\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-16.png 624w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-16-300x75.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2020\/12\/image-16-150x37.png 150w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure>\n<p><code>pandas.DataFrame.apply<\/code> has two important arguments;<\/p>\n<ul>\n<li><code>func<\/code> &#8211; Function to be applied along the mentioned axis<\/li>\n<li><code>axis<\/code> &#8211; Axis along which function is applied<\/li>\n<\/ul>\n<p>Again the axis also has 2 possible values;<\/p>\n<ol type=\"1\">\n<li><code>axis=0<\/code> &#8211; Apply function to multiple columns<\/li>\n<li><code>axis=1<\/code> &#8211; Apply function to every row<\/li>\n<\/ol>\n<h2>1. Pandas apply function to multiple columns<\/h2>\n<p>Let\u2019s say the people in our dataset provided their height (in cms) information. It can be added using the following code,<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> df['height(cms)'] = [178, 160, 173, 168]\n>>> df Name Sex Age weight(kgs) height(cms)\n0 Edward M 45 68.4 178\n1 Natalie F 35 58.2 160\n2 Chris M M 29 64.3 173\n3 Priyatham M 26 53.1 168<\/pre>\n<p>We\u2019ll make the \u201cName\u201d column the index of the DataFrame. Also, we\u2019ll get the subset of the DataFrame with \u201cAge\u201d, \u201cweight(kgs)\u201d, and \u201cheight(cms)\u201d columns.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> data = df.set_index('Name')\n>>> data Sex Age weight(kgs) height(cms)\nName Edward M 45 68.4 178\nNatalie F 35 58.2 160\nChris M M 29 64.3 173\nPriyatham M 26 53.1 168 >>> data_subset = data[['Age', 'weight(kgs)', 'height(cms)']]\n>>> data_subset Age weight(kgs) height(cms)\nName Edward 45 68.4 178\nNatalie 35 58.2 160\nChris M 29 64.3 173\nPriyatham 26 53.1 168<\/pre>\n<p>If we would like to get the average age, weight, and height of all the people, we can use the numpy <code>ufunc<\/code> <code><a href=\"https:\/\/blog.finxter.com\/how-to-calculate-row-variance-numpy-array\/\" target=\"_blank\" rel=\"noreferrer noopener\" title=\"Python Numpy 101: How to Calculate the Row Variance of a Numpy 2D Array?\">numpy.mean( )<\/a><\/code>.<\/p>\n<p>The code for it is,<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import numpy as np\n>>> data_subset.apply(np.mean, axis=0)\nAge 33.75\nweight(kgs) 61.00\nheight(cms) 169.75\ndtype: float64<\/pre>\n<p>We directly have a Pandas DataFrame aggregation function called <code>mean( )<\/code> which does the same as above;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> data_subset.mean()\nAge 33.75\nweight(kgs) 61.00\nheight(cms) 169.75\ndtype: float64<\/pre>\n<p>If you observe the results above, the results of Pandas DataFrame aggregation function and applying <code>ufunc<\/code> are equal. So, we don\u2019t use the <code>apply( )<\/code> method in such simple scenarios where we have aggregation functions available.<\/p>\n<ul>\n<li><strong>Whenever you have to apply some complex functions on DataFrames, then use the <code>apply( )<\/code> method.<\/strong><\/li>\n<\/ul>\n<h2>2. Pandas apply function to every row<\/h2>\n<p>Based upon the height and weight, we can know whether they\u2019re fit or thin, or obese. The fitness criteria are different for men and women as setup by international standards. Let\u2019s grab the fitness criteria data for the heights and weights of the people in our data.<\/p>\n<p>This can be represented using a dictionary;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> male_fitness = {\n... #height : (weight_lower_cap, weight_upper_cap)\n... 178 : ( 67.5 , 83 ),\n... 173 : ( 63 , 70.6 ),\n... 168 : ( 58 , 70.7 )\n... }\n>>> female_fitness = {\n... #height : (weight_lower_cap, weight_upper_cap)\n... 160 : ( 47.2 , 57.6 )\n... }<\/pre>\n<p>In the above dictionary, the keys are the heights and the values are tuples of the lower and upper limit of ideal weight respectively.<\/p>\n<p>If someone is below the ideal weight for their respective height, they are \u201cThin\u201d. If someone is above the ideal weight for their respective height, they are \u201cObese\u201d. If someone is in the range of ideal weight for their respective height, they are \u201cFit\u201d.<\/p>\n<p>Let\u2019s build a function that can be used in the <code>apply( )<\/code> method that takes all the rows one by one.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> def fitness_check(seq):\n... if seq.loc['Sex'] == 'M':\n... if (seq.loc['weight(kgs)'] > male_fitness[seq.loc['height(cms)']][0]) &amp; (seq.loc['weight(kgs)'] &lt; male_fitness[seq.loc['height(cms)']][1]):\n... return \"Fit\"\n... elif (seq.loc['weight(kgs)'] &lt; male_fitness[seq.loc['height(cms)']][0]):\n... return \"Thin\"\n... else:\n... return \"Obese\"\n... else:\n... if (seq.loc['weight(kgs)'] > female_fitness[seq.loc['height(cms)']][0]) &amp; (seq.loc['weight(kgs)'] &lt; female_fitness[seq.loc['height(cms)']][1]):\n... return \"Fit\"\n... elif (seq.loc['weight(kgs)'] &lt; female_fitness[seq.loc['height(cms)']][0]):\n... return \"Thin\"\n... else:\n... return \"Obese\"<\/pre>\n<p>The function returns whether a given person is \u201cFit\u201d or \u201cThin\u201d or \u201cObese\u201d. It uses the different fitness criteria dictionaries for male and female created above.<\/p>\n<p>Finally, let\u2019s apply the above function to every row using the <code>apply( )<\/code> method;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> data.apply(fitness_check, axis=1)\nName\nEdward Fit\nNatalie Obese\nChris M Fit\nPriyatham Thin\ndtype: object<\/pre>\n<p>From the above result, we got to know who is Fit or Thin or Obese.<\/p>\n<h1>Conclusion and Next Steps<\/h1>\n<p>Using the <code>apply( )<\/code> method when you want to achieve some complex functionality is preferred and recommended. Mostly built-in aggregation functions in Pandas come in handy. If you liked this tutorial on the <code>apply( )<\/code> function and like quiz-based learning, please consider giving it a try to read our <a href=\"https:\/\/www.amazon.com\/gp\/product\/B08NG8QHW7\/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=B08NG8QHW7&amp;linkCode=as2&amp;tag=finxter-20&amp;linkId=05c3c3b09840ec9cc56d5a7cad5a6398\" target=\"_blank\" rel=\"noreferrer noopener\">Coffee Break Pandas<\/a> book.<\/p>\n<\/p>\n<p>The post <a href=\"https:\/\/blog.finxter.com\/pandas-apply-a-helpful-illustrated-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\">Pandas apply() &#8212; A Helpful Illustrated Guide<\/a> first appeared on <a href=\"https:\/\/blog.finxter.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Finxter<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Pandas apply( ) function is used to apply the functions on the Pandas objects. We have so many built-in aggregation functions in pandas on Series and DataFrame objects. But, to apply some application-specific functions, we can leverage the apply( ) function. Pandas apply( ) is both the Series method and DataFrame method. Pandas apply [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-121910","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/121910","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=121910"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/121910\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=121910"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=121910"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=121910"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}