{"id":114180,"date":"2020-06-15T19:15:29","date_gmt":"2020-06-15T19:15:29","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=9780"},"modified":"2020-06-15T19:15:29","modified_gmt":"2020-06-15T19:15:29","slug":"python-join-list-of-dataframes","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2020\/06\/15\/python-join-list-of-dataframes\/","title":{"rendered":"Python Join List of DataFrames"},"content":{"rendered":"<p class=\"has-background has-luminous-vivid-amber-background-color\"><strong>To <a href=\"https:\/\/blog.finxter.com\/python-join-list\/\" target=\"_blank\" rel=\"noreferrer noopener\">join <\/a>a list of DataFrames, say <code>dfs<\/code>, use the <code>pandas.concat(dfs)<\/code> function that merges an arbitrary number of DataFrames to a single one.<\/strong><\/p>\n<p>When browsing <a rel=\"noreferrer noopener\" href=\"https:\/\/stackoverflow.com\/questions\/32444138\/concatenate-a-list-of-pandas-dataframes-together\" target=\"_blank\">StackOverflow<\/a>, I recently stumbled upon the following interesting problem. By thinking about solutions to those small data science problems, you can <a href=\"https:\/\/blog.finxter.com\/coffee-break-numpy\/\" target=\"_blank\" rel=\"noreferrer noopener\">improve your data science skills<\/a>, so let&#8217;s dive into the problem description.<\/p>\n<p><strong>Problem<\/strong>: Given a list of Pandas <a href=\"https:\/\/blog.finxter.com\/tilde-python-pandas-dataframe\/\" target=\"_blank\" rel=\"noreferrer noopener\">DataFrames<\/a>. How to merge them into a single DataFrame?<\/p>\n<p><strong>Example<\/strong>: You have the list of <a href=\"https:\/\/blog.finxter.com\/pandas-cheat-sheets\/\" target=\"_blank\" rel=\"noreferrer noopener\">Pandas <\/a>DataFrames:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">df1 = pd.DataFrame({'Alice' : [18, 'scientist', 24000], 'Bob' : [24, 'student', 12000]})\ndf2 = pd.DataFrame({'Alice' : [19, 'scientist', 25000], 'Bob' : [25, 'student', 11000]})\ndf3 = pd.DataFrame({'Alice' : [20, 'scientist', 26000], 'Bob' : [26, 'student', 10000]}) # List of DataFrames\ndfs = [df1, df2, df3]<\/pre>\n<p>Say, you want to get the following DataFrame:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> Alice Bob\n0 18 24\n1 scientist student\n2 24000 12000\n0 19 25\n1 scientist student\n2 25000 11000\n0 20 26\n1 scientist student\n2 26000 10000<\/pre>\n<p>You can try the solution quickly in our interactive Python shell:<\/p>\n<p> <iframe loading=\"lazy\" height=\"800px\" width=\"100%\" src=\"https:\/\/repl.it\/@finxter\/pandasmergedf?lite=true\" scrolling=\"no\" frameborder=\"no\" allowtransparency=\"true\" allowfullscreen=\"true\" sandbox=\"allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals\"><\/iframe> <\/p>\n<p><em><strong>Exercise<\/strong>: Print the resulting DataFrame. Run the code. Which merging strategy is used?<\/em><\/p>\n<h2>Method 1: Pandas Concat<\/h2>\n<p>This is the easiest and most straightforward way to concatenate multiple DataFrames.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd df1 = pd.DataFrame({'Alice' : [18, 'scientist', 24000], 'Bob' : [24, 'student', 12000]})\ndf2 = pd.DataFrame({'Alice' : [19, 'scientist', 25000], 'Bob' : [25, 'student', 11000]})\ndf3 = pd.DataFrame({'Alice' : [20, 'scientist', 26000], 'Bob' : [26, 'student', 10000]}) # list of dataframes\ndfs = [df1, df2, df3] df = pd.concat(dfs)<\/pre>\n<p>This generates the following output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print(df) ''' Alice Bob\n0 18 24\n1 scientist student\n2 24000 12000\n0 19 25\n1 scientist student\n2 25000 11000\n0 20 26\n1 scientist student\n2 26000 10000 '''<\/pre>\n<p>The resulting DataFrames contains all original data from all three DataFrames.<\/p>\n<h2>Method 2: Reduce + DataFrame Merge<\/h2>\n<p>The following method uses the reduce function to repeatedly merge together all dictionaries in the list (no matter its size). To merge two dictionaries, the <a rel=\"noreferrer noopener\" href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.merge.html\" target=\"_blank\"><code>df.merge()<\/code><\/a> method is used. You can use several merging strategies&#8212;in the example, you use <code>\"outer\"<\/code>:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd df1 = pd.DataFrame({'Alice' : [18, 'scientist', 24000], 'Bob' : [24, 'student', 12000]})\ndf2 = pd.DataFrame({'Alice' : [19, 'scientist', 25000], 'Bob' : [25, 'student', 11000]})\ndf3 = pd.DataFrame({'Alice' : [20, 'scientist', 26000], 'Bob' : [26, 'student', 10000]}) # list of dataframes\ndfs = [df1, df2, df3] # Method 2\nfrom functools import reduce\ndf = reduce(lambda df1, df2: df1.merge(df2, \"outer\"), dfs)<\/pre>\n<p>This generates the following output:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print(df) ''' Alice Bob\n0 18 24\n1 scientist student\n2 24000 12000\n3 19 25\n4 25000 11000\n5 20 26\n6 26000 10000 '''<\/pre>\n<p>You can find a discussion of the different merge strategies <a rel=\"noreferrer noopener\" href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.merge.html\" target=\"_blank\">here<\/a>. If you&#8217;d use the parameter <code>\"inner\"<\/code>, you&#8217;d obtain the following result:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"> Alice Bob\n0 scientist student<\/pre>\n<h2>Where to Go From Here?<\/h2>\n<p>Enough theory, let\u2019s get some practice!<\/p>\n<p>To become successful in coding, you need to get out there and solve real problems for real people. That\u2019s how you can become a six-figure earner easily. And that\u2019s how you polish the skills you really need in practice. After all, what\u2019s the use of learning theory that nobody ever needs?<\/p>\n<p><strong>Practice projects is how you sharpen your saw in coding!<\/strong><\/p>\n<p>Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?<\/p>\n<p>Then become a Python freelance developer! It\u2019s the best way of approaching the task of improving your Python skills\u2014even if you are a complete beginner.<\/p>\n<p>Join my free webinar <a rel=\"noreferrer noopener\" href=\"https:\/\/blog.finxter.com\/webinar-freelancer\/\" target=\"_blank\">\u201cHow to Build Your High-Income Skill Python\u201d<\/a> and watch how I grew my coding business online and how you can, too\u2014from the comfort of your own home.<\/p>\n<p><a href=\"https:\/\/blog.finxter.com\/webinar-freelancer\/\" target=\"_blank\" rel=\"noreferrer noopener\">Join the free webinar now!<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>To join a list of DataFrames, say dfs, use the pandas.concat(dfs) function that merges an arbitrary number of DataFrames to a single one. When browsing StackOverflow, I recently stumbled upon the following interesting problem. By thinking about solutions to those small data science problems, you can improve your data science skills, so let&#8217;s dive into [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-114180","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/114180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=114180"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/114180\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=114180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=114180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=114180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}