[Tut] Python List of Dicts to Pandas DataFrame - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python List of Dicts to Pandas DataFrame (/thread-100969.html) |
[Tut] Python List of Dicts to Pandas DataFrame - xSicKxBot - 04-11-2023 Python List of Dicts to Pandas DataFrame <div> <div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"1271692","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"1","legendonly":"","readonly":"","score":"5","starsonly":"","best":"5","gap":"5","greet":"Rate this post","legend":"5\/5 - (1 vote)","size":"24","title":"Python List of Dicts to Pandas DataFrame","width":"142.5","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'> <div class="kksr-stars"> <div class="kksr-stars-inactive"> <div class="kksr-star" data-star="1" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="2" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="3" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="4" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="5" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> <div class="kksr-stars-active" style="width: 142.5px;"> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> </div> <div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div> </p></div> <p>In this article, I will discuss a popular and efficient way to work with structured data in Python using DataFrames. </p> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> A <strong>DataFrame</strong> is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a table or a spreadsheet with rows and columns that can hold a variety of data types. </p> <p>One common challenge is <strong><em>converting a Python list of dictionaries into a DataFrame</em></strong>.</p> <p class="has-global-color-8-background-color has-background"><strong>To create a DataFrame from a Python list of dicts, you can use the <code>pandas.DataFrame(list_of_dicts)</code> constructor.</strong></p> <p>Here’s a minimal example: </p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="4" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd list_of_dicts = [{'key1': 'value1', 'key2': 'value2'}, {'key1': 'value3', 'key2': 'value4'}] df = pd.DataFrame(list_of_dicts) </pre> <p>With this simple code, you can transform your list of dictionaries directly into a pandas DataFrame, giving you a clean and structured dataset to work with.</p> <p>A similar problem is discussed in this Finxter blog post: </p> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/how-to-convert-list-of-lists-to-a-pandas-dataframe/" data-type="post" data-id="7942" target="_blank" rel="noreferrer noopener">How to Convert List of Lists to a Pandas Dataframe</a></p> <figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/python-list-of-dicts-to-pandas-dataframe/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FpcF4rYfqs34%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure> <h2 class="wp-block-heading">Converting Python List of Dicts to DataFrame</h2> <p>Let’s go through various methods and techniques, including using the DataFrame constructor, handling missing data, and assigning column names and indexes. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f603.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Using DataFrame Constructor</h3> <p>The simplest way to convert a list of dictionaries to a DataFrame is by using the pandas DataFrame constructor. You can do this in just one line of code:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="3" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd data = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}] df = pd.DataFrame(data) </pre> <p>Now, <code>df</code> is a DataFrame with the contents of the list of dictionaries. Easy peasy! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f60a.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Handling Missing Data</h3> <p>When your list of dictionaries contains missing keys or values, pandas automatically fills in the gaps with <code><a href="https://blog.finxter.com/check-for-nan-values-in-python/" data-type="post" data-id="273492" target="_blank" rel="noreferrer noopener">NaN</a></code> values. Let’s see an example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">data = [{'a': 1, 'b': 2}, {'a': 3, 'c': 4}] df = pd.DataFrame(data) </pre> <p>The resulting DataFrame will have <code>NaN</code> values in the missing spots:</p> <pre class="wp-block-preformatted"><code> a b c 0 1 2.0 NaN 1 3 NaN 4.0 </code></pre> <p>No need to manually handle missing data! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f44d.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Assigning Column Names and Indexes</h3> <p>You may want to assign custom column names or indexes when creating the DataFrame. To do this, use the columns and index parameters:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">column_names = ['col_1', 'col_2', 'col_3'] index_names = ['row_1', 'row_2'] df = pd.DataFrame(data, columns=column_names, index=index_names) </pre> <p>This will create a DataFrame with the specified column names and index labels:</p> <pre class="wp-block-preformatted"><code> col_1 col_2 col_3 row_1 1.0 2.0 NaN row_2 3.0 NaN 4.0</code></pre> <h2 class="wp-block-heading">Working with the Resulting DataFrame</h2> <p>Once you’ve converted your Python list of dictionaries into a pandas DataFrame, you can work with the data in a more structured and efficient way. </p> <p>In this section, I will discuss three common operations you may want to perform with a DataFrame: </p> <ul> <li>filtering and selecting data, </li> <li>sorting and grouping data, and </li> <li>applying functions and calculations. </li> </ul> <p>Let’s dive into each of these sub-sections! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f603.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Filtering and Selecting Data</h3> <p>Working with data in a DataFrame allows you to easily filter and select specific data using various techniques. To select specific columns, you can use either DataFrame column names or the <code>loc</code> and <code>iloc</code> methods.</p> <figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/python-list-of-dicts-to-pandas-dataframe/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FJQBOpbhxQrM%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" data-type="URL" data-id="https://blog.finxter.com/pandas-loc-and-iloc-a-simple-guide-with-video/" target="_blank" rel="noreferrer noopener">Pandas loc() and iloc() – A Simple Guide with Video</a></p> <p>For example, if you need to select columns A and B from your DataFrame, you can use the following approach:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> selected_columns = df[['A', 'B']] </pre> <p>If you want to filter rows based on certain conditions, you can use <a href="https://blog.finxter.com/pandas-dataframe-indexing/" data-type="post" data-id="64801" target="_blank" rel="noreferrer noopener">boolean indexing</a>:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> filtered_data = df[(df['A'] > 5) & (df['B'] < 10)] </pre> <p>This will return all the rows where column A contains values greater than 5 and column B contains values less than 10. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Sorting and Grouping Data</h3> <p>Sorting your DataFrame can make it easier to analyze and visualize the data. You can sort the data using the <code>sort_values</code> method, specifying the column(s) to sort by and the sorting order:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> sorted_data = df.sort_values(by=['A'], ascending=True) </pre> <p>Grouping data is also a powerful operation to perform statistical analysis or data aggregation. You can use the <code>groupby</code> method to group the data by a specific column:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> grouped_data = df.groupby(['A']).sum() </pre> <p>In this case, I’m grouping the data by column A and aggregating the values using the sum function. These operations can help you better understand patterns and trends in your data. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4ca.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Applying Functions and Calculations</h3> <p>DataFrames allow you to easily apply functions and calculations on your data. You can use the <code><a href="https://blog.finxter.com/the-pandas-apply-function/" data-type="post" data-id="37756" target="_blank" rel="noreferrer noopener">apply</a></code> and <code><a href="https://blog.finxter.com/how-to-apply-a-function-to-each-cell-in-a-pandas-dataframe/" data-type="post" data-id="595293" target="_blank" rel="noreferrer noopener">applymap</a></code> methods to apply functions to columns, rows, or individual cells.</p> <p>For example, if you want to calculate the square of each value in column A, you can use the <code>apply</code> method:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> df['A_squared'] = df['A'].apply(lambda x: x**2) </pre> <p>Alternatively, if you need to apply a function to all cells in the DataFrame, you can use the <code>applymap</code> method:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> df_cleaned = df.applymap(lambda x: x.strip() if isinstance(x, str) else x) </pre> <p>In this example, I’m using <code>applymap</code> to <a href="https://blog.finxter.com/python-string-strip/" data-type="post" data-id="26104" target="_blank" rel="noreferrer noopener">strip</a> all strings in the DataFrame, removing any unnecessary whitespace. Utilizing these methods will make your data processing and analysis tasks more efficient and easier to manage. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4aa.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <hr class="wp-block-separator has-alpha-channel-opacity"/> <p>To keep improving your data science skills, make sure you know what you’re going yourself into: <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f447.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> </p> <div class="wp-block-image"> <figure class="aligncenter size-full"><a href="https://blog.finxter.com/data-scientist-income-and-opportunity/" target="_blank" rel="noreferrer noopener"><img decoding="async" loading="lazy" width="987" height="567" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-71.png" alt="" class="wp-image-1271712" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-71.png 987w, https://blog.finxter.com/wp-content/uploads/2023/04/image-71-300x172.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-71-768x441.png 768w" sizes="(max-width: 987px) 100vw, 987px" /></a></figure> </div> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/data-scientist-income-and-opportunity/" data-type="post" data-id="332478" target="_blank" rel="noreferrer noopener">Data Scientist – Income and Opportunity</a></p> </div> https://www.sickgaming.net/blog/2023/04/06/python-list-of-dicts-to-pandas-dataframe/ |