[Tut] How to Create a DataFrame From Lists? - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] How to Create a DataFrame From Lists? (/thread-100421.html) |
[Tut] How to Create a DataFrame From Lists? - xSicKxBot - 12-17-2022 How to Create a DataFrame From Lists? <div> <div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"985131","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"1","legendonly":"","readonly":"","score":"5","best":"5","gap":"5","greet":"Rate this post","legend":"5\/5 - (1 vote)","size":"24","width":"142.5","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'> <div class="kksr-stars"> <div class="kksr-stars-inactive"> <div class="kksr-star" data-star="1" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="2" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="3" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="4" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="5" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> <div class="kksr-stars-active" style="width: 142.5px;"> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> </div> <div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div> </div> <p>Pandas is a great library for data analysis in Python. With Pandas, you can create visualizations, filter rows or columns, add new columns, and save the data in a wide range of formats. The workhorse of Pandas is the <strong>DataFrame</strong>. </p> <p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f449.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/pandas-quickstart/" data-type="post" data-id="16511" target="_blank" rel="noreferrer noopener">10 Minutes to Pandas (in 5 Minutes)</a></p> <p>So the first step working with Pandas is often to get our data into a DataFrame. If we have data stored in <a href="https://blog.finxter.com/python-lists/" data-type="post" data-id="7332" target="_blank" rel="noreferrer noopener">lists</a>, how can we create this all-powerful DataFrame? </p> <p>There are 4 basic strategies:</p> <ol type="1"> <li>Create a <a href="https://blog.finxter.com/python-dictionary/" data-type="post" data-id="5232" target="_blank" rel="noreferrer noopener">dictionary</a> with column names as keys and your lists as values. Pass this dictionary as an argument when creating the DataFrame.</li> <li>Pass your lists into the <code><a href="https://blog.finxter.com/python-ziiiiiiip-a-helpful-guide/" data-type="post" data-id="1938" target="_blank" rel="noreferrer noopener">zip()</a></code> function. As with strategy 1, your lists will become columns in the DataFrame.</li> <li>Put your lists into a list instead of a dictionary. In this case, your lists become rows instead of columns.</li> <li><a href="https://blog.finxter.com/how-to-create-a-dataframe-in-pandas/" data-type="post" data-id="16764" target="_blank" rel="noreferrer noopener">Create an empty DataFrame</a> and add columns one by one.</li> </ol> <h2>Method 1: Create a DataFrame using a Dictionary</h2> <div class="wp-block-image"> <figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="1010" height="645" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-237.png" alt="" class="wp-image-985155" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-237.png 1010w, https://blog.finxter.com/wp-content/uploads/2022/12/image-237-300x192.png 300w, https://blog.finxter.com/wp-content/uploads/2022/12/image-237-768x490.png 768w" sizes="(max-width: 1010px) 100vw, 1010px" /></figure> </div> <p>The first step is to import pandas. If you haven’t already, <a href="https://blog.finxter.com/how-to-install-pandas-in-python/" data-type="post" data-id="35926" target="_blank" rel="noreferrer noopener">install pandas</a> first.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd</pre> <p>Let’s say you have employee data stored as lists.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># if your data is stored like this employee = ['Betty', 'Veronica', 'Archie', 'Jughead'] salary = [110_000, 20_000, 80_000, 70_000] bonus = [1000, 500, 2500, 400] tax_rate = [.1, .25, .17, .4] absences = [0, 1, 0, 52] </pre> <p>Build a dictionary using column names as keys and your lists as values.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># you can easily create a dictionary that will define your dataframe emp_data = { 'name': employee, 'salary': salary, 'bonus': bonus, 'tax_rate': tax_rate, 'absences': absences } </pre> <p>Your lists will become columns in the resulting DataFrame.</p> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="367" height="164" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-230.png" alt="" class="wp-image-985144" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-230.png 367w, https://blog.finxter.com/wp-content/uploads/2022/12/image-230-300x134.png 300w" sizes="(max-width: 367px) 100vw, 367px" /></figure> </div> <h2>Create a DataFrame using the zip function</h2> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="1010" height="668" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-238.png" alt="" class="wp-image-985156" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-238.png 1010w, https://blog.finxter.com/wp-content/uploads/2022/12/image-238-300x198.png 300w, https://blog.finxter.com/wp-content/uploads/2022/12/image-238-768x508.png 768w" sizes="(max-width: 1010px) 100vw, 1010px" /></figure> </div> <p>Pass each list as a separate argument to the <code><a rel="noreferrer noopener" href="https://blog.finxter.com/python-ziiiiiiip-a-helpful-guide/" data-type="post" data-id="1938" target="_blank">zip()</a></code> function. You can specify the column names using the <code>columns</code> parameter or by setting the <code>columns</code> property on a separate line.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df = pd.DataFrame(zip(employee, salary, bonus, tax_rate, absences)) emp_df.columns = ['name', 'salary', 'bonus', 'tax_rate', 'absences'] </pre> <p>The <code>zip()</code> function creates an <a href="https://blog.finxter.com/iterators-iterables-and-itertools/" data-type="post" data-id="29507" target="_blank" rel="noreferrer noopener">iterator</a>. For the first iteration, it grabs every value at index 0 from each list. This becomes the first row in the DataFrame. Next, it grabs every value at index 1 and this becomes the second row. This continues until it exhausts the shortest list.</p> <p>We can loop thru the iterator to see how this works.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">i = 0 for value in zip(employee, salary, bonus, tax_rate, absences): print(f'zipped value at index {i}: {value}') i += 1 </pre> <p>Each of these values becomes a row in the DataFrame:</p> <pre class="wp-block-preformatted"><code>zipped value at index 0: ('Betty', 110000, 1000, 0.1, 0) zipped value at index 1: ('Veronica', 20000, 500, 0.25, 1) zipped value at index 2: ('Archie', 80000, 2500, 0.17, 0) zipped value at index 3: ('Jughead', 70000, 400, 0.4, 52)</code> </pre> <h2>Create a DataFrame using a list of lists</h2> <p>What if you have a separate list for each employee? In this case, we can just create a <a href="https://blog.finxter.com/python-list-of-lists/" data-type="post" data-id="7890" target="_blank" rel="noreferrer noopener">list of lists</a>. Each of the inner lists becomes a row in the DataFrame.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># lists for employees instead of features betty = ['Betty', 110000, 1000, 0.1, 0] veronica = ['Veronica', 20000, 500, 0.25, 1] archie = ['Archie', 80000, 2500, 0.17, 0] jughead = ['Jughead', 70000, 400, 0.4, 52] emp_df = pd.DataFrame([betty, veronica, archie, jughead]) emp_df.columns = ['name', 'salary', 'bonus', 'tax_rate', 'absences'] emp_df </pre> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="380" height="158" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-231.png" alt="" class="wp-image-985145" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-231.png 380w, https://blog.finxter.com/wp-content/uploads/2022/12/image-231-300x125.png 300w" sizes="(max-width: 380px) 100vw, 380px" /></figure> </div> <h2>Create a DataFrame using a list of dictionaries</h2> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="856" height="863" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-239.png" alt="" class="wp-image-985157" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-239.png 856w, https://blog.finxter.com/wp-content/uploads/2022/12/image-239-298x300.png 298w, https://blog.finxter.com/wp-content/uploads/2022/12/image-239-150x150.png 150w, https://blog.finxter.com/wp-content/uploads/2022/12/image-239-768x774.png 768w" sizes="(max-width: 856px) 100vw, 856px" /></figure> </div> <p>If the employee data is stored in dictionaries instead of lists, we use a list of dictionaries.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">betty = {'name': 'Betty', 'salary': 110000, 'bonus': 1000, 'tax_rate': 0.1, 'absences': 0} veronica = {'name': 'Veronica', 'salary': 20000, 'bonus': 500, 'tax_rate': 0.25, 'absences': 1} archie = {'name': 'Archie', 'salary': 80000, 'bonus': 2500, 'tax_rate': 0.17, 'absences': 0} jughead = {'name': 'Jughead', 'salary': 70000, 'bonus': 400, 'tax_rate': 0.4, 'absences': 52} pd.DataFrame([betty, veronica, archie, jughead])</pre> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="374" height="159" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-232.png" alt="" class="wp-image-985146" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-232.png 374w, https://blog.finxter.com/wp-content/uploads/2022/12/image-232-300x128.png 300w" sizes="(max-width: 374px) 100vw, 374px" /></figure> </div> <p>The columns are determined by the keys in the dictionaries. What if the dictionaries don’t all have the same keys?</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">betty = {'name': 'Betty', 'salary': 110000, 'bonus': 1000, 'tax_rate': 0.1, 'absences': 0, 'hire_date': '2001-01-01'} veronica = {'name': 'Veronica', 'salary': 20000, 'bonus': 500, 'tax_rate': 0.25, 'absences': 1} archie = {'name': 'Archie', 'salary': 80000, 'bonus': 2500, 'tax_rate': 0.17, 'absences': 0, 'title': 'Vice Chief Leader'} jughead = {'name': 'Jughead', 'salary': 70000, 'bonus': 400, 'tax_rate': 0.4, 'absences': 52, 'rank': 'yes'} pd.DataFrame([betty, veronica, archie, jughead]) </pre> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="624" height="151" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-233.png" alt="" class="wp-image-985147" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-233.png 624w, https://blog.finxter.com/wp-content/uploads/2022/12/image-233-300x73.png 300w" sizes="(max-width: 624px) 100vw, 624px" /></figure> </div> <p>All of the keys will be used. Anytime pandas encounters a dictionary with a missing key, the missing value will be replaced with NaN which stands for ‘not a number’.</p> <h2>Create an empty DataFrame and add columns one by one</h2> <p>This method might be preferable if you needed to create a lot of new calculated columns. Here we create a new column for after-tax income.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df = pd.DataFrame() emp_df['name'] = employee emp_df['salary'] = salary emp_df['bonus'] = bonus emp_df['tax_rate'] = tax_rate emp_df['absences'] = absences income = emp_df['salary'] + emp_df['bonus'] emp_df['after_tax'] = income * (1 - emp_df['tax_rate']) </pre> <h2>How to add a list to an existing DataFrame</h2> <p>Here is a neat trick. If you want to edit a row in a DataFrame you can use the handy <code><a href="https://blog.finxter.com/slicing-data-from-a-pandas-dataframe-using-loc-and-iloc/" data-type="post" data-id="230997" target="_blank" rel="noreferrer noopener">loc</a></code> method. Loc allows you to access rows and columns by their index value.</p> <p>To access a row:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df.loc[3]</pre> <p>Output is the row with index value 3 as a Series:</p> <pre class="wp-block-preformatted"><code>name Jughead salary 70000 bonus 400 tax_rate 0.4 absences 52 Name: 3, dtype: object</code> </pre> <p>To access a column just pass in the column name as the index. Note that we have to specify the row and column indexes. The format is <code>[rows, columns]</code>. If you want all rows you can use “<code>:</code>” as we do here. The <code>:</code> also works if you want all columns.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df.loc[:, 'salary']</pre> <p>Output is also a series</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">0 110000 1 20000 2 80000 3 70000 4 200000 Name: salary, dtype: int64 </pre> <p>So how do we use <code>loc</code> to add a new row? If we use a row index that doesn’t exist in the DataFrame, it will create a new row for us.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">new_emp = ['Fonzie', 200000, 30000, .05, 112] emp_df.loc[4] = new_emp emp_df </pre> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="366" height="183" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-234.png" alt="" class="wp-image-985148" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-234.png 366w, https://blog.finxter.com/wp-content/uploads/2022/12/image-234-300x150.png 300w" sizes="(max-width: 366px) 100vw, 366px" /></figure> </div> <p>You can also update existing data with <code>loc</code>. Let’s drop Fonzie’s salary. It looks a bit excessive.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">emp_df.loc[4, 'salary'] = 105000 emp_df </pre> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="376" height="183" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-235.png" alt="" class="wp-image-985149" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-235.png 376w, https://blog.finxter.com/wp-content/uploads/2022/12/image-235-300x146.png 300w" sizes="(max-width: 376px) 100vw, 376px" /></figure> </div> <p>That’s more like it.</p> <h2><strong>Conclusion</strong></h2> <p>There are many different ways of creating a DataFrame. We looked at several methods using data stored in lists. Each will get the job done. </p> <p>The most convenient method will depend on what your lists represent. </p> <p>If each of your lists would best be represented as a column, then a dictionary of lists might be the easiest way to go. </p> <p>If each of your lists would best be represented as a row, then a list of lists would be a good choice. </p> <p>To add data in a list as a new row in an existing DataFrame, the <code>loc</code> method comes in handy. Loc is also useful for updating existing data.</p> </div> https://www.sickgaming.net/blog/2022/12/17/how-to-create-a-dataframe-from-lists/ |