[Tut] How to Read Specific Columns from CSV File in Python - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] How to Read Specific Columns from CSV File in Python (/thread-99375.html) |
[Tut] How to Read Specific Columns from CSV File in Python - xSicKxBot - 05-12-2022 How to Read Specific Columns from CSV File in Python <div><div class="kk-star-ratings kksr-valign-top kksr-align-left " data-payload="{"align":"left","id":"347248","slug":"default","valign":"top","reference":"auto","count":"0","readonly":"","score":"0","best":"5","gap":"5","greet":"Rate this post","legend":"0\/5 - (0 votes)","size":"24","width":"0","_legend":"{score}\/{best} - ({count} {votes})"}"> <div class="kksr-stars"> <div class="kksr-stars-inactive"> <div class="kksr-star" data-star="1" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="2" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="3" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="4" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="5" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> <div class="kksr-stars-active" style="width: 0px;"> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> </div> <div class="kksr-legend"> <span class="kksr-muted">Rate this post</span> </div> </div> <div id="ez-toc-container" class="ez-toc-v2_0_21 counter-hierarchy counter-decimal ez-toc-light-blue"> <div class="ez-toc-title-container"> <p class="ez-toc-title">Table of Contents</p> <p><span class="ez-toc-title-toggle"><a class="ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle" style="display: none;"><i class="ez-toc-glyphicon ez-toc-icon-toggle"></i></a></span></div> <nav> <ul class="ez-toc-list ez-toc-list-level-1"> <li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-1" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_1_Using_Pandas" title="Method 1: Using Pandas">Method 1: Using Pandas</a> <ul class="ez-toc-list-level-3"> <li class="ez-toc-heading-level-3"><a class="ez-toc-link ez-toc-heading-2" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#%E2%9E%A4_List-Based_Indexing_of_a_DataFrame" title="➤ List-Based Indexing of a DataFrame">➤ List-Based Indexing of a DataFrame</a></li> </ul> </li> <li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-3" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_2_Integer_Based_Indexing_with_iloc" title="Method 2: Integer Based Indexing with iloc">Method 2: Integer Based Indexing with iloc</a></li> <li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-4" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_3_Name-Based_Indexing_with_loc" title="Method 3: Name-Based Indexing with loc()">Method 3: Name-Based Indexing with loc()</a></li> <li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-5" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_4_Using_csv_Module" title="Method 4: Using csv Module">Method 4: Using csv Module</a></li> <li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-6" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Conclusion" title="Conclusion">Conclusion</a></li> <li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-7" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Learn_Pandas_the_Fun_Way_by_Solving_Code_Puzzles" title="Learn Pandas the Fun Way by Solving Code Puzzles">Learn Pandas the Fun Way by Solving Code Puzzles</a></li> </ul> </nav> </div> <figure class="wp-block-image size-full is-style-default"><img loading="lazy" width="600" height="337" src="https://blog.finxter.com/wp-content/uploads/2022/05/Read-specific-columns-of-a-given-CSV.gif" alt="" class="wp-image-347521" /><figcaption>A Quick Glance at The Solutions [Each solution stays for 5-10 secs.]</figcaption></figure> <p><strong>Problem: </strong>Given a CSV file, how to read only specific column(s) from the csv file? (Reading a specific column from a csv file will yield all the row values pertaining to that column.)</p> <p><strong>Example: </strong>Consier the following <code>csv</code> file (<code>countries.csv</code>):</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="monokai" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country,Capital,Population,Area Germany,Berlin,"84,267,549","348,560" France,Paris,"65,534,239","547,557" Spain,Madrid,"46,787,468","498,800" Italy,Rome,"60,301,346","294,140" India,Delhi,"1,404,495,187","2,973,190" USA,Washington,"334,506,463","9,147,420" China,Beijing,"1,449,357,022","9,388,211" Poland,Warsaw,"37,771,789","306,230" Russia,Moscow,"146,047,418","16,376,870" England,London,"68,529,747","241,930"</pre> <p><strong>Question:</strong> How will you read the above csv file and display the following columns </p> <ol> <li><code>Country</code> column along with the <code>Capital</code> column?</li> <li>All values in the <code>population</code> column?</li> </ol> <h2><strong>Method 1: Using <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" target="_blank">Pandas</a></strong></h2> <p>Using the Pandas library is probably the best option if you are dealing with csv files. You can easily read a csv file and store an entire column within a variable. </p> <p><strong>Code:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd df = pd.read_csv("countries.csv") country = df['Country'] # or # country = df.Country capital = df['Capital'] # or # capital = df.Capital # displaying selected columns (Country and Capital) for x, y in zip(country, capital): print(f"{x} {y}") # displaying a single column (Country) print() print(df['Population'])</pre> <p><strong>Output:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Germany Berlin France Paris Spain Madrid Italy Rome India Delhi USA Washington China Beijing Poland Warsaw Russia Moscow England London 0 84,267,549 1 65,534,239 2 46,787,468 3 60,301,346 4 1,404,495,187 5 334,506,463 6 1,449,357,022 7 37,771,789 8 146,047,418 9 68,529,747 Name: Population, dtype: object</pre> <p><strong>Explanation: </strong></p> <ul> <li>Read the csv file using <code>pd.read_csv()</code> Pandas function.</li> <li> Save all the information of the columns Country and Capital within independent variables using <ul> <li><code>country = df['Country']</code> <ul> <li>Alternatively, you can also use <code>country = df.Country</code></li> </ul> </li> <li><code>capital = df['Capital']</code> <ul> <li>Alternatively, you can also use <code>capital = df.Capital</code></li> </ul> </li> </ul> </li> </ul> <ul> <li>To display the country names and their capitals simultaneously, you can bind the two columns, <code>country</code> and <code>capital</code>, using the zip() function and then display each country along with its capital using a for loop upon the zipped object. </li> <li>To display all the values in the population column, you can simply use <code>df['Population']</code>.</li> </ul> <p><strong><span style="text-decoration: underline">TRIVIA</span></strong><br /><code>zip()</code> is a built-in function in Python that takes an arbitrary number of iterables and binds them into a single iterable, a <code>zip</code> object. It combines the <code>n-th</code> value of each iterable argument into a tuple.<br />Read more about <a href="https://blog.finxter.com/python-ziiiiiiip-a-helpful-guide/" target="_blank" rel="noreferrer noopener">zip() <strong>here</strong></a>.</p> <h3><strong>➤</strong> <strong>List-Based Indexing of a DataFrame</strong></h3> <p>In case you are not comfortable with using <code>zip()</code> to display multiple columns at once, you have another option. You can simply use list-based indexing to accomplish your goal. </p> <p>List-based indexing is a technique that allows you to pass multiple column names as a <a rel="noreferrer noopener" href="https://blog.finxter.com/python-lists/" target="_blank">list </a>within the square-bracket selector. </p> <p><strong>Example:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="5" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd df = pd.read_csv("countries.csv") print() print(df[['Country', 'Capital']])</pre> <p><strong>Output:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> Country Capital 0 Germany Berlin 1 France Paris 2 Spain Madrid 3 Italy Rome 4 India Delhi 5 USA Washington 6 China Beijing 7 Poland Warsaw 8 Russia Moscow 9 England London</pre> <h2><strong>Method 2: Integer Based Indexing with iloc</strong></h2> <p><strong>Approach: </strong>The idea here is to use the <code>df.iloc[rows, columns].values</code> to access individual columns from the DataFrame using indexing. Note that the first column always has the index 0, while the second column has index 1, and so on. </p> <ul> <li><code>rows</code> is used to select individual rows. Use the <a rel="noreferrer noopener" href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank">slicing colon<code>:</code></a> to ensure all rows have been selected.</li> <li><code>columns</code> is used to select individual columns. <ul> <li>Use <code>country = data.iloc[:, 0].values</code> to save the values of the Country column.</li> <li><code>capital = data.iloc[:, 1].values</code> to save the values of the Capital column.</li> <li><code>population = data.iloc[:, 2].values</code> to save the values of the Population column.</li> </ul> </li> </ul> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="4-6" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd data = pd.read_csv('countries.csv') country = data.iloc[:, 0].values capital = data.iloc[:, 1].values population = data.iloc[:, 2].values # displaying selected columns print(data[['Country', 'Capital']]) print() # displaying a single column (Population) print(population)</pre> <p><strong>Output:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country Capital 0 Germany Berlin 1 France Paris 2 Spain Madrid 3 Italy Rome 4 India Delhi 5 USA Washington 6 China Beijing 7 Poland Warsaw 8 Russia Moscow 9 England London ['84,267,549' '65,534,239' '46,787,468' '60,301,346' '1,404,495,187' '334,506,463' '1,449,357,022' '37,771,789' '146,047,418' '68,529,747']</pre> <h2><strong>Method 3: Name-Based Indexing with loc()</strong></h2> <p>Instead of selecting the columns by their index, you can also select them by their name using the <code>df.loc[]</code> selecter. </p> <p>The following example shows how to select the columns <code>Country</code> and <code>Capital</code> from the given DataFrame.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="4" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd data = pd.read_csv('countries.csv') val = data.loc[:, ['Country', 'Capital']] print(val)</pre> <p><strong>Output:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country Capital 0 Germany Berlin 1 France Paris 2 Spain Madrid 3 Italy Rome 4 India Delhi 5 USA Washington 6 China Beijing 7 Poland Warsaw 8 Russia Moscow 9 England London</pre> <p class="has-base-background-color has-background"><strong>Related Tutorial: <a href="https://blog.finxter.com/slicing-data-from-a-pandas-dataframe-using-loc-and-iloc/" target="_blank" rel="noreferrer noopener">Slicing Data from a Pandas DataFrame using .loc and .iloc</a></strong></p> <h2><strong>Method 4: Using csv Module</strong></h2> <p><code>csv</code> module is yet another spectacular option in Python that allows you to play with csv files. Let us have a look at the code that helps us to read the given csv file and then read specific columns from it:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import csv population = [] with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for r in data: print("Country", ":", "Capital") # append values from population column to population list population.append(r['Population']) # displaying specific columns (Country and Capital) print(r['Country'], ":", r['Capital']) # display the population list print(population)</pre> <p><strong>Output:</strong></p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country : Capital Germany : Berlin Country : Capital France : Paris Country : Capital Spain : Madrid Country : Capital Italy : Rome Country : Capital India : Delhi Country : Capital USA : Washington Country : Capital China : Beijing Country : Capital Poland : Warsaw Country : Capital Russia : Moscow Country : Capital England : London ['84,267,549', '65,534,239', '46,787,468', '60,301,346', '1,404,495,187', '334,506,463', '1,449,357,022', '37,771,789', '146,047,418', '68,529,747']</pre> <p><strong>Explanation:</strong></p> <ul> <li>Import the <code>csv</code> module and open up the csv file. Ensure that you feed in the <code>encoding</code> argument as it helps to eliminate any unreadable characters that may occur in the given csv file. <ul> <li><code>with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile</code></li> </ul> </li> <li>Allow Python to read the csv file as a dictionary using <code>csv.Dictreader</code> object. </li> <li>Once the file has been read in the form of a dictionary, you can easily fetch the values from respective columns by using the keys within square bracket notation from the dictionary. Here each column represents the key within the given dictionary.</li> </ul> <p><strong>Bonus: </strong>Here’s a quick look at how the <code>DictReader()</code> class looks like:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import csv population = [] with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for row in data: print(row)</pre> <p><strong>Output:</strong></p> <pre class="wp-block-code"><code>{'Country': 'Germany', 'Capital': 'Berlin', 'Population': '84,267,549', 'Area': '348,560'} {'Country': 'France', 'Capital': 'Paris', 'Population': '65,534,239', 'Area': '547,557'} {'Country': 'Spain', 'Capital': 'Madrid', 'Population': '46,787,468', 'Area': '498,800'} {'Country': 'Italy', 'Capital': 'Rome', 'Population': '60,301,346', 'Area': '294,140'} {'Country': 'India', 'Capital': 'Delhi', 'Population': '1,404,495,187', 'Area': '2,973,190'} {'Country': 'USA', 'Capital': 'Washington', 'Population': '334,506,463', 'Area': '9,147,420'} {'Country': 'China', 'Capital': 'Beijing', 'Population': '1,449,357,022', 'Area': '9,388,211'} {'Country': 'Poland', 'Capital': 'Warsaw', 'Population': '37,771,789', 'Area': '306,230'} {'Country': 'Russia', 'Capital': 'Moscow', 'Population': '146,047,418', 'Area': '16,376,870'} {'Country': 'England', 'Capital': 'London', 'Population': '68,529,747', 'Area': '241,930'} </code></pre> <p>It is evident from the output that <code>csv.DictReader()</code> returns a dictionary for each row such that the column header is the key while the value in the row is the associated value in the dictionary.</p> <h2><strong>Conclusion</strong></h2> <p>To sum things up, there are majorly four different ways of accessing specific columns from a given csv file:</p> <ul> <li>List-Based Indexing.</li> <li>Integer-Based Indexing. </li> <li>Name-Based Indexing.</li> <li>Using csv modules <code>DictReader</code> class.</li> </ul> <p>Feel free to use the one that suits you best. I hope this tutorial helped you. Please <strong><a href="https://blog.finxter.com/subscribe" target="_blank" rel="noreferrer noopener">subscribe</a></strong> and stay tuned for more interesting tutorials. Happy learning!</p> <hr class="wp-block-separator" /> <h2>Learn Pandas the Fun Way by Solving Code Puzzles</h2> <p>If you want to boost your Pandas skills, consider checking out my puzzle-based learning book <a href="https://amzn.to/3lyM5iZ" title="https://amzn.to/3lyM5iZ" target="_blank" rel="noreferrer noopener">Coffee Break Pandas</a> (Amazon Link). </p> <div class="wp-block-image"> <figure class="aligncenter is-resized"><a href="https://amzn.to/3lyM5iZ" target="_blank" rel="noopener"><img loading="lazy" src="https://blog.finxter.com/wp-content/uploads/2020/11/cover.jpg" alt="Coffee Break Pandas Book" class="wp-image-16780" width="340" height="511" title="Coffee Break Pandas Book" srcset="https://blog.finxter.com/wp-content/uploads/2020/11/cover.jpg 680w, https://blog.finxter.com/wp-content/uploads/2020/11/cover-200x300.jpg 200w, https://blog.finxter.com/wp-content/uploads/2020/11/cover-150x225.jpg 150w" sizes="(max-width: 340px) 100vw, 340px" /></a></figure> </div> <p>It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?</p> <p><a href="https://amzn.to/3lyM5iZ" target="_blank" rel="noreferrer noopener" title="https://amzn.to/3lyM5iZ">Coffee Break Pandas</a> offers a fun-based approach to data science mastery—and a truly gamified learning experience.</p> </div> https://www.sickgaming.net/blog/2022/05/06/how-to-read-specific-columns-from-csv-file-in-python/ |