Sick Gaming
[Tut] How to Read Specific Columns from CSV File in Python - Printable Version

+- Sick Gaming (https://www.sickgaming.net)
+-- Forum: Programming (https://www.sickgaming.net/forum-76.html)
+--- Forum: Python (https://www.sickgaming.net/forum-83.html)
+--- Thread: [Tut] How to Read Specific Columns from CSV File in Python (/thread-99375.html)



[Tut] How to Read Specific Columns from CSV File in Python - xSicKxBot - 05-12-2022

How to Read Specific Columns from CSV File in Python

<div><div class="kk-star-ratings kksr-valign-top kksr-align-left " data-payload="{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;347248&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;count&quot;:&quot;0&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;0&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;0\/5 - (0 votes)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;0&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;}">
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 0px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend"> <span class="kksr-muted">Rate this post</span> </div>
</div>
<div id="ez-toc-container" class="ez-toc-v2_0_21 counter-hierarchy counter-decimal ez-toc-light-blue">
<div class="ez-toc-title-container">
<p class="ez-toc-title">Table of Contents</p>
<p><span class="ez-toc-title-toggle"><a class="ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle" style="display: none;"><i class="ez-toc-glyphicon ez-toc-icon-toggle"></i></a></span></div>
<nav>
<ul class="ez-toc-list ez-toc-list-level-1">
<li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-1" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_1_Using_Pandas" title="Method 1: Using Pandas">Method 1: Using Pandas</a>
<ul class="ez-toc-list-level-3">
<li class="ez-toc-heading-level-3"><a class="ez-toc-link ez-toc-heading-2" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#%E2%9E%A4_List-Based_Indexing_of_a_DataFrame" title="➤ List-Based Indexing of a DataFrame">➤ List-Based Indexing of a DataFrame</a></li>
</ul>
</li>
<li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-3" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_2_Integer_Based_Indexing_with_iloc" title="Method 2: Integer Based Indexing with iloc">Method 2: Integer Based Indexing with iloc</a></li>
<li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-4" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_3_Name-Based_Indexing_with_loc" title="Method 3: Name-Based Indexing with loc()">Method 3: Name-Based Indexing with loc()</a></li>
<li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-5" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Method_4_Using_csv_Module" title="Method 4: Using csv Module">Method 4: Using csv Module</a></li>
<li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-6" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Conclusion" title="Conclusion">Conclusion</a></li>
<li class="ez-toc-page-1 ez-toc-heading-level-2"><a class="ez-toc-link ez-toc-heading-7" href="https://blog.finxter.com/how-to-read-specific-columns-from-csv-file-in-python/#Learn_Pandas_the_Fun_Way_by_Solving_Code_Puzzles" title="Learn Pandas the Fun Way by Solving Code Puzzles">Learn Pandas the Fun Way by Solving Code Puzzles</a></li>
</ul>
</nav>
</div>
<figure class="wp-block-image size-full is-style-default"><img loading="lazy" width="600" height="337" src="https://blog.finxter.com/wp-content/uploads/2022/05/Read-specific-columns-of-a-given-CSV.gif" alt="" class="wp-image-347521" /><figcaption>A Quick Glance at The Solutions [Each solution stays for 5-10 secs.]</figcaption></figure>
<p><strong>Problem: </strong>Given a CSV file, how to read only specific column(s) from the csv file? (Reading a specific column from a csv file will yield all the row values pertaining to that column.)</p>
<p><strong>Example: </strong>Consier the following <code>csv</code> file (<code>countries.csv</code>):</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="monokai" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country,Capital,Population,Area
Germany,Berlin,"84,267,549","348,560"
France,Paris,"65,534,239","547,557"
Spain,Madrid,"46,787,468","498,800"
Italy,Rome,"60,301,346","294,140"
India,Delhi,"1,404,495,187","2,973,190"
USA,Washington,"334,506,463","9,147,420"
China,Beijing,"1,449,357,022","9,388,211"
Poland,Warsaw,"37,771,789","306,230"
Russia,Moscow,"146,047,418","16,376,870"
England,London,"68,529,747","241,930"</pre>
<p><strong>Question:</strong> How will you read the above csv file and display the following columns </p>
<ol>
<li><code>Country</code> column along with the <code>Capital</code> column?</li>
<li>All values in the <code>population</code> column?</li>
</ol>
<h2><strong>Method 1: Using <a rel="noreferrer noopener" href="https://blog.finxter.com/pandas-quickstart/" target="_blank">Pandas</a></strong></h2>
<p>Using the Pandas library is probably the best option if you are dealing with csv files. You can easily read a csv file and store an entire column within a variable. </p>
<p><strong>Code:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd df = pd.read_csv("countries.csv")
country = df['Country']
# or
# country = df.Country
capital = df['Capital']
# or
# capital = df.Capital # displaying selected columns (Country and Capital)
for x, y in zip(country, capital): print(f"{x} {y}") # displaying a single column (Country)
print()
print(df['Population'])</pre>
<p><strong>Output:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Germany Berlin
France Paris
Spain Madrid
Italy Rome
India Delhi
USA Washington
China Beijing
Poland Warsaw
Russia Moscow
England London 0 84,267,549
1 65,534,239
2 46,787,468
3 60,301,346
4 1,404,495,187
5 334,506,463
6 1,449,357,022
7 37,771,789
8 146,047,418
9 68,529,747
Name: Population, dtype: object</pre>
<p><strong>Explanation: </strong></p>
<ul>
<li>Read the csv file using <code>pd.read_csv()</code> Pandas function.</li>
<li> Save all the information of the columns Country and Capital within independent variables using
<ul>
<li><code>country = df['Country']</code>
<ul>
<li>Alternatively, you can also use <code>country = df.Country</code></li>
</ul>
</li>
<li><code>capital = df['Capital']</code>
<ul>
<li>Alternatively, you can also use <code>capital = df.Capital</code></li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li>To display the country names and their capitals simultaneously, you can bind the two columns, <code>country</code> and <code>capital</code>, using the zip() function and then display each country along with its capital using a for loop upon the zipped object. </li>
<li>To display all the values in the population column, you can simply use <code>df['Population']</code>.</li>
</ul>
<p><strong><span style="text-decoration: underline">TRIVIA</span></strong><br /><code>zip()</code>&nbsp;is a built-in function in Python that takes an arbitrary number of&nbsp;iterables&nbsp;and binds them into a single iterable, a <code>zip</code> object. It combines the&nbsp;<code>n-th</code>&nbsp;value of each iterable argument into a tuple.<br />Read more about <a href="https://blog.finxter.com/python-ziiiiiiip-a-helpful-guide/" target="_blank" rel="noreferrer noopener">zip() <strong>here</strong></a>.</p>
<h3><strong>➤</strong> <strong>List-Based Indexing of a DataFrame</strong></h3>
<p>In case you are not comfortable with using <code>zip()</code> to display multiple columns at once, you have another option. You can simply use list-based indexing to accomplish your goal. </p>
<p>List-based indexing is a technique that allows you to pass multiple column names as a&nbsp;<a rel="noreferrer noopener" href="https://blog.finxter.com/python-lists/" target="_blank">list&nbsp;</a>within the square-bracket selector. </p>
<p><strong>Example:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="5" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd df = pd.read_csv("countries.csv")
print()
print(df[['Country', 'Capital']])</pre>
<p><strong>Output:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""> Country Capital
0 Germany Berlin
1 France Paris
2 Spain Madrid
3 Italy Rome
4 India Delhi
5 USA Washington
6 China Beijing
7 Poland Warsaw
8 Russia Moscow
9 England London</pre>
<h2><strong>Method 2: Integer Based Indexing with iloc</strong></h2>
<p><strong>Approach: </strong>The idea here is to use the <code>df.iloc[rows, columns].values</code> to access individual columns from the DataFrame using indexing. Note that the first column always has the index 0, while the second column has index 1, and so on. </p>
<ul>
<li><code>rows</code> is used to select individual rows. Use the&nbsp;<a rel="noreferrer noopener" href="https://blog.finxter.com/introduction-to-slicing-in-python/" target="_blank">slicing colon<code>:</code></a>&nbsp;to ensure all rows have been selected.</li>
<li><code>columns</code> is used to select individual columns.
<ul>
<li>Use <code>country = data.iloc[:, 0].values</code> to save the values of the Country column.</li>
<li><code>capital = data.iloc[:, 1].values</code> to save the values of the Capital column.</li>
<li><code>population = data.iloc[:, 2].values</code> to save the values of the Population column.</li>
</ul>
</li>
</ul>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="4-6" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd data = pd.read_csv('countries.csv')
country = data.iloc[:, 0].values
capital = data.iloc[:, 1].values
population = data.iloc[:, 2].values
# displaying selected columns
print(data[['Country', 'Capital']])
print()
# displaying a single column (Population)
print(population)</pre>
<p><strong>Output:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country Capital
0 Germany Berlin
1 France Paris
2 Spain Madrid
3 Italy Rome
4 India Delhi
5 USA Washington
6 China Beijing
7 Poland Warsaw
8 Russia Moscow
9 England London ['84,267,549' '65,534,239' '46,787,468' '60,301,346' '1,404,495,187' '334,506,463' '1,449,357,022' '37,771,789' '146,047,418' '68,529,747']</pre>
<h2><strong>Method 3: Name-Based Indexing with loc()</strong></h2>
<p>Instead of selecting the columns by their index, you can also select them by their name using the <code>df.loc[]</code> selecter. </p>
<p>The following example shows how to select the columns <code>Country</code> and <code>Capital</code> from the given DataFrame.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="4" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd data = pd.read_csv('countries.csv')
val = data.loc[:, ['Country', 'Capital']]
print(val)</pre>
<p><strong>Output:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country Capital
0 Germany Berlin
1 France Paris
2 Spain Madrid
3 Italy Rome
4 India Delhi
5 USA Washington
6 China Beijing
7 Poland Warsaw
8 Russia Moscow
9 England London</pre>
<p class="has-base-background-color has-background"><strong>Related Tutorial: <a href="https://blog.finxter.com/slicing-data-from-a-pandas-dataframe-using-loc-and-iloc/" target="_blank" rel="noreferrer noopener">Slicing Data from a Pandas DataFrame using .loc and .iloc</a></strong></p>
<h2><strong>Method 4: Using csv Module</strong></h2>
<p><code>csv</code> module is yet another spectacular option in Python that allows you to play with csv files. Let us have a look at the code that helps us to read the given csv file and then read specific columns from it:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import csv population = []
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for r in data: print("Country", ":", "Capital") # append values from population column to population list population.append(r['Population']) # displaying specific columns (Country and Capital) print(r['Country'], ":", r['Capital']) # display the population list print(population)</pre>
<p><strong>Output:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">Country : Capital
Germany : Berlin
Country : Capital
France : Paris
Country : Capital
Spain : Madrid
Country : Capital
Italy : Rome
Country : Capital
India : Delhi
Country : Capital
USA : Washington
Country : Capital
China : Beijing
Country : Capital
Poland : Warsaw
Country : Capital
Russia : Moscow
Country : Capital
England : London
['84,267,549', '65,534,239', '46,787,468', '60,301,346', '1,404,495,187', '334,506,463', '1,449,357,022', '37,771,789', '146,047,418', '68,529,747']</pre>
<p><strong>Explanation:</strong></p>
<ul>
<li>Import the <code>csv</code> module and open up the csv file. Ensure that you feed in the <code>encoding</code> argument as it helps to eliminate any unreadable characters that may occur in the given csv file.
<ul>
<li><code>with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile</code></li>
</ul>
</li>
<li>Allow Python to read the csv file as a dictionary using <code>csv.Dictreader</code> object. </li>
<li>Once the file has been read in the form of a dictionary, you can easily fetch the values from respective columns by using the keys within square bracket notation from the dictionary. Here each column represents the key within the given dictionary.</li>
</ul>
<p><strong>Bonus: </strong>Here’s a quick look at how the <code>DictReader()</code> class looks like:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import csv population = []
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for row in data: print(row)</pre>
<p><strong>Output:</strong></p>
<pre class="wp-block-code"><code>{'Country': 'Germany', 'Capital': 'Berlin', 'Population': '84,267,549', 'Area': '348,560'}
{'Country': 'France', 'Capital': 'Paris', 'Population': '65,534,239', 'Area': '547,557'}
{'Country': 'Spain', 'Capital': 'Madrid', 'Population': '46,787,468', 'Area': '498,800'}
{'Country': 'Italy', 'Capital': 'Rome', 'Population': '60,301,346', 'Area': '294,140'}
{'Country': 'India', 'Capital': 'Delhi', 'Population': '1,404,495,187', 'Area': '2,973,190'}
{'Country': 'USA', 'Capital': 'Washington', 'Population': '334,506,463', 'Area': '9,147,420'}
{'Country': 'China', 'Capital': 'Beijing', 'Population': '1,449,357,022', 'Area': '9,388,211'}
{'Country': 'Poland', 'Capital': 'Warsaw', 'Population': '37,771,789', 'Area': '306,230'}
{'Country': 'Russia', 'Capital': 'Moscow', 'Population': '146,047,418', 'Area': '16,376,870'}
{'Country': 'England', 'Capital': 'London', 'Population': '68,529,747', 'Area': '241,930'}
</code></pre>
<p>It is evident from the output that <code>csv.DictReader()</code> returns a dictionary for each row such that the column header is the key while the value in the row is the associated value in the dictionary.</p>
<h2><strong>Conclusion</strong></h2>
<p>To sum things up, there are majorly four different ways of accessing specific columns from a given csv file:</p>
<ul>
<li>List-Based Indexing.</li>
<li>Integer-Based Indexing. </li>
<li>Name-Based Indexing.</li>
<li>Using csv modules <code>DictReader</code> class.</li>
</ul>
<p>Feel free to use the one that suits you best. I hope this tutorial helped you. Please <strong><a href="https://blog.finxter.com/subscribe" target="_blank" rel="noreferrer noopener">subscribe</a></strong> and stay tuned for more interesting tutorials. Happy learning!</p>
<hr class="wp-block-separator" />
<h2>Learn Pandas the Fun Way by Solving Code Puzzles</h2>
<p>If you want to boost your Pandas skills, consider checking out my puzzle-based learning book <a href="https://amzn.to/3lyM5iZ" title="https://amzn.to/3lyM5iZ" target="_blank" rel="noreferrer noopener">Coffee Break Pandas</a> (Amazon Link). </p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><a href="https://amzn.to/3lyM5iZ" target="_blank" rel="noopener"><img loading="lazy" src="https://blog.finxter.com/wp-content/uploads/2020/11/cover.jpg" alt="Coffee Break Pandas Book" class="wp-image-16780" width="340" height="511" title="Coffee Break Pandas Book" srcset="https://blog.finxter.com/wp-content/uploads/2020/11/cover.jpg 680w, https://blog.finxter.com/wp-content/uploads/2020/11/cover-200x300.jpg 200w, https://blog.finxter.com/wp-content/uploads/2020/11/cover-150x225.jpg 150w" sizes="(max-width: 340px) 100vw, 340px" /></a></figure>
</div>
<p>It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?</p>
<p><a href="https://amzn.to/3lyM5iZ" target="_blank" rel="noreferrer noopener" title="https://amzn.to/3lyM5iZ">Coffee Break Pandas</a> offers a fun-based approach to data science mastery—and a truly gamified learning experience.</p>
</div>


https://www.sickgaming.net/blog/2022/05/06/how-to-read-specific-columns-from-csv-file-in-python/