Posted on Leave a comment

How to Find the Max of List of Lists in Python?

Problem: Say you have a list of lists (nested list) and you want to find the maximum of this list. It’s not trivial to compare lists—what’s the maximum among lists after all? To define the maximum among the inner lists, you may want to consider different objectives.

  1. The first element of each inner list.
  2. The i-th element of each inner list.
  3. The sum of inner list elements.
  4. The maximum of inner list elements.
  5. The minimum of inner list elements.

Example: Given list of lists [[1, 1, 1], [0, 2, 0], [3, 3, -1]]. Which is the maximum element?

  1. The first element of each inner list. The maximum is [3, 3, -1].
  2. The i-th element of each inner list (i = 2). The maximum is [1, 1, 1].
  3. The sum of inner list elements. The maximum is [3, 3, -1].
  4. The maximum of inner list elements. The maximum is [3, 3, -1].
  5. The minimum of inner list elements. The maximum is [3, 3, -1].

So how do you accomplish this?

Solution: Use the max() function with key argument.

Syntax: The max() function is a built-in function in Python (Python versions 2.x and 3.x). Here’s the syntax:

max(iterable, key=None)

Arguments:

Argument Description
iterable The values among which you want to find the maximum. In our case, it’s a list of lists.
key (Optional. Default None.) Pass a function that takes a single argument and returns a comparable value. The function is then applied to each element in the list. Then, the method find the maximum based on the key function results rather than the elements themselves.

Let’s study the solution code for our different versions of calculating the maximum “list” of a list of lists (nested list).

lst = [[1, 1, 1], [0, 2, 0], [3, 3, -1]] # Maximum using first element
print(max(lst, key=lambda x: x[0]))
# [3, 3, -1] # Maximum using third element
print(max(lst, key=lambda x: x[2]))
# [1, 1, 1] # Maximum using sum()
print(max(lst, key=sum))
# [3, 3, -1] # Maximum using max
print(max(lst, key=max))
# [3, 3, -1] # Maximum using min
print(max(lst, key=min))
# [1, 1, 1]

Try it yourself in our interactive code shell:

Related articles:

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Posted on Leave a comment

How to Do a Backslash in Python?

The Python backslash ('\') is a special character that’s used for two purposes:

  1. The Python backslash can be part of a special character sequence such as the tab character '\t', the newline character '\n', or the carriage return '\r'.
  2. The Python backslash can escape other special characters in a Python string. For example, the first backslash in the string '\\n' escapes the second backslash and removes the special meaning so that the resulting string contains the two characters '\' and 'n' instead of the special newline character '\n'.

Try it yourself in our interactive Python shell (just click “Run”):

The backslash \ is an escape character–if used in front of another character, it changes the meaning of this character. For example, the character 'n' is just that a simple character, but the character '\n' (yes, it’s one character consisting of two symbols) is the new line character. We say that it is escaped.

So how do we define a string consisting of the backslash? The problem is that if we use the backslash, Python thinks that the character that follows the backslash is escaped. Here’s an example:

We want to print a string consisting of a single backslash, but the backslash escapes the end of string literal \’. Hence, the interpreter believes the string was never closed and throws an error.

The correct way of accomplishing this is to escape the escape character itself:

print('\\')
>>> \

This is exactly what we want to accomplish. the first character \ escapes the second character \ and therefore removes its meaning. The second character \ is therefore interpreted as a simple backslash.

Posted on Leave a comment

Python List of Lists Group By – A Simple Illustrated Guide

This tutorial shows you how to group the inner lists of a Python list of lists by common element. There are three basic methods:

  1. Group the inner lists together by common element.
  2. Group the inner lists together by common element AND aggregating them (e.g. averaging).
  3. Group the inner lists together by common element AND aggregating them (e.g. averaging) using the Pandas external library.

Before we explore these three options in more detail, let’s give you the quick solution first using the Pandas library in our interactive shell:

You can run this code in your browser. If you want to learn about the Pythonic alternatives or you need a few more explanations, then read on!

Method 1: Group List of Lists By Common Element in Dictionary

Problem: Given a list of lists. Group the elements by common element and store the result in a dictionary (key = common element).

Example: Say, you’ve got a database with multiple rows (the list of lists) where each row consists of three attributes: Name, Age, and Income. You want to group by Name and store the result in a dictionary. The dictionary keys are given by the Name attribute. The dictionary values are a list of rows that have this exact Name attribute.

Solution: Here’s the data and how you can group by a common attribute (e.g., Name).

# Database:
# row = [Name, Age, Income]
rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] # Create a dictionary grouped by Name
d = {}
for row in rows: # Add name to dict if not exists if row[0] not in d: d[row[0]] = [] # Add all non-Name attributes as a new list d[row[0]].append(row[1:]) print(d)
# {'Alice': [[19, 45000], [33, 118000]],
# 'Bob': [[18, 22000]],
# 'Ann': [[26, 88000]]}

You can see that the result is a dictionary with one key per name ('Alice', 'Bob', and 'Ann'). Alice appears in two rows of the original database (list of lists). Thus, you associate two rows to her name—maintaining only the Age and Income attributes per row.

The strategy how you accomplish this is simple:

  • Create the empty dictionary.
  • Go over each row in the list of lists. The first value of the row list is the Name attribute.
  • Add the Name attribute row[0] to the dictionary if it doesn’t exist, yet—initializing the dictionary to the empty list. Now, you can be sure that the key exist in the dictionary.
  • Append the sublist slice [Age, Income] to the dictionary value so that this becomes a list of lists as well—one list per database row.
  • You’ve now grouped all database entries by a common attribute (=Name).

So far, so good. But what if you want to perform some aggregation on the grouped database rows?

Method 2: Group List of Lists By Common Element and Aggregate Grouped Elements

Problem: In the previous example, you’ve seen that each dictionary value is a list of lists because you store each row as a separate list. But what if you want to aggregate all grouped rows?

Example: The dictionary entry for the key 'Alice' may be [[19, 45000], [33, 118000]] but you want to average the age and income values: [(19+33)/2, (45000+118000)/2]. How do you do that?

Solution: The solution is simply to add one post-processing step after the above code to aggregate all attributes using the zip() function as follows. Note that this is the exact same code as before (without aggregation) with three lines added at the end to aggregate the list of lists for each grouped Name into a single average value.

# Database:
# row = [Name, Age, Income]
rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] # Create a dictionary grouped by Name
d = {}
for row in rows: # Add name to dict if not exists if row[0] not in d: d[row[0]] = [] # Add all non-Name attributes as a new list d[row[0]].append(row[1:]) print(d)
# {'Alice': [[19, 45000], [33, 118000]],
# 'Bob': [[18, 22000]],
# 'Ann': [[26, 88000]]} # AGGREGATION FUNCTION:
for key in d: d[key] = [sum(x) / len(x) for x in zip(*d[key])] print(d)
# {'Alice': [26.0, 81500.0], 'Bob': [18.0, 22000.0], 'Ann': [26.0, 88000.0]}

In the code, you use the aggregation function sum(x) / len(x) to calculate the average value for each attribute of the grouped rows. But you can replace this part with your own aggregation function such as average, variance, length, minimum, maximum, etc.

Explanation:

  • You go over each key in the dictionary (the Name attribute) and aggregate the list of lists into a flat list of averaged attributes.
  • You zip the attributes together. For example, zip(*d['Alice']) becomes [[19, 33], [45000, 118000]] (conceptually).
  • You iterate over each list x of this list of lists in the list comprehension statement.
  • You aggregate the grouped attributes using your own custom function (e.g. sum(x) / len(x) to average the attribute values).

See what happens in this code snippet in this interactive memory visualization tool (by clicking “Next”):

Method 3: Pandas GroupBy

The Pandas library has its own powerful implementation of the groupby() function. Have a look at the code first:

# Database:
# row = [Name, Age, Income]
rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] import pandas as pd
df = pd.DataFrame(rows) print(df) ''' 0 1 2
0 Alice 19 45000
1 Bob 18 22000
2 Ann 26 88000
3 Alice 33 118000 ''' print(df.groupby([0]).mean()) ''' 1 2
0 Alice 26 81500
Ann 26 88000
Bob 18 22000 '''

Explanation:

  • Import the pandas library. Find your quick refresher cheat sheets here.
  • Create a DataFrame object from the rows—think of it as an Excel spreadsheet in your code (with numbered rows and columns).
  • Call the groupby() function on your DataFrame. Use the column index [0] (which is the Name attribute) to group your data. This creates a DataFrameGroupBy object.
  • On the DataFrameGroupBy object call the mean() function or any other aggregator function you want.
  • The result is the “spreadsheet” with grouped Name attributes where multiple rows with the same Name attributes are averaged (element-wise).

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Posted on Leave a comment

How to Filter a List of Lists in Python?

Short answer: To filter a list of lists for a condition on the inner lists, use the list comprehension statement [x for x in list if condition(x)] and replace condition(x) with your filtering condition that returns True to include inner list x, and False otherwise.

Lists belong to the most important data structures in Python—every master coder knows them by heart! Surprisingly, even intermediate coders don’t know the best way to filter a list—let alone a list of lists in Python. This tutorial shows you how to do the latter!

Problem: Say, you’ve got a list of lists. You want to filter the list of lists so that only those inner lists remain that satisfy a certain condition. The condition is a function of the inner list—such as the average or sum of the inner list elements.

Example: Given the following list of lists with weekly temperature measurements per week—and one inner list per week.

# Measurements of a temperature sensor (7 per week)
temperature = [[10, 8, 9, 12, 13, 7, 8], # week 1 [9, 9, 5, 6, 6, 9, 11], # week 2 [10, 8, 8, 5, 6, 3, 1]] # week 3

How to filter out the colder weeks with average temperature value <8? This is the output you desire:

print(cold_weeks)
# [[9, 9, 5, 6, 6, 9, 11], [10, 8, 8, 5, 6, 3, 1]]

There are two semantically equivalent methods to achieve this: list comprehension and the map() function. Let’s explore both variants next.

If you’re short on time, you can also get a quick overview by playing with the code in your web browser—I’ll explain the code after that.

Method 1: List Comprehension

The most Pythonic way of filtering a list—in my opinion—is the list comprehension statement [x for x in list if condition]. You can replace condition with any function of x you would like to use as a filtering condition. Only elements that are in the list and meet the condition are included in the newly created list.

Solution: Here’s how you can solve the above problem to filter a list of lists based on a function of the inner lists:

# Measurements of a temperature sensor (7 per week)
temperature = [[10, 8, 9, 12, 13, 7, 8], # week 1 [9, 9, 5, 6, 6, 9, 11], # week 2 [10, 8, 8, 5, 6, 3, 1]] # week 3 # How to filter weeks with average temperature <8? # Method 1: List Comprehension
cold_weeks = [x for x in temperature if sum(x)/len(x)<8]
print(cold_weeks)
# [[9, 9, 5, 6, 6, 9, 11], [10, 8, 8, 5, 6, 3, 1]]

The second and third list in the list of lists meet the condition of having an average temperature of less than 8 degrees. So those are included in the variable cold_weeks.

You can visualize the memory usage of this code snippet in the following interactive tool:

This is the most efficient way of filtering a list and it’s also the most Pythonic one. If you look for alternatives though, keep reading.

Related articles:

Method 2: Filter() Function

The filter(function, iterable) function takes a function as input that takes on argument (a list element) and returns a Boolean value that indicates whether this list element should pass the filter. All elements that pass the filter are returned as a new iterable object (a filter object).

You can use the lambda function statement to create the function right where you pass it as an argument. The syntax of the lambda function is lambda x: expression and it means that you use x as an input argument and you return expression as a result (that can or cannot use x to decide about the return value). For more information, see my detailed blog article about the lambda function.

# Measurements of a temperature sensor (7 per week)
temperature = [[10, 8, 9, 12, 13, 7, 8], # week 1 [9, 9, 5, 6, 6, 9, 11], # week 2 [10, 8, 8, 5, 6, 3, 1]] # week 3 # How to filter weeks with average temperature <8? # Method 2: Map()
cold_weeks = list(filter(lambda x: sum(x) / len(x) < 8, temperature))
print(cold_weeks)
# [[9, 9, 5, 6, 6, 9, 11], [10, 8, 8, 5, 6, 3, 1]]

Again, the second and third list in the list of lists meet the condition of having an average temperature of less than 8 degrees. So those are included in the variable cold_weeks.

The filter() function returns a filter object that’s an iterable. To convert it to a list, you use the list(...) constructor.

Play with this code by clicking “Next” in the interactive code visualization tool:

Related articles:

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Posted on Leave a comment

How to Show Only Unread Messages in Primary Gmail Tab?

This is a small trick I learned the hard way. When working through the massive amounts of emails, I often wondered: how to get only the unread ones in Gmail that are also in the primary tab?

Queries like these happen quite frequently when working with Gmail. As it turns out, there’s a simple solution:

Simply type the following command in your search bar:

in: category:primary is:unread

For coders, this is an easily understandable filter operation. We want to retrieve all emails from your inbox (in:) that are also in your primary tab (category:primary) and that are also unread (is:unread).

As it turns out, Gmail comes with powerful filtering options even way beyond what you’ve seen here. Here are all the search and filtering operators in Gmail (screenshot from this source):

Simply bookmark this page and come back if you run into the next Gmail search issue.

Posted on Leave a comment

Why You Need to Stop Learning to Code [Video]

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Posted on Leave a comment

Configuring Azure Services and emulators using Visual Studio

Angelos Petropoulos

Angelos

Starting with Visual Studio 16.6 Preview 2 the Connected Services tab offers a new experience called Service Dependencies. You can use it to connect your app to Azure services such as Azure SQL, Storage, Key Vault and many others. Wherever possible local emulation options are also available and more are planned for the future.

Connected Services tab - Service Dependencies table

Add a new Service Dependency

You can easily and quickly get the right NuGet packages, start-up code and configuration added to your project for every supported Azure service. You simply click add, pick the service from the list and follow the 2-3 steps in the wizard. Here is an example of adding Azure Cosmos DB

Connected Services tab - Add Azure CosmosDB

Provision a new instance of an Azure service without leaving the IDE

In the above example we re-used an existing instance of Azure Cosmos DB, but you can also create new instances of all the supported Azure services without leaving the IDE. Here is Azure Cosmos DB again as an example of provisioning Azure resources from within Visual Studio

Connected Services tab - Create Azure Cosmos DB Instance

Configure service dependencies for remote environments

Using Visual Studio to publish your app to Azure App Service gives you the opportunity to configure these dependencies for the remote environment you are publishing to. Right click > Publish on your project in Solution Explorer and go through the wizard to create a new publish profile for Azure App Service. At the end you will see a Service Dependencies list already containing all of your application’s dependencies ready to be configured for this remote environment

Publish - Unconfigured Service Dependencies

How it works under the covers

To support all of this Visual Studio creates two new files visible in Solution Explorer under Properties called serviceDependencies.json and serviceDependencies.local.json. Both of these files are safe to check in as they do not contain any secrets.

serviceDependencies.json file

Visual Studio also creates a file called serviceDependencies.local.json.user which is not visible in Solution Explorer by default. This file contains information that could be considered a secret (e.g. resource IDs in Azure) and we do not recommend you check it in.

Service References

While working on the Connected Services tab we took the opportunity to consolidate our UX and make it the new home for the existing OpenAPI & gRPC Service References table. With everything being in one place now we have routed the Right Click > Add > Service Reference… context menu item in Solution Explorer to the consolidated Connected Services tab.

Connected Services Tab - Service References

Feedback

Please give all of the above a try and let us know what you think. Do you wish we supported a feature or Azure service that we don’t already? Please let us know! You can submit a new feature suggestion, leave us comments on this post and report any issues you may encounter using the built-in tools.

Posted on Leave a comment

How to Average a List of Lists in Python?

Problem: You have a list of lists and you want to calculate the average of the different columns.

Example: Given the following list of lists with four rows and three columns.

data = [[0, 1, 0], [1, 1, 1], [0, 0, 0], [1, 1, 0]]

You want to have the average values of the three columns:

[average_col_1, average_col_2, average_col_3]

There are three methods that solve this problem. You can play with them in the interactive shell and read more details below:

Method 1: Average in Python (No Library)

A simple one-liner with list comprehension in combination with the zip() function on the unpacked list to transpose the list of lists does the job in Python.

data = [[0, 1, 0], [1, 1, 1], [0, 0, 0], [1, 1, 0]] # Method 1: Pure Python
res = [sum(x) / len(x) for x in zip(*data)]
print(res)
# [0.5, 0.75, 0.25]

Do you love Python one-liners? I do for sure—I’ve even written a whole book about it with San Francisco Publisher NoStarch. Click to check out the book in a new tab:

Python One-Liners Book

You can visualize the code execution and memory objects of this code in the following tool (just click “Next” to see how one step of the code unfolds).

Method 2: Average with NumPy Library

You create a NumPy array out of the data and pass it to the np.average() function.

data = [[0, 1, 0], [1, 1, 1], [0, 0, 0], [1, 1, 0]] # Method 2: NumPy
import numpy as np
a = np.array(data)
res = np.average(a, axis=0)
print(res)
# [0.5 0.75 0.25]

The axis argument of the average function defines along which axis you want to calculate the average value. If you want to average columns, define axis=0. If you want to average rows, define axis=1. If you want to average over all values, skip this argument.

Method 3: Mean Statistics Library + Map()

Just to show you another alternative, here’s one using the map() function and our zip(*data) trick to transpose the “matrix” data.

data = [[0, 1, 0], [1, 1, 1], [0, 0, 0], [1, 1, 0]] # Method 3: Statistics + Map()
import statistics
res = map(statistics.mean, zip(*data))
print(list(res))
# [0.5, 0.75, 0.25]

The map(function, iterable) function applies function to each element in iterable. As an alternative, you can also use list comprehension as shown in method 1 in this tutorial. In fact, Guido van Rossum, the creator of Python and Python’s benevolent dictator for life (BDFL), prefers list comprehension over the map() function.

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Posted on Leave a comment

How to Remove Duplicates From a Python List of Lists?

What’s the best way to remove duplicates from a Python list of lists? This is a popular coding interview question at Google, Facebook, and Amazon. In this article, I’ll show you how (and why) it works—so keep reading!

How to remove all duplicates of a given value in the list?

Method 1: Naive Method

Algorithm: Go over each element and check whether this element already exists in the list. If so, remove it. The problem is that this method has quadratic time complexity because you need to check for each element if it exists in the list (which is n * O(n) for n elements).

lst = [[1, 1], [0, 1], [0, 1], [1, 1]] dup_free = []
for x in lst: if x not in dup_free: dup_free.append(x) print(dup_free)
# [[1, 1], [0, 1]]

Method 2: Temporary Dictionary Conversion

Algorithm: A more efficient way in terms of time complexity is to create a dictionary out of the elements in the list to remove all duplicates and convert the dictionary back to a list. This preserves the order of the original list elements.

lst = [[1, 1], [0, 1], [0, 1], [1, 1]] # 1. Convert into list of tuples
tpls = [tuple(x) for x in lst] # 2. Create dictionary with empty values and
# 3. convert back to a list (dups removed)
dct = list(dict.fromkeys(tpls)) # 4. Convert list of tuples to list of lists
dup_free = [list(x) for x in lst] # Print everything
print(dup_free)
# [[1, 1], [0, 1], [0, 1], [1, 1]]

All of the following four sub methods are linear-runtime operations. Therefore, the algorithm has linear runtime complexity and is more efficient than the naive approach (method 1).

  1. Convert into a list of tuples using list comprehension [tuple(x) for x in lst]. Tuples are hashable and can be used as dictionary keys—while lists can not!
  2. Convert the list of tuples to a dictionary with dict.fromkeys(tpls) to map tuples to dummy values. Each dictionary key can exist only once so duplicates are removed at this point.
  3. Convert the dictionary into a list of tuples with list(...).
  4. Convert the list of tuples into a list of lists using list comprehension [list(x) for x in lst].

Each list element (= a list) becomes a tuple which becomes a new key to the dictionary. For example, the list [[1, 1], [0, 1], [0, 1]] becomes the list [(1, 1), (0, 1), (0, 1)] the dictionary {(1, 1):None, (0, 1):None}. All elements that occur multiple times will be assigned to the same key. Thus, the dictionary contains only unique keys—there cannot be multiple equal keys.

As dictionary values, you take dummy values (per default).

Then, you convert the dictionary back to a list of lists, throwing away the dummy values.

Related blog articles:

Do Python Dictionaries Preserve the Ordering of the Keys?

Surprisingly, the dictionary keys in Python preserve the order of the elements. So, yes, the order of the elements is preserved. (source)

This is surprising to many readers because countless online resources like this one argue that the order of dictionary keys is not preserved. They assume that the underlying implementation of the dictionary key iterables uses sets—and sets are well-known to be agnostic to the ordering of elements. But this assumption is wrong. The built-in Python dictionary implementation in cPython preserves the order.

Here’s an example, feel free to create your own examples and tests to check if the ordering is preserved.

lst = ['Alice', 'Bob', 'Bob', 1, 1, 1, 2, 3, 3]
dic = dict.fromkeys(lst)
print(dic)
# {'Alice': None, 'Bob': None, 1: None, 2: None, 3: None}

You see that the order of elements is preserved so when converting it back, the original ordering of the list elements is still preserved:

print(list(dic))
# ['Alice', 'Bob', 1, 2, 3]

However, you cannot rely on it because any Python implementation could, theoretically, decide not to preserve the order (notice the “COULD” here is 100% theoretical and does not apply to the default cPython implementation).

If you need to be certain that the order is preserved, you can use the ordered dictionary library. In cPython, this is just a wrapper for the default dict implementation.

Method 3: Set Conversion

Given a list of lists, the goal is to remove all elements that exist more than once in the list.

Sets in Python allow only a single instance of an element. So by converting the list to a set, all duplicates are removed. In contrast to the naive approach (checking all pairs of elements if they are duplicates) that has quadratic time complexity, this method has linear runtime complexity. Why? Because the runtime complexity of creating a set is linear in the number of set elements. Now, you convert the set back to a list, and voilà, the duplicates are removed.

lst = list(range(10)) + list(range(10))
lst = list(set(lst))
print(lst)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # Does this also work for tuples? Yes! lst = [(10,5), (10,5), (5,10), (3,2), (3, 4)]
lst = list(set(lst))
print(lst)
# [(3, 4), (10, 5), (5, 10), (3, 2)]

However, converting a list to a set doesn’t guarantee to preserve the order of the list elements. The set loses all ordering information. Also, you cannot create a set of lists because lists are non-hashable data types:

>>> set([[1,2], [1,1]])
Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> set([[1,2], [1,1]])
TypeError: unhashable type: 'list'

But we can find a simple workaround to both problems as you’ll see in the following method.

Linear-Runtime Method with Set to Remove Duplicates From a List of Lists

This third approach uses a set to check if the element is already in the duplicate-free list. As checking membership on sets is much faster than checking membership on lists, this method has linear runtime complexity as well (membership has constant runtime complexity).

lst = [[1, 1], [0, 1], [0, 1], [1, 1]] dup_free = []
dup_free_set = set()
for x in lst: if tuple(x) not in dup_free_set: dup_free.append(x) dup_free_set.add(tuple(x)) print(dup_free)
# [[1, 1], [0, 1]]

This approach of removing duplicates from a list while maintaining the order of the elements has linear runtime complexity as well. And it works for all programming languages without you having to know implementation details about the dictionary in Python. But, on the other hand, it’s a bit more complicated.

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Posted on Leave a comment

How to Read a CSV File Into a Python List?

6 min read 

How to read data from a .csv file and add its column or row to the list? There are three main ways: 

  • Option 1 (the quickest): use the standard library 
  • Option 2 (the most preferred): use pandas.read_csv() 
  • Option 3 (optional): use csv.reader() 

Short answer 

The simplest option to read a .csv file into a list is to use it with open(“file”) as f: and apply the actions you need. You should also remember that such an approach has its limitations, as you’ll see in the tutorial. 

Prerequisites 

To read the .csv file, I used the following tools: 

  • Python 3.8
  • PyCharm IDE for convenient coding experience 
  • Sublime Text Editor for a manual check of the .csv file 

By default, you can read CSV files using other tools or even default, pre-installed programs you have on your machine, so it is just a matter of choice what tools to use. The codebase can be executed anywhere with the same results. 

What is the CSV Format? 

Nowadays, three main data formats are used for passing data from one machine to another: CSV, XML, and JSON. 

The abbreviation CSV stands for “comma-separated values”. As the name implies, it is just a list of elements separated by commas. It is the most straightforward format to transfer data and should be used if

  1. you need the most compact file size, or 
  2. you have a flat data structure. 

Keep in mind that CSV files do not give you such flexibility in presenting the data as the other two options. 

Related articles:

Example Task 

This is a real-world task in a simplified form. The goal is to read data from CSV file (70 KB) and form a list of all series codes present in the second line of it. 

The provided data is an open statistical data from the European Central Bank (ECB) in CSV format and present financial flows during the period. The file consist of three main fields: 

  1. series code 
  2. observed date (period, e.g., 2019Q4, 2020Q1, etc.) 
  3. observed value (data point, float number) 

Direct download link.

Data Preparation 

To focus on the parsing option, I suggest you download and extract a file beforehand. In the examples, the file will be placed on the Desktop, but you can put it anywhere you like. 

Script: 

import os import wget link = "http://sdw.ecb.europa.eu/export.do? mergeFilter=&removeItem=L&REF_AREA.252=I8&COUNTERPART_AREA.252=W0 &rc=&ec=&legendPub=published&oc=&df=true&DATASET=0&dc=&ACCOUNTING _ENTRY.252=A&node=9689710&showHide=&removedItemList=&pb=&legendNo r=&activeTab=&STO.252=F&STO.252=K7&STO.252=KA&STO.252=LE&legendRe f=reference&REF_SECTOR.252=S1&exportType=csv&ajaxTab=true" path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" wget.download(link, path) 

Script breakdown: 

import os import wget

Import statements are used to install code base which was written by someone else before and is ready to use just by referring to it. Some them (e.g. wget) should be additionally installed using similar command: 

The following command will install the latest version of a module and its dependencies from the Python Packaging Index: 

python -m pip install SomePackage 

os package is used to perform basic operation with files and folders in your operating system. 

wget package is used to download files from websites. 

link = "http://sdw.ecb.europa.eu/export.do? mergeFilter=&removeItem=L&REF_AREA.252=I8&COUNTERPART_AREA.252=W0 &rc=&ec=&legendPub=published&oc=&df=true&DATASET=0&dc=&ACCOUNTING _ENTRY.252=A&node=9689710&showHide=&removedItemList=&pb=&legendNo r=&activeTab=&STO.252=F&STO.252=K7&STO.252=KA&STO.252=LE&legendRe f=reference&REF_SECTOR.252=S1&exportType=csv&ajaxTab=true" 

The string variable link is created which represents a direct download link. This link can be easily tested in any web-browser. 

path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" 

string variable path is created which represents a path in your system where files will be downloaded later. 

The prefix “f” before the string makes it an “f-string” which means that you can use other variables in the string by using {placeholders}. In this case, variable os.environ[‘HOMEPATH’] refers to system variable (declared in the Windows system by default, not in your python script) and puts it into a string we just created. By default, HOMEPATH refers to the current user by C:\Users\%user% (you). 

wget.download(link, path)

The function call wget.download() triggers the file download from previously specified link and saves it by previously specified path. 

The result of this step is a ready-to-use CSV file on your Desktop. Now we can parse data from CSV file and extract series codes to list. 

Data Exploration 

It is a good practice to explore data before you start parsing it. In this case, you can see that series codes are present in the second row of data.csv

Option 1 (the Fastest): Use the Standard Library 

This is the fastest option of reading a file using the standard library. Assuming the file is prepared and located on your Desktop, you can use the script below. This is the easiest way of getting data on the list. However, it has its drawbacks. 

Input: 

import os path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" with open(path, "r") as f: print(list(f.readlines()[1].split(","))[1:]) 

Output: 

[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]

Script breakdown: 

with open(path, "r") as f: print(list(f.readlines()[1].split(","))[1:]) 

The import statement and variable assignment is skipped as it was described previously and the main attention is given to the last statements. 

This is a combined statement of three parts: 

  1. The “with” statement, in the general meaning, allows us to define what code block (actions) we want to do with the object while it is “active”. In this case, we want to tell python that it has to do some actions while the file is open, and when all statements are completed, close it. 
  2. The “open” statement allows us to open a file and place it is into Python memory. In this case, we open the previously given file (“path” variable) in “r” mode, which stands for “read”-only mode. 
  3. The “print” statement allows you to see output on your screen. In this case we
    1. take file object f with open(path, 'r') as f,
    2. read second line with f.readlines()[1],
    3. split the line by the , separator in f.readlines()[1].split(“,”),
    4. convert the to list list(f.readlines()[1].split(“,”)),
    5. return the list starting from second element as the first one is empty in list(f.readlines()[1].split(“,”))[1:], and
    6. print the result in print(list(f.readlines()[1].split(“,”))[1:]).

There is no specific documentation as this code base uses standard library which is built-in in Python. 

Pros/Cons: Such an approach allows the user to get an instant view of the CSV file and select the required data. You can use this for spot checks and simple transformations. It is important to remember that such an approach has the lowest amount of adjustable settings, and it requires lots of workarounds when transformations are complicated. 

Option 2 (the Most Preferred): Use pandas.read_csv() 

The most preferred option of reading .csv file is using the Pandas library (Pandas cheat sheets here). Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool, built on top of the Python programming language. 

Pandas is usually used for more advanced data analysis where data is stored in a “DataFrame” which is basically like a table in excel. A DataFrame has a header row and an index column so that you can refer to table values by “column x row” intersection. 

Script: 

import os import pandas as pd path = f”C:{os.environ[‘HOMEPATH’]}\\Desktop\\data.csv” df = pd.read_csv(path, delimiter=”,”, skiprows=[0]) list = df.columns.to_list()[1:] print(list) 

Output: 

[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’] 

Script breakdown: 

df = pd.read_csv(path, delimiter=”,”, skiprows=[0])

In this dataframe, variable df is created from the .csv file by executing pandas method read_csv. In this case the method requires several arguments: file, delimiter and skiprows

The file is the same as used before. The delimiter is “,” which is a default option for .csv files, and it might be skipped. But it’s good to know that you can use any other delimiter. 

list = df.columns.to_list()[1:] print(list)

This line selects the column headers and puts them into a list starting from the second element going forward. The result is printed. 

Pros/Cons: Such an approach is relatively fast, visually appealing to the reader, and is fully adjustable using a consistent approach. Comparing to the first option when standard libraries are used, it requires additional packages to be installed. I personally believe that it is not a problem, and such a drawback can be neglected. But another point should not be skipped — the amount of data. This approach is inefficient when you need lots of “side” data, which is useless for your purpose. 

Full documentation is available here with more guides and instructions on how to use it. 

Option 3 (optional): use csv.reader() 

There is also another way how to read .csv files, which might be useful in certain circumstances. The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats. 

Script: 

import os import csv path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" with open(path, 'r') as f: wines = list(csv.reader(f, delimiter=","))[1][1:] 

Output: 

[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]

Script breakdown: 

with open(path, 'r') as f: wines = list(csv.reader(f, delimiter=","))[1][1:] 

csv.reader() is a method which allows you to parse .csv file with specified delimiter. 

After that, we select the second row using first brackets “[1]” and after that, select all elements from that list starting from second “[1:]” using slicing

As this is a standard package, there is documentation at the official python website

Pros/Cons: Such an approach is relatively simple and has just a few lines of code. On the other hand, it requires an additional package to be installed. 

Summary 

You should remember that there are different ways of reading data from CSV files. Select the one which suits your needs most or has the best performance and runtime. 

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!