Posted on Leave a comment

Get Key by Value in The Dictionary

Problem Statement: How to get a key by its value in a dictionary in Python

Example:

# Given dictionary
employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030} # Some Way to extract the Key 'Bob' using its value 2020

We have a clear idea about the problem now. So without further delay, let us dive into the solutions to our question.

Solution 1: Using dict.items()

Approach: One way to solve our problem and extract the key from a dictionary by its value is to use the dict.items(). The idea here is to create a function  that takes the provided value as an input and compares it to all the values present in the dictionary. When we get the matching value, we simply return the key assigned to the value.

Solution:

# Given dictionary
employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030} # Function that fetches key from value
def get_key(v): for key, value in employee.items(): # return the key which matches the given value if v == value: return key return "The provided key is not present in the dictionary" # Passing the keys to the function
print("Employee ID - 2020 \nName - ", get_key(2020))

Output:

Employee ID - 2020 Name - Bob

Note: dict.items() is a dictionary method in Python that returns a view object. The returned view object contains a list of tuples that comprises the key-value pairs in the dictionary. Any changes made to the dictionary will also be reflected in the view object.

Example: The following example demonstrates how the dict.items() method works.

# Given dictionary
employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030}
item = employee.items()
employee['Tom'] = '4040'
print(item)

Output:

dict_items([('Sam', 1010), ('Bob', 2020), ('Rob', 3030), ('Tom', '4040')])

Solution 2: Using keys(), values() and index()

Approach: Another workaround to solve our problem is to extract the keys and values of the dictionary separately in two different lists with the help of the keys() and values() methods. Then find the index/position of the given value from the list that stores the values with the help of the index() method. Once the index is found, you can easily locate the key corresponding to this index from the list that stores all the keys.

Solution: Please follow the comments within the code to get an insight of the solution.

# Given dictionary
employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030}
# store all the keys in a list
key = list(employee.keys())
# store all the values in another list
val = list(employee.values())
# find the index of the given value (2020 in this case)
loc = val.index(2020)
# Use the index to locate the key
print(key[loc])

Output:

Bob

Note:

  • keys() is a dictionary method that returns a view object that contains the keys of the dictionary in a list.
  • values() is a dictionary method that returns a view object consisting of the values in the dictionary within a list.
  • The index() method is used to return the index of the specified item in a list. The method returns only the first occurrence of the matching item.

Example:

employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030}
li = ['Lion', 'Dog', 'Cat', 'Mouse', 'Dog']
key = list(employee.keys())
val = list(employee.values())
loc = li.index('Dog')
print(f"Keys: {key}")
print(f"Values: {val}")
print(f"Index: {loc}")

Output:

Keys: ['Sam', 'Bob', 'Rob']
Values: [1010, 2020, 3030]
Index: 1

Solution 3: Interchanging the Keys and Values

Approach: The given problem can be resolved using a single line of code. The idea is to use a dictionary comprehension that reverses the keys and values. This means the keys in the original dictionary become the values in the newly created dictionary while the values in the original dictionary become the keys in the newly created dictionary. Once you have interchanged the keys and values, you can simply extract the key by its value.va

Solution:

employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030}
res = dict((val, key) for key, val in employee.items())
print("Original Dictionary: ", employee)
print("Modified Dictionary: ", res)
# using res dictionary to find out the required key from employee dictionary
print(res[2020])

Output:

Original Dictionary: {'Sam': 1010, 'Bob': 2020, 'Rob': 3030}
Modified Dictionary: {1010: 'Sam', 2020: 'Bob', 3030: 'Rob'}
Bob

Explanation:

  • employee dictionary has Name and Employee ID as Key-Value pairs.
  • res dictionary interchanges the keys and values of the employee dictionary. Therefore, res now has Employee ID and Name as Key-Value pairs.
  • Since we need to extract the name corresponding to an Employee ID. We can simply get that from the res dictionary with the help of the key which in this case is the Employee ID.

Solution 4: Using zip()

Considering that the values in the given dictionary are unique, you can solve the problem with a single line of code. The idea is to use the keys() and values() dictionary methods to extract the keys and values from the dictionary and then tie them together with the help of the zip() method to produce a dictionary.

employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030}
name = dict(zip(employee.values(), employee.keys()))[2020]
print(f'Name: {name} \nEmployee ID: 2020')

Output:

Name: Bob Employee ID: 2020

Solution 5: Using Pandas

We can also opt to use the Pandas DataFrame to get the key by its value. In this approach, first, we will convert the given dictionary into a data frame.  Further, we can name the column with the keys as “key” and the columns with the values as “value“. To get the key by the given value, we have to return the value from the ‘key‘ column from the row where the value of the ‘value‘ column is the required value.

Example:

# Importing the pandas module
import pandas as pd # Given dictionary
employee = {"Sam": 1010, "Bob": 2020, "Rob": 3030}
# list to store the keys from the dictionary
key = list(employee.keys())
# list to store the values from the dictionary
val = list(employee.values())
# Converting the dictionary into a dataframe
df = pd.DataFrame({'key': key, 'value': val})
print("The data frame:")
print(df)
# Given Value
v = 2020
print("The given value is", v)
# Searching for the key by the given value
k = (df.key[df.value == v].unique()[0])
print("The key associated with the given value is:", k)

Output:

The data frame: key value
0 Sam 1010
1 Bob 2020
2 Rob 3030
The given value is 2020
The key associated with the given value is: Bob

Note: df.key[df.value == v].unique()[0]) –> We have to use the unique method in this line to avoid the index from getting printed. While using the panda’s data frame, the output is not in the string format, but it is a pandas series object type. Hence, we need to convert it using the unique or sum() method. Without the unique method, the output will also consider the index of the data frame column.

Conclusion

That’s all about how to get a key by the value in the dictionary. I hope you found it helpful. Please stay tuned and subscribe for more interesting tutorials. Happy Learning!

Posted on Leave a comment

‘Pip’ Is Not Recognized As An Internal Or External Command [FIXED]

Many factors could lead to the error: ‘pip’ is not recognized as an internal or external command. Two of the most common ones are Python’s or pip’s incorrect installation and lacking path in the system environment variables.

This tutorial deeply explains the concept of environment variables, system paths, and pip’s way of storing packages to enable you to track the source of the error comfortably.

It then takes you through a step-by-step way to solve the error. Apart from Windows, you will see how to solve related errors in Linux. What is more? Read on to find out.

What Are Environment Variables?

Understanding environment variables is one the most crucial steps to solving pip’s errors.

A computing environment is a platform consisting of the operating system and the processor. On the other hand, a variable is a place for storing a value. The variable can be binary, text, number, filename, or any other data type. It gets its name during creation and can be displayed, updated, and deleted.

The combination of a computing environment and variable is an environment variable, a dynamic value affecting the behavior of a computer process. A computer process is an instance of a program.

# Determine the value of a variable
echo %VARIABLE% # in Windows
echo $VARIABLE # in Linux # display
%VARIABLE% # in Windows
env # command for printing all environment variables OR
printenv # show a single environment variable in Linux.

Features Of Environment Variables

  • They can be created, read, edited, and deleted.
  • Each process has its set of environment variables. A newly created process inherits its parent’s same runtime environment.
  • Environment variables occur in scripts and the command line.
  • Shell scripts and batch files use environment variables to communicate data and processes to child processes or temporarily store data.
  • A running process can access the environment variables for configuration reasons.
  • A collection of environment variables behave like an associative array, with keys and values in strings.
  • Environment variables may differ depending on the operating system.
  • Windows stores the default environment variable values in the registry and sets them in the AUTOEXEC.BAT file.

Examples Of Environment Variables

Here are the typical environment variables that interact with pip.

PATH

The path variable lists the directory where your system searches executables. It enables you to view the location of a directory without typing the full path.

In Windows, the path variables are stored in C:\Windows or C:\Windows\System32. In Linux, they originate from the user’s bin or sbin file.

HOME

It shows the default path to the user’s home directory. For instance, HOME//APPDATA stores app settings in Windows. In Linux, the settings are found in HOME/{.App Name}.

In Windows, the misplaced APPDATA lands in the USERPROFILE environment variable, which should instead be used for dialogs to allow a user to choose between folders. LOCALAPPDATA stores local app settings.

TEMP

It stores temporary processes.

Now that you understand how environment variables play a massive in package working, you should find out specific ways to solve pip’s errors.

Solution 1: Ensure Pip Is Installed Correctly And Up-to-date

Windows

Pip packages are stored in Python’s installation directory. For instance, installing Python in C:\Python\ stores the default library in C:\Python\Lib\, while the third-party packages reside in C:\Python\Lib\site-packages.

If you install a specific Python version as a stand-alone, pip packages reside in APPDATA.

C:\Users\<username>\AppData\Roaming\Python\Python<version-subversion>\site-packages\ # the version can be 310 for Python 3.10 or 38 for Python 3.8

If you install a pip package that does not use a specific location, it lands in Scripts.

C:\Python310\Scripts\ 

Pip gets installed by default when you install most Python 3 versions. You can confirm the installation by checking the pip’s version or help command.

pip -V
# OR
pip help

You should get pip’s version version, installation folder, and Python version running it.

pip 22.0.4 from C:\Users\<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pip (python 3.10)

Otherwise, you could get an error,

'pip' is not recognized as an internal or external command

OR

Python is not recognized as an internal or external command, operable program or batch file.

if you try running python.

python

If you run the above commands without seeing Python, pip, or the installed package, you should download Python.

Install pip as a stand-alone package if pip is still unavailable after installing Python. Download get-pip, and run the following command on the command prompt.

python get-pip.py

Lastly, you can upgrade the pip version and check if the error persists.

python -m pip install --upgrade pip

If the problem is still not solved, try adding Python to the system path variable, as explained in solution 2 of this tutorial.

Linux

The usr is one of the most crucial folders in Linux. It stores information like user binaries, libraries, documentation, and header files. It is where packages that pip manages get installed.

Say we want to install Python 3.10 on Ubuntu 20.04. We can do that by downloading Python from the source or using the deadsnakes custom PPA as follows.

# Update the system, ensuring the required packages are installed.
sudo apt update && sudo apt upgrade -y # Install the required dependency needed to add the custom PPAs.
sudo apt install software-properties-common -y # Add the deadsnakes PPA to the list of APT package manager sources.
sudo add-apt-repository ppa:deadsnakes/ppa # Download Python 3.10
sudo apt install python3.10 # Confirm successful installation
python3.10 --version

The next step is to locate pip.

# pip
pip --version
# OR
pip -V
pip list -v # pip3
pip3 -V
pip list -v

Either way, you may get the following errors.

# pip
Command 'pip' not found, but can be installed with:
sudo apt install python3-pip # pip3
Command 'pip3' not found, but can be installed with:
sudo apt install python3-pip

You get a similar error when you try installing a package.

# pip
pip install django
Command 'pip' not found, but can be installed with:
sudo apt install python3-pip # pip3
pip3 install django
Command 'pip3' not found, but can be installed with:
sudo apt install python3-pip

Let’s install pip.

sudo apt install python3-pip

Solution 2: Add The Path Of Pip Installation To The PATH System Variable

You can use the terminal or the GUI.

setx PATH "%PATH%;C:\Python<version-subversion>\Scripts" # For example
setx PATH "%PATH%;C:\Python310\Scripts" # for Python 3.10

To use the GUI,

  1. copy to the full path of the system variable: C:\<username>\steve\AppData\Local\Programs\Python\Python310\Scripts
  2. Type Edit the Environment Variables on the search bar.
  3. On the pop-up window, click on the Advanced tab followed by Environment Variables.

4. You are presented with two boxes. Highlight path on the first box followed by the Edit button below the box.

5. Click on New, paste the script path you had copied earlier, followed by OK on the bottommost part of the screen.

Conclusion

You have learned the leading causes of the error, “‘pip’ is not recognized as an internal or external command,” while installing packages and two typical ways to correct it.

You can check whether your installation was successful and whether the pip is updated and lies in the correct path. Otherwise, you can take the most appropriate step, as explained in this tutorial.

Please stay tuned and subscribe for more interesting discussions.

Posted on Leave a comment

How to Sort a List of Tuples by Second Value

In this article, you’ll learn how to sort a list of tuples by the second value in Python.

To make it more fun, we have the following running scenario:

BridgeTech is a bridge restoration company. They have asked you to sort and return the Top 10 elements from the Periodic Table based on the ‘Atomic Radius’ in descending order.

The atomic radius of a chemical element is a measure of the size of its atom, usually the mean or typical distance from the center of the nucleus to the outermost isolated electron.

Wikpedia

Click here to download the Periodic Table. Save this file as periodic_table.csv and move it to the current working directory.

💬 Question: How would you write the Python code to accomplish this task?

We can accomplish this task by one of the following options:


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import numpy as np
from operator import itemgetter

💡 Note: The operator library is built-in to Python and does not require installation.


Method 1: Use Sort and a Lambda

To sort a list of tuples based on the second element, use sort() and lambda in the one-liner expression tups.sort(key=lambda x: x[1], reverse=True).

Here’s an example:

df = pd.read_csv('periodic_table.csv', usecols=['Name', 'AtomicRadius'])
tups = [tuple(x) for x in df.values.tolist()]
tups.sort(key=lambda x: x[1], reverse=True)
print(tups[0:10])

The CSV file is read in preparation, and two (2) columns save to a DataFrame. The DataFrame then converts to a list of tuples (tups) using List Comprehension.

We are ready to sort!

A lambda is passed as a parameter to sort() indicating the sort element (x[1]), and the sort order is set to descending (reverse=True). The results save to tups.

To complete the process, slicing is performed, and the Top 10 elements are sent to the terminal.

Output

[('Francium', 348.0), ('Cesium', 343.0), ('Rubidium', 303.0), ('Radium', 283.0), ('Potassium', 275.0), ('Barium', 268.0), ('Actinium', 260.0), ('Strontium', 249.0), ('Curium', 245.0), ('Californium', 245.0)]

Method 2: Use Sort & Itemgetter

To sort a list of tuples by the second element, use the sort() and itemgetter() functions in the expression tuples.sort(key=itemgetter(1), reverse=True).

Here’s an example:

df = pd.read_csv('periodic_table.csv', usecols=['Name', 'AtomicRadius'])
tups = [tuple(x) for x in df.values.tolist()]
tups.sort(key=itemgetter(1), reverse=True)
print(tups[0:10])

The CSV file is read in preparation, and two (2) columns save to a DataFrame. The DataFrame then converts to a List of Tuples (tups) using List Comprehension.

We are ready to sort!

The sort() function passes a key (itemgetter(n)) where n is the sort element (itemgetter(1)), and the sort order is set to descending (reverse=True).

The results save to tups.

To complete the process, slicing is performed, and the Top 10 elements are sent to the terminal.

💡 Note: The itemgetter() function is slightly faster than a lambda. Use itemgetter if speed and memory are a factor.


Method 3: Use Sorted & Lambda

To sort a list of tuples by the second element, combine the functions sorted() and lambda in the expression sorted(tups, key=lambda x:(x[1]), reverse=True) and assign the resulting sorted list to the original variable tups.

Here’s an example:

df = pd.read_csv('periodic_table.csv', usecols=['Name', 'AtomicRadius'])
tups = [tuple(x) for x in df.values.tolist()]
tups = sorted(tups, key=lambda x:(x[1]), reverse=True)
print(tups[0:10])

The CSV file is read in preparation, and two (2) columns save to a DataFrame. The DataFrame then converts to a List of Tuples (tups) using List Comprehension.

We are ready to sort!

A lambda is passed as a parameter to sorted(), indicating the sort element (x[1]), and the sort order is set to descending (reverse=True). The results save to tups.

To complete the process, slicing is performed, and the Top 10 elements are sent to the terminal.


Method 4: Use Bubble Sort

To sort a List of Tuples by the second element, you can also modify a sorting algorithm from scratch such as Bubble Sort to access the second (or n-th) tuple value as a basis for sorting.

Here’s an example:

df = pd.read_csv('periodic_table.csv', usecols=['Name', 'AtomicRadius'])
tups = [tuple(x) for x in df.values.tolist()] def sort_tuples_desc(tups, idx): length = len(tups) for i in range(0, length): for j in range(0, length-i-1): if (tups[j][idx] < tups[j + 1][idx]): tmp = tups[j] tups[j] = tups[j+1] tups[j+1] = tmp return tups
print(sort_tuples_desc(tups, 1)[0:10])

The CSV file is read in preparation, and two (2) columns save to a DataFrame. The DataFrame then converts to a List of Tuples (tups) using List Comprehension.

We are ready to sort!

A sort function sort_tuples_desc is created and passed two (2) parameters: a List of Tuples (tups), and the sort element (idx). Then, the infamous Bubble Sort is performed on the elements.

This function returns a List of Tuples sorted in descending order.

To complete the process, slicing is performed, and the Top 10 elements are sent to the terminal.


Summary

These four (4) methods of sorting a List of Tuples based on the second element should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!


Posted on Leave a comment

Writing a List to a File in Python

Problem Statement: How to write a list to a file with Python?

Mostly Python programmers use persistent storage systems like databases or files to store serialized data structures like arrays, lists, and dictionaries. It is because databases and files are reusable, i.e. after analyzing the given data, we can store it in the file, and later that data can be read to use in an application. There are many different ways to write a list to the file in Python. Let’s look at some of them:

Method 1- Using Read And Write

Python facilitates us with standard methods used to read the data from the file and to write the data to a file. While dealing with single lines, we can use the read() and write() methods, respectively. Suppose we have the following list of strings and we have to store each string in a file using Python:

colors = ["red", "black", "white", "yellow", "blue"]

To write the list in to file, we have to follow the steps given below:

  • Firstly, open the file in write mode by passing the file path and access mode “w” to the open() function.
  • Next, we have to use the “for” loop to iterate the list. In each iteration, we will get a list item that we need to write in the file using the write() method.
  •  After iterating through the whole list, we need to ensure that we have closed the file. For that, we use the close() method.

Let’s visualize the above demonstration with the help of the following snippet:

# List of colours
colors = ["red", "black", "white", "yellow", "blue"]
# Opening the file in write mode
file = open('colors.txt', 'w')
# Writing the list to the file
for color in colors: file.write(color + '\n')
# Closing the file
file.close()

Output:

red black white yellow blue

Note: The ‘\n‘ character is used for a new line at the end of each item in the list.

Let’s have a look at a situation that demonstrates how we can read the list from the file:

Example:

# Empty list that will read from the file
colors = []
# Opening the file in read mode
with open(r'colors.txt', 'r') as file: for color in file: x = color[:-1] colors.append(x)
print(colors)

Output:

["red", "black", "white", "yellow", "blue"]

Recommended Read: How to Read a File Line-By-Line and Store Into a List?

Method 2- Using Writelines() Method

While dealing with multiple lines, we have to use the readlines() and writelines() file methods in Python.  Hence we can write the entire list into a file using the writelines() method.

Example:

# List of colours
colors = ["red", "black", "white", "yellow", "blue"]
# Opening the file in write mode
with open('colors.txt', 'w') as file: # Writing the entire list to the file file.writelines("\n" % color for color in colors)

Output:

red black white yellow blue

⦿ The following example shows how to use readlines() to read the entire list from a file in Python:

Example:

# Empty list that will read from the file
colors = []
# Opening the file in read mode
with open(r'colors.txt', 'r') as file: colors = [color.rstrip() for color in file.readlines()]

Output:

["red", "black", "white", "yellow", "blue"]

Method 3- Using The Pickle Module

Pickle is a module in Python that is used to serialize or de-serialize an object structure. We can use this module to serialize a list for later use in the same file.  The dump() method from the module is used to write the list into a file and it takes the reference of the file and list as its parameters. The method stores the list efficiently as a binary data stream. As it uses a binary stream, the file can even be opened in binary writing mode (wb).  Using the module, we can convert any object like a list or dictionary into a character stream. The character stream has the information to reconstruct the object in the future.

Approach: To write a list into the file, we have to first import the pickle module at the start of the program. Then we will use the access mode to open the file. The open() function checks if the file exists or not and if it exists, it gets truncated. The function creates a new one if the file doesn’t already exist. Further, the dump() method converts the object and writes it into the file.

Example:

# Importing the pickle module
import pickle
# Writing the list to the binary file
def writel(a): # Storing the list in binary file (wb) mode with open('file', 'wb') as fp: pickle.dump(colors, fp) print('Completed the process of writing the list into a binary file')
# Reading the list to memory
def readl(): # Reading the list in binary file (rb) mode with open('sample', 'rb') as fp: n = pickle.load(fp) return n
# List of colors
colors = ["red", "black", "white", "yellow", "blue"]
# Calling the writel method
writel(colors)
color = readl()
# Printing the list
print(color)

Output:

Completed the process of writing the list into a binary file ["red", "black", "white", "yellow", "blue"]

Method 4- Using The Json Module

We can use the JSON module to convert the list into a JSON format and then write it into a file using the JSON dump() method. Generally, when we execute a GET request, we will receive a response in JSON format. We can then store the JSON response in a file for any future use.

# Importing the JSON module
import JSON
def writel(a): with open("colors.json", "w") as fp: json.dump(a, fp) print('Completed the process of writing json data into json file')
# Reading the list to memory
def readl(): with open('colors.json', 'rb') as fp: n = json.load(fp) return n
# List of colors
colors = ["red", "black", "white", "yellow", "blue"]
writel(colors)
color = readl()
# Printing the list
print(color)

Output:

Completed the process of writing json data into json file ["red", "black", "white", "yellow", "blue"]

Conclusion

That’s all about how to write a list to a file with Python. I hope you found it helpful. Please stay tuned and subscribe for more interesting articles. Happy learning!

Recommended: Correct Way to Write line To File in Python

Authors: Rashi Agarwal and Shubham Sayon

Posted on Leave a comment

How to Count the Occurrences of a List Element

In this article, you’ll learn how to count the occurrences of a selected List element in Python.

To make it more fun, we have the following running scenario:

A Teacher from Orchard Elementary would like a script created for the 4th-grade students called “Count-Me“. She would like this script to do the following:

  • First, generate and display 10 random numbers on a single line.
  • Next, generate and display one (1) random number to find.
  • Prompt for the total occurrences found.
  • Display a message validating the solution.

💬 Question: How would we write the Python code to accomplish this task?

We can accomplish this task by one of the following options:

  • Method 1: Use NumPy and count()
  • Method 2: Use operator countOf()
  • Method 3: Use a For Loop
  • Method 4: Use a Counter()

Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import numpy as np
import random
import operator
from collections import Counter

💡 Note: The counter and collections libraries are built-in to Python and do not require installation.


Method 1: Use NumPy and count()

To count the total occurrences of an element inside a List, this example will use NumPy and the count() function.

the_list = list(np.random.choice(20, 20))
dup_num = the_list[random.randint(0, 19)]
dup_count = the_list.count(dup_num) try: print(the_list) check = int(input(f'How man times does the number {dup_num} appear in the list? ')) if check == dup_count: print(f'Correct! The answer is {check}.') else: print(f'Sorry! Try again!')
except ValueError: print(f'Incorrect value. Bye')

The previous code snippet performs the following steps:

  • Our first line generates and saves 20 random numbers to the_list.
  • Next, dup_num is created by generating and saving one (1) random number from the_list.
  • Finally, we determine how many occurrences of dup_num were found using count().
  • The result saves to dup_count.

Inside the try statement, the_list is output to the terminal.

The user is prompted to enter the total number of occurrences. To confirm, the user presses the <Enter> key. The value entered is then compared to dup_count, and a message indicates the outcome.

💡 Note: Click here for details on the try/except statement.


Method 2: Use operator countOf()

To count the total occurrences of a specified element inside a List, this example will use the countOf() function.

the_list = [random.randrange(0, 20) for num in range(20)]
dup_num = the_list[random.randint(0, 19)]
dup_count = operator.countOf(the_list, dup_num) try: print(the_list) check = int(input(f'How man times does the number {dup_num} appear in the list? ')) if check == dup_count: print(f'Correct! The answer is {check}.') else: print(f'Sorry! Try again!')
except ValueError: print(f'Incorrect value. Bye')

This code snippet performs the following steps:

  • Our first line generates and saves 20 random numbers to the_list.
  • Next, dup_num is created by generating and saving one (1) random number from the_list.
  • Finally, we determine how many occurrences of dup_num were found using operator.countOf().
  • The result saves to dup_count.

Inside the try statement, the_list is output to the terminal.

The user is prompted to enter the total number of occurrences. To confirm, the user presses the <Enter> key.

The value entered is then compared to dup_count, and a message indicates the outcome.


Method 3: Use a For Loop

To count the total occurrences of a specified element inside a List, this example will use the For Loop.

the_list = [random.randrange(0, 20) for num in range(20)]
dup_num = the_list[random.randint(0, 19)] dup_count = 0
for i in the_list: if i == dup_num: dup_count += 1 try: print(the_list) check = int(input(f'How man times does the number {dup_num} appear in the list? ')) if check == dup_count: print(f'Correct! The answer is {check}.') else: print(f'Sorry! Try again!')
except ValueError: print(f'Incorrect value. Bye')

The previous code snippet performs the following steps:

  • Our first line generates and saves 20 random numbers to the_list.
  • Next, dup_num is created by generating and saving one (1) random number from the_list.
  • Finally, a For Loop is instantiated. Upon each Loop, the element is matched against dup_num.
  • If found, dup_count is increased by one (1).

Inside the try statement, the_list is output to the terminal.

The user is prompted to enter the total number of occurrences. To confirm, the user presses the <Enter> key.

The value entered is then compared to dup_count, and a message indicates the outcome.


Method 4: Counter()

To count the total occurrences of a specified element inside a List, this example will use the Counter() initializer method.

the_list = [random.randrange(0, 20) for num in range(20)]
dup_num = the_list[random.randint(0, 19)]
d = Counter(the_list)
dup_count = d[dup_num] try: print(the_list) check = int(input(f'How man times does the number {dup_num} appear in the list? ')) if check == dup_count: print(f'Correct! The answer is {check}.') else: print(f'Sorry! Try again!')
except ValueError: print(f'Incorrect value. Bye')

The previous code snippet performs the following steps:

  • Our first line generates and saves 20 random numbers to the_list.
  • Next, dup_num is created by generating and saving one (1) random number from the_list.
  • Finally, a For Loop is instantiated. Upon each Loop, an element is matched against dup_num.
  • If found, dup_count is increased by one (1).

Inside the try statement, the_list is output to the terminal.

The user is prompted to enter the total number of occurrences. To confirm, the user presses the <Enter> key.

The value entered is then compared to dup_count, and a message indicates the outcome.


Summary

These four (4) methods of counting occurrences of a specified element inside a List should give you enough information to select the best one for your coding requirements.

Good Luck & Happy Coding!


Posted on Leave a comment

How to Create Word Clouds Using Python?

You may have already learned how to analyze quantitative data using graphs such as bar charts and histograms.

But do you know how to study textual data?

One way to analyze textual information is by using a word cloud:

Figure 0: Word cloud you’ll learn how to create in this article.

There are many ways to create word clouds, but we will use the WordCloud library in this blog post. WordCloud is a Python library that makes word clouds from text files.

What Are Word Clouds?

💬 Definition: A word cloud (also known as a tag cloud) is a visual representation of the words that appear most frequently in a given text. They can be used to summarize large bodies of text or to visualize the sentiment of a document.

A word cloud is a graphical representation of text data in which the size of each word is proportional to the number of times it appears in the text.

They can be used to visualize the most critical words in a document quickly or to get an overview of the sentiment of a piece of text.

There are word clouds apps such as Wordle, but in this blog post, we will show how to create word clouds using the Python library WordCloud.

What’s the WordCloud Library in Python?

The WordCloud library is open source and easy to use to create word clouds in Python.

It allows you to create word clouds in various formats, including PDF, SVG, and image files.

In addition, it provides several options for customizing your word clouds, including the ability to control the font, color, and layout.

You can install it using the following command in your terminal (without the $ symbol):

$ pip install wordcloud

Related Article:

Where Are Word Clouds Used?

Word clouds are a fun and easy way to visualize data.

By displaying the most common words in a given text, they can provide insights into the overall themes and tone of the text.

  • Word clouds can be used for various purposes, from educational to marketing.
  • They can use word clouds for vocabulary building and text analysis in the classroom.
  • You can also use word clouds to generate leads or track customer sentiment.
  • For businesses, word clouds can be used to create marketing materials, such as blog posts, infographics, and social media content.
  • Word clouds can also monitor customer feedback or identify negative sentiment.
  • Students can also use word Clouds to engage in an analysis of a piece of text. By visually highlighting the most important words, Word Clouds can help students to identify the main ideas and make connections between different concepts.

Pros of Word Clouds

The advantages of using word clouds are:

First, you can use them to summarize a large body of text quickly and easily. Identifying the most frequently used words in a text can provide a quick overview of the main points.

Second, with word clouds, you can quickly visualize the sentiment in a document. The size and placement of words in the Word Cloud can give you insights into the overall tone of the document. This tool is handy when analyzing a large body of text, such as customer feedback or reviews.

Third, word clouds can be a valuable tool for identifying the most critical keywords in a text. By analyzing the distribution of words, you can quickly identify which terms are most prominent. The word clouds can be beneficial when monitoring changing trends or assessing the overall importance.

Fourth, word clouds can be used to create designs that incorporate both visual and textual elements. By blending words and images, word clouds can add another layer of meaning to an already exciting design.

How to Create Word Clouds in Python?

We will be using Disneyland reviews downloaded from Kaggle to create a word cloud data visualization. 

You can download the file from here.

In this file, we will be focussing on the Review_Text column for creating a word cloud. You can ignore other columns.

First, you have to install the WordCloud Python library. You can do this by running the following command in a terminal:

pip install wordcloud

Once you have installed WordCloud, you must import pandas, matplotlib.pyplot, and wordcloud libraries.

import pandas as pd
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

The pandas library reads the Disneyland reviews CSV file into a data frame.

We will show you the use of STOPWORDS in the upcoming section.

The data frame variable “df” stores the data from the disneylandreviews.csv file with the following command.

df = pd.read_csv("/Users/mohamedthoufeeq/Downloads/DisneylandReviews.csv")

Now run the program and see the output.

You get the following Unicode decode error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 121844: invalid continuation byte

The Unicode decode error means that the string could not be properly decoded into UTF-8. This can happen when a file is downloaded from the Kaggle, and it is not in the correct encoding format.

To solve this problem, you need to specify the encoding format for the file. You can type the following command in a terminal:

df = pd.read_csv("/Users/mohamedthoufeeq/Downloads/DisneylandReviews.csv",encoding='ISO-8859-1')

The encoding = 'ISO-8859-1' tells pandas that the file is in the ISO-8859-1 encoding format.

Next, create a word cloud using the WordCloud Python library.

wordcloud = WordCloud().generate(['Review_Text'])

In this above code, WordCloud().generate() is used to create a word cloud object.

The generate() function takes a list of strings as input. The list we are interested in is Review_Text which contains reviews about Disney Land. The words from the review you want to appear in your word cloud.

Go ahead and run the code.

You get again following error.

TypeError: expected string or bytes-like object

The type error means that the word cloud object expects a string or a bytes-like object. But the data type is Pandas series.

To solve this, You have to type following command

wordcloud = WordCloud().generate(' '.join(df['Review_Text']))

The above command converts the series to strings data type.

plt.imshow(wordcloud)

The plt.imshow() call will create a word cloud image in 2D.

Then remove the axis with the following command:

plt.axis("off")

The "off" parameter removes the axis from the plot.

Finally, the below commands displays the image of the word cloud.

plt.show()

Once run the program you will see a word cloud image as shown below:

Figure 1. 

The word "Park" is bigger, representing that this word appears more in reviews.

But there are words such as "Disneyland", "went", "will", "park", "go", "day", and "One" that are unrelated for analysis.

So we can exclude them from the word cloud with the following command using the stopwords parameter.

STOPWORDS.update(['Disneyland', 'went','will,'go',"park", "day","one"])
wordcloud = WordCloud(stopwords = STOPWORDS).generate(' '.join(df['Review_Text']))

STOPWORDS will remove all the defined words from the text before creating the word cloud. The word cloud function inserts the STOPWORDS parameter.

Now re-run the program, and you will get the following word cloud image.

Figure 2. 

Before we can analyze the words, let us see how to customize the words’ appearance.

You can also customize the appearance of your word cloud by changing the font size and background color.

The maximum font size can be set with the max_font_size option, and the minimum font size can be set with the min_font_size option. The background color of the word cloud can be set with the background_color option.

wordcloud = WordCloud(min_font_size = 10, max_font_size = 70, stopwords = STOPWORDS, background_color="white").generate(' '.join(df['Review_Text']))

The code sets the font size to a minimum of 10 points and a maximum of 70 points, and the background color to white.

Re-run the program, and you will get the following word cloud image.

Figure 3. 

Also, you can set the maximum amount of words to be generated using the max_words parameter.

wordcloud = WordCloud(min_font_size = 5, max_font_size = 100, max_words = 1000, stopwords = STOPWORDS, background_color="white").generate(' '.join(df['Review_Text']))

The above code sets the maximum number of words generated in the word cloud to 1000. Also, change the font size to 5 and 100.

Re-run the program, and you will get the following word cloud.

Figure 4. 

As you can see, when you increase the number of words to 1000, the words that are repeated more in the reviews are shown in a larger size.

This makes it easier to find out which words are prominent. In this word cloud, you can see that "ride" is the largest word.

You set width and height  of the word cloud image.

wordcloud = WordCloud(width=350, height=350, min_font_size=5, max_font_size=100, max_words=1000, stopwords=STOPWORDS, background_color="white").generate(' '.join(df['Review_Text']))

The above code sets the width and height of the word cloud to 350.

Re-run the program, and you will get the following word cloud image.

Figure 5. 

Now let’s analyze the word cloud to get some insights.

The word "ride" appears large in the word cloud as it is the most frequent word in the text. Most people like to ride in Disneyland, which is reflected in the word cloud. 

Next, the word "attraction" is also popular. It shows that people are attracted to the rides and attractions in Disneyland. 

Also, the word "time" appears frequently. The word indicates that people spend a lot of time in Disneyland. 

Staffs of Disney land were very lovely. It is reflected in the word cloud as the word "nice" appears frequently. From the reviews, we can see that there are more queues and people are waiting for a long time, which is also reflected in the word cloud.

The words "lines" and "queue" are also more prominent words in the text.

But the word "hotel" is not popular in the text and represents that people do not prefer to stay in the hotel and go back home after spending the whole day in Disneyland.

💬 Exercise: You can get more insights by analyzing the word cloud data. Try it out!

Summary

Word clouds are a great way to summarize large bodies of text or visualize a document’s sentiment.

Word clouds are a great way to understand large bodies of text and can be used for various purposes.

This blog post showed how to create word clouds using the Python library WordCloud.

We also discussed how to customize the appearance of the word cloud and analyzed the word cloud data to get insights into the text.

What do you use?


Posted on Leave a comment

How to Color a Scatter Plot by Category using Matplotlib in Python

Problem Formulation

Given three arrays:

  • The first two arrays x and y of length n contain the (x_i, y_i) data of a 2D coordinate system.
  • The third array c provides categorical label information so we essentially get n data bundles (x_i, y_i, c_i) for an arbitrary number of categories c_i.

💬 Question: How to plot the data so that (x_i, y_i) and (x_j, y_j) with the same category c_i == c_j have the same color?

Solution: Use Pandas groupby() and Call plt.plot() Separately for Each Group

To plot data by category, you iterate over all groups separately by using the data.groupby() operation. For each group, you execute the plt.plot() operation to plot only the data in the group.

In particular, you perform the following steps:

  1. Use the data.groupby("Category") function assuming that data is a Pandas DataFrame containing the x, y, and category columns for n data points (rows).
  2. Iterate over all (name, group) tuples in the grouping operation result obtained from step one.
  3. Use plt.plot(group["X"], group["Y"], marker="o", linestyle="", label=name) to plot each group separately using the x, y data and name as a label.

Here’s what that looks like in code:

import pandas as pd
import matplotlib.pyplot as plt # Generate the categorical data
x = [1, 2, 3, 4, 5, 6]
y = [42, 41, 40, 39, 38, 37]
c = ['a', 'b', 'a', 'b', 'b', 'a'] data = pd.DataFrame({"X": x, "Y": y, "Category": c})
print(data) # Plot data by category
groups = data.groupby("Category")
for name, group in groups: plt.plot(group["X"], group["Y"], marker="o", linestyle="", label=name) plt.legend()
plt.show()

Before I show you how the resulting plot looks, allow me to show you the data output from the print() function. Here’s the output of the categorical data:

 X Y Category
0 1 42 a
1 2 41 b
2 3 40 a
3 4 39 b
4 5 38 b
5 6 37 a

Now, how does the colored category plot look like? Here’s how:

If you want to learn more about Matplotlib, feel free to check out our full blog tutorial series:

Posted on Leave a comment

How to Use Pandas Rolling – A Simple Illustrated Guide

This article will demonstrate how to use a pandas dataframe method called rolling().

What does the pandas.DataFrame.rolling() method do?

In short, it performs rolling windows calculations.

It is often used when working with time-series data or signal processing. I will shortly dive into a few practical examples to clarify what this means in practice.

The method will be given a parameter that specifies how big the window the desired calculations should be performed in.

A simple example of using time series data could be that each row of a pandas dataframe represents a day with some values.

Let’s say that the desired window size is five days. The rolling method is given a five as input, and it will perform the expected calculation based on steps of five days. 

Before an example of this, let’s see the method, its syntax, and its parameters.  

pandas.DataFrame.rolling()

Dataframe.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method=’single’)

Let’s dive into the parameters one by one:

window

window: int, offset, or BaseIndexer subclass

This is the size of the moving window.

If an integer, the fixed number of observations is used for each window.

If an offset, the time period of each window. Each window will be variable-sized based on the observations included in the time period. This is only valid for datetime-like indexes.

If a BaseIndexer subclass, the window boundaries are based on the defined get_window_bounds() method. Additional rolling keywords argument, namely min_periods, center, and closed will be passed to get_window_bounds().

min_periods

min_periods: int, default None

This is the minimum number of observations in the window required to have a value.

Otherwise, the result is assigned np.nan.

  • For a window that is specified by an offset, min_periods will default to 1.
  • For a window specified by an integer, min_periods will default to the size of the window. 

center

center: bool, default False

If False, set the window labels as the right edge of the window index. If True, set the window labels as the center of the window index. 

win_type

win_type: str, default None

If None, all points are evenly weighted.

If a string, it must be a valid window function from scipy.signal

Some of the Scipy window types require additional parameters to be passed in the aggregation function.

The additional parameters must match the keywords specified in the Scipy window type method signature. 

on

on: str, optional

For a Dataframe, a column label or index level on which to calculate the rolling window, rather than the Dataframes index.

The provided integer column is ignored and excluded from the result since an integer index is not used to calculate the rolling window. 

axis

axis: int or str, default 0

If 0 or 'index', roll across the rows. If 1 or 'columns', roll across the columns. 

closed

closed: str, default None

  • If 'right', the first point in the window is excluded from calculations.
  • If 'left', the last point in the window is excluded from calculations.
  • If 'both', then no points in the window are excluded from the calculations.
  • If 'neither', the first and last points in the window are excluded from the calculations.

Default None means 'right'.

method

method: str {'single', 'table'}, default 'single'

Execute the rolling operation per single column or row for 'single' or over the entire object for 'table'.

This argument is only implemented when specifying engine='numba' in the method call. 


This part was obtained from the official pandas documentation

Data

The data I will be working with for this tutorial is historical data for a stock, the amazon stock.

I use the python package yfinance to import the data. I will use data starting from 2021-04-01 and running one year forward in time.

The data only includes trading days, i.e., days when the stock market was open.

# Get the stock data from Yahoo finance
AmazonData1y = yfinance.Ticker("AMZN").history(period='1y', actions=False, end='2022-04-01')
display(AmazonData1y.head(20))

The resulting dataframe contains data about the opening price, the highest price, the lowest price, the closing price, and the trading volume for each day. 

Calculating moving averages

The first calculations using the rolling method I will do are some different moving averages values. They are often applied in stock analysis.

💡 A moving average value is a statistic that captures the average change in a data series over time. (source)

Let’s calculate the moving averages for seven days and 15 days for the stock closing price and add those values as new columns to the existing amazon dataframe.

They are named 'MA7' and 'MA15'.

# Calculating the 7 and 15 day moving averages based on closing price
# and adding them as new columns
AmazonData1y['7MA'] = AmazonData1y['Close'].rolling(7).mean()
AmazonData1y['15MA'] = AmazonData1y['Close'].rolling(15).mean() display(AmazonData1y.head(20))

Since there is no data before 2021-04-01, no seven-day moving average can be calculated before 2021-04-13 and no 15-day moving average before 2021-04-23

Calculating the Sum of Trading Volume

Let’s now instead use the rolling method to calculate the sum of the volume from the last five trading days to spot if there was any spike in volume.

It is done in the same way as for the moving average, but here the sum() method is used together with the rolling method instead of the mean() method.

I will also add this as a new column to the existing Amazon dataframe. 

# Calculating 5 day volume using rolling
AmazonData1y['5VOL'] = AmazonData1y['Volume'].rolling(5).sum() display(AmazonData1y.head(20))

This metric might not be the most useful but it is a good way to explain how you could use the rolling method together with the sum() method. 

Using rolling() with Aggregation

If combining the rolling() method with the aggregation method agg(), it is easy to perform rolling calculations on multiple columns simultaneously.

Say that I would like to find the highest high and the lowest low for the last seven days. 

# Performing rolling calculations on multiple columns at the
# same time using .agg()
SevenHighAndLow = AmazonData1y.rolling(7).agg({'High': 'max', 'Low': 'min'}) display(SevenHighAndLow.head(20))

Plotting the Values

This part will be included to visualize the value calculated. It’s a bit more appealing than simply just looking at columns of a dataframe. 

First, let’s plot the calculated moving averages values alongside the closing price. 

# Plotting the closing price with the 7 and 15 day moving averages
AmazonData1y.plot(y=['Close', '7MA', '15MA'], kind='line', figsize=(14,12)) plt.title('Closing price, 7MA and 15MA', fontsize=16)
plt.xlabel('Date')
plt.ylabel('Stock price($)')
plt.show()

And then the accumulated 5 day volume alongside the closing price. 

# Plotting the closing price alongside the 5 day volume
AmazonData1y.plot(y=['Close', '5VOL'], secondary_y='5VOL', kind='line', ylabel='Stock Price ($)', figsize=(14,12)) plt.title('Closing price and 5 day accumulated volume', fontsize=16)
plt.xlabel('Date')
plt.ylabel('Volume')
plt.show()

Summary

This was a short tutorial on applying the rolling() method on a pandas dataframe using some statistics.

The goal of this article was to demonstrate some simple examples of how the rolling() method works, and I hope that it did accomplish that goal.

The rolling() method can be used for most statistics calculations, so try and explore it using other methods than those used for this article. 


Posted on Leave a comment

A Simple Guide to Get Absolute Path in Python

What’s the Absolute Path of a File?

The absolute path (i.e., full path) is just what it sounds like — it’s the exact path to, and location of, the file entered as your function’s parameter, within the hierarchical structure on your machine.

The absolute path always starts at the root directory with no regard for your current working directory (CWD).

That’s it!  So let’s get into some code.

Import Python Module to Get Absolute Path

With more than 200 core modules Python can do amazing things. 

But, this can also make it seem daunting to the beginner.  As we go through this one aspect, it should become much more clear to you how you can navigate your way around and find the specific tool for your project.

I have included some links and examples to help get you started.

We will be using the built-in os module, so we need to import that first.

import os

We could just write the code for the absolute path here and then dissect the output, but I want to give you a deeper look at what’s available to you in Python.

In order to get the absolute path in Python, we first check the output of the dir() statement on the os module:

print(dir(os))

This simple code will give us the directory for the os module.

Output:

# Output:
['DirEntry', 'F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'PathLike', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'W_OK', 'X_OK', '_AddedDllDirectory', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_check_methods', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_putenv', '_unsetenv', '_wrap_close', 'abc', 'abort', 'access', 'add_dll_directory', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'linesep', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sep', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'st', 'startfile', 'stat', 'stat_result', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sys', 'system', 'terminal_size', 'times', 'times_result', 'truncate', 'umask', 'uname_result', 'unlink', 'urandom', 'utime', 'waitpid', 'walk', 'write']

You can see that it gives us a list of ALL the sub-modules and methods available to us. The 'path' sub-module in the output is the one we use to get the absolute path next.

Next, we combine the os module and the path sub-module to get a directory of the methods and functions we have available.

print(dir(os.path)) # os + .path 

(If you are very new to Python, the hash in front of the highlighted section creates a comment)

Output:

# Output:
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_abspath_fallback', '_get_bothseps', '_getfinalpathname', '_getfinalpathname_nonstrict', '_getfullpathname', '_getvolumepathname', '_nt_readlink', '_readlink_deep', 'abspath', 'altsep', 'basename', 'commonpath', 'commonprefix', 'curdir', 'defpath', 'devnull', 'dirname', 'exists', 'expanduser', 'expandvars', 'extsep', 'genericpath', 'getatime', 'getctime', 'getmtime', 'getsize', 'isabs', 'isdir', 'isfile', 'islink', 'ismount', 'join', 'lexists', 'normcase', 'normpath', 'os', 'pardir', 'pathsep', 'realpath', 'relpath', 'samefile', 'sameopenfile', 'samestat', 'sep', 'split', 'splitdrive', 'splitext', 'stat', 'supports_unicode_filenames', 'sys']

It gives us another list of Python tools, and I want to highlight the string name abspath.  Can you see how we are building the code as we go?

💡 Hint: os + .path + .abspath

If you want more information on any one of these tools for the os module you can find it HERE.

Now, let’s get to the absolute path

Using the abspath() Function

💡 To get the absolute path of a filename in Python, use the os.path.abspath(filename) function call.

I have included all of the code here with the filename entered as the parameter in the abspath() method.

import os
os.path.abspath('Demo_abspath') # Enter file name as a string

For a comprehensive tutorial on string data types, check out this video:

 Output for this code:

'C:\\Users\\tberr\\FinxterProjects1\\Demo_abspath’

As we can see, this returns the Absolute Path for the current directory in the Jupyter Notebook that I’m using to write and test my code.  It is returned as a string data type.

  • 'C:\\Users\\tberr\\FinxterProjects1\\Demo_abspath'

I’m on a Windows machine and here we have the root directory.

  • 'C:\\Users\\tberr\\FinxterProjects1\\Demo_abspath'

Users, then my username are the next two steps.

  • 'C:\\Users\\tberr\\FinxterProjects1\\Demo_abspath'

The folder in my Jupyter notebook that the file is in.

  • 'C:\\Users\\tberr\\FinxterProjects1\\Demo_abspath'

And finally,the file name entered into the function.

Python Absolute Path vs Relative Path

Now that you understand a bit about absolute path in Python, we should take a look at the relative path, which does take the CWD (current working directory) into consideration.

First let’s get the CWD.

print(os.getcwd())

Output:

'C:\Users\tberr\FinxterProjects1'

We get everything except the file itself, which in this simple example is the relative path.

print(os.path.relpath('Demo_abspath'))

Output:

'Demo_abspath'

So, why not just use the absolute path?  As I’ve said, this is a very simple example.  When we get into deeply nested directories, the absolute path can get very complicated.

This is where the relative path becomes very useful (and can save you some typing!).

Summary

Use the os.path.abspath() function to get the absolute path without regard to the cwd.

Use os.path.relpath() function to get the relative path to the file with regard to the cwd.

I hope this article was helpful and gave you a beginners introduction to abspath() and the os module in Python.  I was hooked on Python my first day.  So maybe this will inspire you to dig deeper and explore all the amazing things Python can do –  and you’ll be hooked too!


Posted on Leave a comment

¿Cómo filtrar una lista en Python?

¿Cómo se puede filtrar una lista en Python utilizando una condición arbitraria? La forma más pitónica y más eficiente es utilizar la comprensión de lista [x for x in list if condition] para filtrar todos los elementos de una lista.

Filtrar con comprensión de lista

La forma más pitónica de filtrar una lista, en mi opinión, es la declaración de comprensión de lista [x for x in list if condition]. Puedes sustituir la condición por cualquier función de x que quieras utilizar como condición de filtrado.

Por ejemplo, si quieres filtrar todos los elementos menores de, digamos, 10, utilizarías la declaración de comprensión de lista [x for x in list if x<10] para crear una nueva lista con todos los elementos de la lista menores de 10.

Aquí hay tres ejemplos de filtrado de una lista:

  • Obtener elementos menores de ocho: [x for x in lst if x<8].
  • Obtener elementos pares: [x for x in lst if x%2==0].
  • Obtener elementos impares: [x for x in lst if x%2].
lst = [8, 2, 6, 4, 3, 1] # Filter all elements <8
small = [x for x in lst if x<8]
print(small) # Filter all even elements
even = [x for x in lst if x%2==0]
print(even) # Filter all odd elements
odd = [x for x in lst if x%2]
print(odd)

El resultado es:

# Elements <8
[2, 6, 4, 3, 1] # Even Elements
[8, 2, 6, 4] # Odd Elements
[3, 1]

Esta es la forma más eficiente de filtrar una lista y también la más pitónica. Pero si buscas alternativas, sigue leyendo porque te explicaré todos y cada uno de los matices del filtrado de listas en Python en esta completa guía.

Filtrar una lista en Python con filter()

La función filter(function, iterable) toma como entrada una función que recibe un argumento (un elemento de lista) y devuelve un valor booleano sobre si este elemento de lista debe pasar el filtro. Todos los elementos que pasan el filtro se devuelven como un nuevo objeto iterable (un objeto filter).

Puedes emplear una declaración de función lambda para crear la función justo donde la pasas como argumento. La sintaxis de la función lambda es lambda x: expression y significa que utilizas x como argumento de entrada y devuelves expression como resultado (que puede o no utilizar x para decidir sobre el valor de retorno). Para más información, consulta mi artículo detallado del blog sobre la función lambda.

lst = [8, 2, 6, 4, 3, 1] # Filter all elements <8
small = filter(lambda x: x<8, lst)
print(list(small)) # Filter all even elements
even = filter(lambda x: x%2==0, lst)
print(list(even)) # Filter all odd elements
odd = filter(lambda x: x%2, lst)
print(list(odd))

El resultado es:

# Elements <8
[2, 6, 4, 3, 1] # Even Elements
[8, 2, 6, 4] # Odd Elements
[3, 1]

La función filter() devuelve un objeto filter que es iterable. Para convertirlo en una lista, se utiliza el constructor list(...).

Artículo relacionado:

Filtrar una lista en Python con map()

Sólo añado esta opción porque hay gente que aún intenta utilizar la función map() para filtrar elementos de una lista. Está claro que esta es la forma incorrecta de hacerlo. La razón es que la función map() sólo le permite transformar cada elemento de una lista en un nuevo elemento. Pero seguirás teniendo el mismo número de elementos en la lista. Por lo tanto, necesitas un paso adicional para filtrar todos los elementos (por ejemplo, utilizando comprensión de lista). Pero si estás dispuesto a dar este paso adicional, también podrías haber utilizado una comprensión de lista para filtrar desde un principio.

A esto me refiero:

lst = [8, 2, 6, 4, 3, 1] # Filter all elements <8
small = list(map(lambda x: x if x<8 else None, lst))
small = [x for x in small if x!=None]
print(small) # Filter all even elements
even = list(map(lambda x: x if x%2==0 else None, lst))
even = [x for x in even if x!=None]
print(even) # Filter all odd elements
odd = list(map(lambda x: x if x%2 else None, lst))
odd = [x for x in odd if x!=None]
print(odd)

El resultado vuelve a ser el mismo:

[2, 6, 4, 3, 1]
[8, 2, 6, 4]
[3, 1]

Pero el método para obtener este resultado es claramente ineficaz y no resulta legible.

Artículos Relacionados:

Filtrar una lista en Python con generador

Una expresión generadora crea un iterador sobre una secuencia de valores. Funciona igual que la comprensión de listas, pero sin crear un tipo de datos de lista. Esto es un poco más eficiente.

Puedes utilizar expresiones generadoras en cualquier llamada a una función que requiera un iterable como entrada. Por ejemplo, si quieres calcular la suma de todos los valores que cumplen una determinada condición en una lista.

  • Primero, determina el iterable de valores que cumplen una determinada condición.
  • En segundo lugar, suma todos esos valores con la función sum().

Aquí tienes un ejemplo de código que muestra cómo sumar todos los valores enteros de una lista (e ignorar el resto) utilizando una expresión generadora para filtrar los no enteros:

lst = [6, 8, 2, 8, 'Alice']
print(sum(x for x in lst if type(x) == int))
# 24

Comprueba el type(x) de cada elemento y compáralo con el tipo integer. Esta comparación devuelve True si el elemento es, de hecho, de tipo entero.

Filtrar una lista en Python con condición

Puedes definir cualquier condición compleja sobre un elemento de la lista para decidir si se filtra o no. Sólo tienes que crear tu propia función (por ejemplo, condition(x)) que tome un elemento de la lista como entrada y devuelva el valor booleano True si se cumple la condición o False en caso contrario.

Aquí tienes un ejemplo de código:

def condition(x): '''Define your arbitrarily complicated condition here''' return x<10 and x>0 lst = [11, 14, 3, 0, -1, -3] # Filter out all elements that do
# not meet condition
filtered = [x for x in lst if condition(x)]
print(filtered)
# [3]

Todos los elementos que son menores que 10 y mayores que 0 se incluyen en la lista filtrada. Por lo tanto, solo queda el elemento 3.

Filtrar una lista en Python con condiciones múltiples

Lo mismo se aplica si quieres combinar varias condiciones. Digamos que deseas filtrar todos los elementos x>9 y x<1. Estas son dos condiciones (simples). Puedes definir cualquier condición compleja sobre un elemento de la lista para decidir si se filtra o no. Sólo tienes que crear tu propia función (por ejemplo, condition(x)) que tome un elemento de la lista como entrada y devuelva el valor booleano True si se cumple la condición o False en caso contrario.

Aquí está el mismo ejemplo de código que antes:

def condition(x): '''Define your arbitrarily complicated condition here''' return x<10 and x>0 lst = [11, 14, 3, 0, -1, -3] # Filter out all elements that do
# not meet condition
filtered = [x for x in lst if condition(x)]
print(filtered)
# [3]

Todos los elementos que son menores que 10 y mayores que 0 se incluyen en la lista filtrada. Por lo tanto, solo queda el elemento 3.

Filtrar una lista en Python con Regex

Problema: Dada una lista de cadenas. ¿Cómo puedes filtrar aquellas que coincidan con una determinada expresión regular?

Ejemplo: Digamos que tienes una lista ['Alice', 'Anne', 'Ann', 'Tom'] y quieres descartar los elementos que no cumplen el patrón regex 'A.*e'. Esperas que la lista filtrada sea ['Alice', 'Anne'].

Solución: Utiliza el marco de filtrado de comprensión de lista [x for x in list if match] para filtrar todos los elementos que no concuerden con una cadena determinada.

import re # Define the list and the regex pattern to match
customers = ['Alice', 'Anne', 'Ann', 'Tom']
pattern = 'A.*e' # Filter out all elements that match the pattern
filtered = [x for x in customers if re.match(pattern, x)] print(filtered)
# ['Alice', 'Anne']

Utiliza el método re.match() que devuelve un objeto match si hay una coincidencia o None en caso contrario. En Python, cualquier objeto match se evalúa como True (si es necesario), salvo algunas excepciones (por ejemplo, None, 0, 0.0, etc.). Si necesitas refrescar tus conocimientos básicos sobre la función re.match(), echa un vistazo a mi detallado artículo del blog que te lleva paso a paso por esta potente herramienta de Python.

Artículos Relacionados:

¿Cómo filtrar una lista de listas en Python?

Respuesta corta: Para filtrar una lista de listas por una condición en las listas internas, utiliza la declaración de comprensión de lista [x for x in list if condition(x)] y sustituye condition(x) por tu condición de filtrado que devuelva True para incluir la lista interna x, y False en caso contrario.

Las listas pertenecen a las estructuras de datos más importantes de Python: ¡todo programador experto las conoce a fondo! Sorprendentemente, incluso los programadores intermedios no saben la mejor manera de filtrar una lista, y mucho menos una lista de listas en Python. ¡Este tutorial te muestra cómo hacer esto último!

Problema: Digamos que tienes una lista de listas. Quieres filtrar la lista de listas para que sólo queden las listas interiores que satisfagan una determinada condición. La condición es una función de la lista interna, como la media o la suma de los elementos de la lista interna.

Ejemplo: Dada la siguiente lista de listas con mediciones semanales de temperatura, y una lista interior semanal.

# Measurements of a temperature sensor (7 per week)
temperature = [[10, 8, 9, 12, 13, 7, 8], # week 1 [9, 9, 5, 6, 6, 9, 11], # week 2 [10, 8, 8, 5, 6, 3, 1]] # week 3

¿Cómo filtrar las semanas más frías con un valor medio de temperatura <8? Este es el resultado que deseas:

print(cold_weeks)
# [[9, 9, 5, 6, 6, 9, 11], [10, 8, 8, 5, 6, 3, 1]]

Hay dos métodos semánticamente equivalentes para lograr esto: la comprensión de listas y la función map().

Artículos Relacionados:

Filtrar una lista de cadenas en Python

Problema: Dada una lista de cadenas y una cadena de búsqueda. ¿Cómo puedes filtrar las que contienen la cadena de búsqueda?

Ejemplo: Digamos que tienes una lista ['Alice', 'Anne', 'Ann', 'Tom'] y quieres obtener todos los elementos que contienen la subcadena 'An'. Esperas que la lista filtrada sea ['Anne', 'Ann'].

Solución: Utiliza el esquema de filtrado de comprensión de lista [x for x in list if condition] para descartar todos los elementos que no contengan otra cadena.

import re # Define the list
customers = ['Alice', 'Anne', 'Ann', 'Tom'] # Filter out all elements that contain 'An'
filtered = [x for x in customers if 'An' in x] print(filtered)
# ['Anne', 'Ann']

Utiliza la operación básica de pertenencia a la cadena “in” para comprobar si un elemento pasa el filtro o no.

Filtrar una lista en Python para eliminar cadenas vacías

Problema: Dada una lista de cadenas. ¿Cómo puedes eliminar todas las cadenas vacías?

Ejemplo: Digamos que tienes una lista ['Alice', 'Anne', '', 'Ann', '', 'Tom'] y quieres obtener una nueva lista ['Alice', 'Anne', 'Ann', 'Tom'] de cadenas no vacías.

Solución: Utiliza el esquema de filtrado de comprensión de lista [x for x in list if x] para filtrar todas las cadenas vacías.

import re # Define the list
customers = ['Alice', 'Anne', '', 'Ann', '', 'Tom'] # Filter out all elements that contain 'An'
filtered = [x for x in customers if x] print(filtered)
# ['Alice', 'Anne', 'Ann', 'Tom']

Aprovechas la característica de que Python asigna el valor booleano False a la cadena vacía ''.

Filtrar una lista en Python con endswith() y startswith()

Problema: Dada una lista de cadenas. ¿Cómo puedes filtrar aquellas que comienzan con otra cadena (o terminan con otra cadena)? En otras palabras, deseas obtener todas las cadenas que tienen otra cadena como prefijo o sufijo.

Ejemplo: Digamos que tienes una lista ['Alice', 'Anne', 'Ann', 'Tom'] y quieres obtener todos los elementos que empiezan por 'An'. Esperas que la lista filtrada sea ['Anne', 'Ann'].

Solución: Utiliza el esquema de filtrado de comprensión de lista [x for x in list if x.startswith('An')] para filtrar todos los elementos que empiezan por 'An'. Si deseas comprobar cadenas que terminan con otra cadena, puedes usar la función str.endswith() en su lugar.

import re # Define the list
customers = ['Alice', 'Anne', 'Ann', 'Tom'] # Filter out all elements that start with 'An'
filtered = [x for x in customers if x.startswith('An')] print(filtered)
# ['Anne', 'Ann'] # Filter out all elements that end with 'e'
filtered = [x for x in customers if x.endswith('e')] print(filtered)
# ['Alice', 'Anne']

Utiliza las funciones startswith() y endswith() como condiciones de filtrado.

Filtrar una lista en Python con lambda

La función filter(function, iterable) toma como argumento una función de filtrado que toma un elemento de la lista como entrada y devuelve el valor booleano True si se cumple la condición o False en caso contrario. Esta función decide si un elemento se incluye en la lista filtrada o no.

Para definir esta función, puedes utilizar la palabra clave lambda. La función lambda es una función anónima: piensa en ella como una función desechable que sólo se necesita como argumento y para nada más en el código.

Este es el código que muestra cómo utilizar la función lambda para filtrar una lista y devolver sólo los valores impares de la lista:

# Create the list
lst = [1, 2, 3, 4] # Get all odd values
print(list(filter(lambda x: x%2, lst)))
# [1, 3]

La función lambda x: x%2 toma un argumento x -el elemento que hay que comprobar con el filtro- y devuelve el resultado de la expresión x%2. Esta expresión de módulo devuelve 1 si el entero es impar y 0 si es par. Por lo tanto, todos los elementos impares pasan la prueba.

Filtrar una lista en Python con otra lista

Problema: Dada una lista de valores lst y una lista de booleanos filter. ¿Cómo filtrar la primera lista usando la segunda lista? Más específicamente, quieres crear una nueva lista que incluya el i-ésimo elemento de lst si el i-ésimo elemento de filter es True.

Ejemplo: Aquí hay dos listas de ejemplo:

lst = [1, 2, 3, 4]
filter_lst = [True, False, False, True]

Y deseas obtener esta lista:

[1, 4]

Solución: Utiliza una simple declaración de comprensión de lista [lst[i] for i in range(len(lst)) if filter_lst[i]] que compruebe para cada índice i si el valor booleano del filtro correspondiente es True. En este caso, se añade el elemento en el índice i en lst a la nueva lista filtrada. Este es el código:

lst = [1, 2, 3, 4]
filter_lst = [True, False, False, True] res = [lst[i] for i in range(len(lst)) if filter_lst[i]]
print(res)
# [1, 4]

La lista booleana sirve de “máscara” que determina qué elemento pasa el filtro y cuál no.

Una alternativa es usar la función zip() para iterar sobre múltiples secuencias sin necesidad de tocar ningún índice:

lst = [1, 2, 3, 4]
filter_lst = [True, False, False, True] res = [x for (x, boo) in zip(lst, filter_lst) if boo]
print(res)
# [1, 4]

¿Necesitas mejorar tu comprensión de zip()? ¡Echa un vistazo a nuestro exhaustivo artículo del blog!

Filtrar una lista en Python con índices

Problema: dada una lista de valores y una lista de índices. ¿Cómo filtrar todos los elementos con índices en la segunda lista?

Ejemplo: Tienes la lista ['Alice', 'Bob', 'Ann', 'Frank'] y los índices [1, 2]. Lo que estás buscando es la lista filtrada ['Bob', 'Ann'].

Solución: Recorre todos los índices de la segunda lista e incluye los elementos correspondientes de la lista mediante una simple sentencia de comprensión de lista [lst[i] for i in indices].

lst = ['Alice', 'Bob', 'Ann', 'Frank']
indices = [1, 2] res = [lst[i] for i in indices]
print(res)
# ['Bob', 'Ann']

Solo dos elementos con índices 1 y 2 pasan el filtro.

Filtrar en Python una lista de diccionarios

Problema: Dada una lista de diccionarios. Cada diccionario consta de uno o más pares (clave, valor). Quieres filtrarlos por el valor de una determinada clave del diccionario (atributo). ¿Cómo puedes lograrlo?

Ejemplo minimalista: Considera el siguiente ejemplo en el que tienes tres diccionarios de usuarios con las claves username, age y play_time. Quieres obtener una lista de todos los usuarios que cumplen una determinada condición, como play_time > 100. Esto es lo que intentas conseguir:

users = [{'username': 'alice', 'age': 23, 'play_time': 101}, {'username': 'bob', 'age': 31, 'play_time': 88}, {'username': 'ann', 'age': 25, 'play_time': 121},] superplayers = # Filtering Magic Here print(superplayers)

El resultado debería parecerse a esto, donde el atributo play_time determina si un diccionario pasa el filtro o no, es decir, play_time>100:

[{'username': 'alice', 'age': 23, 'play_time': 101},
{'username': 'ann', 'age': 25, 'play_time': 121}]

Solución: Utiliza la comprensión de lista [x for x in lst if condition(x)] para crear una nueva lista de diccionarios que cumplan la condición. Todos los diccionarios en lst que no cumplen con la condición se filtran. Puedes definir tu propia condición sobre el elemento x de la lista.

Aquí tienes el código que te muestra cómo filtrar todos los diccionarios de usuario que no cumplen la condición de haber jugado al menos 100 horas.

users = [{'username': 'alice', 'age': 23, 'play_time': 101}, {'username': 'bob', 'age': 31, 'play_time': 88}, {'username': 'ann', 'age': 25, 'play_time': 121},] superplayers = [user for user in users if user['play_time']>100] print(superplayers)

La salida es la lista filtrada de diccionarios que cumplen la condición:

[{'username': 'alice', 'age': 23, 'play_time': 101},
{'username': 'ann', 'age': 25, 'play_time': 121}]

Artículos relacionados en el blog de Finxter:

Filtrar en Python elementos únicos en una lista – eliminar duplicados

¿Cómo eliminar todos los duplicados de un valor dado en la lista?

El enfoque ingenuo es revisar cada elemento y verificar si este elemento ya existe en la lista. Si es así, retíralo. Sin embargo, esto requiere unas cuantas líneas de código.

Una forma más corta y concisa es crear un diccionario a partir de los elementos de la lista. Cada elemento de la lista se convierte en una nueva clave del diccionario. Todos los elementos que aparezcan varias veces se asignarán a la misma clave. El diccionario sólo contiene claves únicas, no puede haber varias claves iguales.

Como valores del diccionario, simplemente tomas valores ficticios (por defecto).

Artículos relacionados del blog:

Luego, sólo tienes que volver a convertir el diccionario en una lista, desechando los valores ficticios. Como las claves del diccionario permanecen en el mismo orden, no se pierde la información de orden de los elementos de la lista original.

Este es el código:

>>> lst = [1, 1, 1, 3, 2, 5, 5, 2]
>>> dic = dict.fromkeys(lst)
>>> dic
{1: None, 3: None, 2: None, 5: None}
>>> duplicate_free = list(dic)
>>> duplicate_free
[1, 3, 2, 5]

Filtrar en Python un rango en una lista

Filtrar todos los elementos de una lista que estén dentro del rango de valores entre los índices de start y stop dados.

lst = [3, 10, 3, 2, 5, 1, 11]
start, stop = 2, 9 filtered_lst = [x for x in lst if x>=start and x<=stop]
print(filtered_lst)
# [3, 3, 2, 5]

Empleas la condición x>=start y x<=stop para comprobar si la lista de elementos x está dentro del rango [start, stop] o no.

Filtrar en Python una lista con mayor que y menor que

Filtrar todos los elementos de una lista que sean mayores que un valor determinado y.

lst = [3, 10, 3, 2, 5, 1, 11]
y = 2 filtered_lst = [x for x in lst if x>y]
print(filtered_lst)
# [3, 10, 3, 5, 11]

Utiliza la condición x > y para comprobar si el elemento de lista x es mayor que y o no. En el primer caso, se incluye en la lista filtrada. En el último, no.

Puedes utilizar la misma idea con el operador menor que < mediante la declaración de comprensión de lista [x for x in lst if x<y].

Recuento de listas filtradas en Python

¿Cómo puedes contar elementos bajo una determinada condición en Python? Por ejemplo, ¿qué pasa si quieres contar todos los valores pares de una lista? ¿O todos los números primos? ¿O todas las cadenas que comienzan con un determinado carácter? Hay múltiples formas de conseguirlo, vamos a discutirlas una por una.

Digamos que tienes una condición para cada elemento x. Convirtámosla en una función con el nombre condition(x). Puedes definir cualquier condición que quieras, sólo tienes que ponerla en tu función. Por ejemplo, esta condición devuelve True para todos los elementos que son mayores que el entero 10:

def condition(x): return x > 10 print(condition(10))
# False print(condition(2))
# False print(condition(11))
# True

Pero también puedes definir condiciones más complicadas, como comprobar si son números primos.

Recuento de listas en Python con if

¿Cómo puedes contar los elementos de la lista SI se cumple la condición?

La respuesta es utilizar una simple expresión generadora sum(condition(x) for x in lst):

>>> def condition(x): return x>10 >>> lst = [10, 11, 42, 1, 2, 3]
>>> sum(condition(x) for x in lst)
2

El resultado indica que hay dos elementos que son más grandes que 10. Has utilizado una expresión generadora que devuelve un iterador de booleanos. Ten en cuenta que el booleano True se representa con el valor entero 1 y el booleano False se representa con el valor entero 0. Por eso puedes calcular simplemente la suma de todos los booleanos para obtener el número de elementos para los que se cumple la condición.

Recuento de listas en Python con mayor / menor que

Si quieres determinar el número de elementos que son mayores o menores que un valor especificado, sólo tienes que modificar la condición de este ejemplo:

>>> def condition(x): return x>10 >>> lst = [10, 11, 42, 1, 2, 3]
>>> sum(condition(x) for x in lst)
2

Por ejemplo, para encontrar el número de elementos menores que 5, usa la condición x<5 en la expresión del generador:

>>> lst = [10, 11, 42, 1, 2, 3]
>>> sum(x<5 for x in lst)
3

Recuento de listas en Python de cero / no cero

Para contar el número de ceros en una lista dada, usa la llamada al método list.count(0).

Para contar el número de no-ceros en una lista dada, debes utilizar el recuento condicional, tal y como se comentó anteriormente:

def condition(x): return x!=0 lst = [10, 11, 42, 1, 2, 0, 0, 0]
print(sum(condition(x) for x in lst))
# 5

Recuento de listas en Python con lambda + map

Una alternativa es utilizar una combinación de las funciones map y lambda.

Artículos Relacionados:

Este es el código:

>>> sum(map(lambda x: x%2==0, [1, 2, 3, 4, 5]))
2

Cuenta el número de enteros pares en la lista.

  • La función lambda devuelve un valor booleano para un elemento dado x.
  • La función map transforma cada elemento de lista en un valor booleano (1 o 0).
  • La función sum suma los “1”.

El resultado es el número de elementos para los que la condición se evalúa como True.

Filtrar una lista por longitud de cadena en Python

Dada una lista de cadenas. ¿Cómo obtener todos los elementos que tienen más de x caracteres? En otras palabras: ¿cómo filtrar una lista por longitud de cadena?

coders = ['Ann', 'Alice', 'Frank', 'Pit']
filtered = [x for x in coders if len(x)>3]
print(filtered)
# ['Alice', 'Frank']

La sentencia de comprensión de lista [x for x in coders if len(x)>3] filtra todas las cadenas que tienen más de tres caracteres.

Filtrar los elementos None de una lista en Python

¿Cómo eliminar todos los valores None de una lista? Por ejemplo, tienes la lista ['Alice', None, 'Ann', None, None, 'Bob'] y quieres la lista ['Alice', 'Ann', 'Bob']. ¿Cómo puedes lograrlo?

coders = ['Alice', None, 'Ann', None, None, 'Bob']
filtered = [x for x in coders if x]
print(filtered)
# ['Alice', 'Ann', 'Bob']

En Python, cada elemento tiene un valor booleano asociado, por lo que puede usar cualquier objeto Python como condición. El valor None está asociado al valor booleano False.

Filtrar en Python una lista JSON

Problema: Digamos que tienes un objeto de lista JSON. Quieres filtrar la lista en función de un atributo. ¿Cómo lograrlo?

Ejemplo: Dada la siguiente lista JSON.

json = [ { "user": "alice", "type": "free" }, { "user": "ann", "type": "paid" }, { "user": "bob", "type": "paid" }
]

Quieres encontrar todos los usuarios que tienen un tipo de cuenta 'paid'.

[ { "user": "ann", "type": "paid" }, { "user": "bob", "type": "paid" }
]

Solución: Utiliza la comprensión de lista [x for x in json if x['type']=='paid'] para filtrar la lista y obtener una nueva lista json con los objetos que pasan el filtro.

json = [ { "user": "alice", "type": "free" }, { "user": "ann", "type": "paid" }, { "user": "bob", "type": "paid" }
] filtered = [x for x in json if x['type']=='paid']
print(filtered)
# [{'user': 'ann', 'type': 'paid'},
# {'user': 'bob', 'type': 'paid'}]

Sólo Ann y Bob tienen una cuenta de pago y pasan el test x['type']=='paid'.

Filtrar en una línea una lista en Python

¿Quieres filtrar tu lista por una condición dada en una línea de código? Utiliza la declaración de comprensión de lista [x for x in list if condition], en la que la parte de la condición puede ser cualquier expresión booleana sobre x. Esta sentencia de una sola línea devuelve un nuevo objeto de lista con todos los elementos que pasan el “test” de filtrado.

Aquí hay un ejemplo:

lst = ['Alice', 3, 5, 'Bob', 10] # ONE-LINER:
f = [x for x in lst if type(x)==str] print(f)
# ['Alice', 'Bob']

La sentencia filtra todos los elementos de la lista y comprueba si son de tipo cadena. Si lo son, pasan la prueba y se incluyen en la nueva lista.

Si te gustan las sentencias de una sola línea, te encantará mi libro Python One-Liner (NoStarch Press 2020). Te muestra exactamente cómo escribir código pitónico y comprimir tu pensamiento y codificación a la forma más minimalista.

Cómo filtrar una lista en Python de forma eficiente: filter() frente a comprensión de lista

[Spoiler] ¿Qué es más rápido para filtrar una lista: filter() o comprensión de lista? Para listas grandes con un millón de elementos, filtrar listas con comprensión de lista es un 40% más rápido que el método incorporado filter().

Para responder a esta pregunta, he escrito un breve script que comprueba el rendimiento en tiempo de ejecución del filtrado de grandes listas de tamaños crecientes mediante los métodos filter() y de comprensión de lista.

Mi tesis es que el método de comprensión de lista debería ser ligeramente más rápido para los tamaños de lista más grandes, porque aprovecha la eficiente implementación de cPython de la comprensión de lista y no necesita llamar a una función adicional.

Utilicé mi portátil con un procesador Intel(R) Core(TM) i7-8565U a 1,8 GHz (con Turbo Boost hasta 4,6 GHz) y 8 GB de RAM.

A continuación, creé 100 listas con ambos métodos con tamaños que iban de 10.000 elementos a 1.000.000 de elementos. Como elementos, simplemente incrementé los números enteros en uno a partir de 0.

Este es el código que he utilizado para medir y representar los resultados: ¿qué método es más rápido: filter() o una comprensión de lista?

import time # Compare runtime of both methods
list_sizes = [i * 10000 for i in range(100)]
filter_runtimes = []
list_comp_runtimes = [] for size in list_sizes: lst = list(range(size)) # Get time stamps time_0 = time.time() list(filter(lambda x: x%2, lst)) time_1 = time.time() [x for x in lst if x%2] time_2 = time.time() # Calculate runtimes filter_runtimes.append((size, time_1 - time_0)) list_comp_runtimes.append((size, time_2 - time_1)) # Plot everything
import matplotlib.pyplot as plt
import numpy as np f_r = np.array(filter_runtimes)
l_r = np.array(list_comp_runtimes) print(filter_runtimes)
print(list_comp_runtimes) plt.plot(f_r[:,0], f_r[:,1], label='filter()')
plt.plot(l_r[:,0], l_r[:,1], label='list comprehension') plt.xlabel('list size')
plt.ylabel('runtime (seconds)') plt.legend()
plt.savefig('filter_list_comp.jpg')
plt.show()

El código compara los tiempos de ejecución de la función filter() y la variante de comprensión de lista para filtrar una lista. Ten en cuenta que la función filter() devuelve un objeto filter, por lo que debes convertirlo en lista mediante el constructor list().

Aquí está el gráfico resultante que compara el tiempo de ejecución de los dos métodos. En el eje x, puedes ver el tamaño de la lista de 0 a 1.000.000 de elementos. En el eje y, puedes ver el tiempo de ejecución que se necesita en segundos para ejecutar las respectivas funciones.

El gráfico resultante muestra que ambos métodos son extremadamente rápidos para unas decenas de miles de elementos. De hecho, son tan rápidos que la función time() del módulo time no puede capturar el tiempo transcurrido.

Pero a medida que aumentas el tamaño de las listas a cientos de miles de elementos, el método de comprensión de lista empieza a ganar:

Para listas grandes con un millón de elementos, filtrar listas con comprensión de lista es un 40% más rápido que el método incorporado filter().

El motivo es la eficiente implementación de la declaración de comprensión de lista. Sin embargo, es interesante la siguiente observación. Si no conviertes la función filter en una lista, obtendrás el siguiente resultado:

De repente, la función filter() tiene un tiempo de ejecución constante cercano a 0 segundos, independientemente del número de elementos que haya en la lista. ¿Por qué está pasando esto?

La explicación es simple: la función filter devuelve un iterador, no una lista. El iterador no necesita calcular un solo elemento hasta que se le pida que calcule el elemento next(). Por lo tanto, la función filter() calcula el siguiente elemento sólo si es necesario hacerlo. Sólo si lo conviertes en lista, tendrá que computar todos los valores. De lo contrario, no calcula de antemano ningún valor.

A dónde ir desde aquí

Este tutorial te ha mostrado los entresijos de la función filter() en Python y la ha comparado con el método de comprensión de lista para filtrar: [x for x in list if condition]. Has podido comprobar que esto último no sólo es más legible y más pitónico, sino también más rápido. ¡Así que adopta el enfoque de comprensión de lista para filtrar listas!

Si te encanta programar y quieres hacerlo a tiempo completo desde la comodidad de tu propia casa, estás de suerte:

He creado un seminario web gratuito que muestra cómo comencé como freelancer de Python después de mis estudios de informática trabajando desde casa (y viendo crecer a mis hijos) mientras ganaba un ingreso a tiempo completo trabajando solo horas a tiempo parcial.

Seminario web: ¿Cómo convertirse en un freelance de Python de seis cifras?

Únete a 21.419 codificadores ambiciosos de Python. Es divertido!