Posted on Leave a comment

How to Extract Emails from any Website using Python?

5/5 – (1 vote)

The article begins by formulating a problem regarding how to extract emails from any website using Python, gives you an overview of solutions, and then goes into great detail about each solution for beginners.

At the end of this article, you will know the results of comparing methods of extracting emails from a website. Continue reading to find out the answers.

You may want to read out the disclaimer on web scraping here:

⚖ Recommended Tutorial: Is Web Scraping Legal?

You can find the full code of both web scrapers on our GitHub here. 👈

Problem Formulation

Marketers build email lists to generate leads.

Statistics show that 33% of marketers send weekly emails, and 26% send emails multiple times per month. An email list is a fantastic tool for both company and job searching.

For instance, to find out about employment openings, you can hunt up an employee’s email address of your desired company.

However, manually locating, copying, and pasting emails into a CSV file takes time, costs money, and is prone to error. There are a lot of online tutorials for building email extraction bots.

When attempting to extract email from a website, these bots experience some difficulty. The issues include the lengthy data extraction times and the occurrence of unexpected errors.

Then, how can you obtain an email address from a company website in the most efficient manner? How can we use robust programming Python to extract data?

Method Summary

This post will provide two ways to extract emails from websites. They are referred to as Direct Email Extraction and Indirect Email Extraction, respectively.

💡 Our Python code will search for emails on the target page of a given company or specific website when using the direct email extraction method.

For instance, when a user enters “www.scrapingbee.com”  into their screen, our Python email extractor bot scrapes the website’s URLs. Then it uses a regex library to look for emails before saving them in a CSV file.

💡 The second method, the indirect email extraction method, leverages Google.com’s Search Engine Result Page (SERP) to extract email addresses instead of using a specific website.

For instance, a user may type “scrapingbee.com” as the website name. The email extractor bot will search on this term and return the results to the system. The bot then stores the email addresses extracted using regex into a CSV file from these search results.

👉 In the next section, you will learn more about these methods in more detail.

These two techniques are excellent email list-building tools.

The main issue with alternative email extraction techniques posted online, as was already said, is that they extract hundreds of irrelevant website URLs that don’t contain emails. The programming running through these approaches takes several hours.

Discover our two excellent methods by continuing reading. 

Solution

Method 1  Direct Email Extraction

This method will outline the step-by-step process for obtaining an email address from a particular website.

Step 1: Install Libraries.

Using the pip command, install the following Python libraries:

  1. You can use Regular Expression (re) to match an email address’s format.
  2. You can use the request module to send HTTP requests.
  3. bs4 is a beautiful soup for web page extraction.
  4. The deque module of the collections package allows data to be stored in containers.
  5. The urlsplit module in the urlib package splits a URL into four parts.
  6. The emails can be saved in a DataFrame for future processing using the pandas module.
  7. You can use tld library to acquire relevant emails.
pip install re
pip install request
pip install bs4
pip install python-collections
pip install urlib
pip install pandas
pip install tld

Step 2: Import Libraries.

Import the libraries as shown below:

import re
import requests
from bs4 import BeautifulSoup
from collections import deque
from urllib.parse import urlsplit
import pandas as pd
from tld import get_fld

Step 3: Create User Input.

Ask the user to enter the desired website for extracting emails with the input() function and store them in the variable user_url:

user_url = input("Enter the website url to extract emails: ")
if "https://" in user_url: user_url = user_url
else: user_url = "https://"+ user_url

Step 4: Set up variables.

Before we start writing the code, let’s define some variables.

Create two variables using the command below to store the URLs of scraped and un-scraped websites:

unscraped_url = deque([user_url])
scraped_url = set()

You can save the URLs of websites that are not scraped using the deque container. Additionally, the URLs of the sites that were scraped are saved in a set data format.

As seen below, the variable list_emails contains the retrieved emails:

list_emails = set()

Utilizing a set data type is primarily intended to eliminate duplicate emails and keep just unique emails.

Let us proceed to the next step of our main program to extract email from a website.

Step 5: Adding Urls for Content Extraction.

Web page URLs are transferred from the variable unscraped_url to scrapped_url to begin the process of extracting content from the user-entered URLs.

while len(unscraped_url): url = unscraped_url.popleft() scraped_url.add(url)

The popleft() method removes the web page URLs from the left side of the deque container and saves them in the url variable. 

Then the url is stored in scraped_url using the add() method.

Step 6: Splitting of URLs and merging them with base URL.

The website contains relative links that you cannot access directly.

Therefore, we must merge the relative links with the base URL. We need the urlsplit() function to do this.

parts = urlsplit(url)

Create a parts variable to segment the URL as shown below.

SplitResult(scheme='https', netloc='www.scrapingbee.com', path='/', query='', fragment='')

As an example shown above, the URL https://www.scrapingbee.com/  is divided into scheme, netloc, path, and other elements.

The split result’s netloc variable contains the website’s name. Continue reading to learn how this procedure benefits our programming.

base_url = "{0.scheme}://{0.netloc}".format(parts)

Next, we create the basic URL by merging the scheme and netloc.

Base URL means the main website’s URL is what you type into the browser’s address bar when you input it.

If the user enters relative URLs when requested by the program, we must then convert them back to base URLs. We can accomplish this by using the command:

if '/' in parts.path: part = url.rfind("/") path = url[0:part + 1]
else: path = url

Let us understand how each line of the above command works. 

Suppose the user enters the following URL:

This URL is a relative link, and the above set of commands will convert it to a base URL (https://www.scrapingbee.com). Let’s see how it works.

If the condition finds that there is a “/” in the path of the URL, then the command finds where is the last slash ”/” is located using the rfind() method. The “/” is located at the 27th position. 

Next line of code stores the URL from 0 to 27 + 1, i.e., 28th item position, i.e., https://www.scrapingbee.com/.  Thus, it converts to the base URL.

In the last command, If there is no relative link from the URL, it is the same as the base URL. That links are in the path variable.

The following command prints the URLs for which the program is scraping

print("Searching for Emails in %s" % url)

 Step 7: Extracting Emails from the URLs.

The HTML Get Request Command access the user-entered website.

response = requests.get(url)

Then, extract all email addresses from the response variable using a regular expression, and update them to the list_emails set.

new_emails = ((re.findall(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", response.text, re.I)))
list_emails.update(new_emails)

The regression is built to match the email address syntax displayed in the new emails variable. The regression format pulls the email address from the website URL’s content with the response.text method.  And re.I flag method ignores the font case. The list_emails set is updated with new emails.

The next is to find all of the website’s URL links and extract them in order to retrieve the email addresses that are currently available. You can utilize a powerful, beautiful soup module to carry out this procedure.

soup = BeautifulSoup(response.text, 'lxml')

A beautiful soup function parses the HTML document of the webpage the user has entered, as shown in the above command.

You can find out how many emails have been extracted with the following command.

print("Email Extracted: " + str(len(list_emails)))

The URLs related to the website can be found with “a href” anchor tags. 

for tag in soup.find_all("a"): if "href" in tag.attrs: weblink = tag.attrs["href"] else: weblink = ""

Beautiful soups find all the anchor tag “a” from the website.

Then if href is in the attribute of tags, then soup fetches the URL in the weblink variable else it is an empty string.

if weblink.startswith('/'): weblink = base_url + weblink
elif not weblink.startswith('https'): weblink = path + weblink

The href contains just a link to a particular page beginning with “/,” the page name, and no base URL.

For instance, you can see the following URL on the scraping bee website:

  • <a href="/#pricing" class="block hover:underline">Pricing</a>
  • <a href="/#faq" class="block hover:underline">FAQ</a>
  • <a href="/documentation" class="text-white hover:underline">Documentation</a>

Thus, the above command combines the extracted href link and the base URL.

For example, in the case of pricing, the weblink variable is as  follows:

Weblink = "https://www.scrapingbee.com/#pricing"

In some cases, href doesn’t start with either “/” or “https”; in that case, the command combines the path with that link.

For example, href is like below:

<a href="mailto:support@scrapingbee.com?subject=Enterprise plan&amp;body=Hi there, I'd like to discuss the Enterprise plan." class="btn btn-sm btn-black-o w-full mt-13">1-click quote</a>

Now let’s complete the code with the following command:

if not weblink in unscraped_url and not weblink in scraped_url: unscraped_url.append(weblink) print(list_emails)

The above command appends URLs not scraped to the unscraped url variable. To view the results, print the list_emails.

Run the program.

What if the program doesn’t work?

Are you getting errors or exceptions of Missing Schema, Connection Error, or Invalid URL?

Some of the websites you aren’t able to access for some reason. 

Don’t worry! Let’s see how to hit these errors.

Use the Try Exception command to bypass the errors as shown below:

try: response = requests.get(url)
except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError, requests.exceptions.InvalidURL): continue

Insert the command before the email regex command. Precisely, place this command above the new_emails variable.

Run the program now.

Did the program work?

Does it keep on running for several hours and not complete it?

The program searches and extracts all the URLs from the given website. Also, It is extracting links from other domain name websites. For example, the Scraping Bee website has URLs such as  https://seekwell.io/., https://codesubmit.io/, and more.

A well-built website has up to 100 links for a single page of a website. So the program will take several hours to extract the links.

Sorry about it. You have to face this issue to get your target emails.

Bye Bye, the article ends here……..

No, I am just joking!

Fret Not! I will give you the best solution in the next step.

Step 8: Fix the code problems.

Here is the solution code for you:

if base_url in weblink: # code1 if ("contact" in weblink or "Contact" in weblink or "About" in weblink or "about" in weblink or 'CONTACT' in weblink or 'ABOUT' in weblink or 'contact-us' in weblink): #code2 if not weblink in unscraped_url and not weblink in scraped_url: unscraped_url.append(weblink)

First off, apply code 1, which specifies that you only include base URL websites from links weblinks to prevent scraping other domain name websites from a specific website.

Since the majority of emails are provided on the contact us and about web pages, only those links from those sites will be extracted (Refer to code 2). Other pages are not considered.

Finally, unscraped URLs are added to the unscrapped_url variable.

Step 9: Exporting the Email Address to CSV file.

Finally, we can save the email address in a CSV file (email2.csv) through data frame pandas.

url_name = "{0.netloc}".format(parts)
col = "List of Emails " + url_name
df = pd.DataFrame(list_emails, columns=[col])
s = get_fld(base_url)
df = df[df[col].str.contains(s) == True]
df.to_csv('email2.csv', index=False)

We use get_fld to save emails belonging to the first level domain name of the base URL. The s variable contains the first level domain of the base URL. For example, the first level domain is scrapingbee.com.

We include only emails ending with the website’s first-level domain name in the data frame. Other domain names that do not belong to the base URL are ignored. Finally, the data frame transfers emails to a CSV file.

As previously stated, a web admin can maintain up to 100 links per page.

Because there are more than 30 hyperlinks on each page for a normal website, it will still take some time to finish the program. If you believe that the software has extracted enough email, you may manually halt it using try except KeyboardInterrupt  and raise SystemExit command as shown below:

try:
while len(unscraped_url):
… if base_url in weblink: if ("contact" in weblink or "Contact" in weblink or "About" in weblink or "about" in weblink or 'CONTACT' in weblink or 'ABOUT' in weblink or 'contact-us' in weblink): if not weblink in unscraped_url and not weblink in scraped_url: unscraped_url.append(weblink) url_name = "{0.netloc}".format(parts) col = "List of Emails " + url_name df = pd.DataFrame(list_emails, columns=[col]) s = get_fld(base_url) df = df[df[col].str.contains(s) == True] df.to_csv('email2.csv', index=False) except KeyboardInterrupt: url_name = "{0.netloc}".format(parts) col = "List of Emails " + url_name df = pd.DataFrame(list_emails, columns=[col]) s = get_fld(base_url) df = df[df[col].str.contains(s) == True] df.to_csv('email2.csv', index=False) print("Program terminated manually!") raise SystemExit

Run the program and enjoy it…

Let’s see what our fantastic email scraper application produced. The website I have entered is www.abbott.com.

Output:

Method 2 Indirect Email Extraction

You will learn the steps to extract email addresses from Google.com using the second method.

Step 1: Install Libraries.

Using the pip command, install the following Python libraries:

  1. bs4 is a Beautiful soup for extracting google pages.
  2. The pandas module can save emails in a DataFrame for future processing.
  3. You can use Regular Expression (re) to match the Email Address format.
  4. The request library sends HTTP requests.
  5. You can use tld library to acquire relevant emails.
  6. time library to delay the scraping of pages.
pip install bs4
pip install pandas
pip install re
pip install request
pip install time

Step 2: Import Libraries.

Import the libraries.

from bs4 import BeautifulSoup
import pandas as pd
import re
import requests
from tld import get_fld
import time

Step 3: Constructing Search Query.

The search query is written in the format “@websitename.com“.

Create an input for the user to enter the URL of the website.

user_keyword = input("Enter the Website Name: ")
user_keyword = str('"@') + user_keyword +' " '

The format of the search query is “@websitename.com,” as indicated in the code for the user_keyword variable above. The search query has opening and ending double quotes.

Step 4: Define Variables.

Before moving on to the heart of the program, let’s first set up the variables.

page = 0
list_email = set()

You can move through multiple Google search results pages using the page variable. And list_email for extracted emails set.

Step 5: Requesting Google Page.

In this step, you will learn how to create a Google URL link using a user keyword term and request the same.

The Main part of coding starts as below:

while page <= 100: print("Searching Emails in page No " + str(page)) time.sleep(20.00) google = "https://www.google.com/search?q=" + user_keyword + "&ei=dUoTY-i9L_2Cxc8P5aSU8AI&start=" + str(page) response = requests.get(google) print(response)

Let’s examine what each line of code does.

  • The while loop enables the email extraction bot to retrieve emails up to a specific number of pages, in this case 10 Pages.
  • The code prints the page number of the Google page being extracted. The first page is represented by page number 0, the second by page 10, the third by page 20, and so on.
  • To prevent having Google’s IP blocked, we slowed down the programming by 20 seconds and requested the URLs more slowly.

Before creating a google variable, let us learn more about the google search URL.

Suppose you search the keyword “Germany” on google.com. Then the Google search URL will be as follows

  • https://www.google.com/search?q=germany

If you click the second page of the Google search result, then the link will be as follows:

  • https://www.google.com/search?q=germany&ei=dUoTY-i9L_2Cxc8P5aSU8AI&start=10

How does that link work?

  • The user keyword is inserted after the “q=” symbol, and the page number is added after the “start=” as shown above in the google variable.
  • Request a Google webpage after that, then print the results. To test whether it’s functioning or not. The website was successfully accessed if you received a 200 response code. If you receive a 429, it implies that you have hit your request limit and must wait two hours before making any more requests.

Step 6: Extracting Email Address.

In this step, you will learn how to extract the email address from the google search result contents.

soup = BeautifulSoup(response.text, 'html.parser')
new_emails = ((re.findall(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", soup.text, re.I)))
list_email.update(new_emails)
page = page + 10

The Beautiful soup parses the web page and extracts the content of html web page.

With the regex findall() function, you can obtain email addresses, as shown above. Then the new email is updated to the list_email set. The page is added to 10 for navigating the next page. 

n = len(user_keyword)-1
base_url = "https://www." + user_keyword[2:n]
col = "List of Emails " + user_keyword[2:n]
df = pd.DataFrame(list_email, columns=[col])
s = get_fld(base_url)
df = df[df[col].str.contains(s) == True]
df.to_csv('email3.csv', index=False)

And finally, target emails are saved to the CSV file from the above lines of code. The list item in the user_keyword starts from the  2nd position until the domain name.

Run the program and see the output.

Method 1 Vs. Method 2 

Can we determine which approach is more effective for building an email list: Method 1 Direct Email Extraction or Method 2 Indirect Email Extraction? The output’s email list was generated from the website abbot.com.

Let’s contrast two email lists that were extracted using Methods 1 and 2.

  • From Method 1, the extractor has retrieved 60 emails.
  • From Method 2, the extractor has retrieved 19 emails.
  • The 17 email lists in Method 2 are not included in Method 1. 
  • These emails are employee-specific rather than company-wide. Additionally, there are more employee emails in Method 1.

Thus, we are unable to recommend one procedure over another. Both techniques provide fresh email lists. As a result, both of these methods will increase your email list.

Summary

Building an email list is crucial for businesses and freelancers alike to increase sales and leads.

This article offers instructions on using Python to retrieve email addresses from websites.

The best two methods to obtain email addresses are provided in the article.

In order to provide a recommendation, the two techniques are finally compared.

The first approach is a direct email extractor from any website, and the second method is to extract email addresses using Google.com.


Regex Humor

Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee. (source)
Posted on Leave a comment

How to Apply a Function to a List

5/5 – (2 votes)

Problem Formulation and Solution Overview

This article will show you how to apply a function to a List in Python.

To make it more interesting, we have the following running scenario:

As a Python assignment, you have been given a List of Integers and asked to apply a function to each List element in various ways.


💬 Question: How would we write code to apply a function to a List in Python?

We can accomplish this task by one of the following options:


Preparation

These examples use functions from the math library.

Add the following code to the top of each script. This snippet will allow the code in this article to run error-free.

import math

Method 1: Use a Generator Expression

This example uses a Generator Expression. This expression performs any operations in memory first and returns an iterable object. An efficient option as upon completion, memory is cleared, and variables erased.

nums = [18, 43, 54, 65, 31, 21, 27]
nums = (math.pow(num,2) for num in nums)
print(nums)

The above code declares a List of Integers and saves it to the variable nums.

Next, a Generator Expression is called and applies the math.pow() function from Python’s built-in math library to each list element. The results save back to nums.

If output to the terminal at this point, an iterable Generator Object similar to the following displays.

<generator object at 0x000002468D9B59A0>

To turn the Generator Object into a List, run the following code.

print(list(nums))

The content of nums is as follows.

[324.0, 1849.0, 2916.0, 4225.0, 961.0, 441.0, 729.0]
YouTube Video

💡Note: The math.pow() function accepts two (2) integers as arguments: x (the value) and y (the power), and returns the value of x raised to the power of y.


Method 2: Use List Comprehension

This example uses List Comprehension to perform an operation on each List element.

nums = [18, 43, 54, 65, 31, 21, 27]
nums = [math.sqrt(num) for num in nums]
print(nums)

The above code declares a List of Integers and saves it to the variable nums.

Next, List Comprehension is called and applies the math.sqrt() function from Python’s built-in math library to each List element. The results save back to nums.

If output to the terminal, the following displays.

[4.242640687119285, 6.557438524302, 7.3484692283495345, 8.06225774829855, 5.5677643628300215, 4.58257569495584, 5.196152422706632]
YouTube Video

💡Note: The math.sqrt() function accepts an integer as an argument and returns the square root of said argument.


Method 3: Use a Lambda and map()

This example uses Python’s lambda function combined with map() and List to apply a mathematical operation to each List element.

nums = [18, 43, 54, 65, 31, 21, 27]
nums = list(map(lambda x: math.degrees(x), nums))
print(nums)

The above code declares a List of numbers and saves it to the variable nums.

Next, List is called and passed an argument map(), which in turn passes the lambda function to apply the math.degrees() function from Python’s built-in math library to each List element. The result returns to nums.

If output to the terminal, the following displays.

[1031.324031235482, 2463.71851906254, 3093.9720937064453, 3724.225668350351, 1776.169164905552, 1203.2113697747288, 1546.9860468532227]
YouTube Video

💡Note: The math.degrees() function accepts an angle as an argument, converts this argument from radians to degrees and returns the result.


Method 4: Use a For Loop

This example uses a for Loop to apply a mathematical operation to each List element.

nums = [18, 43, 54, 65, 31, 21, 27]
i = 0 while i < len(nums): nums[i] = round(math.sqrt(nums[i]), 2) i += 1 print(nums)

The above code declares a List of Integers and saves it to the variable nums. Then, a counter variable, i is declared, set to 0.

Next, a while loop is instantiated and iterates through each List element, applying the math.sqrt() function, and limiting the decimal places to two (2). The results save back to the appropriate element in nums.

Upon completion of the iteration, the output is sent to the terminal.

[4.24, 6.56, 7.35, 8.06, 5.57, 4.58, 5.2]

Bonus: Calculate Commissions on each List Element

This bonus code extracts two (2) columns from a real-estate.csv file, the street and price columns and converts each into a List.

Then, the street column is converted from UPPERCASE uppercase() to Title Case by applying the title() function. Next, Sales Commissions are calculated and applied to each price element using round().

import pandas as pd df = pd.read_csv('real-estate.csv', usecols=['street', 'price']).head(5) street = list(df['street'])
street = [item.title() for item in street] prices = list(df['price'])
commis = [round(p*.06,2) for p in prices] print(street)
print(prices)

The output it as follows.

['3526 High St', '51 Omaha Ct', '2796 Branch St', '2805 Janette Way', '6001 Mcmahon Dr']
[59222, 68212, 68880, 69307, 81900]

🌟Finxter Challenge!
Convert these Lists into a Dictionary format.


Summary

This article has provided four (4) ways to apply a function to each List element to select the best fit for your coding requirements.

Good Luck & Happy Coding!


Programming Humor – Python

“I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I’m leaving you.”xkcd
Posted on Leave a comment

How to Find a Partial String in a Python List?

5/5 – (1 vote)

Problem Formulation

💬 Challenge: Given a Python list of strings and a query string. Find the strings that partially match the query string.

Example 1:

  • Input: ['hello', 'world', 'python'] and 'pyth'
  • Output: ['python']

Example 2:

  • Input: ['aaa', 'aa', 'a'] and 'a'
  • Output: ['aaa', 'aa', 'a']

Example 3:

  • Input: ['aaa', 'aa', 'a'] and 'b'
  • Output: []

Let’s dive into several methods that solve this and similar type of problems. We start with the most straightforward solution.

Method 1: Membership + List Comprehension

The most Pythonic way to find a list of partial matches of a given string query in a string list lst is to use the membership operator in and the list comprehension statement like so: [s for s in lst if query in s].

Here’s a simple example:

def partial(lst, query): return [s for s in lst if query in s] # Example 1:
print(partial(['hello', 'world', 'python'], 'pyth'))
# ['python'] # Example 2:
print(partial(['aaa', 'aa', 'a'], 'a'))
# ['aaa', 'aa', 'a'] # Example 3:
print(partial(['aaa', 'aa', 'a'], 'b'))
# []

In case you need some background information, feel free to check out our two tutorials and the referenced videos.

👉 Recommended Tutorial: List Comprehension in Python

YouTube Video

👉 Recommended Tutorial: The Membership Operator in Python

YouTube Video

Method 2: list() and filter()

To find a list of partial query matches given a string list lst, combine the membership operator with the filter() function in which you pass a lambda function that evaluates the membership operation for each element in the list like so: list(filter(lambda x: query in x, lst)).

Here’s an example:

def partial(lst, query): return list(filter(lambda x: query in x, lst)) # Example 1:
print(partial(['hello', 'world', 'python'], 'pyth'))
# ['python'] # Example 2:
print(partial(['aaa', 'aa', 'a'], 'a'))
# ['aaa', 'aa', 'a'] # Example 3:
print(partial(['aaa', 'aa', 'a'], 'b'))
# []

Beautiful Python one-liner, isn’t it? 🦄

I recommend you check out the following tutorial with video to shed some light on the background information here:

👉 Recommended Tutorial: Python Filtering

YouTube Video

Generally, I like list comprehension more than the filter() function because the former is more concise (e.g., no need to convert the result to a list) and slightly faster. But both work perfectly fine!

Method 3: Regex Match + List Comprehension

The most flexible way to find a list of partial query matches given a string list lst is provided by Python’s powerful regular expressions functionality. For example, the expression [x for x in lst if re.match(pattern, x)] finds all strings that match a certain query pattern as defined by you.

The following examples showcase this solution:

import re def partial(lst, query): pattern = '.*' + query + '.*' return [x for x in lst if re.match(pattern, x)] # Example 1:
print(partial(['hello', 'world', 'python'], 'pyth'))
# ['python'] # Example 2:
print(partial(['aaa', 'aa', 'a'], 'a'))
# ['aaa', 'aa', 'a'] # Example 3:
print(partial(['aaa', 'aa', 'a'], 'b'))
# []

In this example, we use the dummy pattern .*query.* that simply matches words that contain the query string. However, you could also do more advanced pattern matching—regex to the rescue!

Again, I’d recommend you check out the background info on regular expressions:

👉 Recommended Tutorial: Python Regex match() — A Simple Illustrated Guide

YouTube Video

Regex Humor

Wait, forgot to escape a space. Wheeeeee[taptaptap]eeeeee. (source)
Posted on Leave a comment

Python TypeError ‘set’ object is not subscriptable

5/5 – (1 vote)

Minimal Error Example

Given the following minimal example where you create a set and attempt to access an element of this set using indexing or slicing:

my_set = {1, 2, 3}
my_set[0]

If you run this code snippet, Python raises the TypeError: 'set' object is not subscriptable:

Traceback (most recent call last): File "C:\Users\xcent\Desktop\code.py", line 2, in <module> my_set[0]
TypeError: 'set' object is not subscriptable

Why Does the Error Occur?

The Python TypeError: 'set' object is not subscriptable occurs if you try to access an element of a set using indexing or slicing that imply an ordering of the set.

However, sets are unordered collections of unique elements: they have no ordering of elements. Thus, you cannot use slicing or indexing, operations that are only possible on an ordered type.

🌍 Recommended Tutorial: The Ultimate Guide to Python Sets

How to Fix the Error?

How to fix the TypeError: 'set' object is not subscriptable?

To fix the TypeError: 'set' object is not subscriptable, either convert the unordered set to an ordered list or tuple before accessing it or get rid of the indexing or slicing call altogether.

Here’s an example where you convert the unordered set to an ordered list first. Only then you use indexing or slicing so the error doesn’t occur anymore:

my_set = {1, 2, 3} # Convert set to list:
my_list = list(my_set) # Indexing:
print(my_list[0])
# 1 # Slicing:
print(my_list[:-1])
# [1, 2]

Alternatively, you can also convert the set to a tuple to avoid the TypeError: 'set' object is not subscriptable:

my_tuple = tuple(my_set)

Let’s end this article with a bit of humor, shall we? 🙂

Programmer Humor

There are only 10 kinds of people in this world: those who know binary and those who don’t.
👩🧔‍♂️
~~~

There are 10 types of people in the world. Those who understand trinary, those who don’t, and those who mistake it for binary.
👩🧔‍♂️👱‍♀️

Posted on Leave a comment

Python Convert Image (JPG, PNG) to CSV

5/5 – (1 vote)

Given an image as a .png or .jpeg file. How to convert it to a CSV file in Python?

Example image:

Convert the image to a CSV using the following steps:

  1. Read the image into a PIL.Image object.
  2. Convert the PIL.Image object to a 3D NumPy array with the dimensions rows, columns, and RGB values.
  3. Convert the 3D NumPy array to a 2D list of lists by collapsing the RGB values into a single value (e.g., a string representation).
  4. Write the 2D list of lists to a CSV using normal file I/O in Python.

Here’s the code that applies these four steps, assuming the image is stored in a file named 'c++.jpg':

from PIL import Image
import numpy as np # 1. Read image
img = Image.open('c++.jpg') # 2. Convert image to NumPy array
arr = np.asarray(img)
print(arr.shape)
# (771, 771, 3) # 3. Convert 3D array to 2D list of lists
lst = []
for row in arr: tmp = [] for col in row: tmp.append(str(col)) lst.append(tmp) # 4. Save list of lists to CSV
with open('my_file.csv', 'w') as f: for row in lst: f.write(','.join(row) + '\n')

Note that the resulting CSV file looks like this with super long rows.

Each CSV cell (column) value is a representation of the RGB value at that specific pixel. For example, [255 255 255] represents the color white at that pixel.


For more information and some background on file I/O, check out our detailed tutorial on converting a list of lists to a CSV:

🌍 Related Tutorial: How to Convert a List to a CSV File in Python [5 Ways]

Posted on Leave a comment

How to Find the Most Common Element in a Python Dictionary

5/5 – (1 vote)

Problem Formulation and Solution Overview

This article will show you how to find the most common element in a Python Dictionary. However, since all Dictionary Keys are unique, this article focuses on searching for the most common Dictionary Value.

To make it more fun, we have the following running scenario:

Marty Smart, a Math Teacher at Harwood High, has amassed his student’s grades for the semester and has come to you to write a script to determine the most common grade. Below is sample data.

students = {'Marc': 99, 'Amie': 76, 'Jonny': 98, 'Anne': 99, 'Andy': 77, 'Elli': 98, 'Acer': 67, 'Joan': 61, 'Mike': 54, 'Anna': 76, 'Bobi': 67, 'Kate': 99, 'Todd': 98, 'Emma': 49, 'Stan': 76, 'Harv': 99, 'Ward': 67, 'Hank': 54, 'Wendy': 98, 'Sven': 100}

💬 Question: How would we write code to locate the most common value in a Dictionary?

We can accomplish this task by one of the following options:


Method 1: Use statistics mode()

This example uses mode() from the statistics library. This function returns the single most common element found in the passed argument.

from statistics import mode
common_val = mode(students.values())

The above code calls in mode() from the statistics library.

The following line uses the mode() function and passes the values from the key:value pair of students as an argument. The results save to common_val.

If the contents of students.values() are output to the terminal, the following will display.

print(students.values())
dict_values([99, 76, 98, 99, 77, 98, 67, 61, 54, 76, 67, 99, 98, 49, 76, 99, 67, 54, 98, 100])

Run the code below to find the most common value.

print(common_val)
99

This is correct!

YouTube Video

Method 2: Use Collections.Counter

This example uses the collections library with the counter() function to keep track of each element count.

from collections import Counter common_val = Counter(students.values()).most_common

The above code imports Python’s built-in collections library and counter().

Next, the counter() function is called and is passed all values from the key:value pair of students as an argument. Then, most_common() is appended. The results save to common_val.

If this was output to the terminal, the following would display.

<bound method Counter.most_common of Counter({99: 4, 98: 4, 76: 3, 67: 3, 54: 2, 77: 1, 61: 1, 49: 1, 100: 1})>

This isn’t the result we want. How can we get this result?

common_val = Counter(students.values()).most_common(1) 

If we append a (1) to the end of most_common, a List containing one Tuple returns.

[(99, 4)]

To extract the data further, use slicing ([0]) to reference the Tuple and assign the output accordingly.

value, count = Counter(students.values()).most_common(1)[0]
print(value, count)

Much clearer! The grade of 99 appears 4 times in students.

99 4
YouTube Video

Method 3: Use For Loop and max()

This example locates the most common value in a Dictionary using a for loop and max() without importing a library.

tally = {}
for k, v in students.items(): if v not in tally: tally[v] = 0 else: tally[v] += 1
print(max(tally, key=tally.get))

The above code declares an empty Dictionary tally.

Then a for loop is instantiated to loop through each key:value pair in the Dictionary students.

If v (the value) is not in the tally, then the count for is set to 0.

Otherwise, if tv (the value) is in tally, the count is increased by 1.

Once the iteration is complete, the max() function is called to get the most common value in tally and output to the terminal.

99
YouTube Video

Method 4: Use max()

This example uses max() to retrieve the most common value in a Python dictionary. Simple, clean, efficient.

common_val = max(list(students.values()), key=list(students.values()).count)

The code above calls the max() function and passes two (2) arguments, the values of the key:value pairs of students and a List object.

If output to the terminal, these two (2) arguments contain the following.

print(list(students.values()))
print(list(students.values()).count)
[99, 76, 98, 99, 77, 98, 67, 61, 54, 76, 67, 99, 98, 49, 76, 99, 67, 54, 98, 100]
<built-in method count of list object at 0x00000239566D3540>

To retrieve the most common element, run the following code.

print(common_val)
99

Summary

This article has provided four (4) ways to find the most common element in a Python Dictionary. These examples should give you enough information to select the best fitting for your coding requirements.

Good Luck & Happy Coding!


Programmer Humor – Blockchain

“Blockchains are like grappling hooks, in that it’s extremely cool when you encounter a problem for which they’re the right solution, but it happens way too rarely in real life.” source xkcd
Posted on Leave a comment

Solidity by Example – Simple Open Auction (Explained)

5/5 – (1 vote)
YouTube Video

This article continues on the series we started the last time: Solidity smart contract examples, which implement a simplified real-world process.

Here, we’re walking through an example of a simple open auction.

🌍 Original Source Code: Solidity Docs

We’ll first lay out the entire smart contract example without the comments for readability and development purposes.

Then we’ll dissect it part by part, analyze it and explain it.

Following this path, we’ll get a hands-on experience with smart contracts, as well as good practices in coding, understanding, and debugging smart contracts.

Smart contract – Simple Open Auction

// SPDX-License-Identifier: GPL-3.0
pragma solidity ^0.8.4; contract SimpleAuction { address payable public beneficiary; uint public auctionEndTime; address public highestBidder; uint public highestBid; mapping(address => uint) pendingReturns; bool ended; event HighestBidIncreased(address bidder, uint amount); event AuctionEnded(address winner, uint amount); error AuctionAlreadyEnded(); error BidNotHighEnough(uint highestBid); error AuctionNotYetEnded(uint timeToAuctionEnd); error AuctionEndAlreadyCalled(); constructor( uint biddingTime, address payable beneficiaryAddress ) { beneficiary = beneficiaryAddress; auctionEndTime = block.timestamp + biddingTime; } function bid() external payable { if (block.timestamp > auctionEndTime) revert AuctionAlreadyEnded(); if (msg.value <= highestBid) revert BidNotHighEnough(highestBid); if (highestBid != 0) { pendingReturns[highestBidder] += highestBid; } highestBidder = msg.sender; highestBid = msg.value; emit HighestBidIncreased(msg.sender, msg.value); } function withdraw() external returns (bool) { uint amount = pendingReturns[msg.sender]; if (amount > 0) { pendingReturns[msg.sender] = 0; if (!payable(msg.sender).send(amount)) { pendingReturns[msg.sender] = amount; return false; } } return true; } function auctionEnd() external { if (block.timestamp < auctionEndTime) revert AuctionNotYetEnded(auctionEndTime - block.timestamp); if (ended) revert AuctionEndAlreadyCalled(); ended = true; emit AuctionEnded(highestBidder, highestBid); beneficiary.transfer(highestBid); }
}

Code breakdown and analysis

// SPDX-License-Identifier: GPL-3.0

Compiles only with Solidity compiler version 0.8.4 and later, but before version 0.9.

🌍 Learn More: Layout of a Solidity File

pragma solidity ^0.8.4; contract SimpleAuction {

Parameters of the auction are variables beneficiary and auctionEndTime which we’ll initialize with contract creation arguments while the contract gets created, i.e. in the contract constructor.

Data type for time variables is unsigned integer uint, so that we can represent either absolute Unix timestamps (seconds since 1970-01-01) or time periods in seconds (seconds lapsed from the reference moment we chose).

 address payable public beneficiary; uint public auctionEndTime;

The current state of the auction is reflected in two variables, highestBidder and highestBid.

 address public highestBidder; uint public highestBid;

Previous bids can be withdrawn, that’s why we have mapping data structure to record pendingReturns.

 mapping(address => uint) pendingReturns;

Indicator flag variable for the auction end. By default, the flag is initialized to false; we’ll prevent changing it once it switches to true.

 bool ended;

When changes occur, we want our smart contract to emit the corresponding change events.

 event HighestBidIncreased(address bidder, uint amount); event AuctionEnded(address winner, uint amount);

We’re defining four errors to describe relevant failures. Along with these errors, we’ll also introduce “triple-slash” comments, commonly known as natspec comments. They enable users to see comments when an error is displayed or when users are asked to confirm the transaction.

🌍 Learn More: Natspec comments are formally defined in Ethereum Natural Language Specification Format.

 /// The auction has already ended. error AuctionAlreadyEnded(); /// There is already a higher or equal bid. error BidNotHighEnough(uint highestBid); /// The auction has not ended yet, the remaining seconds are displayed. error AuctionNotYetEnded(uint timeToAuctionEnd); /// The function auctionEnd has already been called. error AuctionEndAlreadyCalled();

Initialization of the contract with the contract creation arguments biddingTime and beneficiaryAddress.

 /// Create a simple auction with `biddingTime` /// seconds bidding time on behalf of the /// beneficiary address `beneficiaryAddress`. constructor( uint biddingTime, address payable beneficiaryAddress ) { beneficiary = beneficiaryAddress; auctionEndTime = block.timestamp + biddingTime; }

A bidder bids by sending the currency (paying) to the smart contract representing the beneficiary, hence the bid() function is defined as payable.

🌍 Learn More: What is payable in Solidity?

 /// Bid on the auction with the value sent /// together with this transaction. /// The value will only be refunded if the /// auction is not won. function bid() external payable {

The function call reverts if the bidding period ended.

 if (block.timestamp > auctionEndTime) revert AuctionAlreadyEnded();

The function rolls back the transaction to the bidder if the bid does not exceed the highest one.

 if (msg.value <= highestBid) revert BidNotHighEnough(highestBid);

The previous highest bidder was outbid and his bid is added to his previous bids reserved for a refund.

💡 A direct refund is considered a security risk due to the possibility of executing an untrusted contract.

Instead, the bidders (recipients) will withdraw their bids themselves by using withdraw() function below.

 if (highestBid != 0) { pendingReturns[highestBidder] += highestBid; }

The new highest bidder and his bid are recorded; the event HighestBidIncreased is emitted carrying this information pair.

 highestBidder = msg.sender; highestBid = msg.value; emit HighestBidIncreased(msg.sender, msg.value); }

Bidders call the withdraw() function to retrieve the amount they bid.

 /// Withdraw a bid that was overbid. function withdraw() external returns (bool) { uint amount = pendingReturns[msg.sender]; if (amount > 0) {

It is possible to call the withdraw() function again before the send() function returns. That’s the reason why we need to disable multiple sequential withdrawals from the same sender by setting the pending returns for a sender to 0.

 pendingReturns[msg.sender] = 0;

Variable type of msg.sender is not address payable, therefore we need to convert it explicitly by using function payable() as a wrapping function.

If the send() function ends with an error, we’ll just reset the pending amount and return false.

 if (!payable(msg.sender).send(amount)) { // No need to call throw here, just reset the amount owing pendingReturns[msg.sender] = amount; return false; } } return true; }

The auctionEnd() function ends the auction and sends the highest bid to the beneficiary.

The official Solidity documentation recommends dividing the interacting functions into three functional parts:

  • checking the conditions,
  • performing the actions, and
  • interacting with other contracts.

Otherwise, by combining these parts rather than keeping them separated, more than one calling contract could try and modify the state of the called contract and change the called contract’s state.

 /// End the auction and send the highest bid /// to the beneficiary. function auctionEnd() external {

Checking the conditions…

 if (block.timestamp < auctionEndTime) revert AuctionNotYetEnded(auctionEndTime - block.timestamp); if (ended) revert AuctionEndAlreadyCalled();

…performing the actions…

 ended = true; emit AuctionEnded(highestBidder, highestBid);

…and interacting with other contracts.

 beneficiary.transfer(highestBid); }
}

Our smart contract example is a simple, but a powerful one, enabling us to bid an amount of currency to the beneficiary.

When the contract instantiates via its constructor, it sets the auction end time and its beneficiary, i.e. beneficiary address.

The contract has three simple features, implemented via dedicated functions: bidding, withdrawing the bids and ending the auction.

A new bid is accepted only if its amount is strictly larger than the current highest bid. A new bid acceptance means that the current highest bid is added to the bidder’s balance for later withdrawal. The new highest bidder becomes the current highest bidder and the new highest bid becomes the current highest bid.

Bid withdrawing returns all summed previous bids to each bidder (mapping pendingReturns).

Contract Test Scenario

Open auction duration (in seconds): 240

Beneficiary: 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4

Testing/demonstration steps:

  1. Account 0xAb8483F64d9C6d1EcF9b849Ae677dD3315835cb2 bids 10 Wei;
  2. Account 0x4B20993Bc481177ec7E8f571ceCaE8A9e22C02db bids 25 Wei;
  3. Account 0x78731D3Ca6b7E34aC0F824c42a7cC18A495cabaB bids 25 Wei (rejected);
  4. Account 0x617F2E2fD72FD9D5503197092aC168c91465E7f2 bids 35 Wei;
  5. Account 0xAb8483F64d9C6d1EcF9b849Ae677dD3315835cb2 bids 40 Wei + initiates premature auction end;
  6. Account 0xAb8483F64d9C6d1EcF9b849Ae677dD3315835cb2 withdraws his bids;
  7. Account 0x4B20993Bc481177ec7E8f571ceCaE8A9e22C02db withdraws his bids;
  8. Account 0x78731D3Ca6b7E34aC0F824c42a7cC18A495cabaB withdraws his bids;
  9. Account 0x78731D3Ca6b7E34aC0F824c42a7cC18A495cabaB initiates timely auction end;
  10. Account 0x617F2E2fD72FD9D5503197092aC168c91465E7f2 withdraws his bids;

Appendix – The Contract Arguments

In this section is additional information for running the contract. We should expect that our example accounts may change with each refresh/reload of Remix.

Our contract creation arguments are the open auction duration (in seconds) and the beneficiary address (copy this line when deploying the example):

300, 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4

💡 Info: we could’ve used any amount of time, but I went with 300 seconds to timely simulate both a rejected attempt of ending the auction and the successful ending of the auction.

Conclusion

We continued our smart contract example series with this article that implements a simple open auction.

First, we laid out clean source code (without any comments) for readability purposes. Omitting the comments is not recommended, but we love living on the edge – and trying to be funny! 😀

Second, we dissected the code, analyzed it, and explained each possibly non-trivial segment. Just because we’re terrific, safe players who never risk it and do everything by the book 🙂


Programmer Humor – Blockchain

“Blockchains are like grappling hooks, in that it’s extremely cool when you encounter a problem for which they’re the right solution, but it happens way too rarely in real life.” source xkcd
Posted on Leave a comment

Python – Finding the Most Common Element in a Column

5/5 – (1 vote)

Problem Formulation and Solution Overview

This article will show you how to find the most common element in a Pandas Column.

To make it more interesting, we have the following running scenario:

You have been provided with a downloadable CSV file containing crime statistics for the San Diego area, including their respective NCIC Crime Codes.


💬 Question: How would you determine the most common NCIC Crime Code that occurs in San Diego’s jurisdiction?

We can accomplish this task by one of the following options:


Preparation

Before moving forward, please ensure the Pandas library is installed. Click here if you require instructions.

Then, add the following code to the top of each script. This snippet will allow the code in this article to run error-free.

import pandas as pd

After importing the Pandas library, this library is referenced by calling the shortcode (pd).


Method 1: Use Pandas mode()

This example uses the mode() method to determine the single most common crime committed in San Diego on a given day.

df = pd.read_csv('crimes.csv', usecols=['crimedescr'])
max_crime = df['crimedescr'].mode()
print(max_crime)

The above code reads in the crimedescr column from the crimes.csv file downloaded earlier. This saves to the DataFrame df.

Next, the crimedescr column is then accessed, and the mode() method is appended. This method returns a value or set of values that appear most often along a selected axis. The results save to max_crime.

These results are output to the terminal.

0 10851(A)VC TAKE VEH W/O OWNER
Name: crimedescr, dtype: object

So, out of 7,854 rows of crimes committed on a given day for San Diego, the above offense was committed the highest number of times.

The above code only provides us with the name of the most common crime; what if we need the crime name and the respective count?

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
max_crime = df['crimedescr'].mode()
max_count = df['ucr_ncic_code'].mode() print(max_crime)
print(max_count)

The above code is output to the terminal and displays the following.

0 10851(A)VC TAKE VEH W/O OWNER
Name: crimedescr, dtype: object
0 7000
Name: ucr_ncic_code, dtype: int64

Now, you are equipped to return to your boss and tell them that 7,000 offenses of 10851 (A) VC TAKE VEH W/O OWNER occurred on a given day in San Diego.

YouTube Video

Method 2: Use value_counts()

This example uses the value_counts() function to determine the top 5 most common crimes committed in San Diego on a given day.

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
top5_names = df['crimedescr'].value_counts()[:5].index.tolist()
print(top5_names)

The above code reads in the crimedescr and ucr_ncic_code columns from the crimes.csv file downloaded earlier. This saves to the DataFrame df.

Then, the crimedescr column is accessed, and the value_counts() function is appended. This function returns a series containing the counts of unique values.

However, since slicing is also appended ([:5]), only the top five (5) common crimes are retrieved and then converted to a List. The results save to top5_names.

['10851(A)VC TAKE VEH W/O OWNER', 'TOWED/STORED VEH-14602.6', '459 PC BURGLARY VEHICLE', 'TOWED/STORED VEHICLE', '459 PC BURGLARY RESIDENCE']

The above code only provides us with the names of the top 5 most common crimes; what if we need the names and their respective counts?

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
top5 = df['crimedescr'].value_counts()[:5].sort_values(ascending=False)
print(top5)

The above output is sent to the terminal.

10851(A)VC TAKE VEH W/O OWNER 653
TOWED/STORED VEH-14602.6 463
459 PC BURGLARY VEHICLE 462
TOWED/STORED VEHICLE 434
459 PC BURGLARY RESIDENCE 356
Name: crimedescr, dtype: int64
YouTube Video

A cleaner way to achieve the same results is to use the following code.

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
top5 = df['crimedescr'].value_counts().nlargest(5)
print(top5)

The above code calls the nlargest() method to determine and retrieve the top five (5) common crimes. The output is identical to the above.

10851(A)VC TAKE VEH W/O OWNER 653
TOWED/STORED VEH-14602.6 463
459 PC BURGLARY VEHICLE 462
TOWED/STORED VEHICLE 434
459 PC BURGLARY RESIDENCE 356
Name: crimedescr, dtype: int64

A much cleaner and more precise output to send to the boss!


Method 3: Use value_counts() and idxmax()

This example uses value_counts() and idxmax() to determine the single most common crime committed in San Diego on a given day.

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code'])
max_crime = df['crimedescr'].value_counts().idxmax()
print(max_crime)

The above code reads in the crimedescr and ucr_ncic_code columns from the crimes.csv file downloaded earlier. This saves to the DataFrame df.

Then, the crimedescr column is accessed, and the value_counts() function is appended. This function returns a series containing the count of unique values.

Next, idxmax() is appended. This method returns the index of the first occurrence of the maximum index(es) over a selected axis.

The results save to max_crime and are output to the terminal.

10851(A)VC TAKE VEH W/O OWNER

Method 4: Use value_counts() and keys()

This example uses value_counts() and keys() to determine the top 5 most common crimes committed in unique grid areas of San Diego on a given day.

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'grid', 'ucr_ncic_code'])
top5_grids = df['grid'].value_counts().keys()[:5]
print(top5_grids)

The above code reads in the crimedescr, grid, and the ucr_ncic_code columns from the crimes.csv file downloaded earlier. This saves to the DataFrame df.

Let’s break the highlighted line down.

If df['grid'].value_counts() was output to the terminal, the following would display (snippet). However, we have added a heading row to make it more understandable, and only five (5) rows are displayed.

Grid # Grid Total
742 115
969 105
958 100
564 80
1084 71

Next, the code keys()[:5] is appended. The final output displays as follows.

Int64Index([742, 969, 958, 564, 1084], dtype='int64')

Method 5: Use groupby()

This examples uses groupby() to group our data on the Crime Code and displays the totals in descending order.

df = pd.read_csv('crimes.csv', usecols=['crimedescr', 'ucr_ncic_code']) res = (df.groupby(['ucr_ncic_code','crimedescr']).size() .sort_values(ascending=False) .reset_index(name='count'))
print(res)

The above code reads in the crimedescr and the ucr_ncic_code columns from the crimes.csv file downloaded earlier. This saves to the DataFrame df.

Next, the groupby() function is called and passed the first argument: df.groupby(['ucr_ncic_code','crimedescr']).size(). If this was output to the terminal at this point, the following would display (snippet).

print(df.groupby(['ucr_ncic_code','crimedescr']).size())
ucr_ncic_code crimedescr
909 2
999 1
197 1
664 1
1099 1

As you can see, the other arguments need to be added to turn this into something usable. Sorting the data in descending order and adding a count column will provide the results we are looking for.

If the original Method 5 code example was output to the terminal, the following would display.

ucr_ncic_code crimedescr count
0 2404 10851(A)VC TAKE VEH W/O OWNER 653
1 7000 TOWED/STORED VEH-14602.6 463
2 2299 459 PC BURGLARY VEHICLE 462
3 7000 TOWED/STORED VEHICLE 434
4 2204 459 PC BURGLARY RESIDENCE 356
YouTube Video

Summary

This article has provided five (5) ways to find the most common element in a Panda Column. These examples should provide you with enough information to select the one that best meets your coding requirements.

Good Luck & Happy Coding!


Programming Humor – Python

“I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I’m leaving you.”xkcd
Posted on Leave a comment

Python TypeError: NoneType is Not Subscriptable (Fix This Stupid Bug)

5/5 – (1 vote)

Do you encounter the following error message?

TypeError: NoneType is not subscriptable

You’re not alone! This short tutorial will show you why this error occurs, how to fix it, and how to never make the same mistake again.

So, let’s get started!

Summary

Python raises the TypeError: NoneType is not subscriptable if you try to index x[i] or slice x[i:j] a None value. The None type is not indexable, i.e., it doesn’t define the __getitem__() method. You can fix it by removing the indexing or slicing call, or defining the __getitem__ method.

Example

 TypeError: 'NoneType' object is not subscriptable

The following minimal example that leads to the error:

x = None
print(x[0])
# TypeError: 'NoneType' object is not subscriptable

You set the variable to the value None. The value None is not a container object, it doesn’t contain other objects. So, the code really doesn’t make any sense—which result do you expect from the indexing operation?

Exercise: Before I show you how to fix it, try to resolve the error yourself in the following interactive shell:

If you struggle with indexing in Python, have a look at the following articles on the Finxter blog—especially the third!

🌍 Related Articles:

Fixes

You can fix the non-subscriptable TypeError by wrapping the non-indexable values into a container data type such as a list in Python:

x = [None]
print(x[0])
# None

The output now is the value None and the script doesn’t yield an error message anymore.

An alternative is to define the __getitem__() method in your code:

class X: def __getitem__(self, i): return f"Value {i}" variable = X()
print(variable[0])
# Value 0

🌍 Related Tutorial: Python __getitem__() magic method

You overwrite the __getitem__ method that takes one (index) argument i (in addition to the obligatory self argument) and returns the i-th value of the “container”.

In our case, we just return a string "Value 0" for the element variable[0] and "Value 10" for the element variable[10].

🌍 Full Guide: Python Fixing This Subsctiptable Error (General)

What’s Next?

I hope you’d be able to fix the bug in your code! Before you go, check out our free Python cheat sheets that’ll teach you the basics in Python in minimal time:

Posted on Leave a comment

[Fixed] Matplotlib: TypeError: ‘AxesSubplot’ object is not subscriptable

5/5 – (1 vote)

Problem Formulation

Say, you’re me 👱‍♂️ five minutes ago, and you want to create a Matplotlib plot using the following (genius) code snippet:

import matplotlib.pyplot as plt fig, axes = plt.subplots()
axes[0, 0].plot([1, 2, 3], [9, 8, 7])
plt.show()

If you run this code, instead of the desired plot, you get the following TypeError: 'AxesSubplot' object is not subscriptable:

Traceback (most recent call last): File "C:\Users\xcent\Desktop\code.py", line 4, in <module> axes[0, 0].plot([1, 2, 3], [5, 5, 5])
TypeError: 'AxesSubplot' object is not subscriptable

💬 Question: How to resolve the TypeError: 'AxesSubplot' object is not subscriptable in your Python script?

Don’t panic! 📘 The solution is easier than you think…

Fix Not Subscriptable TypeError on ‘AxesSubplot’ Object

💡 Generally, Python raises the TypeError XXX object is not subscriptable if you use indexing with the square bracket notation on an object that is not indexable. In this case, you tried to index an Axes object because you thought it was an array of Axes objects.

Let’s go over the code to understand why the error happened!

First, you assign the result of the plt.subplots() function to the two variables fig and axes.

fig, axes = plt.subplots()

If you don’t pass an argument in the plt.subplots() function, it creates a Figure with one Axes object.

So if you try to subscript using axes[0,0], axes[0], or any other indexing scheme, Python will raise an error. It’s simple: axes doesn’t hold a container type so it cannot be indexed using the square bracket notation!

So to fix the TypeError: 'AxesSubplot' object is not subscriptable, simply remove the indexing notation on the axes object obtained by plt.subplots() called without arguments.

import matplotlib.pyplot as plt fig, axes = plt.subplots()
axes.plot([1, 2, 3], [9, 8, 7]) # not: axes[0, 0]
plt.show()

Now it works — here’s the output:

What is the Reason for the Error?

However, this error is tough to spot because if you pass any other argument into the plt.subplot() function, it creates a Figure and a Numpy array of Subplot/Axes objects which you store in fig and axes respectively.

For example, this creates a non-subscriptable axes because you don’t pass any argument:

fig, axes = plt.subplots()

For example, this creates a subscriptable array of axes that is a one-dimensional array of subplots because you pass an argument:

fig, axes = plt.subplots(3)

For example, this creates a subscriptable array of axes that is a two-dimensional array of subplots because you passed two arguments

fig, axes = plt.subplots(3, 2)

No wonder did you think that you can call axes[0,0] or axes[0] on the return value of the plt.subplot() function! However, doing so is only possible if you didn’t pass an argument into it.

Make sure you never run into similar errors by spending a couple of minutes understanding the plt.subplot() function once and for all!

Learn More about plt.subplot()

To further understand the subplots() function, check out our detailed guide on the Finxter blog and the following video:

YouTube Video

🌍 Full Tutorial: Matplotlib Subplots – A Helpful Illustrated Guide