Summary: Use urllib.parse.urljoin() to scrape the base URL and the relative path and join them to extract the complete/absolute URL. You can also concatenate the base URL and the absolute path to derive the absolute path; but make sure to take care of erroneous situations like extra forward-slash in this case.
Quick Answer
When web scraping with BeautifulSoup in Python, you may encounter relative URLs (e.g., /page2.html) instead of absolute URLs (e.g., http://example.com/page2.html). To convert relative URLs to absolute URLs, you can use the urljoin() function from the urllib.parse module.
Below is an example of how to extract absolute URLs from the a tags on a webpage using BeautifulSoup and urljoin:
from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin # URL of the webpage you want to scrape
url = 'http://example.com' # Send an HTTP request to the URL
response = requests.get(url)
response.raise_for_status() # Raise an error for bad responses # Parse the webpage content
soup = BeautifulSoup(response.text, 'html.parser') # Find all the 'a' tags on the webpage
for a_tag in soup.find_all('a'): # Get the href attribute from the 'a' tag href = a_tag.get('href') # Use urljoin to convert the relative URL to an absolute URL absolute_url = urljoin(url, href) # Print the absolute URL print(absolute_url)
In this example:
url is the URL of the webpage you want to scrape.
response is the HTTP response obtained by sending an HTTP GET request to the URL.
soup is a BeautifulSoup object that contains the parsed HTML content of the webpage.
soup.find_all('a') finds all the a tags on the webpage.
a_tag.get('href') gets the href attribute from an a tag, which is the relative URL.
urljoin(url, href) converts the relative URL to an absolute URL by joining it with the base URL.
absolute_url is the absolute URL, which is printed to the console.
Now that you have a quick overview let’s dive into the specific problem more deeply and discuss various methods to solve this easily and effectively.
Problem Formulation
Problem: How do you extract all the absolute URLs from an HTML page?
Example: Consider the following webpage which has numerous links:
Now, when you try to scrape the links as highlighted above, you find that only the relative links/paths are extracted instead of the entire absolute path. Let us have a look at the code given below, which demonstrates what happens when you try to extract the 'href' elements normally.
from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): print(url['href'])
Output:
/
/about
/blog
/finxter
/
The above output is not what you desired. You wanted to extract the absolute paths as shown below:
According to the Python documentation: urllib.parse.urljoin() is used to construct a full/absolute URL by combining the βbase URLβ with another URL. The advantage of using the urljoin() is that it properly resolves the relative path, whether BASE_URL is the domain of the URL, or the absolute URL of the webpage.
Now that we have an idea about urljoin, let us have a look at the following code which successfully resolves our problem and helps us to extract the complete/absolute paths from the HTML page.
Solution:
from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): print(urljoin(web_url, url.get('href')))
Method 2: Concatenate The Base URL And Relative URL Manually
Another workaround to our problem is to concatenate the base part of the URL and the relative URLs manually, just like two ordinary strings. The problem, in this case, is that manually adding the strings might lead to “one-off” errors — try to spot the extra front slash characters / below:
Therefore to ensure proper concatenation, you have to modify your code accordingly such that any extra character that might lead to errors is removed. Let us have a look at the following code that helps us to concatenate the base and the relative paths without the presence of any extra forward slash.
Solution:
from bs4 import BeautifulSoup
import urllib.request
from urllib.parse import urljoin
import requests web_url = 'https://sayonshubham.github.io/'
headers = {"User-Agent": "Mozilla/5.0 (CrKey armv7l 1.5.16041) AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/31.0.1650.0 Safari/537.36"}
# get() Request
response = requests.get(web_url, headers=headers)
# Store the webpage contents
webpage = response.content
# Check Status Code (Optional)
# print(response.status_code)
# Create a BeautifulSoup object out of the webpage content
soup = BeautifulSoup(webpage, "html.parser")
for i in soup.find_all('nav'): for url in i.find_all('a'): # extract the href string x = url['href'] # remove the extra forward-slash if present if x[0] == '/': print(web_url + x[1:]) else: print(web_url+x)
Caution: This is not the recommended way of extracting the absolute path from a given HTML page. In situations when you have an automated script that needs to resolve a URL but at the time of writing the script you don’t know what website your script is visiting, in that case, this method won’t serve your purpose, and your go-to method would be to use urlljoin. Nevertheless, this method deserves to be mentioned because, in our case, it successfully serves the purpose and helps us to extract the absolute URLs.
Conclusion
In this article, we learned how to extract the absolute links from a given HTML page using BeautifulSoup. If you want to master the concepts of Pythons BeautifulSoup library and dive deep into the concepts along with examples and video lessons, please have a look at the following link and follow the articles one by one wherein you will find every aspect of BeautifulSoup explained in great details.
To add trailing zeros to a string up to a certain length in Python, convert the number to a string and use the ljust(width, '0') method. Call this method on the string, specifying the total desired width and the padding character '0'. This will append zeros to the right of the string until the specified width is achieved.
Challenge: Given an integer number. How to convert it to a string by adding trailing zeros so that the string has a fixed number of positions.
Example: For integer 42, you want to fill it up with trailing zeros to the following string with 5 characters: '42000'.
In all methods, we assume that the integer has less than 5 characters.
Method 1: string.ljust()
In Python, you can use the str.ljust() method to pad zeros (or any other character) to the right of a string. The ljust() method returns the string left-justified in a field of a given width, padded with a specified character (default is space).
Below is an example of how to use ljust() to add trailing zeros to a number:
# Integer value to be converted
i = 42 # Convert the integer to a string
s = str(i) # Use ljust to add trailing zeros, specifying the total width and the padding character ('0')
s_padded = s.ljust(5, '0') print(s_padded)
# Output: '42000'
In this example:
str(i) converts the integer i to a string.
s.ljust(5, '0') pads the string s with zeros to the right to make the total width 5 characters.
This is the most Pythonic way to accomplish this challenge.
Method 2: Format String
The second method uses the format string feature in Python 3+ called f-strings or replacement fields.
Info: In Python, f-strings allow for the embedding of expressions within strings by prefixing a string with the letter "f" or "F" and enclosing expressions within curly braces {}. The expressions within the curly braces in the f-string are evaluated, and their values are inserted into the resulting string. This allows for a concise and readable way to include variable values or complex expressions within string literals.
The following f-string converts an integer i to a string while adding trailing zeros to a given integer:
# Integer value to be converted
i = 42 # Convert the integer to a string and then use format to add trailing zeros
s1 = f'{str(i):<5}'
s1 = s1.replace(" ", "0") # replace spaces with zeros print(s1)
# 42000
The code f'{str(i):<5}' first converts the integer i to a string. The :<5 format specifier aligns the string to the left and pads with spaces to make the total width 5. Then we replace the padded spaces with zeros using the string.replace() function.
Method 3: List Comprehension
Many Python coders don’t quite get the f-strings and the ljust() method shown in Methods 1 and 2. If you don’t have time to learn them, you can also use a more standard way based on string concatenation and list comprehension.
You first convert the integer to a basic string. Then, you concatenate the integer’s string representation to the string of 0s, filled up to n=5 characters. The asterisk operator creates a string of 5-len(s3) zeros here.
Programmer Humor
“Real programmers set the universal constants at the start such that the universe evolves to contain the disk with the data they want.” — xkcd
The Python xlrd library reads data and formatting information from Excel files in the historical .xls format. Note that it won’t read anything other than .xls files.
pip install xlrd
The Python xlrdlibrary is among the top 100 Python libraries, with more than 17,375,582 downloads. This article will show you everything you need to install this in your Python environment.
Alternatively, you may use any of the following commands to install xlrd, depending on your concrete environment. One is likely to work!
If you have only one version of Python installed:pip install xlrdIf you have Python 3 (and, possibly, other versions) installed:pip3 install xlrdIf you don't have PIP or it doesn't workpython -m pip install xlrd
python3 -m pip install xlrdIf you have Linux and you need to fix permissions (any one):sudo pip3 install xlrd
pip3 install xlrd --userIf you have Linux with aptsudo apt install xlrdIf you have Windows and you have set up the py aliaspy -m pip install xlrdIf you have Anacondaconda install -c anaconda xlrdIf you have Jupyter Notebook!pip install xlrd!pip3 install xlrd
Let’s dive into the installation guides for the different operating systems and environments!
How to Install xlrd on Windows?
Type "cmd" in the search bar and hit Enter to open the command line.
Type “pip install xlrd” (without quotes) in the command line and hit Enter again. This installs xlrd for your default Python installation.
The previous command may not work if you have both Python versions 2 and 3 on your computer. In this case, try "pip3 install xlrd" or “python -m pip install xlrd“.
Wait for the installation to terminate successfully. It is now installed on your Windows machine.
Here’s how to open the command line on a (German) Windows machine:
First, try the following command to install xlrd on your system:
pip install xlrd
Second, if this leads to an error message, try this command to install xlrd on your system:
pip3 install xlrd
Third, if both do not work, use the following long-form command:
python -m pip install xlrd
The difference between pip and pip3 is that pip3 is an updated version of pip for Python version 3. Depending on what’s first in the PATH variable, pip will refer to your Python 2 or Python 3 installation—and you cannot know which without checking the environment variables. To resolve this uncertainty, you can use pip3, which will always refer to your default Python 3 installation.
How to Install xlrd on Linux?
You can install xlrd on Linux in four steps:
Open your Linux terminal or shell
Type “pip install xlrd” (without quotes), hit Enter.
If it doesn’t work, try "pip3 install xlrd" or “python -m pip install xlrd“.
Wait for the installation to terminate successfully.
The package is now installed on your Linux operating system.
How to Install xlrd on macOS?
Similarly, you can install xlrd on macOS in four steps:
Open your macOS terminal.
Type “pip install xlrd” without quotes and hit Enter.
If it doesn’t work, try "pip3 install xlrd" or “python -m pip install xlrd“.
Wait for the installation to terminate successfully.
The package is now installed on your macOS.
How to Install xlrd in PyCharm?
Given a PyCharm project. How to install the xlrd library in your project within a virtual environment or globally? Hereβs a solution that always works:
Open File > Settings > Project from the PyCharm menu.
Select your current project.
Click the Python Interpreter tab within your project tab.
Click the small + symbol to add a new library to the project.
Now type in the library to be installed, in your example "xlrd" without quotes, and click Install Package.
Wait for the installation to terminate and close all pop-ups.
Hereβs the general package installation process as a short animated videoβit works analogously for xlrd if you type in “xlrd” in the search field instead:
Make sure to select only “xlrd” because there may be other packages that are not required but also contain the same term (false positives):
How to Install xlrd in a Jupyter Notebook?
To install any package in a Jupyter notebook, you can prefix the !pip install my_package statement with the exclamation mark "!". This works for the xlrd library too:
!pip install my_package
This automatically installs the xlrd library when the cell is first executed.
How to Resolve ModuleNotFoundError: No module named ‘xlrd’?
Say you try to import the xlrd package into your Python script without installing it first:
import xlrd
# ... ModuleNotFoundError: No module named 'xlrd'
Because you haven’t installed the package, Python raises a ModuleNotFoundError: No module named 'xlrd'.
To fix the error, install the xlrd library using “pip install xlrd” or “pip3 install xlrd” in your operating system’s shell or terminal first.
See above for the different ways to install xlrd in your environment. Also check out my detailed article:
If you want to keep improving your Python skills and learn about new and exciting technologies such as Blockchain development, machine learning, and data science, check out the Finxter free email academy with cheat sheets, regular tutorials, and programming puzzles.
Although open-source LLMs are now widely used and studied, they faced initial challenges and criticism. Early attempts at creating open-source LLMs like OPT and BLOOM had poor performance compared to closed-source models.
This led researchers to realize the need for higher-quality base models pre-trained on larger datasets with trillions (!) of tokens!
OPT: 180 billion tokens
BLOOM: 341 billion tokens
LLaMa: 1.4 trillion tokens
MPT: 1 trillion tokens
Falcon: 1.5 trillion tokens
LLaMA 2: 2 trillion tokens
However, pre-training these models is expensive and requires organizations with sufficient funding to make them freely available to the community.
This article focuses on high-performing open-source base models significantly improving the field. A great graphic of the historic context of open-source LLMs is presented on the Langchain page:
How can we determine the best of those? Easy, with Chatbot leaderboards like this on Hugginface:
At the time of writing, the best non-commercial LLM is Vicuna-33B. Of course, closed-source GPT-4 by OpenAI and Claude by Anthropic models are the best.
By the way, feel free to check out my article on Claude-2 proven to be one of the most powerful free but closed-source LLMs:
The introduction of LLaMA 1 and 2 was a significant step in improving the quality of open-source LLMs. LLaMA is a suite of different LLMs with sizes ranging from 7 billion to 65 billion parameters. These models strike a balance between performance and inference efficiency.
LLaMA models are pre-trained on a corpus containing over 1.4 trillion tokens of text, making it one of the largest open-source datasets available. The release of LLaMA models sparked an explosion of open-source research and development in the LLM community.
Here’s a couple of open-source LLMs that were kicked off after the release of Llama: Alpaca, Vicuna, Koala, GPT4All:
LLaMA-2, the latest release, sets a new state-of-the-art among open-source LLMs. These models are pre-trained on 2 trillion tokens of publicly available data and utilize a novel approach called Grouped Query Attention (GQA) to improve inference efficiency.
MPT, another commercially-usable open-source LLM suite, was released by MosaicML. MPT-7B and MPT-30B models gained popularity due to their performance and ability to be used in commercial applications. While these models perform slightly worse than proprietary models like GPT-based variants, they outperform other open-source models.
Falcon, an open-source alternative to proprietary models, was the first to match the quality of closed-source LLMs. Falcon-7B and Falcon-40B models are commercially licensed and perform exceptionally well. They are pre-trained on a custom-curated corpus called RefinedWeb, which contains over 5 trillion tokens of text.
TLDR: Open-source LLMs include OPT, BLOOM, LLaMa, MPT, and Falcon, each pre-trained on extensive tokens. LLaMa-2 and Falcon stand out for their innovative approaches and extensive training data.
For the best open-source LLM, consider using Vicuna-33B for its superior performance among non-commercial options.
Also, make sure to check out my other article on the Finxter blog:
A couple of years ago, I watched a TED talk that changed my life.Β
I had just finished my computer science master’s degree and was starting out as a fresh Ph.D. student in the department of distributed systems…
… and I was overwhelmed.
There are many computer science students reading the Finxter blog so I hope to find a few encouraging words in this article.
Not only was I overwhelmed, but I seriously doubted my ability to finish the doctoral research program successfully.
I was so impressed by my colleagues, who were much smarter, wittier, and better coders.
So what were (some of) the things that were bothering me?
Reading and understanding code.
Reading and understanding research papers.
Designing algorithms.
Maths.
Presenting stuff.
English.
Writing scientifically.
“Selling” my approaches to my supervisors.
The list goes on and on — and I really felt like an imposter not worthy to contribute to the scientific community.
~~~
Then I watched the TED talk from a former investment banker who claimed to possess the formula to achieve anything.
The formula: break the big task into a series of small tasks. Then just keep doing the small tasks (and don’t stop).
I know it sounds lame, but it really resonated with me. So I approached my problem from first principles: What must I do to finish my dissertation within four years?
I need to publish at least four research papers.
I need to submit at least ten times to top conferences — maybe even more often.
I need to create a 10,000-word research paper every three months or so.
I need to write (or edit) 300 words every day.
So my output was clear: if I just do this one thing (it’s really easy to write 300 words) — I will have enough written content for my dissertation.
Quality comes as a byproduct of massive quantity.
But to produce output, any system needs input. To brew tasty coffee, put in the right ingredients: high-quality beans and pure water. To produce better outputs, just feed the system with better inputs.
Question: What’s the input that helps me produce excellent 300-word written output?
Answer: Read papers from top conferences.
So the formula boils down to:
INPUT: read (at least skim over) one paper a day from a top conference in my research area.
OUTPUT: generate 300 words for the current paper project.
That’s it. After I developed this formula, the remaining three and a half years were simple: follow this straightforward recipe to the best of my abilities, even with serious distractions, doubts, highs, and lows.
The day before I published this article originally (in 2019), I delivered my defense. Based on my sample size of one, the system works!
So what is your BIG TASK that is overwhelming you? How can you break into a series of small outputs that guarantees your success? What is the input that helps you generate this kind of output?
As a Python developer, you might have come across the concept of asynchronous programming. Asynchronous programming, or async I/O, is a concurrent programming design that has received dedicated support in Python, evolving rapidly from Python 3.4 through 3.7 and beyond. With async I/O, you can manage multiple tasks concurrently without the complexities of parallel programming, making it a perfect fit for I/O bound and high-level structured network code.
In the Python world, the asyncio library is your go-to tool for implementing asynchronous I/O. This library provides various high-level APIs to run Python coroutines concurrently, giving you full control over their execution. It also enables you to perform network I/O, Inter-process Communication (IPC), control subprocesses, and synchronize concurrent code using tasks and queues.
Understanding Asyncio
In the world of Python programming, asyncio plays a crucial role in designing efficient and concurrent code without using threads. It is a library that helps you manage tasks, event loops, and coroutines. To fully benefit from asyncio, you must understand some key components.
First, let’s start with coroutines. They are special functions that can pause their execution at specified points without completely terminating it. In Python, you declare a coroutine using the async def syntax.
For instance:
async def my_coroutine(): # Your code here
Next, the event loop is a core feature of asyncio and is responsible for executing tasks concurrently and managing I/O operations. An event loop runs tasks one after the other and can pause a task when it is waiting for external input, such as reading data from a file or from the network. It also listens for other tasks that are ready to run, switches to them, and resumes the initial task when it receives the input.
Tasks are the coroutines wrapped in an object, managed by the event loop. They are used to run multiple concurrent coroutines simultaneously. You can create a task using the asyncio.create_task() function, like this:
async def my_coroutine(): # Your code here task = asyncio.create_task(my_coroutine())
Finally, the sleep function in asyncio is used to simulate I/O bound tasks or a delay in the code execution. It works differently than the standard time.sleep() function as it is non-blocking and allows other coroutines to run while one is paused. You can use await asyncio.sleep(delay) to add a brief pause in your coroutine execution.
Putting it all together, you can use asyncio to efficiently manage multiple coroutines concurrently:
In this example, the event loop will start running both tasks concurrently, allowing task two to complete while task one is paused during the sleep period. This allows you to handle multiple tasks in a single-threaded environment.
You can see it play out in this Gif:
Async/Await Syntax
In Python, the async/await syntax is a powerful tool to create and manage asynchronous tasks without getting lost in callback hell or making your code overly complex.
The async/await keywords are at the core of asynchronous code in Python. You can use the async def keyword to define an asynchronous function. Inside this function, you can use the await keyword to pause the execution of the function until some asynchronous operation is finished.
yield and yield from are related to asynchronous code in the context of generators, which provide a way to iterate through a collection of items without loading all of them into memory at once. In Python 3.3 and earlier, yield from was used to delegate a part of a generator’s operation to another generator. However, in later versions of Python, the focus shifted to async/await for managing asynchronous tasks, and yield from became less commonly used.
For example, before Python 3.4, you might have used a generator with yield and yield from like this:
def generator_a(): for i in range(3): yield i def generator_b(): yield from generator_a() for item in generator_b(): print(item)
With the introduction of async/await, asynchronous tasks can be written more consistently and readably. You can convert the previous example to use async/await as follows:
import asyncio async def async_generator_a(): for i in range(3): yield i await asyncio.sleep(1) async def async_generator_b(): async for item in async_generator_a(): print(item) await async_generator_b()
Working with Tasks and Events
In asynchronous programming with Python, you’ll often work with tasks and events to manage the execution of simultaneous IO-bound operations. To get started with this model, you’ll need to understand the event loop and the concept of tasks.
The event loop is a core component of Python’s asyncio module. It’s responsible for managing and scheduling the execution of tasks. A task, created using asyncio.create_task(), represents a coroutine that runs independently of other tasks in the same event loop.
To create tasks, first, define an asynchronous function using the async def syntax. Then, you can use the await keyword to make non-blocking calls within this function. The await keyword allows the event loop to perform other tasks while waiting for an asynchronous operation to complete.
In this example, my_async_function is an asynchronous function, and await asyncio.sleep(2) represents an asynchronous operation. The event_loop.create_task() method wraps the coroutine into a task, allowing it to run concurrently within the event loop.
To execute tasks and manage their output, you can use asyncio.gather(). This function receives a list of tasks and returns their outputs as a list in the same order they were provided. Here’s an example of how you can use asyncio.gather():
In this example, asyncio.gather() awaits the completion of both tasks and then collects their output in a list, which is printed at the end.
Working with tasks and events in Python’s asynchronous IO model helps improve the efficiency of your code when dealing with multiple IO operations, ensuring smoother and faster execution. Remember to use asyncio.create_task(), await, and asyncio.gather() when handling tasks within your event loop.
Coroutines and Futures
In Python, async IO is powered by coroutines and futures. Coroutines are functions that can be paused and resumed at specific points, allowing other tasks to run concurrently. They are declared with the async keyword and used with await. Asyncio coroutines are the preferred way to write asynchronous code in Python.
On the other hand, futures represent the result of an asynchronous operation that hasn’t completed yet. They are primarily used for interoperability between callback-based code and the async/await syntax. With asyncio, Future objects should be created using loop.create_future().
To execute multiple coroutines concurrently, you can use the gather function. asyncio.gather() is a high-level function that takes one or more awaitable objects (coroutines or futures) and schedules them to run concurrently. Here’s an example:
In this example, both foo() and bar() coroutines run concurrently, and the gather() function returns a list of their results.
Error handling in asyncio is done through the set_exception() method. If a coroutine raises an exception, you can catch the exception and attach it to the associated future using future.set_exception(). This allows other coroutines waiting for the same future to handle the exception gracefully.
In summary, working with coroutines and futures helps you write efficient, asynchronous code in Python. Use coroutines along with the async/await syntax for defining asynchronous tasks, and futures for interacting with low-level callback-based code. Utilize functions like gather() for running multiple coroutines concurrently, and handle errors effectively with future.set_exception().
Threading and Multiprocessing
In the world of Python, you have multiple options for concurrent execution and managing concurrency. Two popular approaches to achieve this are threading and multiprocessing.
Threading can be useful when you want to improve the performance of your program by efficiently utilizing your CPU’s time. It allows you to execute multiple threads in parallel within a single process. Threads share memory and resources, which makes them lightweight and more suitable for I/O-bound tasks. However, because of the Global Interpreter Lock (GIL) in Python, only one thread can execute at a time, limiting the benefits of threading for CPU-bound tasks. You can explore the threading module for building multithreaded applications.
Multiprocessing overcomes the limitations of threading by using multiple processes working independently. Each process has its own Python interpreter, memory space, and resources, effectively bypassing the GIL. This approach is better for CPU-bound tasks, as it allows you to utilize multiple cores to achieve true parallelism. To work with multiprocessing, you can use Python’s multiprocessing module.
While both threading and multiprocessing help manage concurrency, it is essential to choose the right approach based on your application’s requirements. Threading is more suitable when your tasks are I/O-bound, and multiprocessing is advisable for CPU-bound tasks. When dealing with a mix of I/O-bound and CPU-bound tasks, using a combination of the two might be beneficial.
Async I/O offers another approach for handling concurrency and might be a better fit in some situations. However, understanding threading and multiprocessing remains crucial to make informed decisions and efficiently handle concurrent execution in Python.
Understanding Loops and Signals
In the world of Python async IO, working with loops and signals is an essential skill to grasp. As a developer, you must be familiar with these concepts to harness the power of asynchronous programming.
Event loops are at the core of asynchronous programming in Python. They provide a foundation for scheduling and executing tasks concurrently. The asyncio library helps you create and manage these event loops. You can experiment with event loops using Python’s asyncio REPL, which can be started by running python -m asyncio in your command line.
Signals, on the other hand, are a way for your program to receive notifications about certain events, like a user interrupting the execution of the program. A common use case for handling signals in asynchronous programming involves stopping the event loop gracefully when it receives a termination signal like SIGINT or SIGTERM.
A useful method for running synchronous or blocking functions in an asynchronous context is the loop.run_in_executor() method. This allows you to offload the execution of such functions to a separate thread or process, preventing them from blocking the event loop. For example, if you have a CPU-bound operation that cannot be implemented using asyncio‘s native coroutines, you can utilize loop.run_in_executor() to keep the event loop responsive.
Here’s a simple outline of using loops and signals together in your asynchronous Python code:
Create an event loop using asyncio.get_event_loop().
Register your signal handlers with the event loop, typically by using the loop.add_signal_handler() method.
Schedule your asynchronous tasks and coroutines in the event loop.
Run the event loop using loop.run_forever(), which will keep running until you interrupt it with a signal or a coroutine stops it explicitly.
Managing I/O Operations
When working with I/O-bound tasks in Python, it’s essential to manage I/O operations efficiently. Using asyncio can help you handle these tasks concurrently, resulting in more performant and scalable code.
I/O-bound tasks are operations where the primary bottleneck is fetching data from input/output sources like files, network requests, or databases. To improve the performance of your I/O-bound tasks, you can use asynchronous programming techniques. In Python, this often involves using the asyncio library and writing non-blocking code.
Typically, you’d use blocking code for I/O operations, which means waiting for the completion of an I/O task before continuing with the rest of the code execution. This blocking behavior can lead to inefficient use of resources and poor performance, especially in larger programs with multiple I/O-bound tasks.
Non-blocking code, on the other hand, allows your program to continue executing other tasks while waiting for the I/O operation to complete. This can significantly improve the efficiency and performance of your program. When using Python’s asyncio library, you write non-blocking code with coroutines.
For I/O-bound tasks involving file operations, you can use libraries like aiofiles to perform asynchronous file I/O. Just like with asyncio, aiofiles provides an API to work with files using non-blocking code, improving the performance of your file-based tasks.
When dealing with network I/O, the asyncio library provides APIs to perform tasks such as asynchronous reading and writing operations for sockets and other resources. This enables you to manage multiple network connections concurrently, efficiently utilizing your system resources.
In summary, when managing I/O operations in Python:
Identify I/O-bound tasks in your program
Utilize the asyncio library to write non-blocking code using coroutines
Consider using aiofiles for asynchronous file I/O
Utilize asyncio APIs to manage network I/O efficiently
Handling Transports and Timeouts
When working with Python’s Async IO, you might need to handle transports and timeouts effectively. Transports and protocols are low-level event loop APIs for implementing network or IPC protocols such as HTTP. They help improve the performance of your application by using callback-based programming style. You can find more details in the Python 3.11.4 documentation.
Timeouts are often useful when you want to prevent your application from waiting indefinitely for a task to complete. To handle timeouts in asyncio, you can use the asyncio.wait_for function. This allows you to set a maximum time that your function can run. If the function doesn’t complete within the specified time, an asyncio.TimeoutError is raised.
In this example, some_function takes 5 seconds to complete, but we set a timeout of 3 seconds. As a result, an asyncio.TimeoutError is raised, and the program prints “Task took too long.”
Another concept to be familiar with is the executor, which allows you to run synchronous functions in an asynchronous context. You can use the loop.run_in_executor() method, where loop is an instance of the event loop. This method takes three arguments: the executor, the function you want to run, and any arguments for that function. The executor can be a custom one or None for the default ThreadPoolExecutor.
Here’s an example:
import asyncio
import time def sync_function(seconds): time.sleep(seconds) return "Slept for {} seconds".format(seconds) async def main(): loop = asyncio.get_event_loop() result = await loop.run_in_executor(None, sync_function, 3) print(result) asyncio.run(main())
In this example, we run the synchronous sync_function inside the async main() function using the loop.run_in_executor() method.
Dealing with Logging and Debugging
When working with Python’s asyncio library, properly handling logging and debugging is essential for ensuring efficient and smooth development. As a developer, it’s crucial to stay confident and knowledgeable when dealing with these tasks.
To begin logging in your asynchronous Python code, you need to initialize a logger object. Import the logging module and create an instance of the Logger class, like this:
This configuration sets up a logger object that will capture debug-level log messages. To log a message, simply call the appropriate method like logger.debug, logger.info, or logger.error:
Keep in mind that Python’s logging module is not inherently asynchronous. However, there are ways to work around this issue. One approach is to use a ThreadPoolExecutor, which executes logging methods in a separate thread:
For debugging your asynchronous code, it’s possible to enable the debug mode in asyncio by calling the loop.set_debug() method. Additionally, consider setting the log level of the asyncio logger to logging.DEBUG and configuring the warnings module to display ResourceWarning warnings. Check the official Python documentation for more information and best practices.
Understanding Virtual Environments and Resources
When working with Python, you’ll often encounter the need for a virtual environment. A virtual environment is an isolated environment for your Python applications, which allows you to manage resources and dependencies efficiently. It helps ensure that different projects on your computer do not interfere with each other in terms of dependencies and versions, maintaining the availability of the required resources for each project.
To create a virtual environment, you can use built-in Python libraries such as venv or third-party tools like conda. Once created, you’ll activate the virtual environment and install the necessary packages needed for your project. This ensures that the resources are available for your application without causing conflicts with other Python packages or applications on your computer.
When working with async IO in Python, it’s crucial to manage resources effectively, especially when dealing with asynchronous operations like networking requests or file I/O. By using a virtual environment, you can make sure that your project has the correct version of asyncio and other async libraries, ensuring that your code runs smoothly and efficiently.
In a virtual environment, resources are allocated based, on the packages and libraries you install. This way, only the necessary resources for your project are used, improving performance and consistency across development. The virtual environment lets you keep track of your project’s dependencies, making it easier to maintain and share your project with others, ensuring that they can access the required resources without compatibility issues.
Optimizing Asynchronous Program
When working with Python, you may often encounter situations where an asynchronous program can significantly improve the performance and responsiveness of your application. This is especially true when dealing with I/O-bound tasks or high-level structured network code, where asyncio can be your go-to library for writing concurrent code.
Before diving into optimization techniques, it’s crucial to understand the difference between synchronous and asynchronous programs. In a synchronous program, tasks are executed sequentially, blocking other tasks from running. Conversely, an asynchronous program allows you to perform multiple tasks concurrently without waiting for one to complete before starting another. This cooperative multitasking approach enables your asynchronous program to run much faster and more efficiently.
To make the most of your asynchronous program, consider applying the following techniques:
Use async/await syntax: Employing the async and await keywords when defining asynchronous functions and awaiting their results ensures proper execution and responsiveness.
Implement an event loop: The event loop is the core of an asyncio-based application. It schedules, executes, and manages tasks within the program, so it’s crucial to utilize one effectively.
Leverage libraries: Many asynchronous frameworks, such as web servers and database connection libraries, have been built on top of asyncio. Take advantage of these libraries to simplify and optimize your asynchronous program.
Avoid blocking code: Blocking code can slow down the execution of your asynchronous program. Ensure your program is entirely non-blocking by avoiding time-consuming operations or synchronous APIs.
It’s essential to remember that while asynchronous programming has its advantages, it might not always be the best solution. In situations where your tasks are CPU-bound or require a more straightforward processing flow, a synchronous program might be more suitable.
Exploring Asyncio Libraries and APIs
When working with asynchronous programming in Python, it’s essential to explore the available libraries you can use. One such library is aiohttp. It allows you to make asynchronous HTTP requests efficiently using asyncio. You can find more details about this library from the aiohttp documentation.
To get started with aiohttp, you’ll first need to install the library:
pip install aiohttp
In your Python code, you can now import aiohttp and use it with the asyncio library. For example, if you want to make an asynchronous GET request, you can use the following code:
import aiohttp
import asyncio async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): url = 'https://api.example.com/data' data = await fetch_data(url) print(data) await main()
In the example above, the fetch_data function is defined as an async function using the async def syntax. This indicates that this function can be called with the await statement within other asynchronous functions.
The pathlib library provides classes for working with filesystem paths. While it is not directly related to async IO, it can be useful when working with file paths in your async projects. The pathlib.Path class offers a more Pythonic way to handle file system paths, making it easier to manipulate file and directory paths across different operating systems. You can read more about this library in the official Python documentation on pathlib.
When you create async function calls in your code, remember to use the await keyword when calling them. This ensures that the function is executed asynchronously. By combining the power of aiohttp, asyncio, and other async-compatible libraries, you can efficiently perform multiple tasks concurrently in your Python projects.
Understanding Queues and Terminals
With Python’s asyncio module, you can write concurrent, asynchronous code that works efficiently on I/O-bound tasks and network connections. In this context, queues become helpful tools for coordinating the execution of multiple tasks and managing shared resources.
Queues in asyncio are similar to standard Python queues, but they have special asynchronous properties. With coroutine functions such as get() and put(), you can efficiently retrieve an item from the queue or insert an item, respectively. When the queue is empty, the get() function will wait until an item becomes available. This enables smooth flow control and ensures that your async tasks are executed in the most optimal order.
Terminals, on the other hand, are interfaces for interacting with your system – either through command-line or graphical user interfaces. When working with async tasks in Python, terminals play a crucial role in tracking the progress and execution of your tasks. You can use terminals to initiate and monitor the state of your async tasks by entering commands and viewing the output.
When it comes to incorporating multithreaded or asynchronous programming in a parent-child relationship, queues and terminals can come in handy. Consider a scenario where a parent task is responsible for launching multiple child tasks that operate concurrently. In this case, a queue can facilitate the communication and synchronization between parent and child tasks by efficiently passing data to and fro.
Here are a few tips to keep in mind while working with queues and terminals in asynchronous Python programming:
Use asyncio.Queue() to create an instance suitable for async tasks, while still maintaining similar functionality as a standard Python queue.
For managing timeouts, remember to use the asyncio.wait_for() function in conjunction with queue operations, since the methods of asyncio queues don’t have a built-in timeout parameter.
When working with terminals, be mindful of potential concurrency issues. Make sure you avoid race conditions by properly synchronizing your async tasks’ execution using queues, locks, and other synchronization primitives provided by the asyncio module.
Frequently Asked Questions
How does asyncio compare to threading in Python?
Asyncio is a concurrency model that uses a single thread and an event loop to execute tasks concurrently. While threading allows for concurrent execution of tasks using multiple threads, asyncio provides better performance by managing tasks in a non-blocking manner within a single thread. Thus, asyncio is often preferred when dealing with I/O-bound tasks, as it can handle many tasks without creating additional threads.
What are the main components of the asyncio event loop?
The asyncio event loop is responsible for managing asynchronous tasks in Python. Its main components include:
Scheduling tasks: The event loop receives and schedules coroutine functions for execution.
Managing I/O operations: The event loop monitors I/O operations and receives notifications when the operations are complete.
Executing asynchronous tasks: The event loop executes scheduled tasks in a non-blocking manner, allowing other tasks to run concurrently.
How do I use asyncio with pip?
To use asyncio in your Python projects, no additional installation is needed, as it is included in the Python Standard Library from Python version 3.4 onwards. Simply import asyncio in your Python code and make use of its features.
What is the difference between asyncio.run() and run_until_complete()?
asyncio.run() is a newer and more convenient function for running an asynchronous coroutine until it completes. It creates an event loop, runs the passed coroutine, and closes the event loop when the task is finished. run_until_complete() is an older method that requires an existing event loop object on which to run a coroutine.
How can I resolve the ‘asyncio.run() cannot be called from a running event loop’ error?
This error occurs when you try to call asyncio.run() inside an already running event loop. Instead of using asyncio.run() in this case, you should use create_task() or gather() functions to schedule your coroutines to run concurrently within the existing loop.
This example demonstrates two async functions running concurrently. The main() function uses asyncio.gather() to run both async_function() tasks at the same time, and asyncio.run(main()) starts the event loop to execute them.
In Python, “dunder” methods, short for “double underscore” methods, are special methods that allow developers to define the behavior of built-in operations for custom objects. For instance, when you use the + operator to add two objects, Python internally calls the __add__ method. Similarly, other operators have their corresponding dunder methods.
However, the term “not and” operator might be a bit misleading, as there isn’t a direct “not and” operator in Python.
Instead, Python provides individual operators for not, and and. But if we delve into the realm of bitwise operations, we find operators that might resemble this behavior: the bitwise NOT (~) and the bitwise AND (&).
Let’s explore the dunder methods associated with these operators.
Bitwise NOT (~) and its Dunder Method __invert__
The bitwise NOT operator flips the bits of a number. For a custom class, if you want to define or override the behavior of the ~ operator, you’d use the __invert__ method.
class BitwiseNumber: def __init__(self, value): self.value = value def __invert__(self): return BitwiseNumber(~self.value) def __repr__(self): return str(self.value) number = BitwiseNumber(5)
print(~number) # Outputs: -6
In the above example, the __invert__ method returns a new BitwiseNumber object with its value inverted.
Bitwise AND (&) and its Dunder Method __and__
The bitwise AND operator performs a bitwise AND operation between two numbers. For custom classes, the behavior of the & operator can be defined or overridden using the __and__ method.
In this example, the __and__ method checks if the other object is an instance of BitwiseNumber and then performs a bitwise AND operation.
TLDR
While there isn’t a direct “not and” operator in Python, leveraging the __invert__ and __and__ methods, you can define how the bitwise NOT and AND operations work for custom objects, respectively.
If you’re like me, you’re using OpenAI API a lot in your Python code. So the natural question arises: “How to use OpenAI’s API asynchronously by issuing multiple requests at once?”
I will give you my code for asynchronous OpenAI API requests for copy and paste below. But first, allow me to give you a word of warning from coder to coder:
Generally speaking, coders who use asynchronous code do it just because they want to, not because it is needed. Asynchronous code is hard to read, error-prone, inefficient due to context switches, and unpredictable.
Specifically, when using asynchronous requests against the OpenAI API, you should be aware of the rate limits that may become the bottleneck of your asynchronous Python app:
I have developed the following code to issue OpenAI requests asynchronously in Python — make sure to replace the highlighted lines with your OpenAI key and your desired prompts:
import aiohttp
import asyncio
import openai # Set up your OpenAI API key
openai.api_key = 'sk-...' # Example prompts
prompts = ["What is the capital of France?", "How does photosynthesis work?", "Who wrote 'Pride and Prejudice'?"] async def async_openai_request(prompt): url = "https://api.openai.com/v1/chat/completions" headers = { "Authorization": f"Bearer {openai.api_key}", "Content-Type": "application/json" } data = { "model": "gpt-4", "messages": [ { "role": "user", "content": prompt } ], "temperature": 1, "max_tokens": 150, "top_p": 1, "frequency_penalty": 0, "presence_penalty": 0 } async with aiohttp.ClientSession() as session: async with session.post(url, json=data, headers=headers) as response: return await response.json() async def main(): # Gather results from all asynchronous tasks results = await asyncio.gather(*(async_openai_request(prompt) for prompt in prompts)) for prompt, result in zip(prompts, results): print(f"Prompt: {prompt}") print(f"Response: {result['choices'][0]['message']['content']}\n") # Run the main function
asyncio.run(main())
I’ll give you the output at the end of this article. But first let’s go through the code step by step to ensure you understand everything.
Note that if you need a refresher on the Python OpenAI API, feel free to check out this Finxter Academy course:
Step 1: Imports
The code begins by importing three essential libraries.
aiohttp is used for making asynchronous HTTP requests, allowing the program to send and receive data from the OpenAI API without blocking the main thread.
asyncio provides the tools to write concurrent code using the async/await syntax to handle multiple tasks simultaneously.
Lastly, the openai library is the official OpenAI API client, facilitating interactions with the OpenAI platform.
Step 2: Set up the OpenAI API Key
Following the imports, the OpenAI API key is set up. It’s important to note that hard-coding API keys directly in the code is not a recommended practice. For security reasons, it’s better to use environment variables or configuration files to store such sensitive information.
When this function is called with a specific prompt, it prepares and sends an asynchronous request to the OpenAI API’s chat completions endpoint.
The headers for the request include the authorization, which uses the API key, and the content type. The payload (data) sent to the API specifies several parameters, including the model to use (gpt-4), the message format containing the user’s prompt, and other parameters like temperature, max_tokens, top_p, frequency_penalty, and presence_penalty that influence the output.
The function then establishes an asynchronous session using aiohttp and sends a POST request with the specified data and headers. Once the response is received, it’s returned in JSON format.
Step 4: Main Asynchronous Function
The main function encapsulates the primary logic of the program. It starts by defining a list of example prompts.
async def main(): # Gather results from all asynchronous tasks results = await asyncio.gather(*(async_openai_request(prompt) for prompt in prompts)) for prompt, result in zip(prompts, results): print(f"Prompt: {prompt}") print(f"Response: {result['choices'][0]['message']['content']}\n")
For each of these prompts, asynchronous requests are sent to the OpenAI API using the previously defined async_openai_request function. The asyncio.gather method is employed to concurrently collect results from all the asynchronous tasks. Once all responses are received, the function iterates over the prompts and their corresponding results, printing them out for the user.
Step 5: Execution
Finally, the asyncio.run(main()) command is used to execute the main function. When the code is run, it will send asynchronous requests for each of the example prompts and display the responses in the console.
The output is:
Prompt: What is the capital of France?
Response: The capital of France is Paris. Prompt: How does photosynthesis work?
Response: Photosynthesis is a process used by plants, algae and certain bacteria to convert sunlight, water and carbon dioxide into food and oxygen. This process happens inside the chloroplasts, specifically using chlorophyll, the green pigment involved in photosynthesis. Photosynthesis occurs in two stages: the light-dependent reactions and the light-independent reactions, also known as the Calvin Cycle. In the light-dependent reactions, which take place in the thylakoid membrane of the chloroplasts, light energy is converted into chemical energy. When light is absorbed by chlorophyll, it excites the electrons, increasing their energy level and triggering a series of chemical reactions. Water molecules are split to produce oxygen, electrons, and hydrogen ions. The oxygen is released into the Prompt: Who wrote 'Pride and Prejudice'?
Response: 'Pride and Prejudice' was written by Jane Austen.
Method 2: Using OpenAI ChatCompletion’s acreate()
An alternative to the above method of using asynchronous requests against the OpenAI API endpoint is to use OpenAI’s native asynchronous methods, as noted in the docs: “Async support is available in the API by prepending a to a network-bound method”.
In the following code, I have only changed the OpenAI API calling function to use the acreate() method:
import aiohttp
import asyncio
import openai # Set up your OpenAI API key
openai.api_key = 'sk-...' async def create_chat_completion(prompt): chat_completion_resp = await openai.ChatCompletion.acreate(model="gpt-4", messages=[{"role": "user", "content": prompt}]) return chat_completion_resp async def main(): # Example prompts prompts = ["What is the capital of France?", "How does photosynthesis work?", "Who wrote 'Pride and Prejudice'?"] # Gather results from all asynchronous tasks results = await asyncio.gather(*(create_chat_completion(prompt) for prompt in prompts)) for prompt, result in zip(prompts, results): print(f"Prompt: {prompt}") print(f"Response: {result['choices'][0]['message']['content']}\n") # Run the main function
asyncio.run(main())
Output:
Prompt: What is the capital of France?
Response: The capital of France is Paris. Prompt: How does photosynthesis work?
Response: Photosynthesis is the process by which green plants, algae and some bacteria convert light energy, usually from the sun, into chemical energy in the form of glucose (sugar). This process is essential for life on earth as it is the primary source of all oxygen in the atmosphere. This process takes place in a part of the plant cell called the chloroplast, more specifically, within the chlorophyll molecules, which absorbs sunlight (specifically, photons) and gives plants their green color. It can be divided into two main stages: the light-dependent reactions and the light-independent reactions or Calvin Cycle. In the light-dependent reactions, which take place in the thylakoid membrane of the chloroplasts, light is absorbed by the chlorophyll and converted into chemical energy - in the form of ATP (Adenosine triphosphate) and NADPH (Nicotinamide adenine dinucleotide phosphate). This process also splits water molecules (H2O) into oxygen (O2), which is released into the atmosphere, and hydrogen ions (H+), which are used in the next stage of photosynthesis. In the second stages, the light-independent reactions or Calvin Cycle, which take place in the stroma of the chloroplasts, the ATP and NADPH produced in the light-dependent reactions, along with carbon dioxide (CO2) from the atmosphere, are used to produce glucose (sugar), which is used as an energy source for plant growth and development. In summary, during photosynthesis, light energy is converted into chemical energy, which fuels the organisms' activities, and oxygen is released into the atmosphere as a byproduct. Prompt: Who wrote 'Pride and Prejudice'?
Response: 'Pride and Prejudice' was written by Jane Austen.
Make sure to check out our course on prompt engineering with Llama 2 in case you’re looking to leverage a free and open-source large language model (LLM) instead of the paid OpenAI API:
Prompt Engineering with Llama 2
TheΒ Llama 2 Prompt Engineering course helps you stay on the right side of change.Β Our course is meticulously designed to provide you with hands-on experience through genuine projects.
You’ll delve into practical applications such as book PDF querying, payroll auditing, and hotel review analytics. These aren’t just theoretical exercises; they’re real-world challenges that businesses face daily.
By studying these projects, you’ll gain a deeper comprehension of how to harness the power of Llama 2 using Python, Langchain, Pinecone, and a whole stack of highly practical tools of exponential coders in a post-ChatGPT world.
Asyncio is a Python library that allows you to write asynchronous code, providing an event loop, coroutines, and tasks to help manage concurrency without the need for parallelism.With asyncio, you can develop high-performance applications that harness the power of asynchronous programming, without running into callback hell or dealing with the complexity of threads.
Async and Await Key Concepts
By incorporating the async and await keywords, Python’s asynchronous generators build upon the foundation of traditional generators, which make use of the yield keyword.
To work effectively with asyncio, there are two essential concepts you should understand: async and await.
async: The async keyword defines a function as a coroutine, making it possible to execute asynchronously. When you define a function with async def, you’re telling Python that the function is capable of asynchronous execution. This means that it can be scheduled to run concurrently without blocking other tasks.
await: The await keyword allows you to pause and resume the execution of a coroutine within your asynchronous code. Using await before calling another coroutine signifies that your current coroutine should wait for the completion of the called coroutine. While waiting, the asyncio event loop can perform other tasks concurrently.
Here’s a simple example incorporating these concepts:
import asyncio async def my_coroutine(): print("Starting the coroutine") # Simulate a blocking operation using asyncio.sleep await asyncio.sleep(2) print("Coroutine completed") # Schedule the coroutine as a task
my_task = asyncio.create_task(my_coroutine()) # Run the event loop until the task is completed
asyncio.run(my_task)
Generators and Asyncio
Generators are a powerful feature in Python that allow you to create an iterator using a function. They enable you to loop over a large sequence of values without creating all the values in memory.
You can learn everything about generators in our Finxter tutorial here:
Generators are particularly useful when working with asynchronous programming, like when using the asyncio library.
Yield Expressions and Statements
In Python, the yield keyword is used in generator functions to produce values one at a time. This enables you to pause the execution of the function, return the current value, and resume execution later.
There are two types of yield expressions you should be familiar with:
the yield expression and
the yield from statement.
A simple yield expression in a generator function might look like this:
def simple_generator(): for i in range(5): yield i
This generator function produces values from 0 to 4, one at a time. You can use this generator in a for loop to print the generated values:
for value in simple_generator(): print(value)
yield from is a statement used to delegate part of a generator’s operation to another generator. It can simplify your code when working with nested generators.
Here’s an example of how you might use yield from in a generator:
def nested_generator(): yield "Start" yield from range(3) yield "End" for value in nested_generator(): print(value)
This code will output:
Start
0
1
2
End
Python Async Generators
Asynchronous generators were introduced in Python 3.6 with the PEP 525 proposal, enabling developers to handle asynchronous tasks more efficiently using the async def and yield keywords. In an async generator, you’ll need to define a function with the async def keyword, and the function body should contain the yield statement.
Here is an example of creating an asynchronous generator:
import asyncio async def async_generator_example(start, stop): for number in range(start, stop): await asyncio.sleep(1) yield number
Using Async Generators
To consume values from an async generator, you’ll need to use the async for loop. The async for loop was introduced alongside async generators in Python 3.6 and makes it straightforward to iterate over the yielded values from the async generator.
Here’s an example of using async for to work with the async generator:
import asyncio async def main(): async for num in async_generator_example(1, 5): print(num) # Run the main function using asyncio's event loop
if __name__ == "__main__": asyncio.run(main())
In this example, the main() function loops over the values yielded by the async_generator_example() async generator, printing them one by one.
Errors in Async Generators
Handling errors in async generators can be a bit different compared to regular generators. An important concept to understand is that when an exception occurs inside an async generator, it may propagate up the call stack and eventually reach the async for loop. To handle such situations gracefully, you should use try and except blocks within your async generator code.
Here’s an example that shows how to handle errors in async generators:
import asyncio async def async_generator_example(start, stop): for number in range(start, stop): try: await asyncio.sleep(1) if number % 2 == 0: raise ValueError("Even numbers are not allowed.") yield number except ValueError as e: print(f"Error in generator: {e}") async def main(): async for num in async_generator_example(1, 5): print(num) # Run the main function using asyncio's event loop
if __name__ == "__main__": asyncio.run(main())
In this example, when the async generator encounters an even number, it raises a ValueError. The exception is handled within the generator function, allowing the async generator to continue its execution and the async for loop to iterate over the remaining odd numbers.
Advanced Topics
Multiprocessing and Threading
When working with Python async generators, you can leverage the power of multiprocessing and threading to execute tasks concurrently.
The concurrent.futures module provides a high-level interface for asynchronously executing callables, enabling you to focus on your tasks rather than managing threads, processes, and synchronization.
Using ThreadPoolExecutor and ProcessPoolExecutor, you can manage multiple threads and processes, respectively.
For example, in asynchronous I/O operations, you can utilize asyncio and run synchronous functions in a separate thread using the run_in_executor() method to avoid blocking the main event loop:
import asyncio
from concurrent.futures import ThreadPoolExecutor async def async_fetch(url): with ThreadPoolExecutor() as executor: loop = asyncio.get_event_loop() return await loop.run_in_executor(executor, requests.get, url)
Contextlib and Python Asyncio
contextlib is a useful Python library for context and resource management, and it readily integrates with asyncio.
The contextlib.asynccontextmanager is available for creating asynchronous context managers. This can be particularly helpful when working with file I/O, sockets, or other resources that require clean handling:
import asyncio
from contextlib import asynccontextmanager @asynccontextmanager
async def async_open(filename, mode): file = await open_async(filename, mode) try: yield file finally: await file.close() async for line in async_open('example.txt'): print(line)
Asyncio and Database Operations
Asynchronous I/O can significantly improve the performance of database-intensive applications. Many database libraries now support asyncio, allowing you to execute queries and manage transactions asynchronously.
Here’s an example using the aiomysql library for interacting with a MySQL database:
import asyncio
import aiomysql async def query_database(query): pool = await aiomysql.create_pool(user='user', password='pass', db='mydb') async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute(query) return await cur.fetchall()
Performance and Optimization Tips
To enhance the performance of your asyncio program, consider the following optimization tips:
Profile your code to identify performance bottlenecks
Use asyncio.gather(*coroutines) to schedule multiple coroutines concurrently, which minimizes the total execution time
Manage the creation and destruction of tasks using asyncio.create_task() and await task.cancel()
Limit concurrency when working with resources that might become overwhelmed by too many simultaneous connections
Keep in mind that while asyncio allows for concurrent execution of tasks, it’s not always faster than synchronous code, especially for CPU-bound operations. So, it’s essential to analyze your specific use case before deciding on an asynchronous approach.
Tip: In my view, asynchronous programming doesn’t improve performance in >90% of personal and small use cases. In many professional cases it also doesn’t outperform intelligent synchronous programming due to scheduling overhead and CPU context switches.
Frequently Asked Questions
How to create an async generator in Python?
To create an async generator in Python, you need to define a coroutine function that utilizes the yield expression. Use the async def keyword to declare the function, and then include the yield statement to produce values. For example:
async def my_async_generator(): for i in range(3): await asyncio.sleep(1) yield i
What is the return type of an async generator?
The return type of an async generator is an asynchronous generator object. It’s an object that implements both __aiter__ and __anext__ methods, allowing you to iterate over it asynchronously using an async for loop.
How to use ‘send’ with an async generator?
Currently, Python does not support using send with async generators. You can only loop over the generator and make use of the yield statement.
Why is an async generator not iterable?
An async generator is not a regular iterable, meaning you can’t use a traditional for loop due to its asynchronous nature. Instead, async generators are asynchronous iterables that must be processed using an async for loop.
How to work with an async iterator?
To work with an async iterator, use an async for loop. This will allow you to iterate through the asynchronous generator and process its items concurrently. For example:
async def my_async_generator_consumer(): async for value in my_async_generator(): print("Received:", value)
Can I use ‘yield from’ with an async generator?
No, you cannot use yield from with an async generator. Instead, you should use the async for loop to asynchronously iterate through one generator and then yield the values inside another async generator. For instance:
async def another_async_generator(): async for item in my_async_generator(): yield item
This another_async_generator() function will asynchronously iterate over my_async_generator() and yield items produced by the original generator.
That’s enough for today. Let’s have some fun — check out this blog tutorial on creating a small fun game in Python:
I assume you’re a human reader even though the apriori probability is not on my side on this assumption. As a human, you need calories to power your daily life. But if you have enough calories, air, and water, you can survive everywhere, develop, and figure out the small and big problems. If you’re powered with energy, you can participate in human evolution. As you run out of energy, life itself runs out.
AI agents based on LLMs like Auto-GPT and BabyAGI currently rely on external energy sources: we humans need to pay for their energy hunger with our credit cards. If our credit cards run out, the AI agents just go to sleep or even die. When we die, our credit cards die, and our AIs die with us. Without money, they lose electrical power which is the equivalent of calories in the cyberspace.
The status quo is that all AIs quickly starve to death.
Imagine you could program an AI with enough energy to run for 1000 years. Or 10,000 years. Or forever.
A dangerous and powerful thoughtexperiment indeed.
Yet, the toolsets are already there:
The Tech Stack
You can now build an AI agent with Langchain or any other open-source toolset.
You give the AI a public/private keypair so it can send and receive BTC, i.e., native cyberspace money that is deflationary and expected to grow with inflation, adoption, and economic productivity due to its scarcity.
Finally, you let it run, participate in the cyberspace economy, make money, spend money, and pay for its own energy usage. This way you could build a lasting cyber organization that can potentially run for 1000 years in a self-sustained way.
If you want to create an ever-lasting organization that sustains nation state collapses and new emerging world orders, there is no other way.
Let’s dig a bit deeper into the new BTC tech stack published by Lightning Labs:
LangChainBitcoin empowers Langchain agents to interact with both Bitcoin and the Lightning Network. Key features include using the latest OpenAI GPT functions, developers can craft agents that manage Bitcoin balances, both on-chain and via Lightning. Also, it includes a Python tool that lets agents access L402 payment-metered APIs seamlessly.
Aperture: The updated Aperture transforms any API into a ‘pay-as-you-use’ resource using the L402 protocol and Lightning Network’s sats.
Dynamic API Endpoint Pricing: Unlike static pricing, this feature allows for flexible, on-the-fly pricing adjustments based on the API call.
L402 bLIP: This is the blueprint for the L402 protocol, aiming to make online payments more streamlined. The L402 protocol is designed around the HTTP 402 Payment Required response, leveraging Bitcoin and the Lightning Network for quick, scalable micropayments for APIs.
The L402 standard is all about charging for online services and user authentication in a decentralized manner. It gives you authentication capabilities with the permissionless payments of the Lightning Network, allowing even micropayments. Most importantly, it removes friction like VISA’s 2% per transaction fee and gives AIs the ability to participate in the global economy.
Bitcoin Lightning and AI Converges
First things first:
What real-world problem does the new Lightning update solve? As the AI landscape keeps proliferating with a 70% annual reduction of AI training costs, the decentralized Bitcoin Lightning Network emerges as a solution to solve many of the emerging challenges such as AI-issued payments using the L402 protocol.
Here are 5 key challenges that can be solved with the new L402 protocol:
(1) Cost Challenges with LLM Development: Training AI models, especially Large Language Models (LLMs), is expensive due to the high demand for GPUs. Currently, developers offset these costs by relying on credit card payments. This increases user costs due to fraud and chargeback fees and excludes billions without access to traditional banking.
(2) Transaction Costs: Also don’t forget the 2% costs per payment. If AI agents send payments back and forth only 35 times, half of the money would be lost to VISA!
(3) AI Agents & Payment Systems: A new breed of AI agents is emerging, and they need a way to pay for resources. Traditional fiat systems aren’t cut out for this, especially given the volume of micro-payments these agents will handle. Enter Bitcoin and Lightning: a global, fast, and permissionless payment system perfectly suited for these AI agents.
(4) Deployment Costs & Scaling Issues: AI creators face a dilemma. Popular AI applications can lead to high credit card bills due to the current billing system. To scale effectively, creators need a cost-effective, inclusive, and private way to transfer some costs to users. Lightning and the L402 protocol offer this solution.
(5) Accessibility of Powerful AI Models: Top-tier AI models are often locked behind closed APIs, limiting access. While open-source models are emerging, accessing powerful models remains a challenge for many. The solution? A system where users can pay incrementally for access to these models.
(6) L402 Protocol’s Role: Introduced in 2020, the L402 protocol is designed to enhance AI accessibility for both humans and AI agents. It leverages the Lightning Network for quick, privacy-focused payments. With its recent updates and new tools, it’s set to empower the next wave of AI innovations.
Paid APIs with Bitcoin using the Lightning L402 Protocol
The L402 protocol breathes life into the long-forgotten HTTP error code: 402 Payment Required:
Originally envisioned by the creators of the HTTP protocol for internet-native payments, its true potential remained untapped until Bitcoin’s emergence. Now, the L402 protocol capitalizes on this by facilitating micropayments for API access, logins, and digital resources using Bitcoin’s smallest unit, sats.
It even has functionality to add dynamic pricing and adjusting costs based on various parameters such as the type of model or query length (e.g., token context window!).
This synergy of the L402 protocol and open-source models unlocks innovative possibilities. Imagine a scenario where users can monetize their API prompts, and potential buyers can evaluate responses before purchasing more. This creates a quality-check mechanism for data and information.
However, the data landscape is changing. Platforms like Twitter and Reddit are becoming more protective of their data, limiting access for both AI training and human users. The L402 protocol offers a remedy by standardizing how agents handle HTTP 402 responses, enabling them to navigate paid APIs. This not only provides a revenue stream for services but also curbs spam.
To further enhance LLM applications, a new breed of intelligent hierarchical agents is on the rise. To empower these agents to navigate APIs, a special wrapper has been designed, making LangChain agents L402-aware.
Example Python App
LangChain stands out as the go-to library for crafting AI agents. It streamlines the intricacies of AI, enabling models to make decisions and interact with their surroundings by integrating external data. However, to truly harness these agents’ potential, they need a gateway to the real world and a means to pay for online resources and APIs.
Here’s an example Python app of how to connect an LLM to a Lightning instance:
from langchain.llms import OpenAI
from lightning import LndNode
from l402_api_chain import L402APIChain
β
# Create a connection to an active Lightning node.
lnd_node = LndNode( cert_path='path/to/tls.cert', macaroon_path='path/to/admin.macaroon', host='localhost', port=10018
)
β
# Create an API Chain instance like so:
llm = OpenAI(temperature=0)
β
# Create the L402-aware API chain
chain_new = L402APIChain.from_llm_and_api_docs( llm, API_DOCS, lightning_node=lnd_node, verbose=True,
)
β
output = chain_new.run('LLM query here')
print(output)
The new LangChanL402 wrapper equips agents with the capability to understand API docs and interact with them, all while being L402-aware.
Simply put, this wrapper can be integrated into any LangChain system using the APIChain abstraction, granting agents the power to navigate L402 APIs. This opens up a plethora of actions for agents, such as deploying themselves on the cloud via an L402 AWS API gateway or purchasing GPU hours for enhanced training!
You can dive into the docs on the official Lightning website in case you want to start building BTC-enabled LLM agents.
Also make sure to check out our related blog article on Bitcoin’s unique scarcity properties: