Understanding Asyncio in Python

Most Python developers may have only worked with synchronous code in Python. However, if you are a data scientist, you may have used the multiprocessing library to run some calculations in parallel. And if you are a web developer, you may have come across concurrency with threading. Both multiprocessing and threading are advanced concepts in Python and have their own specific fields of application.

What Is Asyncio?

Asyncio is a Python library that’s used to write concurrent code with async and await syntax. It’s used primarily in I/O-bound tasks, such as web page development or fetching data from APIs.

Aside from multiprocessing and threading, there is another, new member in the concurrency family of Python, asyncio. Asyncio is a library used to write concurrent code with the async/await syntax. Similar to threading, asyncio is suitable for I/O-bound tasks, which are very common in practice. We’ll go over the basics of asyncio and demonstrate how to use this new library to write asynchronous code.

What’s the Difference Between CPU-Bound and I/O-Bound Tasks?

Before we get started with the asyncio library, it’s important to understand what CPU-bound and I/O-bound tasks are because they determine which library should be used to solve your particular problem.

A CPU-bound task spends most of its time doing heavy calculations with the CPUs. If you are a data scientist and need to crunch a huge amount of data to train machine learning models, then it’s a CPU-bound task. In this case, you should use multiprocessing to run your jobs in parallel and make full use of your CPUs.

An I/O-bound task spends most of its time waiting for I/O responses, which can be responses from web pages, databases or disks. If you’re developing a web page where a request needs to fetch data from APIs or databases, it’s an I/O-bound task. Concurrency can be achieved for I/O-bound tasks with either threading or asyncio to minimize the waiting time from external resources.

More on Python__new__ vs. __init__ Methods in Python

Asyncio vs. Threading

Now that you know that both threading and asyncio are suitable for I/O-bound tasks, what are the differences?

First, threading uses multiple threads, whereas asyncio uses only one. Threading is easier to understand because the threads take turns to run the code and thus achieve concurrency. But how is it possible to achieve concurrency with a single thread?

Well, threading achieves concurrency with preemptive multitasking, which means we can’t determine when to run which code in which thread. It’s the operating system that determines which code should be run in which thread. The operating system can switch the control at any point between threads. This is why we often see random results with threading.

On the other hand, asyncio achieves concurrency with cooperative multitasking. We decide which part of the code can be awaited, which then switches the control to run other parts of the code. The tasks need to cooperate and announce when the control will be switched out. And all this is done in a single thread with the await command. This will be clearer when we see the code later.

What Is a Coroutine in Asyncio?

This is a fancy and exotic name in asyncio. It’s not easy to explain what it is. Many tutorials don’t explain this concept at all and just show you what it is with some code. However, let’s try to understand what it is. We’ll start by reviewing Python’s definition.

“Coroutines are a more generalized form of subroutines. Subroutines are entered at one point and exited at another point. Coroutines can be entered, exited, and resumed at many different points,” according to Python documentation.

While this may still seem pretty confusing, it will make more sense once you have more experience with asyncio.

In this definition, we can understand subroutines as functions, despite the differences between the two. Normally, a function is entered and exited only once when it’s called. However, there is a special function in Python called generator, which can be entered and exited many times.

Coroutines behave like generators. In older versions of Python, coroutines are defined by generators. These coroutines are called generator-based coroutines. However, coroutines have become a native feature in Python now and can be defined with the new async def syntax. Even though generator-based coroutines are deprecated now, their history and existence can help us understand what a coroutine is and how the control is switched or yielded between different parts of the code. PEP 492 is a good reference to learn more about the history and specification of coroutine in Python. However, it may be difficult to read and understand for beginners.

Even if you don’t understand all the concepts right away, it’s OK. They will become clearer over time when you write and read more and more asynchronous code with the asyncio library.

How to Define a Coroutine Function in Asyncio

Now that the basic concepts have been introduced, we can write our first coroutine function:

async def coro_func():
    print("Hello, asyncio!")

coro_func() is a coroutine function, and when it’s called it will return a coroutine object:

coro_obj = coro_func()

type(coro_obj)
# coroutine

Note that the term coroutine can refer to either a coroutine function or a coroutine object, depending on the context.

As you may have noticed, when the coroutine function is called, the print function isn’t called. If you have worked with generators, you won’t be surprised because it behaves similarly to generator functions:

def gen_func():
    yield "Hello, generator!"

generator = gen_func()
type(generator)
# generator

In order to run the code in a generator, you need to iterate it. For example, you can use the next function to iterate it:

next(generator)
# 'Hello, generator!'

Similarly, to run the code defined in a coroutine function, you need to await it. However, you can’t await it in the same way as you iterate a generator. A coroutine can only be awaited inside another coroutine defined by the async def syntax:

async def coro_func():
    print("Hello, asyncio!")

async def main():
    print("In the entrypoint coroutine.")
    await coro_func()

Now, the question is, how can we run the main() coroutine function? Well, we can’t put it in another coroutine function and await it.

For the top-level entry point coroutine function, which is normally named as main(), we need to use asyncio.run() to run it:

import asyncio

async def coro_func():
    print("Hello, asyncio!")

async def main():
    print("In the entrypoint coroutine.")
    await coro_func()

asyncio.run(main())
# In the entrypoint coroutine.
# Hello, asyncio!

We need to import the built-in asyncio library here.

Under the hood, it’s handled by an event loop. However, with modern Python, you don’t need to worry about those details.

How to Return a Value in a Coroutine Function in Asyncio

We can return a value in a coroutine function. The value is returned with the await command and can be assigned to a variable:

import asyncio

async def coro_func():
    print("Hello, asyncio!")

async def main():
    print("In the entrypoint coroutine.")
    await coro_func()

asyncio.run(main())
# In the entrypoint coroutine.
# Hello, asyncio!

How to Run Multiple Coroutines Concurrently in Asyncio

It’s not much fun nor useful to have a single coroutine in your code. Coroutines really shine when there are multiples of them that run concurrently.

Let’s first look at an example where coroutines are awaited incorrectly:

import asyncio
from datetime import datetime

async def async_sleep(num):
    print(f"Sleeping {num} seconds.")
    await asyncio.sleep(num)

async def main():
    start = datetime.now()

    for i in range(1, 4):
        await async_sleep(i)
    
    duration = datetime.now() - start
    print(f"Took {duration.total_seconds():.2f} seconds.")

asyncio.run(main())
# Sleeping 1 seconds.
# Sleeping 2 seconds.
# Sleeping 3 seconds.
# Took 6.00 seconds.

First, we need to use the asyncio.sleep() function in a coroutine function to simulate the I/O blocking time.

Second, the three coroutine objects created are awaited one by one. Since the control is only handled until the next line of code, the next loop here, when the coroutine object awaited has been completed, these three coroutines are actually awaited one by one. As a result, it took six seconds to run the code, which is the same as running the code synchronously.

To achieve concurrency, we need to run multiple coroutines with the async.gather() function.

async.gather() is used to run multiple awaitables concurrently. An awaitable is something that can be awaited with the await command. It can be a coroutine, a task, a future or anything that implements the __await__() magic method.

Let’s see the usage with async.gather():

import asyncio
from datetime import datetime

async def async_sleep(num):
    print(f"Sleeping {num} seconds.")
    await asyncio.sleep(num)


async def main():
    start = datetime.now()

    coro_objs = []
    for i in range(1, 4):
        coro_objs.append(async_sleep(i))
    
    await asyncio.gather(*coro_objs)
    
    duration = datetime.now() - start
    print(f"Took {duration.total_seconds():.2f} seconds.")

asyncio.run(main())
# Sleeping 1 seconds.
# Sleeping 2 seconds.
# Sleeping 3 seconds.
# Took 3.00 seconds.

We need to unpack the list of awaitables for the async.gather() function.

This time the coroutine objects were run concurrently and the code only took three seconds.

If you check the return type of asyncio.gather(), you will see that it’s a Future object. A Future object is a special data structure representing that some work is done somewhere else and may or may not have been completed. When a Future object is awaited, three things can happen:

When the future has been resolved successfully, meaning the underlying work has been completed, it will return immediately with the returned value, if available.
When the future has been resolved unsuccessfully and an exception is raised, the exception will be propagated to the caller.
When the future has not been resolved yet, the code will wait until it’s resolved.

A tutorial on asynchronous coding with Asyncio. | Video: Tech With Tim

More on PythonHow to Use JSON Schema to Validate JSON Documents in Python

Async and Aiohttp Example in Asyncio

In the previous example, we wrote some dummy code to demonstrate the basics of asyncio. Now, let’s write some more practical code to further demonstrate the use of asyncio.

We will write code to fetch responses from the requests to web pages concurrently, which is a classic I/O-bound task.

We can’t use our familiar requests library to get responses from web pages. This is because the requests library does not support the asynico library. This is a major limitation of the asynico library, as many classic Python libraries still don’t support the asyncio library. However, this will get better over time, and more asynchronous libraries will become available.

To solve the problem of the requests library, we need to use the aiohttp library which is designed for making asynchronous HTTP requests (and more).

We need to install aiohttp first as it’s still an external library:

pip install aiohttp

It’s highly recommended to install new libraries in a virtual environment so that they don’t impact system libraries and you won’t have compatibility issues.

This is the code for using the aiohttp library to perform HTTP requests, which also uses the async with syntax heavily:

import asyncio
import aiohttp

async def scrape_page(session, url):
    print(f"Scraping {url}")
    async with session.get(url) as resp:
        return len(await resp.text())

async def main():
    urls = [
        "https://www.superdataminer.com/posts/66cff907ce8e",
        "https://www.superdataminer.com/posts/f21878c9897",
        "https://www.superdataminer.com/posts/b24dec228c43"
    ]

    coro_objs = []

    async with aiohttp.ClientSession() as session:
        for url in urls:
            coro_objs.append(
                scrape_page(session, url)
            )
    
        results = await asyncio.gather(*coro_objs)

    for url, length in zip(urls, results):
        print(f"{url} -> {length}")

asyncio.run(main())
# Scraping https://www.superdataminer.com/posts/66cff907ce8e
# Scraping https://www.superdataminer.com/posts/f21878c9897
# Scraping https://www.superdataminer.com/posts/b24dec228c43
# https://www.superdataminer.com/posts/66cff907ce8e -> 12873
# https://www.superdataminer.com/posts/f21878c9897 -> 12809
# https://www.superdataminer.com/posts/b24dec228c43 -> 12920

The async with statement makes it possible to perform asynchronous calls when entering or exiting a context. Under the hood, it’s achieved by the async def __aenter__() and async def __aexit__() magical methods, which is a pretty advanced topic. If you are interested, you should first learn more about regular context manager in Python. However, normally, you don’t need to dive that deep, unless you want to create your own asynchronous context managers.

Except the async with syntax, the usage of the aiohttp library is actually very similar to that of the requests library.

In this post, we have introduced the basic concepts of asynchronous programming. With this knowledge, you shall be able to read and write basic asynchronous code with the asyncio library and can work more comfortably with asynchronous API frameworks like FastAPI.

Asyncio in Python: A Guide