Introduction to asynchronous programming

Asynchronous programming is such a broad term, describing a way to run code in parallel. Let me give you an introduction to how to speed things up.

Originally, the original content was meant to be just a talk, entitled “Introduction to asynchronous programming”, at PyCon Canada 2019. However, as it would be awhile to have a record published, I thought it would be benefitial to write a blog post on the same topic.

When I researched on how asyncio in Python 3 worked, I struggled to differentiate asynchronous programming and multithread programming. Then, one of my friends mentioned that asynchronous programming is a broad term. So, what exactly is it?

What is Asynchronous programming?

Asynchronous programming is a way to run code in parallel with the main thread and notifies the main thread when it finishes running, fails to complete, or is still running. However, asynchronous programming is a very broad term as I mention earlier. People generally confuse it with cooperative multitasking. In this blog post, I will forcus on only multithreading (MT) and cooperative multitasking (CM).

For the rest of this blog post, I will refer multithreading as MT and cooperative multitasking as CM wherever I feel lazy to type them out in full. Sorry. :P

What is Cooperative multitasking?

Here is the long version.

Cooperative multitasking, also known as non-preemptive multitasking, is a style of computer multitasking in which the operating system never initiates a context switch from a running process to another process. Instead, processes voluntarily yield control periodically or when idle or logically blocked in order to enable multiple applications to be run concurrently. This type of multitasking is called “cooperative” because all programs must cooperate for the entire scheduling scheme to work. In this scheme, the process scheduler of an operating system is known as a cooperative scheduler, having its role reduced down to starting the processes and letting them return control back to it voluntarily. – Wikipedia

By definition, the main difference is that unlike the MT architecture, CM code never initiates a context switch from a running process to another process. Or at least the context switching should be minimal.

How it works

Since it is pretty difficult to explain here, I will use an example as a guide.

All examples are from https://github.com/shiroyuki/2019-talk-demo-async/tree/master/demo.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


# Based on 001-001-base.py
import time

def func_001():
    time.sleep(2)
    return 'r001'

def func_002():
    time.sleep(2)
    return 'r002'

def main():
    p001 = func_001()
    p002 = func_002()

if __name__ == '__main__':
    main()

As you can see, this code will take 4 seconds to finish. But we can make it faster.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


# Based on 001-002-threading.py
def func_001(shared_memory: Dict[str, str], key: str):
    time.sleep(2)
    shared_memory[key] = 'r001'

def func_002(shared_memory: Dict[str, str], key: str):
    time.sleep(2)
    shared_memory[key] = 'r002'

def main():
    shared_memory = dict()
    t1 = threading.Thread(target=func_001,
                          args=(shared_memory, 'p001'))
    t2 = threading.Thread(target=func_002,
                          args=(shared_memory, 'p002'))
    t1.start()
    t2.start()
    t1.join()
    t2.join()

At this point, we have func_001 and func_002 run in parallel and reduce the execution time by 50%. However, the code is pretty ugly at this stage due to the setup code and limitation of Thread. So, the next question is whether we can simplify this. Yes, we can. So, let’s see if ThreadPoolExecutor can do the magic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# Based on 001-003-tpe.py
def func_001():
    time.sleep(3)
    return 'r001'

def func_002():
    time.sleep(2)
    return 'r002'

def main():
    with ThreadPoolExecutor() as pool:
        futures: List[Future] = []
        p001 = pool.submit(func_001)
        p002 = pool.submit(func_002)
        logger.debug(f'main: p001 = {p001.result()}')
        logger.debug(f'main: p002 = {p002.result()}')

While ThreadPoolExecutor simply makes the code much more manageable and readable, you still need to wait to the result manually with either Future.result() or as_completed(List[Future]).

Now, let’s rewrite with coroutines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


# Modified
# based on demo/001-004-asyncio.py
async def func_001():
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(None, time.sleep, 3)
    return 'r001'

async def func_002():
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(None, time.sleep, 2)
    return 'r002'

async def main():
    t1 = asyncio.create_task(func_001())
    t2 = asyncio.create_task(func_002())
    p1 = await t1  # this line will start the event loop.
    p2 = await t2

asyncio.run(main())

Now, it is much simplier to follow, just like the base example, while it is still as fast as the MT solutions. But, how is it still as fast as using threads?

Event loops and coroutines

In CM, an event loop (NOTE: in this example, it is created when you call asyncio.run) manages the task scheduling and context switching and runs in a single thread. This means it can only run one task at a time. Also, the loop will rely on await to start, suspend, and resume scheduled tasks. This means the active task should have await to yield the activation to other tasks.

While you may have more than one event loop, the term “the event loop” only refers to the default event loop created as the result of executing asyncio.run(...).

What happens to CPU-bound code?

As an event loop only runs in a single thread, when you have CPU-bound code, the loop will be blocked until that code finishes running. To avoid the blockage, the loop has the method run_in_executor to delegate the CPU-bound code to a thread pool and unblock the loop.

Let’s look a bit deeper into the example.

First, with async def func_001(), type(func_001) is classified as a function. However, func_001() returns a coroutine.

Next, you can see that we intentionally create tasks for each coroutine:

14
15
16
17


t1 = asyncio.create_task(func_001())
t2 = asyncio.create_task(func_002())
p1 = await t1  # this line will start the event loop.
p2 = await t2

When we do asyncio.create_task(func_001()), not only it creates a task, it will also schedule the task to run in the event loop. However, the event loop will not start until the first await.

In Python 3, you can simplify this by writing:

14
15


p1 = await func_001()  # this line will start the event loop.
p2 = await func_002()

This is simplier but the simplicity comes with a tradeoff.

When you await for a coroutine, Python will create a task, schedult it, and start the event loop right away. In this particular example, you will notice that func_002 won’t be scheduled as the code is awaiting for the task of func_001 to finish. Thus, the placement of await and explicit task creation are very critical as context switching is handled by the event loop.

In summary…

	Multitasking	Cooperative Multitasking
Function Declaration	`def foo()`	`async def foo()`
When you call `foo()`	Whatever in `return`	A coroutine
Scheduler	Operating System (Kernel)	A corresponding event loop

Now, why do we care about cooperative multitasking?

While there are many reasons and considerations why you may or may not want to write a CM code, I will illustrate just a few reasons: code simplicity, cancellable tasks, and thread (un)safety.

Code simplicity

With PEP 492, we can write a cooperative multitasking code as a simple sequence of instructions, yet each instruction can be executed in a much more complicated order based on I/O and upstream tasks’ completion.

In comparison to conventional multithreading code, developers have to manage threads on their own.

If you look back at the previous set of example, you will see that the cooperative multitasking code is almost as straight-forward as a non-optimized code.

Runaway Threads versus Cancellable Tasks

When you write a MT code, it is impossible to stop any active threads within the same process. There are suggestion of using the magical-yet-undocumented Thread._stop() but it does not work in the way its name suggests. Why is that?

Each thread in Python has a state lock, called _tstate_lock, which is only acquired at the end of thread’s life. Thread._stop() only works if Thread._tstate_lock is acquired. This means Thread._stop() only works if that thread stops.

Fun fact: As of Python 3.7, if you try to call Thread._stop() while the thread is still active, you will get AssertionError with no message.

So, we can’t really stop the thread. But can we stop a (CM) task?

The short answer is kind of YES by using Task.cancel(). Why do I mention that?

When a task is cancelled, the corresponding coroutine can catch CancelledError, an exception from asyncio, so that the method can run a cleanup or teardown procedure.

In Python 3.7:

If the coroutine is cancelled before running, the excception will not be raised.

If the coroutine is not caught inside the coroutine, it will bubble up to the parent coroutine.

So, as you can see, the cancellation is not guaranteed. Why? Depending on implementation, a coroutine may suppress the cancellation and keep running like nothing even happens. I agree with the official documentation that CancelledError should not be suppressed.

Can you stop a future (concurrent.futures.Future) from ThreadPoolExecutor? The answer is the same as the one about Thread._stop().

Fun fact: From multi discussion with lots of people, observation, and research, killing an active thread is generally discouraged in a few programming languages as it could lead your program to an undesirable state, such as memory management, memory recovery, and deadlock.

Thread safety

Let’s start with the definition of thread safety?

Thread safety is a computer programming concept applicable to multi-threaded code. Thread-safe code only manipulates shared data structures in a manner that ensures that all threads behave properly and fulfill their design specifications without unintended interaction. There are various strategies for making thread-safe data structures.

A program may execute code in several threads simultaneously in a shared address space where each of those threads has access to virtually all of the memory of every other thread. Thread safety is a property that allows code to run in multithreaded environments by re-establishing some of the correspondences between the actual flow of control and the text of the program, by means of synchronization.

– Wikipedia

Why am I mentioning this? As documented in the official documentation, asyncio is NOT thread-safe. And here is an example from BaseEventLoop.call_soon from asyncio.base_event (from a 2019-11-22 snapshot of Python 3.8).

728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745


    def call_soon(self, callback, *args, context=None):
        """Arrange for a callback to be called as soon as possible.

        This operates as a FIFO queue: callbacks are called in the
        order in which they are registered.  Each callback will be
        called exactly once.

        Any positional arguments after the callback will be passed to
        the callback when it is called.
        """
        self._check_closed()
        if self._debug:
            self._check_thread()
            self._check_callback(callback, 'call_soon')
        handle = self._call_soon(callback, args, context)
        if handle._source_traceback:
            del handle._source_traceback[-1]
        return handle

757
758
759
760
761
762


    def _call_soon(self, callback, args, context):
        handle = events.Handle(callback, args, self, context)
        if handle._source_traceback:
            del handle._source_traceback[-1]
        self._ready.append(handle)
        return handle

In lines 743-744 and 759-760, you will see that handle._source_traceback is prone to race condition and thus call_soon is not thread-safe.

So, should you write asynchronous code?

Generally, you should avoid concurrency as much as you can. However, when you need to speed up your code, writing asynchronous code is usually a solution to speed things up. But from what I introduce in this post, which approach should you choose?

Multithreading is generally a good approach if your code is CPU-intensive.

Cooperative multitasking is good for a few situations.

For example, your code needs to be more responsive. Without context switching, your code does not have to sacrifice some CPU time to switch between tasks. Also, as running the event loop in a single thread, your code tends to use less memory.

Fun fact: While threads share heap memory, each thread has its own stack memory.

Also, writing a CM code may be suitable if you can tolerate occasional blockages in the event loop by not-so-intense CPU-bound code.

While writing a CM app seems cool or trendy, it is more difficult to design to have a CM application running as fast as a multithreading app. Othewise, your code is as good as a normal sequential code if some of your coroutines either never yield the control back to the event loop or have inefficient await placements.

Fun fact: In asyncio, when you create tasks of any coroutines (with asyncio.create_task), all tasks are scheduled right away. This means, as soon as your code start awaiting one of the tasks, the other tasks will be executed.

If you can live with that, welcome to the party.

Disclaimer, copyright, and acknowledgements

I’m not a core developer. This post is based on my research as part of technology evaluation of a collaborating project with my friend.
Definitions are derived from the documentation published by Microsoft and Wikipedia. The missing links to the source materials will be posted later.
I would like to thank many people who helped me reviewing the actual presentation at PyCon Canada 2019.
The source code in the post is belong to the public domain, except the code from CPython, which has its own license. The examples here are the modified version of the ones at https://github.com/shiroyuki/2019-talk-demo-async. Please feel free to play around.
Slide: https://www.slideshare.net/jutinoppornpitak/pyconcanada-2019-introduction-to-asynchronous-programming