Multithreading and multiprocessing are two ways to achieve multitasking (think distributed computing) in Python. Multitasking is useful for running functions and code concurrently or in parallel, such as breaking down mathematical computation into multiple, smaller parts, or splitting items in a for loop if they are independent of each other. 

Multithreading vs. Multiprocessing Defined

  • Multithreading: This refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process.
  • Multiprocessing: This refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.

This article will introduce and compare the differences between multithreading and multiprocessing, when to use each method and how to implement them in Python.

Here’s what we’ll cover:

  1. Multithreading vs. multiprocessing
  2. Multithreading as a Python function
  3. Multithreading as a Python class
  4. Multiprocessing as a Python function

 

Multithreading vs. Multiprocessing

Multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. 

Multiprocessing refers to the ability of a system to run multiple processors in parallel, where each processor can run one or more threads.

multithreading and multiprocessing illustrated figures
Multithreading vs. Multiprocessing illustration. | Image by author

From the diagram above, we can see that in multithreading (middle diagram), multiple threads share the same code, data and files but run on a different register and stack. Multiprocessing (the right diagram) multiplies a single processor, replicating the code, data and files, which incurs more overhead.

More on PythonPytest vs. Unittest: A Comparison and Guide

 

Multithreading vs. Multiprocessing Advantages and Disadvantages

Multithreading is useful for IO-bound processes, such as reading files from a network or database since each thread can run the IO-bound process concurrently. Multiprocessing is useful for CPU-bound processes, such as computationally heavy tasks since it will benefit from having multiple processors; similar to how multicore computers work faster than computers with a single core.

There is a difference between concurrency and parallelism. Parallelism allows multiple tasks to execute at the same time, whereas concurrency allows multiple tasks to execute one at a time in an interleaving manner.

Due to Python global interpreter lock (GIL), only one thread can be executed at a time. Therefore, multithreading only achieves concurrency and not parallelism for IO-bound processes. On the other hand, multiprocessing achieves parallelism.

Using multithreading for CPU-bound processes might slow down performance due to competing resources that ensure only one thread can execute at a time, and overhead is incurred in dealing with multiple threads.

On the other hand, multiprocessing can be used for IO-bound processes. However, the overhead for managing multiple processes is higher than managing multiple threads as illustrated above. You may notice that multiprocessing might lead to higher CPU utilization due to multiple CPU cores being used by the program, which is expected.

 

Multithreading as a Python Function

Multithreading can be implemented using the Python built-in library threading and is done in the following order:

  1. Create thread: Each thread is tagged to a Python function with its arguments.
  2. Start task execution.
  3. Wait for the thread to complete execution: Useful to ensure completion or ‘checkpoints.’

In the code snippet below, the steps above are implemented, together with a threading lock (Line 22) to handle competing resources which is optional in our case.

import os
import threading
import time


def task_sleep(sleep_duration, task_number, lock):
    lock.acquire()
    # Perform operation that require a common data/resource
    lock.release()

    time.sleep(sleep_duration)
    print(f"Task {task_number} done (slept for {sleep_duration}s)! "
          f"Main thread: {threading.main_thread().name}, "
          f"Current thread: {threading.current_thread().name}, "
          f"Process ID: {os.getpid()}")


if __name__ == "__main__":
    time_start = time.time()

    # Create lock (optional)
    thread_lock = threading.Lock()

    # Create thread
    t1 = threading.Thread(target=task_sleep, args=(2, 1, thread_lock))
    t2 = threading.Thread(target=task_sleep, args=(2, 2, thread_lock))

    # Start task execution
    t1.start()
    t2.start()

    # Wait for thread to complete execution
    t1.join()
    t2.join()

    time_end = time.time()
    print(f"Time elapsed: {round(time_end - time_start, 2)}s")

    # Task 2 done (slept for 2s)! Main thread: MainThread, Current thread: Thread-67, Process ID: 6068
    # Task 1 done (slept for 2s)! Main thread: MainThread, Current thread: Thread-66, Process ID: 6068
    # Time elapsed: 2.03s

There are a few notable observations:

  • Line 12–15: Processes run on different threads (Thread ID) but with the same processor (Process ID).
  • Line 8: If time.sleep(sleep_duration) were to be implemented between acquiring and releasing the lock instead, the threads will run sequentially and there will not be any time savings. 

 

Multithreading as a Python Class

For users who prefer object-oriented programming, multithreading can be implemented as a Python class that inherits from threading.Thread superclass. One benefit of using classes instead of functions would be the ability to share variables via class objects.

The difference between implementing multithreading as a function rather than class would be in the first step, creating thread, since a thread is now tagged to a class method instead of a function. The subsequent steps to call t1.start() and t1.join() remain the same.

import time

class Sleep(threading.Thread):
    def __init__(self, sleep_duration):
        self.sleep_duration = sleep_duration

    def sleep(self):
        time.sleep(self.sleep_duration)

if __name__ == "__main__":
    # Create thread
    sleep_class = Sleep(2)
    t1 = threading.Thread(target=sleep_class.sleep)
An introduction to the differences between multiprocessing and multithreading. | Video: codebasics

More on Python__new__ vs. __init__ Methods in Python

 

Multiprocessing as a Python Function

Multiprocessing can be implemented with Python built-in library multiprocessing using two different methods: Process and pool.

Process method is similar to the multithreading method above, where each process is tagged to a function with its arguments. In the code snippet below, we can see that the time taken is longer for multiprocessing than multithreading since there is more overhead in running multiple processors.

import multiprocessing
import os
import time


def task_sleep(sleep_duration, task_number):
    time.sleep(sleep_duration)
    print(f"Task {task_number} done (slept for {sleep_duration}s)! "
          f"Process ID: {os.getpid()}")


if __name__ == "__main__":
    time_start = time.time()

    # Create process
    p1 = multiprocessing.Process(target=task_sleep, args=(2, 1))
    p2 = multiprocessing.Process(target=task_sleep, args=(2, 2))

    # Start task execution
    p1.start()
    p2.start()

    # Wait for process to complete execution
    p1.join()
    p2.join()

    time_end = time.time()
    print(f"Time elapsed: {round(time_end - time_start, 2)}s")

    # Task 1 done (slept for 2s)! Process ID: 11544
    # Task 2 done (slept for 2s)! Process ID: 23724
    # Time elapsed: 2.81s

Pool method allows users to define the number of workers and distribute all processes to available processors in a first-in-first-out schedule, handling process scheduling automatically. The pool method is used to break a function into multiple small parts using map or starmap (line 19), running the same function with different input arguments. Whereas the process method is used to run different functions.

import multiprocessing
import os
import time


def task_sleep(sleep_duration, task_number):
    time.sleep(sleep_duration)
    print(f"Task {task_number} done (slept for {sleep_duration}s)! "
          f"Process ID: {os.getpid()}")


if __name__ == "__main__":
    time_start = time.time()

    # Create pool of workers
    num_cpu = multiprocessing.cpu_count() - 1
    pool = multiprocessing.Pool(processes=num_cpu)

    # Map pool of workers to process
    pool.starmap(func=task_sleep, iterable=[(2, 1)] * 10)

    # Wait until workers complete execution
    pool.close()

    time_end = time.time()
    print(f"Time elapsed: {round(time_end - time_start, 2)}s")

    # Task 1 done (slept for 2s)! Process ID: 20464
    # Task 1 done (slept for 2s)! Process ID: 22308
    # Task 1 done (slept for 2s)! Process ID: 20464
    # Task 1 done (slept for 2s)! Process ID: 22308
    # Task 1 done (slept for 2s)! Process ID: 20464
    # Task 1 done (slept for 2s)! Process ID: 22308
    # Task 1 done (slept for 2s)! Process ID: 20464
    # Task 1 done (slept for 2s)! Process ID: 22308
    # Task 1 done (slept for 2s)! Process ID: 20464
    # Task 1 done (slept for 2s)! Process ID: 20464
    # Time elapsed: 12.58s

The Python examples are skeleton code snippets that you can replace with your functions and you’re good to go.  

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us