Python Concurrency in Practice: When to Use Threads, Processes, or Asyncio

A practical guide to choosing between multithreading, multiprocessing, and async code in real backend systems.

Introduction: Why Concurrency Exists

Modern backend systems rarely do one thing at a time.

A web service may need to handle many requests, call external APIs, query a database, read from object storage, publish messages to a queue, and run background jobs. If each task had to fully finish before the next one started, the system would waste a lot of time waiting.

That is the problem concurrency tries to solve.

Concurrency is not automatically about making code faster. More often, it is about making better use of time. If one task is waiting for a database response, the program can make progress on something else. If one request is waiting for a network call, another request can be handled in the meantime.

But not all slow programs are slow for the same reason.

Some workloads are I/O-bound. They spend most of their time waiting on external systems such as databases, APIs, files, sockets, or queues. For these workloads, concurrency can improve throughput because the program can switch to other work while waiting.

Other workloads are CPU-bound. They spend most of their time calculating: parsing large datasets, processing images, running simulations, transforming data, or performing model inference. For these workloads, the goal is often true parallel execution across multiple CPU cores.

This distinction matters in Python because multithreading, multiprocessing, and asyncio are not interchangeable tools. They solve different problems and come with different trade-offs.

A common engineering mistake is to choose a concurrency model based on popularity rather than workload. Threads are not always faster. Async code is not automatically more scalable. Multiprocessing is powerful, but it adds memory and communication overhead.

The practical question is not:

Which one is best?

The better question is:

What kind of waiting or work is my system actually doing?

That answer should drive the choice between threads, processes, and async code.

Concurrency vs Parallelism

Concurrency and parallelism are related, but they are not the same thing.

Concurrency means a system can manage multiple tasks during the same period of time. The tasks may not literally run at the exact same instant. Instead, the program can switch between them, allowing progress to happen while other tasks are waiting.

For example, a backend service might start a database query, then handle another request while waiting for the database response. The service is not necessarily doing both pieces of work at the exact same CPU moment, but it is making progress on multiple tasks within the same time window.

Parallelism means multiple tasks are actually executing at the same time. This usually requires multiple CPU cores. For example, one process could parse a large file while another process transforms a dataset at the same time on a different core.

A simple way to think about it:

Concurrency is about dealing with many tasks. Parallelism is about doing many tasks at the same time.

This distinction matters because different Python tools are designed for different kinds of work.

Multithreading and asyncio are often useful for concurrent I/O-bound workloads, where the program spends time waiting on external systems. Multiprocessing is usually better for parallel CPU-bound workloads, where the program needs to use multiple cores for computation.

The mistake is assuming that any concurrency tool will speed up any slow program. If the program is slow because it is waiting on network calls, concurrency can help. If the program is slow because one CPU core is overloaded with computation, the solution is usually parallelism, not just more concurrent scheduling.

That is why the first engineering question should be:

Is this workload mostly waiting, or mostly calculating?

That answer determines whether threads, processes, or async code are the right starting point.

Multithreading: Useful, but Limited by the GIL

Multithreading lets a program run multiple threads inside the same process.

A thread is a smaller unit of execution that shares memory with other threads in the same process. This makes threads relatively lightweight compared with processes, and it also makes them convenient when tasks need access to the same application state.

In backend systems, threads are often useful when the program spends time waiting.

For example, imagine a service that needs to call five external APIs. If each request is made one after another, the total time is roughly the sum of all five waiting times. With threads, the service can start multiple requests at once and wait for them together.

This works well because network calls, database queries, file reads, and other I/O operations spend much of their time outside Python execution. While one thread is waiting, another thread can make progress.

The limitation is the Global Interpreter Lock, usually called the GIL.

In standard CPython, the GIL allows only one thread to execute Python bytecode at a time. That means multiple Python threads do not usually run CPU-heavy Python code in true parallel on multiple cores. Python has introduced optional free-threaded builds from Python 3.13 onward, but the GIL remains an important practical consideration for most existing Python systems and libraries.

This is why multithreading is usually a poor choice for speeding up CPU-heavy Python code. If several threads are all trying to perform heavy calculations, they may end up taking turns under the GIL rather than executing in parallel.

However, this does not make threads useless.

Threads are still effective when the bottleneck is waiting rather than calculating.

Good fit for threadsPoor fit for threads
Blocking API callsHeavy numerical loops in Python
Database queries using blocking driversCPU-heavy data transformations
File or object storage operationsImage processing in pure Python
Background I/O tasksWork that needs clean multi-core CPU scaling

A practical backend example is a worker that downloads files from object storage, calls an external API, and writes results to a database. If most of the time is spent waiting on network or storage responses, a thread pool can improve throughput without requiring the whole application to be rewritten as async code.

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_user_profile(user_id: int) -> dict:
    response = requests.get(
        f"https://api.example.com/users/{user_id}",
        timeout=5,
    )
    response.raise_for_status()
    return response.json()

user_ids = [101, 102, 103, 104, 105]

with ThreadPoolExecutor(max_workers=5) as pool:
    profiles = list(pool.map(fetch_user_profile, user_ids))

This is useful when requests.get() spends most of its time waiting for the network. It would be much less useful if fetch_user_profile() was doing heavy Python computation instead.

The key point is:

Threads help when Python is waiting. They usually do not help much when Python is calculating.

That distinction is the reason multiprocessing exists.

Multiprocessing: Real Parallelism for CPU-Bound Work

Multiprocessing is Python’s usual answer when the workload needs real CPU parallelism.

Instead of running multiple threads inside one process, multiprocessing runs work in separate Python processes. Each process has its own Python interpreter and memory space. This matters because separate processes can run on different CPU cores at the same time.

That makes multiprocessing a strong fit for CPU-bound work.

Good fit for multiprocessingPoor fit for multiprocessing
Parsing large filesSmall tasks where startup overhead dominates
Image or video processingTasks that constantly share state
CPU-heavy data transformationsSimple database or API calls
Batch feature extractionWorkloads with large objects passed between workers
CPU-based model inferenceLow-latency tasks that need fast startup

The advantage is that multiprocessing can sidestep the GIL, which is why it is often used when Python code needs to make better use of multiple CPU cores in GIL-based Python builds.

The trade-off is overhead.

Processes are heavier than threads. Starting them takes more time. Each process has separate memory. Data passed between processes usually needs to be serialized, sent across a process boundary, and reconstructed on the other side. In Python, this often means objects need to be picklable when using ProcessPoolExecutor.

That overhead is worth paying when each unit of work is large enough.

For example, multiprocessing can make sense if a backend worker receives a batch of documents and needs to extract features, run CPU-heavy transformations, or perform expensive validation. Each process can take part of the batch and use a separate core.

from concurrent.futures import ProcessPoolExecutor

def count_tokens(document: str) -> int:
    # Stand-in for a CPU-heavy parsing or feature extraction step.
    return sum(1 for token in document.split() if token.strip())

documents = [
    "large document text ...",
    "another large document ...",
    "more text to process ...",
]

with ProcessPoolExecutor() as pool:
    token_counts = list(pool.map(count_tokens, documents))

In a real backend pipeline, each process might parse a document, transform a large payload, extract features, or run a CPU-heavy validation step.

But multiprocessing may be the wrong tool for lightweight I/O work. If a service is mostly waiting for API responses or database queries, creating multiple processes may add complexity without improving the real bottleneck.

A useful rule is:

Use multiprocessing when the work is expensive enough that parallel CPU execution is worth the overhead.

For backend and AI systems, this often means using multiprocessing in workers, batch jobs, offline pipelines, or CPU-heavy preprocessing steps rather than inside every web request path.

Multiprocessing is powerful, but it is not free. It works best when the tasks are large, mostly independent, and expensive enough to justify separate processes.

Asyncio: High-Concurrency I/O Without Many Threads

asyncio is Python’s main model for asynchronous programming.

Instead of creating many threads or processes, asyncio runs tasks on an event loop. The event loop keeps track of work that is ready to continue and work that is currently waiting.

The important idea is cooperative scheduling.

An async function pauses when it reaches an await. That tells the event loop:

This task is waiting. You can run something else for now.

For example, an async web service might be waiting for a database query, an HTTP response, or a message from a queue. While one request is waiting, the event loop can continue handling other requests.

This makes asyncio useful for high-concurrency I/O workloads.

Good fit for asyncioPoor fit for asyncio
High-concurrency API clientsCPU-heavy computation
Async web serversBlocking database drivers
WebSocket servicesLong-running synchronous functions
Queue consumersLibraries with no async support
Many concurrent network callsCode that blocks the event loop

The strength of asyncio is that it can handle many waiting tasks without creating a large number of operating system threads. This can be efficient for backend systems that deal with thousands of connections or many external I/O calls.

But async code only works well when the code actually cooperates with the event loop.

If an async function calls blocking code, the event loop can freeze. For example, using a blocking HTTP client or a blocking database driver inside an async route can prevent other async tasks from running. The function may be marked async, but the behaviour is still blocking.

This is one of the most common mistakes with asyncio.

Another mistake is using async code for CPU-heavy work. asyncio does not make CPU-bound Python code run across multiple cores. If a task is spending its time calculating rather than waiting, the event loop has no opportunity to switch to other work.

A practical backend example is a service that calls several external APIs per request. If the HTTP client is async-compatible, the service can fire off several requests and await their results together. That can reduce waiting time and improve throughput without adding threads for every request.

import asyncio
import httpx

async def fetch_json(client: httpx.AsyncClient, url: str) -> dict:
    response = await client.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

async def fetch_all(urls: list[str]) -> list[dict]:
    async with httpx.AsyncClient() as client:
        tasks = [fetch_json(client, url) for url in urls]
        return await asyncio.gather(*tasks)

urls = [
    "https://api.example.com/prices",
    "https://api.example.com/inventory",
    "https://api.example.com/recommendations",
]

results = asyncio.run(fetch_all(urls))

The important detail is that the HTTP client is async-compatible. If this code used a blocking client inside the async function, it could block the event loop and reduce the benefit of async design.

The key point is:

Asyncio is best when many tasks are waiting on I/O and the stack supports async properly.

It is not a replacement for multiprocessing, and it is not automatically better than threads. It is a strong tool when the workload, libraries, and framework fit the async model.

Choosing the Right Model in Backend Systems

The best concurrency model depends on the bottleneck.

Before choosing threads, processes, or asyncio, ask one simple question:

Is the program mostly waiting, or mostly calculating?

If the program is waiting on I/O, concurrency can help. If the program is calculating, parallelism is usually the better target.

ModelBest forNot good forMain trade-off
MultithreadingBlocking I/OCPU-heavy Python codeSimple model, but limited by the GIL
MultiprocessingCPU-heavy workLightweight tasks with lots of shared stateReal parallelism, but higher overhead
asyncioHigh-concurrency I/OBlocking or CPU-heavy codeEfficient, but requires async-compatible libraries

A practical decision flow looks like this:

flowchart TD
    A[What is the workload mostly doing?] --> B{Mostly waiting on I/O?}
    B -->|Yes| C{Is the stack async-compatible?}
    C -->|Yes| D[Use asyncio]
    C -->|No, mostly blocking libraries| E[Use threads]
    B -->|No| F{Is it CPU-heavy?}
    F -->|Yes| G[Use multiprocessing]
    F -->|No| H[Keep it simple and synchronous]

This is not a perfect rule, but it is a good starting point.

Use multithreading when the code is mostly blocking on external systems and you want a simple implementation. For example, a worker that calls several third-party APIs or reads many files from storage may work well with a thread pool.

Use multiprocessing when the work is genuinely CPU-heavy and can be split into independent chunks. This is common in batch jobs, data processing, CPU-based model inference, feature extraction, and expensive transformations.

Use asyncio when the system needs to manage many concurrent I/O operations and the surrounding stack is async-friendly. This is common in async web frameworks, WebSocket services, API clients, queue consumers, and services that make many network calls.

The common mistakes are usually caused by choosing the tool before understanding the workload.

One mistake is using threads to speed up CPU-heavy Python code. Threads may make the code more complex without giving real multi-core execution.

Another mistake is using multiprocessing for small I/O tasks. If each task is tiny, process startup, memory usage, and serialization overhead can dominate the actual work.

A third mistake is using asyncio while still calling blocking libraries. An async function that runs blocking code can freeze the event loop and reduce the benefit of async architecture.

There is also a design mistake: adding concurrency before the simple version is understood.

Concurrency makes systems harder to debug. It introduces timeouts, cancellation, shared state issues, race conditions, backpressure, and failure modes that are not always obvious in local testing.

For backend engineering, the best approach is usually:

  1. Start with the workload.
  2. Identify the bottleneck.
  3. Choose the simplest concurrency model that matches it.
  4. Measure the result.

A useful final rule is:

If the program is waiting, consider threads or asyncio. If the program is calculating, consider multiprocessing. If neither is a real bottleneck, keep it simple.

That is the practical difference between Python’s concurrency tools. They are not competing answers to the same problem. They are different tools for different kinds of work.

Closing Thoughts: Match the Model to the Bottleneck

Python gives engineers several ways to handle more than one task at a time, but they are not interchangeable.

Threads, processes, and asyncio each solve a different kind of problem. Threads are useful when work is mostly waiting on I/O. Processes are useful when work needs real CPU parallelism. asyncio is useful when many I/O-bound tasks can cooperate through an event loop.

The mistake is treating concurrency as a general performance upgrade.

In backend systems, the better approach is to start with the bottleneck. Is the service waiting on databases, APIs, queues, sockets, or files? Or is it spending most of its time calculating, transforming, parsing, or running inference?

That answer should drive the design.

For data-heavy and AI-heavy systems, this distinction matters even more. A pipeline might use async I/O to fetch documents, multiprocessing to process them, and threads to integrate with blocking libraries. Real systems often combine models, but each one should be used deliberately.

The practical takeaway is simple:

  • Use threads when blocking I/O is the problem.
  • Use multiprocessing when CPU-bound work is the problem.
  • Use asyncio when high-concurrency I/O is the problem and the stack supports async properly.
  • When none of those are clearly needed, keep the code synchronous.

The goal is not to make Python code more complicated. The goal is to make the system use time and CPU resources more intelligently.

Concurrency is useful when it matches the workload. Otherwise, it is just extra complexity.