The Why, When, and How of Using Python Multi-threading and Multi-Processing

Let’s see why this happens.

Much like the wizard being limited by his human nature and only being able to calculate one number at a time, Python comes with something called the Global Interpreter Lock (GIL).

Python will happily let you spawn as many threads as you like, but the GIL ensures that only one of those threads will ever be executing at any given time.

For an IO-bound task, that is perfectly fine.

One thread fires off a request to a URL and while it is waiting for a response, that thread can be swapped out for another thread that fires another request to another URL.

Since a thread doesn’t have to do anything until it receives a response, it doesn’t really matter that only one thread is executing at a given time.

For a CPU bound task, having multiple threads is about as useful as nipples on a breastplate.

Because only one thread is being executed at a time, even if you spawn multiple threads with each having their own number to be checked for prime-ness, the CPU is still only going to be dealing with one thread at a time.

In effect, the numbers will still be checked one after the other.

The overhead in dealing with multiple threads will contribute to the performance degradation you may observe if you use multithreading in a CPU bound task.

To get around this ‘limitation’, we use the multiprocessing module.

Instead of using threads, multiprocessing uses, well, multiple processes.

Each process gets its own interpreter and memory space, so the GIL won’t be holding things back.

In essence, each process uses a different CPU core to work on a different number, at the same time.

Sweet!You may notice that CPU utilization goes much higher when you are using multiprocessing compared to using a simple for loop, or even multithreading.

That is because multiple CPU cores are being used by your program, rather than just a single core.

This is a good thing!Keep in mind that multiprocessing comes with its own overhead to manage multiple processes, which typically tends to be heavier than multithreading overhead.

( Multiprocessing spawns a separate interpreter, and assigns a separate memory space for each process, so duh!).

This means that, as a rule of thumb, it is better to use the lightweight multithreading when you can get away with it (read: IO-bound tasks).

When CPU processing becomes your bottleneck, it’s generally time to summon the multiprocessing module.

But remember, with great power comes great responsibility.

If you spawn more processes than your CPU can handle at a time, you will notice your performance starting to drop.

This is because the operating system now has to do more work swapping processes in and out of the CPU cores since you have more processes than cores.

The reality might be more complicated than a simple explanation, but that’s the basic idea.

You can see a drop-off in performance on my system when we reach 16 processes.

This is because my CPU only has 16 logical cores.

Chapter 4: TLDR;For IO-bound tasks, using multithreading can improve performance.

For IO-bound tasks, using multiprocessing can also improve performance, but the overhead tends to be higher than using multithreading.

The Python GIL means that only thread can be executing at any given time in a Python program.

For CPU bound tasks, using multithreading can actually worsen the performance.

For CPU bound tasks, using multiprocessing can improve performance.

Wizards are awesome!That concludes this introduction to multithreading and multiprocessing in Python.

Go forth and conquer!.. More details

Leave a Reply