Python Multiprocessing

Now we’ll try true parallelism, with the multiprocessing library:

import time
import multiprocessing

# A CPU heavy calculation, just
# as an example. This can be
# anything you like
def heavy(n, myid):
    for x in range(1, n):
        for y in range(1, n):
            x**y
    print(myid, "is done")

def multiproc(n):
    processes = []

    for i in range(n):
        p = multiprocessing.Process(target=heavy, args=(500,i,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

if __name__ == "__main__":
    start = time.time()
    multiproc(80)
    end = time.time()
    print("Took: ", end - start)

This takes about 23 seconds. That's half of the threaded version.

As you can see, this looks almost the same as the threaded version, code-wise. The threading and multiprocessing libraries are intentionally made very equivalent. But the 80 invocations of heavy finish roughly twice as fast this time!

My test system (a small desktop computer) has only two CPU cores, so that explains why it’s a factor two. If I run this code on my brand new laptop, with 4 faster CPU cores, it’s more than four times faster. This perfectly demonstrates the linear speed increase multiprocessing offers us in case of CPU-bound code.

Using multiprocessing with a pool

We can make the multiprocessing version a little more elegant by using multiprocessing.Pool(p). This helper creates a pool of size p processes. If you don’t supply a value for p, it will default to the number of CPU cores in your system, which is a sensible choice.

By using the Pool.map() method, we can submit work to the pool. This work comes in the form of a simple function call:

import time
import multiprocessing

# A CPU heavy calculation, just
# as an example. This can be
# anything you like
def heavy(n, myid):
    for x in range(1, n):
        for y in range(1, n):
            x**y
    print(myid, "is done")

def doit(n):
    heavy(500, n)

def pooled(n):
    # By default, our pool will have
    # numproc slots
    with multiprocessing.Pool() as pool:
       pool.map(doit, range(n))

if __name__ == "__main__":
    start = time.time()
    pooled(80)
    end = time.time()
    print("Took: ", end - start)

The runtime for this version is roughly the same as the non-pooled version, but it has to create fewer processes so it is more efficient. Instead of creating 80 processes, we create four and reuse those each time.


If you liked this page, please share it with a fellow learner: