Python multi-process parallel programming practice: take the multiprocessing module as an example

Python multi-process parallel programming practice: take the multiprocessing module as an example


❈Pytlab, a columnist for the Python Chinese community. Mainly engaged in the application of scientific computing and high-performance computing, the main languages ​​are Python, C, C++. Familiar with numerical algorithms (optimization methods, Monte Carlo algorithms, etc.) and parallelization algorithms (multithreaded and multi-process parallelization such as MPI, OpenMP) and python optimization methods, and often use C++ to write extensions to python.




Parallel computing is the use of parallel computers to reduce the time required for a single calculation problem. We can design our parallel programs by using a programming language to explicitly explain how different parts of the calculation are executed on different processors at the same time. The purpose of increasing the efficiency of the program.

As we all know, the GIL in Python limits the use of multi-core CPUs in parallel by Python multi-threading, but we can still make Python truly use multi-core resources through various other ways, such as multi-threading/multi-process through C/C++ extensions, And directly use Python's multiprocessing module multiprocessing to perform multi-process programming.

This article mainly tries to optimize and improve the efficiency of its own dynamic calculation program only through the built-in multiprocessing module of python. Among them:-Realize the use of multi-core resources on a single machine to achieve parallelism and speed up comparison-Use the manager module to implement a simple multi-machine Distributed computing

This article is not a translation introduction to the interface of Python's multiprocessing module. Children's shoes who need to be familiar with multiprocessing can refer to the official document .


Recently, I want to use my own micro-dynamics program to perform a series of solutions and plot the results into a two-dimensional map for visualization. In this way, it is necessary to calculate multiple points on the two-dimensional map and collect the results and draw them. Each point needs to perform an ODE integration and Newton's method to solve the system of equations, so drawing the entire graph serially may encounter extremely low efficiency problems, especially when testing parameters, each drawing is required Wait a long time. Each point in the drawn two-dimensional graph is calculated independently, so it is natural to think of parallel processing.

The original version of the serial

Since the script is relatively long and the implementation is its own program, the general structure of the script is as follows, which is essentially a double loop. The variables of the loop are the partial pressure values ​​of the reactant gases (O2 and CO):

The overall process is as simple as that. What I need to do is to use the multiprocessing interface to parallelize this double loop.

The time required to draw 100 points using single-core serial is as follows, which took 240.76 seconds in total:

The effect of two-dimensional map drawing is as follows:

Multi-process parallel processing

multiprocessing module

The multiprocessing module provides an interface similar to the threading module, and encapsulates various operations of the process well. It provides various inter-process communication interfaces, such as Pipe, Queueetc., which can help us realize inter-process communication, synchronization and other operations.

Use Processclasses to dynamically create processes to achieve parallelism

The multiprocessing module provides a method that Processallows us startto create a real process to perform tasks by creating a process object and executing the process object . The interface is similar threadingto the thread class in the module Thread.

But when the number of manipulated objects is not large, you can use Processdynamic generation of multiple processes, but if the number of processes required is too large, manually limiting the number of processes and handling the return values ​​of different processes will become unusually cumbersome, so this Sometimes we need to use the process pool to simplify operations.

Use process pool to manage processes

The multiprocessing module provides a process pool Poolclass, which is responsible for creating process pool objects, and provides some methods for offloading computing tasks to different child processes for execution, and obtaining return values ​​easily. For example, the parallel loop we are going to do now can be easily realized.

For the parallelism of single instruction and multiple data streams, we can directly use it map the function to the parameter list. Pool.mapIn fact, it is a parallel version of the map function. This function will block until all processes are finished, and the order of the results returned by this function remains unchanged.

1. I first encapsulate the processing for each pair of pressure data into a function, so that the function object can be passed to the child process for execution.

Using two cores for calculation, the calculation time is reduced from 240.76s to 148.61s, and the speedup is 1.62

Test the acceleration effect of different cores

In order to see the improvement of program efficiency by using different core numbers, I tested and graphed different core numbers and speedups. The effects are as follows:

Number of running cores and program running time:

Number of operating cores and speedup ratio:

It can be seen that since my outer loop only circulates 10 times, the increase in the number of cores after the number of cores used exceeds 10 does not accelerate the program, that is, the excess cores are wasted.

Use manager to implement simple distributed computing

Previously, we used the interface provided by the multiprocessing package. We used the parallel processing of multi-core computing on another machine, but multiprocessing has more uses. Through the multiprocessing.managers module, we can implement simple multi-machine distributed parallel computing. , Distribute computing tasks to different computers to run.

Managers provide additional multi-process communication tool, he provides the interface and data object sharing data between multiple computers, all of these data objects are achieved through a proxy class, for example ListProxy, and DictProxyso on, they have achieved with the native listwithdict same interface, but they can be shared among processes in different computers through the network.

For detailed usage of the manager module interface, please refer to the official document:

Okay, now we begin to try to transform the drawing program into a program that can be distributed and parallel on multiple computers. The main ideas of the transformation are: 1. Use a computer as a server. This computer manages shared objects, task allocation and result reception through a Manager object, and then collects the results for post-processing (drawing two-dimensional map). 2. Many other computers can be used as clients to receive the server's data for calculation, and transmit the results to the shared data so that the server can collect it. At the same time, the client side can perform the multi-process parallelism realized above at the same time to make full use of the computer's multi-core advantages.

It can be roughly summarized as the following figure:

Service process

1. the server needs a manager object to manage shared objects

  1. BaseManager.registerIt is a class method that can bind a certain type or callable object to the manager object and share it in the network, so that other computers in the network can obtain the corresponding object. For example, JobManager.register('get_jobid_queue', callable=lambda: jobid_queue) I bind a function object that returns to the task queue with the manager object and share it with the network, so that the process in the network can pass through its own manager objectget_jobid_queue get the same queue method, thus realizing data sharing .
  2. Two parameters are required when creating the manager object,
    • address, is the ip where the manager is located and the port number used to monitor the connection with the server. For example, if I listen the 5000port of the address in the internal network , then this parameter can be (', 5000)`
    • authkey, as the name implies, is an authentication code used to verify that the client can connect to the server. This parameter must be a string object.

Assign tasks

Above we bound a task queue to the manager object. Now I need to fill the queue so that tasks can be issued to different clients for parallel execution.

The so-called task here is actually the index value of the corresponding parameter in the list, so that the results obtained from different computers can be filled into the result list according to the corresponding index, so that the server can collect the calculations of each computer in the shared network the result of.

Start the server to listen

Task progress

The service process is responsible for simple task allocation and scheduling, while the task process is only responsible for obtaining tasks and performing calculation processing.

The basic code in the task process (client) is basically the same as the script run by the multi-core in our single machine above (because the same function handles different data), but we also need to create a manager for the client to obtain the task And return.

On the client side, we can still use multi-core resources in multiple processes to speed up calculations.

Below I will perform a simple distributed computing test on 3 computers in the same local area network,-one of which is the management node in the laboratory cluster, the internal network ip is -the other is a node in the cluster, There are 12 cores-the last one is my notebook, 4 cores

  1. First run the service script on the server side for task assignment and monitoring:
  1. Run task scripts on two clients to get tasks in the task queue and execute them

When the task queue is empty and the task is completed, the task process is terminated; when the results in the result list are collected, the service process is also terminated.

The execution result is as follows:

The above panel is the server monitoring, the bottom left is the running result of your own notebook, and the bottom right panel is one of the nodes in the cluster.

It can be seen that the running time is 56.86s, but I am helpless because my notebook has lost its hind legs (-_-!)


This article uses the python built-in module multiprocessing to achieve multi-core parallelism in a single machine and simple distributed parallel computing of multiple computers. Multiprocessing provides us with a well-packaged and friendly interface to make our Python programs use multi-core resources to speed up their calculations. Program, I hope it can be helpful to children's shoes that use python to achieve parallel words.


Reference: Python multi-process parallel programming practice: taking the multiprocessing module as an example-Cloud + Community-Tencent Cloud