Column
❈Pytlab, a columnist for the Python Chinese community. Mainly engaged in the application of scientific computing and high-performance computing, the main languages are Python, C, C++. Familiar with numerical algorithms (optimization methods, Monte Carlo algorithms, etc.) and parallelization algorithms (multithreaded and multi-process parallelization such as MPI, OpenMP) and python optimization methods, and often use C++ to write extensions to python.
blog: http://ipytlab.com
github: https://github.com/PytLab
❈ —
Preface
Parallel computing is the use of parallel computers to reduce the time required for a single calculation problem. We can design our parallel programs by using a programming language to explicitly explain how different parts of the calculation are executed on different processors at the same time. The purpose of increasing the efficiency of the program.
As we all know, the GIL in Python limits the use of multi-core CPUs in parallel by Python multi-threading, but we can still make Python truly use multi-core resources through various other ways, such as multi-threading/multi-process through C/C++ extensions, And directly use Python's multiprocessing module multiprocessing to perform multi-process programming.
This article mainly tries to optimize and improve the efficiency of its own dynamic calculation program only through the built-in multiprocessing module of python. Among them:-Realize the use of multi-core resources on a single machine to achieve parallelism and speed up comparison-Use the manager module to implement a simple multi-machine Distributed computing
This article is not a translation introduction to the interface of Python's multiprocessing module. Children's shoes who need to be familiar with multiprocessing can refer to the official document https://docs.python.org/2/library/multiprocessing.html .
Recently, I want to use my own micro-dynamics program to perform a series of solutions and plot the results into a two-dimensional map for visualization. In this way, it is necessary to calculate multiple points on the two-dimensional map and collect the results and draw them. Each point needs to perform an ODE integration and Newton's method to solve the system of equations, so drawing the entire graph serially may encounter extremely low efficiency problems, especially when testing parameters, each drawing is required Wait a long time. Each point in the drawn two-dimensional graph is calculated independently, so it is natural to think of parallel processing.
Since the script is relatively long and the implementation is its own program, the general structure of the script is as follows, which is essentially a double loop. The variables of the loop are the partial pressure values of the reactant gases (O2 and CO):
The overall process is as simple as that. What I need to do is to use the multiprocessing interface to parallelize this double loop.
The time required to draw 100 points using single-core serial is as follows, which took 240.76 seconds in total:
The effect of two-dimensional map drawing is as follows:
Multi-process parallel processing
The multiprocessing module provides an interface similar to the threading module, and encapsulates various operations of the process well. It provides various inter-process communication interfaces, such as Pipe
, Queue
etc., which can help us realize inter-process communication, synchronization and other operations.
Process
classes to dynamically create processes to achieve parallelismThe multiprocessing module provides a method that Process
allows us start
to create a real process to perform tasks by creating a process object and executing the process object . The interface is similar threading
to the thread class in the module Thread
.
But when the number of manipulated objects is not large, you can use Process
dynamic generation of multiple processes, but if the number of processes required is too large, manually limiting the number of processes and handling the return values of different processes will become unusually cumbersome, so this Sometimes we need to use the process pool to simplify operations.
The multiprocessing module provides a process pool Pool
class, which is responsible for creating process pool objects, and provides some methods for offloading computing tasks to different child processes for execution, and obtaining return values easily. For example, the parallel loop we are going to do now can be easily realized.
For the parallelism of single instruction and multiple data streams, we can directly use it Pool.map()
to map the function to the parameter list. Pool.map
In fact, it is a parallel version of the map function. This function will block until all processes are finished, and the order of the results returned by this function remains unchanged.
1. I first encapsulate the processing for each pair of pressure data into a function, so that the function object can be passed to the child process for execution.
Using two cores for calculation, the calculation time is reduced from 240.76s to 148.61s, and the speedup is 1.62
Test the acceleration effect of different cores
In order to see the improvement of program efficiency by using different core numbers, I tested and graphed different core numbers and speedups. The effects are as follows:
Number of running cores and program running time:
Number of operating cores and speedup ratio:
It can be seen that since my outer loop only circulates 10 times, the increase in the number of cores after the number of cores used exceeds 10 does not accelerate the program, that is, the excess cores are wasted.
Previously, we used the interface provided by the multiprocessing package. We used the parallel processing of multi-core computing on another machine, but multiprocessing has more uses. Through the multiprocessing.managers module, we can implement simple multi-machine distributed parallel computing. , Distribute computing tasks to different computers to run.
Managers provide additional multi-process communication tool, he provides the interface and data object sharing data between multiple computers, all of these data objects are achieved through a proxy class, for example ListProxy
, and DictProxy
so on, they have achieved with the native list
withdict
same interface, but they can be shared among processes in different computers through the network.
For detailed usage of the manager module interface, please refer to the official document: https://docs.python.org/2/library/multiprocessing.html#managers
Okay, now we begin to try to transform the drawing program into a program that can be distributed and parallel on multiple computers. The main ideas of the transformation are: 1. Use a computer as a server. This computer manages shared objects, task allocation and result reception through a Manager object, and then collects the results for post-processing (drawing two-dimensional map). 2. Many other computers can be used as clients to receive the server's data for calculation, and transmit the results to the shared data so that the server can collect it. At the same time, the client side can perform the multi-process parallelism realized above at the same time to make full use of the computer's multi-core advantages.
It can be roughly summarized as the following figure:
Service process
1. the server needs a manager object to manage shared objects
BaseManager.register
It is a class method that can bind a certain type or callable object to the manager object and share it in the network, so that other computers in the network can obtain the corresponding object. For example,
JobManager.register('get_jobid_queue', callable=lambda: jobid_queue)
I bind a function object that returns to the task queue with the manager object and share it with the network, so that the process in the network can pass through its own manager objectget_jobid_queue
get the same queue method, thus realizing data sharing .192.168.0.1
on the 5000
port of the address in the internal network , then this parameter can be ('192.169.0.1
, 5000)`Assign tasks
Above we bound a task queue to the manager object. Now I need to fill the queue so that tasks can be issued to different clients for parallel execution.
The so-called task here is actually the index value of the corresponding parameter in the list, so that the results obtained from different computers can be filled into the result list according to the corresponding index, so that the server can collect the calculations of each computer in the shared network the result of.
Start the server to listen
Task progress
The service process is responsible for simple task allocation and scheduling, while the task process is only responsible for obtaining tasks and performing calculation processing.
The basic code in the task process (client) is basically the same as the script run by the multi-core in our single machine above (because the same function handles different data), but we also need to create a manager for the client to obtain the task And return.
On the client side, we can still use multi-core resources in multiple processes to speed up calculations.
Below I will perform a simple distributed computing test on 3 computers in the same local area network,-one of which is the management node in the laboratory cluster, the internal network ip is 10.10.10.245
-the other is a node in the cluster, There are 12 cores-the last one is my notebook, 4 cores
python server.py
python worker.py
When the task queue is empty and the task is completed, the task process is terminated; when the results in the result list are collected, the service process is also terminated.
The execution result is as follows:
The above panel is the server monitoring, the bottom left is the running result of your own notebook, and the bottom right panel is one of the nodes in the cluster.
It can be seen that the running time is 56.86s, but I am helpless because my notebook has lost its hind legs (-_-!)
summary
This article uses the python built-in module multiprocessing to achieve multi-core parallelism in a single machine and simple distributed parallel computing of multiple computers. Multiprocessing provides us with a well-packaged and friendly interface to make our Python programs use multi-core resources to speed up their calculations. Program, I hope it can be helpful to children's shoes that use python to achieve parallel words.