Column
❈Pytlab, columnist of the Python Chinese community. Mainly engaged in the application of scientific computing and high-performance computing, the main languages are Python, C, C++. Familiar with numerical algorithms (optimization methods, Monte Carlo algorithms, etc.) and parallelization algorithms (multithreaded and multi-process parallelization such as MPI, OpenMP) and python optimization methods, and often use C++ to write extensions to python.
blog: http://ipytlab.com
github: https://github.com/PytLab
❈ ——
Recently, I have improved my catalytic kinetics simulation program by using descriptors. With the help of Python descriptors, I have realized more flexible and powerful and effective attribute management, making the data packaging of each component of the program more complete and organized.
This article uses descriptors in our own programs for effective python attribute management as an example, introduces the concept of descriptors in python and how to better use descriptors. This powerful tool helps us effectively manage data access control in python programs. .
text
We often implemented in other languages in classes getter
and setter
other methods tools, which helps the class interface definition, but also allows developers to easily encapsulation functions, such as authentication or usage range of detection. But in Python we usually public
write directly from attributes, but when we have special requirements for attributes, such as type verification (Python is a dynamic type), value range detection, and return deep copy (rather than reference), we Generally consider using:
@property
decorator@property
Decorators are convenient and quick to use, for example
class KineticModel(obejct): # ... @property def kB(self): return self.__kB @property.setter def kB(self, kB): #...
But @property
the disadvantage is that it cannot be reused, and the same set of logic cannot be reused between different attributes. In this way, in addition to processing the Boltzmann constant, if there is a Planck constant that needs to be processed in the same way, is it necessary? Write setter
and getter
function repeatedly ? This is obviously "Repeat yourself".
At this time, it is necessary to call Python's descriptor mechanism. Its existence is that Python developers can reuse logic related to attributes.
The Python Descriptor Protocol is a method for events that will occur when referring to attributes in the model. Python will perform a certain translation of attribute access operations, and this translation method is determined by the descriptor protocol. With the help of the descriptor protocol provided to us by Python, we can use it to implement functions similar to private variables in Python.
The descriptor protocol includes several methods:--- descr.__get__(self, obj, type=None)
> value, used to access the attribute descr.__set__(self, obj, value)
---> None, used to set the value of the attribute-descr.__delete__(self, obj)
control the delete operation of the attribute
Any object that defines any of the above methods will implement the descriptor protocol, and it will become a descriptor. By rewriting the logic in the previous getter
and setter
method to the __get__
and __set__
method, we can apply the same set of logic to different attributes in different classes.
Here only introduces the use of class methods to create descriptors.
The KineticModel in my kinetic model requires many types of attributes, such as the basic elementary reaction formula rxn_expression
(here I use a list containing a string to represent it), and the temperature at which the model reaction occurs temperature
(represented by a float type).
In order to be able to perform the corresponding type detection when assigning a value to the attribute, I have defined several basic types of descriptors and provided the corresponding logic for detecting the data type. The following is a simple integer descriptor (of course this is not the final Version used):
class Float(object): def __init__(self, name): self.name = name def __get__(self): private_name = "_{}__{}".format(instance.__class__.__name__, self.name) if private_name not in instance.__dict__: instance.__dict__[private_name] = self.default def __set__(self, instance, value): # Detection type if type(value) is not float: msg = "{} ({}) is not a float number".format(self.name, value) raise ValueError(msg) # Assign values to the corresponding properties of the object. Note that I used `mangled name` for privatization private_name = "_{}__{}".format(instance.__class__.__name__, self.name) instance.__dict__[private_name] = value
In this way, we can define the corresponding class attribute in our class as the corresponding descriptor object, and later we can use it like a normal attribute, but it has the type detection function:
... class KineticModel(obejct): # Set temperature as a class attribute temperature = Float("temperature") ...
When I try to assign a string to it, an exception is thrown:
Principle of Descriptor
The basic descriptor creation and use effects have been performed above, so how does the descriptor work to allow us to manipulate attributes in this way?
Sentence summary is carried out by the property access translation .
When we access the attribute, it will trigger the descriptor (if this attribute has a descriptor definition). When we access obj
the attribute of the object , the triggering process of the descriptor is roughly: first look for d in the dictionary of the object obj , If it is a contained object, call it directly .d
obj.d
d
__get__()
d.__get__(obj)
The specific trigger details are described in more detail in the official document. The specific trigger is divided into whether we are accessing class attributes or instance attributes: 1. If we are accessing instance attributes, the key to the translation of attribute access lies in the base. object
The __getattribute__
method of the class , we know that this built-in method is called unconditionally during attribute access, so this method will be obj.d
translated into type(obj).__dict__['d'].__get__(obj, type(obj))
the C code of its implementation. See: https://docs.python.org/3/c- api/object.html#c.PyObject_GenericGetAttr
type
the __getattribute__
method of the metaclass , which will cls.d
be translated into cls.__dict__['d'].__get__(None, cls)
, and there __get__()
is instance
no corresponding None
. The C code of its implementation can be found at: https://hg.python.org/cpython/file/3.5/Objects/typeobject.c#l2936First of all, there are differences between descriptors and descriptors : 1. If an object defines __get__()
and __set__()
methods at the same time , this descriptor is called data descriptor
2. If an object only defines __get__()
methods, then this descriptor is called For non-data descriptor
When we access the attributes, we need a few lines to deal with, which basically include these objects: 1. data descriptor 2. non-data descriptor 3. dictionary of instances 4. built-in__getattr__()
functions
The order of their priority is:
data descriptor >> instance's dict >> non-data descriptor >> __getattr__()
That is to say, if the instance obj
reproduces the data descriptor d
and instance attributes of the same name d
, when we access d
, because the data descriptor has a higher priority, python will call type(obj).__dict__['d'].__get__(obj, type(obj))
instead of returnobj.__dict__['d']
But if the descriptor is a non-data descriptor, the opposite is true, and python will returnobj.__dict__['d']
In many cases, we don’t need to initialize the properties of a class when the class is initialized. We can initialize this property by the way when we use this property for the first time , so that it will return directly when we reuse this property later. The result is fine, which not only reduces the number of calculations, but also reduces the memory requirements to a certain extent.
Therefore, when I defined my own descriptor, I __get__()
judged whether the corresponding instance attribute has been initialized, if it is not initialized, initialize it, if it has been initialized, return directly, achieving the purpose of lazy access:
def __get__(self, instance, owner): private_name = "_{}__{}".format(instance.__class__.__name__, self.name) # Whether the instance attribute already exists if private_name not in instance.__dict__: instance.__dict__[private_name] = self.default return instance.__dict__[private_name]
When we want an attribute (descriptor) to prohibit the caller from modifying it, we can __set__()
throw it in the methodAttributeError
an exception , for example:
def __set__(self, instance, value): private_name = "_{}__{}".format(instance.__class__.__name__, self.name) # After the first assignment, the value of the attribute cannot be modified if private_name not in instance.__dict__: instance.__dict__[private_name] = value else: msg ="Changing value of {}.{} is not allowed".format(instance.__class__.__name__, self.name) raise AttributeError(msg)
In this way, the effect of private variables can be realized, and the data can be encapsulated more safely, preventing unwanted results from accidentally modifying the object's data during external calls.
If the instance attribute is a variable such as a dictionary or a list, python will return a reference to the object, so it is possible to modify its internal data after obtaining its value. Therefore, if you really want this attribute to not be modified in any way, you can Use deepcopy
the deep copy of the directly returned object, so that no matter how you damage the object externally, it has nothing to do with the object itself that is returned.
def __get__(self, instance, owner): private_name = "_{}__{}".format(instance.__class__.__name__, self.name) if private_name not in instance.__dict__: instance.__dict__[private_name] = self.default if self.deepcopy: return copy.deepcopy(instance.__dict__[private_name]) else: return instance.__dict__[private_name]
Descriptors are all for class attributes, so if you store data in the descriptor object, unexpected results will occur. For example, I want to create a corresponding height descriptor for each student class, and put the height data in the descriptor, I can define the descriptor like this:
We created two student instances, but the height attribute is the same object. This is because the descriptor is a class attribute, so each instance is a reference to the accessed class attribute when it is accessed .
At this time, we can not put the data in the descriptor, but create private variables in the corresponding instance objects, so that the private variables of different objects are different variables, and the problem of the picture above will not arise.
class Height(object): def __init__(self, name): self.name = name def __get__(self, instance, cls): return getattr(instance, self.name) def __set__(self, instance, value): setattr(instance, self.name, value)
At the same time, the corresponding object and value can be stored in the dictionary of the descriptor as the key-value pair of the dictionary, but this will cause the reference count to be unable to be 0 and the garbage collection will not be possible, resulting in the risk of memory leaks, so this method is not detailed Described.
This article summarizes the concepts and usage of descriptors in Python. Descriptors can help us achieve powerful and flexible attribute management. Elegant programming can be achieved through the combined use of descriptors, but at the same time, we should maintain a cautious attitude to avoid overwriting Unnecessary code complexity due to normal object behavior.