Effective python attribute management: the use of descriptors

Effective python attribute management: the use of descriptors

Column

❈Pytlab, columnist of the Python Chinese community. Mainly engaged in the application of scientific computing and high-performance computing, the main languages ​​are Python, C, C++. Familiar with numerical algorithms (optimization methods, Monte Carlo algorithms, etc.) and parallelization algorithms (multithreaded and multi-process parallelization such as MPI, OpenMP) and python optimization methods, and often use C++ to write extensions to python.

blog: http://ipytlab.com

github: https://github.com/PytLab

——

Preface

Recently, I have improved my catalytic kinetics simulation program by using descriptors. With the help of Python descriptors, I have realized more flexible and powerful and effective attribute management, making the data packaging of each component of the program more complete and organized.

This article uses descriptors in our own programs for effective python attribute management as an example, introduces the concept of descriptors in python and how to better use descriptors. This powerful tool helps us effectively manage data access control in python programs. .

text

We often implemented in other languages in classes getterand setterother methods tools, which helps the class interface definition, but also allows developers to easily encapsulation functions, such as authentication or usage range of detection. But in Python we usually publicwrite directly from attributes, but when we have special requirements for attributes, such as type verification (Python is a dynamic type), value range detection, and return deep copy (rather than reference), we Generally consider using:

  1. Built-in @propertydecorator
  2. Use descriptor

@propertyDecorators are convenient and quick to use, for example

class KineticModel(obejct):

    # ...

    @property

    def kB(self):

        return self.__kB



    @property.setter

    def kB(self, kB):

        #...

But @propertythe disadvantage is that it cannot be reused, and the same set of logic cannot be reused between different attributes. In this way, in addition to processing the Boltzmann constant, if there is a Planck constant that needs to be processed in the same way, is it necessary? Write setterand getterfunction repeatedly ? This is obviously "Repeat yourself".

At this time, it is necessary to call Python's descriptor mechanism. Its existence is that Python developers can reuse logic related to attributes.

Descriptor Protocol

The Python Descriptor Protocol is a method for events that will occur when referring to attributes in the model. Python will perform a certain translation of attribute access operations, and this translation method is determined by the descriptor protocol. With the help of the descriptor protocol provided to us by Python, we can use it to implement functions similar to private variables in Python.

The descriptor protocol includes several methods:--- descr.__get__(self, obj, type=None)> value, used to access the attribute descr.__set__(self, obj, value)---> None, used to set the value of the attribute-descr.__delete__(self, obj) control the delete operation of the attribute

Any object that defines any of the above methods will implement the descriptor protocol, and it will become a descriptor. By rewriting the logic in the previous getterand settermethod to the __get__and __set__method, we can apply the same set of logic to different attributes in different classes.

Create descriptor

Here only introduces the use of class methods to create descriptors.

The KineticModel in my kinetic model requires many types of attributes, such as the basic elementary reaction formula rxn_expression(here I use a list containing a string to represent it), and the temperature at which the model reaction occurs temperature(represented by a float type).

In order to be able to perform the corresponding type detection when assigning a value to the attribute, I have defined several basic types of descriptors and provided the corresponding logic for detecting the data type. The following is a simple integer descriptor (of course this is not the final Version used):

class Float(object):

    def __init__(self, name):

        self.name = name



    def __get__(self):

        private_name = "_{}__{}".format(instance.__class__.__name__, self.name)

        if private_name not in instance.__dict__:

            instance.__dict__[private_name] = self.default



    def __set__(self, instance, value):

        # Detection type

        if type(value) is not float:

            msg = "{} ({}) is not a float number".format(self.name, value)

            raise ValueError(msg)

        # Assign values ​​to the corresponding properties of the object. Note that I used `mangled name` for privatization

        private_name = "_{}__{}".format(instance.__class__.__name__, self.name)

        instance.__dict__[private_name] = value

In this way, we can define the corresponding class attribute in our class as the corresponding descriptor object, and later we can use it like a normal attribute, but it has the type detection function:

...



class KineticModel(obejct):



    # Set temperature as a class attribute

    temperature = Float("temperature")



...

When I try to assign a string to it, an exception is thrown:

Principle of Descriptor

The basic descriptor creation and use effects have been performed above, so how does the descriptor work to allow us to manipulate attributes in this way?

Sentence summary is carried out by the property access translation .

Descriptor trigger

When we access the attribute, it will trigger the descriptor (if this attribute has a descriptor definition). When we access objthe attribute of the object , the triggering process of the descriptor is roughly: first look for d in the dictionary of the object obj , If it is a contained object, call it directly .dobj.dd__get__()d.__get__(obj)

The specific trigger details are described in more detail in the official document. The specific trigger is divided into whether we are accessing class attributes or instance attributes: 1. If we are accessing instance attributes, the key to the translation of attribute access lies in the base. objectThe __getattribute__method of the class , we know that this built-in method is called unconditionally during attribute access, so this method will be obj.dtranslated into type(obj).__dict__['d'].__get__(obj, type(obj)) the C code of its implementation. See: https://docs.python.org/3/c- api/object.html#c.PyObject_GenericGetAttr

  1. If it is to access the properties of the class object, the key to the translation of the access of the properties lies in typethe __getattribute__method of the metaclass , which will cls.dbe translated into cls.__dict__['d'].__get__(None, cls), and there __get__()is instanceno corresponding None. The C code of its implementation can be found at: https://hg.python.org/cpython/file/3.5/Objects/typeobject.c#l2936

Descriptor priority

First of all, there are differences between descriptors and descriptors : 1. If an object defines __get__()and __set__()methods at the same time , this descriptor is called data descriptor 2. If an object only defines __get__()methods, then this descriptor is called For non-data descriptor

When we access the attributes, we need a few lines to deal with, which basically include these objects: 1. data descriptor 2. non-data descriptor 3. dictionary of instances 4. built-in__getattr__() functions

The order of their priority is:

data descriptor >> instance's dict >> non-data descriptor >> __getattr__()

That is to say, if the instance objreproduces the data descriptor dand instance attributes of the same name d, when we access d, because the data descriptor has a higher priority, python will call type(obj).__dict__['d'].__get__(obj, type(obj))instead of returnobj.__dict__['d']

But if the descriptor is a non-data descriptor, the opposite is true, and python will returnobj.__dict__['d']

Descriptor for lazy access (access on demand)

In many cases, we don’t need to initialize the properties of a class when the class is initialized. We can initialize this property by the way when we use this property for the first time , so that it will return directly when we reuse this property later. The result is fine, which not only reduces the number of calculations, but also reduces the memory requirements to a certain extent.

Therefore, when I defined my own descriptor, I __get__()judged whether the corresponding instance attribute has been initialized, if it is not initialized, initialize it, if it has been initialized, return directly, achieving the purpose of lazy access:

def __get__(self, instance, owner):

    private_name = "_{}__{}".format(instance.__class__.__name__, self.name)

    # Whether the instance attribute already exists

    if private_name not in instance.__dict__:

        instance.__dict__[private_name] = self.default

    return instance.__dict__[private_name]

Create read-only descriptor

When we want an attribute (descriptor) to prohibit the caller from modifying it, we can __set__()throw it in the methodAttributeError an exception , for example:

def __set__(self, instance, value):

    private_name = "_{}__{}".format(instance.__class__.__name__, self.name)

    # After the first assignment, the value of the attribute cannot be modified

    if private_name not in instance.__dict__:

        instance.__dict__[private_name] = value

    else:

        msg ="Changing value of {}.{} is not allowed".format(instance.__class__.__name__,

                                                             self.name)

        raise AttributeError(msg)

In this way, the effect of private variables can be realized, and the data can be encapsulated more safely, preventing unwanted results from accidentally modifying the object's data during external calls.

Deep copy can be used for mutable variables

If the instance attribute is a variable such as a dictionary or a list, python will return a reference to the object, so it is possible to modify its internal data after obtaining its value. Therefore, if you really want this attribute to not be modified in any way, you can Use deepcopythe deep copy of the directly returned object, so that no matter how you damage the object externally, it has nothing to do with the object itself that is returned.

def __get__(self, instance, owner):

    private_name = "_{}__{}".format(instance.__class__.__name__, self.name)

    if private_name not in instance.__dict__:

        instance.__dict__[private_name] = self.default

    if self.deepcopy:

        return copy.deepcopy(instance.__dict__[private_name])

    else:

        return instance.__dict__[private_name]

Points to note

Descriptors are all for class attributes, so if you store data in the descriptor object, unexpected results will occur. For example, I want to create a corresponding height descriptor for each student class, and put the height data in the descriptor, I can define the descriptor like this:

We created two student instances, but the height attribute is the same object. This is because the descriptor is a class attribute, so each instance is a reference to the accessed class attribute when it is accessed .

At this time, we can not put the data in the descriptor, but create private variables in the corresponding instance objects, so that the private variables of different objects are different variables, and the problem of the picture above will not arise.

class Height(object):

    def __init__(self, name):

        self.name = name



    def __get__(self, instance, cls):

        return getattr(instance, self.name)



    def __set__(self, instance, value):

        setattr(instance, self.name, value)

At the same time, the corresponding object and value can be stored in the dictionary of the descriptor as the key-value pair of the dictionary, but this will cause the reference count to be unable to be 0 and the garbage collection will not be possible, resulting in the risk of memory leaks, so this method is not detailed Described.

summary

This article summarizes the concepts and usage of descriptors in Python. Descriptors can help us achieve powerful and flexible attribute management. Elegant programming can be achieved through the combined use of descriptors, but at the same time, we should maintain a cautious attitude to avoid overwriting Unnecessary code complexity due to normal object behavior.

reference

Reference: https://cloud.tencent.com/developer/article/1033133 Effective python attribute management: the use of descriptors-Cloud + Community-Tencent Cloud