Deep Learning Series: Convolution Structural Changes in the Neural Network - A Network Deformable Convolutional Deformable Convolution

Deep Learning Series: Convolution Structural Changes in the Neural Network - A Network Deformable Convolutional Deformable Convolution

In this article, we will delve into the world of convolutional neural networks (CNNs) and explore a novel approach to modeling transformations: deformable convolutional networks. This technique, introduced by the authors, allows for a more flexible and adaptive convolution process, which can be beneficial in various applications such as image recognition and object detection.

Why Do We Need This Strange Structure?

In the previous article, we introduced the concept of Spatial Transformer Networks (STNs), which enabled affine transformations to be learned on CNNs. The primary goal of these transformations was to increase the robustness of the network by allowing it to learn a structure that could handle rotation, translation, scaling, and cropping. However, the conventional STN approach has its limitations, and the authors propose an alternative method: deformable convolutional networks.

Deformable Convolutional Networks

The deformable convolutional network is a new transformation that can be considered a successor to the STN approach. In this method, the CNN kernel is no longer fixed, but rather, it can be irregular, with dilation, rotation, or other transformations applied to it. This allows the network to better adapt to the input data and learn a more complex structure.

Convolution Deformable

The deformable convolution process can be understood by examining the traditional definition of convolution:

R represents a receptive field of the grid: R = {(-1, -1), (-1, 0), …, (0, 1), (1, 1)}, in Example 3 * 3.

For each output pixel position P0, the general convolution is:

However, the deformable convolution process is different:

Returning to Figure 1, the original convolution process is divided into two, all the way to the above study offset Δpn, to give H * W * 2N output (offset), N = |R| represents the number of grid pixels, 2N mean there offset in both directions x, y.

With this offset after convolution window for each convolution of the original, it is no longer sliding window (green boxes in FIG. 1) of the original regular, but through the window (blue boxes) after translation of the data to take after convolution calculation process and consistent.

Deformable Region of Interest Pooled

The deformable region of interest pooled (Deformable RoI Pooling) is another new module introduced by the authors. This module learns an offset for each region of interest (ROI) and uses it to pool the features. The offset is calculated using a scale r:

Once you have offset, you can do this:

As before, because the offset is not directly obtained decimal pixel, each pixel value needs to be a bilinear interpolation algorithm.

Experimental Results

The authors have conducted experiments to demonstrate the effectiveness of the deformable convolutional network and Deformable RoI Pooling. The results show that these modules can be used in various applications, such as image recognition and object detection.

Conclusion

In conclusion, the deformable convolutional network and Deformable RoI Pooling are two novel modules that can be used to improve the performance of CNNs. These modules allow for a more flexible and adaptive convolution process, which can be beneficial in various applications. The experimental results demonstrate the effectiveness of these modules, and we can expect to see them used in various applications in the future.

Reference Material

[1] Deformable Convolutional Networks
[2] Mask RCNN
[3] Spatial Transformer Networks