From MobileNet V1 to MobileNet V2: A Journey of Innovation

lavitha · October 17, 2025, 5:47am

From MobileNet V1 to MobileNet V2: A Journey of Innovation

In the realm of deep learning, the quest for accuracy and efficiency has led to the development of various neural network architectures. One such architecture is the MobileNet series, which has undergone significant transformations from MobileNet V1 to MobileNet V2. In this article, we will delve into the evolution of MobileNet V1 to MobileNet V2, exploring the innovations and improvements that have made these networks more efficient and accurate.

From MobileNet V1 to MobileNet V2

MobileNet V1 was a pioneering effort in designing a neural network architecture that was optimized for mobile and embedded devices. The primary goal was to achieve a balance between model size, computational speed, and training speed, while sacrificing some accuracy. This led to the development of the Depthwise Separable Convolution (DWSC) layer, which was inspired by the Xception architecture.

However, MobileNet V1 had its limitations, and researchers sought to improve upon it. This led to the creation of MobileNet V2, which introduced two new layer structures: the residual block and the expansion-convolution-projection (ECP) block. These new blocks were designed to enhance the dimensionality of the feature maps and improve the model’s performance.

MobileNet V1

MobileNet V1 had a relatively simple overall structure, consisting of a series of convolutional layers with Batch Normalization and ReLU activation. The key difference between MobileNet V1 and other architectures like VGG was the use of Depthwise Separable Convolution (DWSC) layers, which were inspired by the Xception architecture.

Here is a simplified representation of the MobileNet V1 architecture:

def mobilenet_v1(inputs, alpha, is_training):
    if alpha not in [0.25, 0.50, 0.75, 1.0]:
        raise ValueError('alpha must be one of `0.25`, `0.50`, `0.75` or `1.0` only.')

    filter_initializer = tf.contrib.layers.xavier_initializer()

    def _conv2d(inputs, filters, kernel_size, stride, scope=''):
        with tf.variable_scope(scope):
            outputs = tf.layers.conv2d(inputs, filters, kernel_size, strides=(stride, stride), padding='same', activation=None, use_bias=False, kernel_initializer=filter_initializer)
            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = tf.nn.relu(outputs)
            return outputs

    def _mobilenet_v1_conv2d(inputs, pointwise_conv_filters, depthwise_conv_kernel_size, stride, scope=''):
        with tf.variable_scope(scope):
            with tf.variable_scope('depthwise_conv'):
                outputs = tf.contrib.layers.separable_conv2d(inputs, None, depthwise_conv_kernel_size, depth_multiplier=1, stride=(stride, stride), padding='SAME', activation_fn=None, weights_initializer=filter_initializer, biases_initializer=None)
                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)

            with tf.variable_scope('pointwise_conv'):
                pointwise_conv_filters = int(pointwise_conv_filters * alpha)
                outputs = tf.layers.conv2d(outputs, pointwise_conv_filters, (1, 1), padding='same', activation=None, use_bias=False, kernel_initializer=filter_initializer)
                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)
                return outputs

    # ... (rest of the code remains the same)

MobileNet V2

MobileNet V2 introduced two new layer structures: the residual block and the expansion-convolution-projection (ECP) block. The residual block is used to enhance the dimensionality of the feature maps, while the ECP block is used to reduce the dimensionality of the feature maps.

Here is a simplified representation of the MobileNet V2 architecture:

def mobilenet_v2_func_blocks(is_training):
    filter_initializer = tf.contrib.layers.xavier_initializer()
    activation_func = tf.nn.relu6

    def conv2d(inputs, filters, kernel_size, stride, scope=''):
        with tf.variable_scope(scope):
            with tf.variable_scope('conv2d'):
                outputs = tf.layers.conv2d(inputs, filters, kernel_size, strides=(stride, stride), padding='same', activation=None, use_bias=False, kernel_initializer=filter_initializer)
                outputs = tf.layers.batch_normalization(outputs, training=is_training)
                outputs = tf.nn.relu(outputs)
                return outputs

    def _1x1_conv2d(inputs, filters, stride):
        kernel_size = [1, 1]
        with tf.variable_scope('1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs, filters, kernel_size, strides=(stride, stride), padding='same', activation=None, use_bias=False, kernel_initializer=filter_initializer)
            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            return outputs

    def expansion_conv2d(inputs, expansion, stride):
        input_shape = inputs.get_shape().as_list()
        assert len(input_shape) == 4
        filters = input_shape[3] * expansion
        kernel_size = [1, 1]
        with tf.variable_scope('expansion_1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs, filters, kernel_size, strides=(stride, stride), padding='same', activation=None, use_bias=False, kernel_initializer=filter_initializer)
            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            outputs = activation_func(outputs)
            return outputs

    def projection_conv2d(inputs, filters, stride):
        kernel_size = [1, 1]
        with tf.variable_scope('projection_1x1_conv2d'):
            outputs = tf.layers.conv2d(inputs, filters, kernel_size, strides=(stride, stride), padding='same', activation=None, use_bias=False, kernel_initializer=filter_initializer)
            outputs = tf.layers.batch_normalization(outputs, training=is_training)
            return outputs

    # ... (rest of the code remains the same)

Android Performance Bottlenecks

Mobile devices have limited computational resources, making it challenging to deploy deep learning models on these devices. The primary performance bottlenecks on Android devices are:

Memory constraints: Mobile devices have limited RAM, making it challenging to deploy large-scale deep learning models.
Computational constraints: Mobile devices have limited computational resources, making it challenging to perform complex computations.
Power constraints: Mobile devices have limited power resources, making it challenging to perform computationally intensive tasks.

TensorFlow Lite

TensorFlow Lite is a lightweight version of TensorFlow that is optimized for mobile and embedded devices. It provides a range of features that make it suitable for deployment on Android devices, including:

Model compression: TensorFlow Lite provides a range of model compression techniques that can reduce the size of the model, making it more suitable for deployment on Android devices.
Quantization: TensorFlow Lite provides a range of quantization techniques that can reduce the precision of the model, making it more suitable for deployment on Android devices.
Multi-threading: TensorFlow Lite provides a range of multi-threading techniques that can improve the performance of the model on Android devices.

In conclusion, the evolution of MobileNet V1 to MobileNet V2 has been a significant milestone in the development of deep learning architectures. The introduction of the residual block and the expansion-convolution-projection (ECP) block has improved the performance of the model, making it more suitable for deployment on Android devices. TensorFlow Lite has also played a crucial role in the deployment of deep learning models on Android devices, providing a range of features that make it suitable for deployment on these devices.