Deep convolution neural network (pooling functions for object detection and segmentation)

MUHAMMAD DANISH

Deep convolution neural network (pooling functions for object detection and segmentation)

Files

Deep convolution neural network (pooling functions for object detection and segmentation).PDF (3.13 MB)

Date

2022

Authors

MUHAMMAD DANISH

Publisher

UMT Lahore

Abstract

Deep Convolutional Neural Networks (DCNN) delivers state-of-the-art performance in Object Detection, Classification, and Segmentation and are utilized for a range of Computer Vision applications. Pooling is an important component of many DCNN architectures. Pooling layers decrease the input's spatial size to lower the architecture's resource consumption. Commonly used pooling functions in DCNN are static and do not adapt to the feature maps that are being pooled. Investigating the impact of adaptive pooling functions in a DCNN for detection and segmentation is the goal of this work. This paper explains the theoretical foundations of DCNN, typical modifications, and prominent architectural designs. The flexible pooling algorithms Gated and Tree Pooling are then implemented as Keras layers. Gated Pooling achieves this by learning a combination of Max and Average Pooling and Tree Pooling learns pooling functions and how to combine them. Additionally, it is explained how to develop the MaskX R-CNN architecture by extending an existing Mask R-CNN implementation. This entails constructing a weight transfer function from the bounding box head to the mask head and combining mask predictions that are both class-specific and class agnostic. Following this approach, several MaskX R-CNN configurations are trained on all 80 categories of the Microsoft COCO dataset using these pooling methods. This thesis demonstrates that, when trained and tested on CIFAR-10 and CIFAR-100, utilizing these flexible pooling functions in tiny CNN results in better performance than Max Pooling. Analysis of the Mask and MaskX R-CNN models reveals that the performance of the basic Mask R-CNN implementation produces somewhat superior results. The performance of a MaskX R-CNN architecture trained and evaluated utilizing Tree Pooling within the ResNet backbone is marginally worse than the MaskX R-CNN baseline. The MaskX R-network CNN's heads that used Gated Pooling had the worst performance of all the trained models. However, the performance of the MaskX R-CNN implementations may be explained by the fact that they have around 4.5 times as many parameters as the Mask R-CNN baseline while adhering to the same training regimen. Because the MaskX R-CNN architecture is more complex, it is expected that: (1) using a prolonged iv training schedule will improve the performance of the baseline and the implementation using Tree Pooling; and (2) Tree Pooling with further training will eventually achieve a better performance than the baselines, as seen on the performance increase on CIFAR-10 and CIFAR-100.

URI

https://escholar.umt.edu.pk/handle/123456789/16676

Collections

2022

Full item page