Object Detection via Multimodal Adaptive Feature Fusion

Xiaoqiang Gao; Kan Chang; Mingyang Ling; Mengyu Yin

doi:10.3788/LOP230856

Abstract

With the advancement of deep learning, object detection methods based on convolutional neural networks (CNNs) have achieved tremendous success. Existing CNN-based object detection models typically employ single-modal RGB images for training and testing; however, their detection performance is significantly degraded in low-light conditions. To address this issue, a multimodal object detection network model built on YOLOv5 is proposed, which integrates RGB and thermal infrared imagery to fully exploit the information provided by the fusion of multi-modal features, increasing the object detection accuracy. To achieve effective fusion of multimodal feature information, a multimodal adaptive feature fusion (MAFF) module is introduced. It facilitated multimodal feature fusion by adaptively selecting diverse modal features and exploiting the complementary information between modalities. The experimental results indicate the efficacy of the proposed algorithm for seamlessly merging features from distinct modalities, which significantly increases the detection accuracy.

微信扫一扫：分享

微信扫一扫：分享