• Acta Optica Sinica
  • Vol. 44, Issue 7, 0715001 (2024)
Jianming Chen1、2, Dingjian Li1, Xiangjin Zeng1、2, Zhenbo Ren3, Jianglei Di1、*, and Yuwen Qin1、2、**
Author Affiliations
  • 1Key Laboratory of Photonic Technology for Integrated Sensing and Communication, Ministry of Education, Guangdong Provincial Key Laboratory of Information Photonics Technology, School of Information Engineering of Guangdong University of Technology, Institute of Advanced Photonics Technology, Guangzhou 510006, Guangdong , China
  • 2Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519082, Guangdong , China
  • 3Key Laboratory of Light-Field Manipulation and Information Acquisition, Ministry of Industry and Information Technology, Shaanxi Key Laboratory of Photonics Technology for Information, School of Physical Science and Technology, Northwestern Polytechnical University, Xi an 710129, Shaanxi , China
  • show less
    DOI: 10.3788/AOS231907 Cite this Article Set citation alerts
    Jianming Chen, Dingjian Li, Xiangjin Zeng, Zhenbo Ren, Jianglei Di, Yuwen Qin. Cross-Modal Optical Information Interaction and Template Dynamic Update for RGBT Target Tracking Method[J]. Acta Optica Sinica, 2024, 44(7): 0715001 Copy Citation Text show less

    Abstract

    Objective

    RGB and thermal infrared (RGBT) tracking technology fully leverages the complementary advantages of different optical modalities, providing effective solutions for target tracking challenges in complex environments. However, the performance of many tracking algorithms is constrained due to the neglect of information exchange between modalities. Simultaneously, as the tracking template remains fixed, existing tracking methods based on Siamese networks face limitations in adapting to variations in target appearance, resulting in tracking drift. Therefore, enhancing the performance of target trackers in complex environments remains challenging.

    Methods

    The proposed algorithm adopts the Siamese network tracker as its foundational framework and introduces a feature interaction module to enhance inter-modal information exchange by reconstructing information proportions of different modalities. Based on the anchor-free concept, a prediction network is directly constructed to perform classification and regression on the target bounding box at each position point in the search region. To address the mismatch between the target and template during the tracking of the Siamese network tracker, we propose a template update strategy, which dynamically updates the tracking template using the predicted results from the previous frame.

    Results and Discussions

    Qualitative and quantitative experiments are carried out on SiamCTU and advanced RGBT target tracking models, with ablation experiments analyzed. Meanwhile, comparative experiments are conducted by evaluating the proposed target tracker against state-of-the-art target trackers on three benchmark datasets (GTOT, RGBT234, and LasHeR) to assess the tracking performance of the algorithm. Figs. 6, 7, and 9 respectively display the quantitative comparison results between SiamCTU and advanced RGBT tracking algorithms on the three benchmark datasets. Compared with advanced RGBT target tracking algorithms, the experimental results on three baseline datasets demonstrate outstanding tracking performance of SiamCTU, fully exhibiting the effectiveness of the proposed method. Specifically, on the GTOT and LasHeR datasets, the proposed tracking algorithm secures top rankings in both PR and SR. Fig. 8 and Table 1 respectively present the experimental results based on challenge attributes for the tracking algorithm on the GTOT and RGBT234 datasets. The experimental results show that SiamCTU exhibits excellent tracking performance under various challenging attributes, suggesting that the proposed tracker is effective in handling complex target tracking scenarios. To provide a more intuitive demonstration of the tracker's tracking performance, we visualize the tracking results in Fig. 10. In the LightOcc sequence [Fig. 10(a)], the proposed tracking algorithm utilizing the template update strategy maintains continuous and stable tracking of the target even under such challenges as occlusion and low illumination. For scenarios involving significant scale variations [Fig. 10(b)], the proposed tracker outperforms the comparative tracker, demonstrating the advantages of constructing a prediction network based on the anchor-free concept. The visual results in Figs. 10(c) and 10(d) reveal that the proposed tracker can leverage the complementary advantages of RGB and T modalities, reducing interference from similar objects. Meanwhile, the comparative tracking efficiency analysis of the tracker on the GTOT dataset (Table 2) indicates that SiamCTU significantly improves tracking accuracy with minimal tracking speed loss. Furthermore, the proposed tracker exhibits higher speed and precision advantages over the advanced MDNet-based tracker. In further ablation experiments (Table 3), the performance of the proposed tracker surpasses that of the baseline tracker, which underscores the substantial contributions of various modules designed in the algorithm and collectively enhances the tracker's ability to handle complex tracking scenarios. Specifically, when the feature interaction module is removed, the overall performance of SiamCTU decreases by 3.1% on the more complex RGBT234 dataset. Additionally, by varying template update parameters to study their influence on tracking performance, experimental results (Table 4) indicate that with an appropriate value of λ as the update parameter, the feature-level template update method can significantly enhance the tracker's performance.

    Conclusions

    To address the target tracking challenges in complex environments, we propose a cross-modal optical information interaction method for RGBT target tracking. The tracking model adopts the Siamese network as its foundational framework and incorporates a feature interaction module. This module enhances the inter-modal information exchange by reconstructing information proportions of different optical modalities, mitigating the effect of complex backgrounds on tracking performance. Subsequently, by dealing with the relationship between the tracker's initial template and the online template, we introduce a template dynamic updating strategy. This strategy dynamically updates the tracking template using predicted results, capturing the real-time status of the target and improving the algorithm's robustness. Evaluation results on three benchmark datasets including GTOT, RGBT234, and LasHeR demonstrate that the proposed method surpasses current advanced RGBT target tracking methods in terms of tracking accuracy. Additionally, it meets real-time tracking requirements and holds potential for broad applications in optical information detection, perception, and recognition of targets in complex environments.

    Jianming Chen, Dingjian Li, Xiangjin Zeng, Zhenbo Ren, Jianglei Di, Yuwen Qin. Cross-Modal Optical Information Interaction and Template Dynamic Update for RGBT Target Tracking Method[J]. Acta Optica Sinica, 2024, 44(7): 0715001
    Download Citation