Shibei LIU, Ying CHEN. Audio object detection network with multimodal cross level feature knowledge transfer[J]. Optics and Precision Engineering, 2024, 32(2): 237

Search by keywords or author
- Optics and Precision Engineering
- Vol. 32, Issue 2, 237 (2024)

Fig. 1. Schematic of RGB, depth and audio information

Fig. 2. Multimodal knowledge distillation target detection network

Fig. 3. Cross-level fusion and no cross-level feature heatmaps

Fig. 4. Cross-level feature knowledge transfer loss based on attentional fusion

Fig. 5. Attention fusion module(AFM) and the KL divergence calculation module(KLD)

Fig. 6. Selection diagram of image and audio

Fig. 7. Example images of MAVD dataset

Fig. 8. Comparison of object detection capability under different network architecture

Fig. 9. Schematic diagram of different fusion modes

Fig. 10. Qualitative comparison of vehicle detection capability with or without LDLoss

Fig. 11. Los curves for MTALoss and MCFTLoss

Fig. 12. Qualitatively compares the vehicle detection capabilities of the baseline network and the method presented in this paper
|
Table 1. Results comparison of the method and the baseline network under different faculty modes
|
Table 2. This paper compares the method with classical object detection networks
|
Table 3. Ablation studies for both losses
|
Table 4. 损失函数中超参数和的消融研究
|
Table 5. 损失函数中超参数,和的消融研究
|
Table 6. Ablation studies with different fusion methods and loss calculation methods

Set citation alerts for the article
Please enter your email address