• Chinese Optics Letters
  • Vol. 20, Issue 8, 081101 (2022)
Zhenqi Liang1, Jingshi Wang1、2, Gang Xiao1、*, and Liu Zeng1
Author Affiliations
  • 1School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, China
  • 2Jiangsu Automation Research Institute, Lianyungang 222061, China
  • show less
    DOI: 10.3788/COL202220.081101 Cite this Article Set citation alerts
    Zhenqi Liang, Jingshi Wang, Gang Xiao, Liu Zeng. FAANet: feature-aligned attention network for real-time multiple object tracking in UAV videos[J]. Chinese Optics Letters, 2022, 20(8): 081101 Copy Citation Text show less

    Abstract

    Multiple object tracking (MOT) in unmanned aerial vehicle (UAV) videos has attracted attention. Because of the observation perspectives of UAV, the object scale changes dramatically and is relatively small. Besides, most MOT algorithms in UAV videos cannot achieve real-time due to the tracking-by-detection paradigm. We propose a feature-aligned attention network (FAANet). It mainly consists of a channel and spatial attention module and a feature-aligned aggregation module. We also improve the real-time performance using the joint-detection-embedding paradigm and structural re-parameterization technique. We validate the effectiveness with extensive experiments on UAV detection and tracking benchmark, achieving new state-of-the-art 44.0 MOTA, 64.6 IDF1 with 38.24 frames per second running speed on a single 1080Ti graphics processing unit.
    wchannel=σ(Conv1d(g(F))),

    View in Article

    Fchannel=wchannelConv2d(F)+Conv2d(F),

    View in Article

    F1=softmax(f1(g(Conv2d(Fchannel)))),

    View in Article

    F2=f2(Conv2d(Fchannel)),

    View in Article

    wspatial=σ(f3(F1F2)),

    View in Article

    Fspatial=wspatialFchannel,

    View in Article

    Foutput=Fchannel+Fspatial.

    View in Article

    Fconcat=Concat(upsample(Flow),Fhigh),

    View in Article

    Δlow=Conv2d3×3(ReLU(BN(Conv2d1×1(Fconcat)))),

    View in Article

    Δhigh=Conv2d3×3(ReLU(BN(Conv2d1×1(Fconcat)))),

    View in Article

    Foutput=f(upsample(Flow),Δlow)+f(Fhigh,Δhigh),

    View in Article

    Ah,w=h=1Hw=1WFh,w·max(0,1|h+Δ1hwh|)·max(0,1|w+Δ2hww|),

    View in Article

    Zhenqi Liang, Jingshi Wang, Gang Xiao, Liu Zeng. FAANet: feature-aligned attention network for real-time multiple object tracking in UAV videos[J]. Chinese Optics Letters, 2022, 20(8): 081101
    Download Citation