• Acta Optica Sinica
  • Vol. 37, Issue 11, 1115005 (2017)
Xin Wang*, Zhiqiang Hou, Wangsheng Yu, Zefenfen Jin, and Xianxiang Qin
Author Affiliations
  • Information and Navigation College, Air Force Engineering University, Xi'an, Shaanxi 710077, China
  • show less
    DOI: 10.3788/AOS201737.1115005 Cite this Article Set citation alerts
    Xin Wang, Zhiqiang Hou, Wangsheng Yu, Zefenfen Jin, Xianxiang Qin. Target Scale Adaptive Robust Tracking Based on Fusion of Multilayer Convolutional Features[J]. Acta Optica Sinica, 2017, 37(11): 1115005 Copy Citation Text show less
    Schematic of deep convolution network of VGG-Net-19
    Fig. 1. Schematic of deep convolution network of VGG-Net-19
    Visualizations for different convolutional layers of VGG-Net-19. (a) Input images; (b) Conv3-4; (c) Conv4-4; (d) Conv5-4
    Fig. 2. Visualizations for different convolutional layers of VGG-Net-19. (a) Input images; (b) Conv3-4; (c) Conv4-4; (d) Conv5-4
    Construct the scale pyramid of the target by multi-scale sampling
    Fig. 3. Construct the scale pyramid of the target by multi-scale sampling
    Flow chart of proposed algorithm
    Fig. 4. Flow chart of proposed algorithm
    Comparison of partial tracking results of seven trackers
    Fig. 5. Comparison of partial tracking results of seven trackers
    Center location error curves of eight test sequences
    Fig. 6. Center location error curves of eight test sequences
    Overlap rate curves of eight test sequences
    Fig. 7. Overlap rate curves of eight test sequences
    (a) Success rate curves and (b) precision curves of 28 test sequences
    Fig. 8. (a) Success rate curves and (b) precision curves of 28 test sequences
    Tracking performance analysis in different combinations of feature. (a) Success rate curves; (b) precision curves
    Fig. 9. Tracking performance analysis in different combinations of feature. (a) Success rate curves; (b) precision curves
    Input: Image sequence: I1, I2, …, In. Initial target position: p0=(x0, y0), and initial target scale: s0=(w0, h0).
    Output: The estimated position of target: pt=(xt, yt), and estimated scale: st=(wt, ht).
    For t=1,2,…,n, do:
    1Locate the Center of Target
    1.1Crop out the ROI image in frame #t centered at pt-1, and extract the hierarchical convolutional features;
    1.2Learn the correlation response map using Eq. (5) and Eq. (7) for each convolutional layer;
    1.3Fuse the multiple correlation response maps using Eq. (8), and obtain the compositive response map;
    1.4Locate the center of the target pt in frame #t using Eq. (9).
    2Estimate the Scale of Target
    2.1Obtain the multi-scale sample images Is={Is1,…, Ism} in frame #t based on pt and st-1;
    2.2Build scale filters by extracting HOG features from the above multi-scale sample images;
    2.3Compute the correlation response score using Eq. (10) and Eq. (11);
    2.4Estimate the optimal scale st of the target in frame #t using Eq. (12).
    3Model Update
    3.1Update the position filters using Eq. (13);
    3.2Update the scale filters using Eq. (14).
    Until End of the image sequence.
    Table 1. Scale adaptive robust tracker based on fusion of multilayer convolutional features
    AlgorithmSV(28)IV(15)OCC(16)BC(11)DEF(9)MB(8)FM(12)IPR(18)OPR(23)OV(4)LR(3)
    Proposed0.8800.8380.841¯0.861¯0.9320.8700.7720.8790.855¯0.7020.873
    HCF0.8800.8580.8470.8670.927¯0.844¯0.757¯0.873¯0.8570.6560.863¯
    FCNT0.830¯0.7790.7370.7130.9250.7400.7150.7740.7980.691¯0.686
    CNN-SVM0.8270.7510.7330.6890.8900.7250.6850.7930.8000.6500.606
    CNT0.6620.5210.6670.4630.6860.4790.4770.5830.6300.4810.410
    DSST0.7400.6810.7850.6100.7330.6350.5390.7140.7250.4530.402
    KCF0.6800.6320.7440.5780.7340.6790.5860.6190.6780.6390.233
    Table 2. Comparison of the tracking precisions of the algorithm of different attributes
    AlgorithmSV(28)IV(15)OCC(16)BC(11)DEF(9)MB(8)FM(12)IPR(18)OPR(23)OV(4)LR(3)
    Proposed0.6000.5560.5820.5860.6290.591¯0.5540.5910.5790.5270.574
    HCF0.5310.5090.5140.573¯0.5890.5940.545¯0.532¯0.5250.5220.497¯
    FCNT0.558¯0.551¯0.517¯0.5060.628¯0.5520.5330.5040.539¯0.5730.451
    CNN-SVM0.5130.4770.4730.5000.5940.5350.5130.4800.5040.536¯0.373
    CNT0.5080.4250.5060.3720.5410.4260.4110.4420.4750.4170.342
    DSST0.4510.4120.4620.4210.4910.4570.4110.4410.4460.4050.238
    KCF0.4270.3890.4580.3980.5010.5120.4500.3830.4250.5200.209
    Table 3. Comparison of the tracking success rates of the algorithm of different attributes
    VideoCarScaleDog1DollIronmanMotorRollingSkiingSoccerWalking2Average
    Tracking speed9.08.39.76.73.112.14.79.67.9
    Table 4. Tracking speed of proposed algorithm for the eight videosframe /s
    TrackerProposedCNTFCNTCNN-SVMHCFMDNetDeepTrack[29]STCT[30]
    CodeM+CMMC+MM+CMMC+M
    PlatformCPU+GPUCPUCPU+GPUCPU+GPUGPUCPU+GPUCPU+GPUCPU+GPU
    Average tracking speed8.553-1012.52.5
    Table 5. Comparison of average tracking speed of the trackers based on deep learningframe /s
    Xin Wang, Zhiqiang Hou, Wangsheng Yu, Zefenfen Jin, Xianxiang Qin. Target Scale Adaptive Robust Tracking Based on Fusion of Multilayer Convolutional Features[J]. Acta Optica Sinica, 2017, 37(11): 1115005
    Download Citation