• Opto-Electronic Advances
  • Vol. 3, Issue 9, 190018-1 (2020)
Haorui Zuo1、2、3、*, Zhiyong Xu1、2、3, Jianlin Zhang1、2, and Ge Jia1、2
Author Affiliations
  • 1Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China
  • 2University of Chinese Academy of Sciences, Beijing 100049, China
  • 3Key Laboratory of Optical Engineering, Chinese Academy of Sciences, Chengdu 610209, China
  • show less
    DOI: 10.29026/oea.2020.190018 Cite this Article
    Haorui Zuo, Zhiyong Xu, Jianlin Zhang, Ge Jia. Visual tracking based on transfer learning of deep salience information[J]. Opto-Electronic Advances, 2020, 3(9): 190018-1 Copy Citation Text show less
    The FCNs for salience detection. Our network is designed for static salience detection. It takes a single frame as input and outputs the estimation of the static salience prediction of the image.
    Fig. 1. The FCNs for salience detection. Our network is designed for static salience detection. It takes a single frame as input and outputs the estimation of the static salience prediction of the image.
    Salience sketch image from videos.The 1st row images are from skier video. The 2nd row images are the salience maps of the skier; the 3rd row images are from leopard video; the 4th row images are the salience maps of leopard.
    Fig. 2. Salience sketch image from videos.The 1st row images are from skier video. The 2nd row images are the salience maps of the skier; the 3rd row images are from leopard video; the 4th row images are the salience maps of leopard.
    The architecture of new multi-domain network.
    Fig. 3. The architecture of new multi-domain network.
    Representative images selected from the Car2 sequence of VOT15. As can be seen from the three similar segment sequences, many images show little difference compared to others. When we go through at least 20 frames generally, scale variation, illumination variation, background clutter and so on can be found.
    Fig. 4. Representative images selected from the Car2 sequence of VOT15. As can be seen from the three similar segment sequences, many images show little difference compared to others. When we go through at least 20 frames generally, scale variation, illumination variation, background clutter and so on can be found.
    The weights for the images distributed by Gaussian distribution to generate certain numbers of samples. As can be seen from the figure, the frames at the beginning and the end of the group will have less weight than those in the middle. The frames located closer to the center of the group are designed to generate more samples because they could make greater distinctions.
    Fig. 5. The weights for the images distributed by Gaussian distribution to generate certain numbers of samples. As can be seen from the figure, the frames at the beginning and the end of the group will have less weight than those in the middle. The frames located closer to the center of the group are designed to generate more samples because they could make greater distinctions.
    The Precision plots and the success plots on the OTB50 dataset.We compare the results with the change of the salience extraction layers and find three layers best in them.
    Fig. 6. The Precision plots and the success plots on the OTB50 dataset.We compare the results with the change of the salience extraction layers and find three layers best in them.
    The Precision plots and the success plots on the OTB50 dataset.
    Fig. 7. The Precision plots and the success plots on the OTB50 dataset.
    Comparison with the state-of-the-art methods on the OTB100 dataset.
    Fig. 8. Comparison with the state-of-the-art methods on the OTB100 dataset.
    Comparison among the proposed method and several deep-learning methods and traditional methods on UAV123.
    Fig. 9. Comparison among the proposed method and several deep-learning methods and traditional methods on UAV123.
    The tracking examples where our proposed algorithm is compared with other trackers. As illustrated by the challenging videos, other trackers are vulnerable and sensitive to inferences caused by generic objects and background clutter while our tracker recognizes the target in the most difficult cases.
    Fig. 10. The tracking examples where our proposed algorithm is compared with other trackers. As illustrated by the challenging videos, other trackers are vulnerable and sensitive to inferences caused by generic objects and background clutter while our tracker recognizes the target in the most difficult cases.
    TrackerAccuracyFailuresOverlapSpeed (fps)
    Struck320.41291030.20142
    DLT60.43451130.21520.5
    SO-DLT200.50861170.20066
    DeepSRDCF430.5216640.29310.3
    SiameseFC440.4931950.230735
    MD-CNNs10.5543490.34881
    Ours0.5592550.36941
    Table 1. The testing results of our proposed method and some typical trackers on the VOT15 challenge.
    Haorui Zuo, Zhiyong Xu, Jianlin Zhang, Ge Jia. Visual tracking based on transfer learning of deep salience information[J]. Opto-Electronic Advances, 2020, 3(9): 190018-1
    Download Citation