Visual tracking based on transfer learning of deep salience information

Haorui Zuo; Zhiyong Xu; Jianlin Zhang; Ge Jia

doi:10.29026/oea.2020.190018

Journals >Opto-Electronic Advances >Volume 3 >Issue 9 >Page 190018-1 > Article

Opto-Electronic Advances
Vol. 3, Issue 9, 190018-1 (2020)

Visual tracking based on transfer learning of deep salience information

Haorui Zuo^{1、2、3、*}, Zhiyong Xu^1、2、3, Jianlin Zhang^1、2, and Ge Jia^1、2

Author Affiliations

¹Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China

²University of Chinese Academy of Sciences, Beijing 100049, China

³Key Laboratory of Optical Engineering, Chinese Academy of Sciences, Chengdu 610209, China

show less

DOI: 10.29026/oea.2020.190018 Cite this Article

Haorui Zuo, Zhiyong Xu, Jianlin Zhang, Ge Jia. Visual tracking based on transfer learning of deep salience information[J]. Opto-Electronic Advances, 2020, 3(9): 190018-1 Copy Citation Text

EndNote(RIS)

BibTex

Plain Text

show less

The FCNs for salience detection. Our network is designed for static salience detection. It takes a single frame as input and outputs the estimation of the static salience prediction of the image.

Fig. 1. The FCNs for salience detection. Our network is designed for static salience detection. It takes a single frame as input and outputs the estimation of the static salience prediction of the image.

Download full size | View in the Article

Salience sketch image from videos.The 1st row images are from skier video. The 2nd row images are the salience maps of the skier; the 3rd row images are from leopard video; the 4th row images are the salience maps of leopard.

Fig. 2. Salience sketch image from videos.The 1st row images are from skier video. The 2nd row images are the salience maps of the skier; the 3rd row images are from leopard video; the 4th row images are the salience maps of leopard.

Download full size | View in the Article

Fig. 3. The architecture of new multi-domain network.

Download full size | View in the Article

Fig. 4. Representative images selected from the Car2 sequence of VOT15. As can be seen from the three similar segment sequences, many images show little difference compared to others. When we go through at least 20 frames generally, scale variation, illumination variation, background clutter and so on can be found.

Download full size | View in the Article

Fig. 5. The weights for the images distributed by Gaussian distribution to generate certain numbers of samples. As can be seen from the figure, the frames at the beginning and the end of the group will have less weight than those in the middle. The frames located closer to the center of the group are designed to generate more samples because they could make greater distinctions.

Download full size | View in the Article

Fig. 6. The Precision plots and the success plots on the OTB50 dataset.We compare the results with the change of the salience extraction layers and find three layers best in them.

Download full size | View in the Article

Fig. 7. The Precision plots and the success plots on the OTB50 dataset.

Download full size | View in the Article

Fig. 8. Comparison with the state-of-the-art methods on the OTB100 dataset.

Download full size | View in the Article

Fig. 9. Comparison among the proposed method and several deep-learning methods and traditional methods on UAV123.

Download full size | View in the Article

Fig. 10. The tracking examples where our proposed algorithm is compared with other trackers. As illustrated by the challenging videos, other trackers are vulnerable and sensitive to inferences caused by generic objects and background clutter while our tracker recognizes the target in the most difficult cases.

Download full size | View in the Article

Tracker	Accuracy	Failures	Overlap	Speed (fps)
Struck³²	0.4129	103	0.2014	2
DLT⁶	0.4345	113	0.2152	0.5
SO-DLT²⁰	0.5086	117	0.2006	6
DeepSRDCF⁴³	0.5216	64	0.2931	0.3
SiameseFC⁴⁴	0.4931	95	0.2307	35
MD-CNNs¹	0.5543	49	0.3488	1
Ours	0.5592	55	0.3694	1

Table 1. The testing results of our proposed method and some typical trackers on the VOT15 challenge.

Haorui Zuo, Zhiyong Xu, Jianlin Zhang, Ge Jia. Visual tracking based on transfer learning of deep salience information[J]. Opto-Electronic Advances, 2020, 3(9): 190018-1

Download Citation

Save the article for my favorites

Paper Information