
- Journal of Electronic Science and Technology
- Vol. 23, Issue 2, 100314 (2025)
Abstract
Keywords
1 Introduction
Target tracking is an important research area in computer vision [1]. It is widely used in robotics [2], security surveillance [3], visual navigation [4], precision guidance [5], and other areas [6,7]. Among the current tracking methods, those based on correlation filters (CFs) are especially intriguing, due to their high efficiency and broad usability [8,9]. In 2010, Bolme et al. [10] proposed the first CF-based tracking method by utilizing the minimum output sum of squared error (MOSSE) filter. Later, two representative methods were successively reported by Henriques et al. [11,12]. One is based on the circulant structure of tracking-by-detection with kernels (CSK) that exploits large amounts of cyclically shifted samples for learning [11]. Another is based on a kernelized correlation filter (KCF) by adding a kernel mechanism [12]. These studies largely provoke the development of CF-based methods. However, early CF-based methods mainly adopt a cyclic sampling process, where unrealistic samples are periodically taken [11,12]. Thus, they are apt to produce boundary effects. To overcome such limitation, Danelljan et al. [13] proposed an algorithm based on the spatially regularized discriminative correlation filters (SRDCF) and utilized the spatial regularization term to constrain the filter, in order to fit the actual target. Galoogahi et al. [14] proposed background-aware correlation filters (BACF) based on the idea that background redundancy information in the correlation filters with limited boundaries (CFLB) [15] can alleviate the boundary effects. However, neither adding spatial regularization nor adding more background redundancy information can prevent model drift when the target appearance changes significantly, eventually leading to tracking failure. This model drift mainly originates from the following two aspects: i) Only the information in the current frame is considered. When the target appearance changes significantly, the update direction of the filter cannot be effectively constrained. ii) There is no validation for filter model updates. For example, when a tracking target is occluded, the occlusion may be misrecognized as the target, and updating the filter parameters will lead to identifying the wrong target.
To alleviate the model drift problem, introducing the passive-aggressive (PA) idea into SRDCF to constrain the update filter has been widely demonstrated to be viable [16,17]. PA is a method to enhance the robustness of the model by introducing adversarial perturbations. Its core is to optimize the model update process by simulating potential attacks, thereby improving the model’s adaptability to the diverse targets with different appearances. However, it is assumed simply based on the prior information from the previous frame, whose efficacy is unreliable. For example, when the tracking target is occluded, the spatial-temporal regularized correlation filters (STRCF) [16] will continue to use the previous occlusion frame and update CF with the wrong samples. As a result, the updating errors are gradually accumulated during the tracking process, finally resulting in tracking failure. This is even worse than SRDCF without prior information. Based on SRDCF, Danelljan et al. [17] further optimized the spatially regularized discriminative correlation filter with decontamination (SRDCFdecon) by integrating a multi-sample weighted update strategy to effectively mitigate model drift. However, the calculation of SRDCF is complicated due to the additional multiple optimization processes, which dims the high-speed advantage of CFs. Although great achievements have been made in recent years, a filter that can achieve acceptable precision and robustness simultaneously is still desirable, especially when the target changes significantly.
As an alternative, BACF is attractive due to its high efficiency of multi-channel parallel computation. Moreover, it can alleviate boundary effects by sampling with redundant background information. However, it does not take temporal information into consideration, leading to errors in selecting temporal-confidence samples. To address this issue, this paper proposes a target-tracking method based on the temporal regularization CF, named temporal-regulation background-aware correlation filter (TBACF), to address model drift caused by significant changes in the target appearance. TBACF adopts BACF as a baseline and simultaneously integrates the temporal regularization term and the peak side-lobe to peak correlation energy (PSPCE)-based updating strategy. Due to the convexity of the temporal regularization term, the alternating direction method of multipliers (ADMM) [18] is used to improve the computational efficiency of TBACF. The contributions of this paper are as follows:
• A new temporal regularization term is proposed for CF-based tracking methods, based on temporal-confidence samples generated by high-confidence samples selected along the time axis. Such samples ensure the precision of filter updates in the tracking process, thus eliminating the negative influence of variations in the target appearance on the model performance.
• An innovative update strategy derived from the PSPCE criterion is introduced, which is able to effectively select high-confidence samples over time, update the temporal-confidence samples, and maintain the integrity of the tracking process.
• By comparing with some state-of-the-art (SOTA) methods, it has been demonstrated that the proposed method is superior in precision, robustness, and speed and exhibits a strong adaptability to broad applications. It means that the proposed temporal regularization term and update strategy are feasible to be integrated into other baseline models to improve their performance.
2 Related work
Despite various visual tracking methods have been reported, CF-based methods are highly attractive for their high efficiency and precision. MOSSE is the first method, which applies CF to the tracking task [10]. Subsequently, CSK simplifies the computation of filter parameters in the frequency domain and significantly improves the computational efficiency by introducing a kernel mechanism and cyclic matrices [11]. Based on CSK, KCF further improves the tracking precision by leveraging the histogram of oriented gradients (HOG) features for filter training [12].
On the basis of these priori experiences, different advanced tracking techniques have been realized to conquer different aspects of the challenge. For instance, Bertinetto et al. [19] reported a method named the sum of template and pixel-wise learners (Staple), which integrates template learning with pixel-wise learners and incorporates color histograms to improve tracking robustness. It is high-efficient and particularly desirable in environments where color plays a crucial role, such as wildlife monitoring and urban surveillance. Using a scale pool, Li and Zhu [18] introduced scale adaptation to improve the tracking performance across various image scales and named this method as scale adaptive with multiple features (SAMF). SAMF is crucial for applications, like traffic monitoring, where vehicles may appear at different distances from the camera. Danelljan et al. [20] realized a discriminative scale space tracker (DSST), which offers an advanced approach to managing the scale by using different filters for the position and scale. It is suitable for fine-grained size adjustments in retail and crowd monitoring.
Currently, the performance of tracking systems used in dynamic environments like public spaces has been significantly improved by innovative methods. For example, the discriminative CF with channel and spatial reliability (CSR-DCF) [21] employs spatially constrained masks to address boundary effects and SRDCF [13] expands the feature learning area through spatial regularization. Both of them can enhance the system’s robustness. Deep learning technology has also been introduced into CF-based tracking methods. Replacing traditional HOG features with convolutional layer outputs, the tracking performance of deep spatially regularized discriminative correlation filters (DeepSRDCF) [13] in complex scenarios, such as security surveillance, has been largely boosted. By using multi-layer convolutional features, the hierarchical convolutional feature (HCF) [22] is able to improve the efficiency of CF-based methods for diverse applications, such as automated manufacturing and sports analytics.
Focusing on specific challenges in modern tracking tasks, such as tracking objects in drastically changing or occluded environments, Cai et al. [23] proposed the multi-object tracking with memory (MeMOT) strategy, which employs spatiotemporal memory to improve the tracking performance. Qin et al. [24] proposed MotionTrack for multi-object tracking by using a strategy of scale pool. Hoorick et al. [25] proposed an effective strategy named containers and occluders in the wild (TCOW) to reinforce tracking capabilities. Ren et al. [26] explored the fine-grained object representations through a combination of a flow alignment feature pyramid network (FAFPN), multi-head part mask generator, and shuffle-group sampling strategy, and demonstrated its superior performance on benchmark datasets.
However, the above-mentioned methods suffer from model drift when there are significant changes in the target appearance. To conquer this problem, Ma et al. [27] introduced a long-term correlation tracking (LCT) algorithm, which incorporates spatiotemporal context to maintain the precision in background clutter and occlusions. Similarly, SRDCFdecon [17] integrates a strategy of weighting multiple frames based on the calculated loss function, which can effectively mitigate model drift. However, both of them exhibit a high computational load, and especially under the situation that the target appearance changes significantly, they still face certain limitations. Despite STRCF can stabilize filter changes over time by using previous frame data in current updates [16], cumulative errors will be generated under the condition of long-time occlusions, leading to aiming at incorrect targets. In summary, although existing methods have alleviated the problem of model drift to some extent, how to enhance the robustness and precision of models while maintaining low computational complexity when the target appearance undergoes significant changes remains a crucial issue that urgently needs to be addressed in this field.
3 Temporal-regulation background-aware correlation filter
In this section, BACF is first introduced, following with the principle of the proposed TBACF with temporal regularization terms and introduction of the optimization algorithm, and finally the ADMM-based solution of TBACF is stated.
3.1 Background-aware correlation filter
BACF is a common method used to mitigate boundary effects encountered in CF approaches. The objective function for BACF is defined as
where
Obviously, (1) only considers the information of the current frame during the tracking process and does not take the prior temporal information to constrain the filter update when the target appearance changes significantly.
3.2 Principle of temporal-regulation background-aware correlation filter
To eliminate the negative effects of drastic changes between neighboring frames, TBACF is proposed in this paper. It can ensure the precision of the current tracking target as much as possible and simultaneously update the filter to be similar to the previous one. Inspired by PA [28], a temporal regularization term is proposed based on a temporal-confidence sample, namely
where
The system function in (2) is set as
It is obvious that the essential difference between TBACF and BACF is the temporal regularization term
3.3 Optimization algorithm
Solving the objective function of CF is usually performed in the frequency domain [12,14]. Thus, (2) is further converted into a frequency domain, expressed as
where the symbol
where
Equation (5) satisfies two conditions for ADMM convergence [29]:
•
• The Lagrange function (non-augmented)
For the subproblem (5a),
For the subproblem (5b), it has the complexity of
where
Equation (7) has the complexity of
The complexity of (8) is reduced to
The formula for Lagrangian update is as follows:
where
The step size parameter
3.4 PSPCE-based update strategy
In large-margin object tracking with circulant feature (LMCF) maps [31], the average peak-to-correlation energy (APCE) coefficient is usually utilized to discriminate whether the current target is reliable. However, the APCE coefficient only considers the maximum fluctuation (the difference between the maximum and minimum values) in the response map, which leads to tracking failure when the target is occluded. Because the APCE coefficient will treat the occlusion as the “real” target, thus causing CF to focus on the misjudged target—occlusion.
To eliminate the negative effect of such ambiguous targets, we introduce the concept of side lobes from the radar field. There usually have two or more lobes in antenna patterns. Amongst, the one with the highest radiation intensity is called the main lobe while the remaining called the side lobes. The width of the main lobe indicates the extent to which the energy radiation is concentrated, so the width of the side lobes should be as small as possible, that is, the energy should be focused on the correct target. As all we know, the peak side lobe ratio (PSLR) represents the ratio of the side lobe value to the maximum value of the main lobe in the signal field, thus it can be utilized to overcome this limitation of ambiguous targets. As a result, we propose an updated strategy for temporal-confidence samples based on PSPCE, which is defined as
where
Figure 1.Tracking results of TBACF in different frames: (a) normal frame and (d) corresponding frame with a significant change in the appearance, where the green rectangle is the tracking result of TBACF; (b) and (e) response maps after the green rectangles in (a) and (d) passing through the filter; (c) and (f) response maps after suppressing the maximum value and its surroundings.
Similarly, when the tracking target is occluded by something with similar features to the targets, misjudgment in both PSPCE1 and PSPCE2 may arise. To solve this problem, a queue
Figure 2.Temporal-confidence sample updating: (a) PSPCE variation and (b) ratio of PSPCE1 and PSPCE2 during the tracking process.
Table Infomation Is Not EnableWhen ADMM is completed, all frames enter the update strategy to obtain the results. The calculation cost resulted by the update strategy is fixed, and the time complexity is
4 Experimental results and discussion
The simulation is conducted by using MATLAB-2018a with an Intel i7-
4.1 OTB-100
The OTB-100 benchmark is a mainstream tracking public dataset with 11 attributes, including but not limited to deformation, rotation, occlusion, background clutter, and illumination variation. The trackers are evaluated by the one pass evaluation (OPE) protocol proposed in Ref. [36], where the overlap precision (OP) metric is used to quantify the proportion of bounding box overlaps in the sequence that exceeds a customized threshold. As a comparison, the SOTA methods are considered, including convolutional neural network-support vector machine (CNN-SVM) [33], HCF [22], hedged deep tracking (HDT) [37], DeepSRDCF [13], efficient convolution operator (ECO) [38], deep learning methods (such as fully-convolutional Siamese networks for object tracking (SiamFC) [39] and Siamese region proposal network (SiamRPN) [40]), and none-deep learning methods (such as multi-expert entropy minimization (MEEM) [41], BACF [14], LCT [27], Staple [19], STRCF [16], KCF [12], SRDCF [13], SAMF [18], SRDCFdecon [17], and DSST [20]).
The experimental results of success rate and precision under the OPE protocol are shown in Fig. 3. Obviously, TBACF is superior than most of the existing methods in terms of both success rate and precision. Especially when compared with these non-deep learning methods, TBACF exhibits the best performance. In addition, the success rate of TBACF is also higher than the majority of other competitors, such as 0.8% higher than ECO, 1.2% higher than SiamRPN, 3.3% higher than STRCF, 4.5% higher than BACF, 6.1% higher than LCT, and 25.0% higher than DSST. It is also worthy to note that an obvious advantage in precision has been achieved by TBACF. Although it is slightly inferior to ECO, its precision is larger than HCF, HDT, and DSST. Especially for DSST, such enhancement is high to 19.1%. These results demonstrate that TBACF is highly competitive among the SOTA methods.
Figure 3.Comparison of TBACF with the SOTA trackers on OTB-100.
For further comparison, another two metrics (AUC and FPS) are adopted, where AUC denotes the area under the curve, i.e., the area under the receiver operating characteristic (ROC) curve, and FPS indicates the frames per second, i.e., the number of image frames that can be processed per second, which is an indicator to measure the video processing speed. As shown in Table 1, a tantalizing AUC result is realized by our proposed TBACF because this value is larger than those of almost all SOTA methods except ECO. With regards to the computational efficiency, TBACF operates at 30.7 FPS. It is highly advantageous to SRDCFdecon and LCT, and also comparable to BACF and DeepSRDCF. Although its speed is slower than SiamRPN, the precision of our proposed TBACF is higher. This indicates TBACF can achieve a balance between precision and speed, endowing it especially attractive in practical applications.
Metric | Method | |||||||
ECO [ | STRCF [ | SiamRPN [ | BACF [ | SRDCFdecon [ | LCT [ | DeepSRDCF [ | TBACF | |
AUC (%) | 69.0 | 67.6 | 66.6 | 64.9 | 63.9 | 63.8 | 63.3 | 68.0 |
FPS | 10.3 | 21.1 | 89.3 | 35.6 | 2.5 | 23.6 | 1.2 | 30.7 |
Table 1. Results of top-8 trackers on OTB-100.
To demonstrate its robustness, the success rates of TBACF and the above-mentioned SOTA methods are also investigated on OTB-100 with the following attributes: In-plane rotation, out-of-plane rotation, deformation, background clutter, and illumination variation, to mimic the circumstances when the target appearance is significantly changed. The obtained results are shown in Fig. 4. It is obvious that TBACF is able to successfully track the target with a success rate of >78.5% under the illumination variation situation. This rate is even higher under other situations. Especially for out-of-plane rotation, deformation, and background clutter, as high as 88.5%, 88.2%, and 91.1% are realized by our proposed TBACF, respectively. This indicates that TBACF has an excellent capacity to working with complex environmental changes and resisting substantial background interference. Under the in-plane rotation situation, although its performance is slightly inferior to SiamRPN, the success rate is still high to 84.9%. All the results obtained with OTB-100 demonstrate the potential of TBACF for tracking applications in complex scenes.
Figure 4.Performance evaluation and comparison on OTB-100 with attributes: (a) and (b) in-plane rotation, (c) and (d) out-of-plane rotation, and (e) and (f) deformation.
4.2 TColor-128
TColor-128 is a benchmark for evaluating how color information affects the performance of the tracker. It contains a total of 128 color video sequences. As a comparison, the SOTA methods are considered, including ECO [38], continuous convolution operator tracker (CCOT) [17], tracking-hand-crafted feature version (ECO-HC) [38], STRCF [16], BACF [14], and DSST [20]. The results shown in Fig. 5 reveal that TBACF has an enhanced performance on TColor-128, compared with DSST and BACF. Especially for DSST, such enhancement is remarkable with a 21.1% higher success rate and 21.3% higher precision. Despite this deviation is reduced between TBACF and BACF, the increased amounts are still high to 7.1% and 10.4%, respectively. Moreover, TBACF is also comparable to efficient convolution operators for ECO-HC with negligible deviations in terms of both success rate and precision. These experimental results indicate that our proposed TBACF can make good use of color features. However, the performance of TBACF is slightly inferior to ECO and CCOT with respectively 5.1% and 1.8% lower in success rate and 5.0% and 3.3% lower in precision. This is attributed to richer feature representation of ECO and CCOT. ECO and CCOT employ more complex feature fusion strategies, which enable them to better capture the multi-scale and multi-directional features of the target. Moreover, more advanced optimization strategies are adopted during the optimization process of ECO and CCOT, which can adapt to the changes in the target appearance at a faster speed, thereby enhancing tracking precision. Therefore, it is possible to further enhance TBACF by integrating multiple types of features to strengthen the target representation capability, and thus improve its success rate and precision.
Figure 5.Performance evaluation and comparison on TColor-128.
4.3 UAV123
The UAV123 benchmark contains 123 fully-annotated high-definition videos captured by professional-grade drones, featuring with viewing-angle changes, smaller targets, etc. Therefore, to evaluate the performance of TBACF on resisting occlusion and adapting to rapidly changing angles, simulations are conducted on UAV123 and also compared with ECO [38], multi-cue correlation filter based tracker (MCCT) [42], CCOT [17], STRCF [16], BACF [14], SRDCF [13], SRDCFdecon [17], hierarchical convolutional feature (CF2) [22], Staple [19], SAMF [18], DSST [20], and KCF [12]. The calculated results in Fig. 6 show that TBACF can perform better on UAV123, despite ECO, MCCT, and CCOT exhibit a tiny superiority. For example, CCOT is 1.0% and 1.8% higher in success rate and precision, respectively. However, its performance enhancement is remarkable with respect to KCF and DSST. Therefore, it can be concluded that TBACF is highly competitive among the existing methods.
Figure 6.Performance evaluation and comparison on UAV123.
4.4 LaSOT
The LaSOT dataset [35] is a recently proposed large-scale database for tracking. It contains
Figure 7.Performance evaluation and comparison on LaSOT.
4.5 VOT2016
The VOT2016 dataset [34] contains 60 challenging sequences. With the aim to substantiate the efficacy of TBACF in addressing the challenges posed by multiple targets, intricate scenarios, and varied motion dynamics, TBACF is evaluated on this dataset and compared with several representative trackers: Staple [19], ECO [38], tree-structure convolutional neural network (TCNN) [55], BACF [14], SRDCF [13], SRDCFdecon [17], and STRCF [16]. The obtained results are shown in Table 2. Obviously, TBACF achieves the highest precision of 57% and relatively higher results of the expected average overlap (EAO) at 30% and robustness at 36%. Its overall performance is significantly better than that of BACF because the temporal regularization adopted in our TBACF is especially advantageous to bad conditions. With respect to ECO and TCNN, both EAO and robustness of our TBACF are slightly inferior, however, its precision is superior. This means that TBACF is potential to eliminate the compromise between EAO, precision, and robustness among existing tracker and achieves a balanced performance.
Metric | Method | |||||||
ECO [ | Staple [ | TCNN [ | SRDCF [ | BACF [ | SRDCFdecon [ | STRCF [ | TBACF | |
EAO (%) | 29 | 25 | 22 | 26 | 27 | |||
Precision (%) | 54 | 54 | 52 | 53 | 53 | |||
Robustness (%) | 38 | 44 | 48 | 41 | 38 |
Table 2. Results of top-8 trackers on VOT2016.
Based on the above results obtained on various datasets, it can be concluded that our TBACF proposed in this paper still situates a leading position among the current representative trackers, even if some methods show a better performance on certain datasets.
5 Ablation study
5.1 Effectiveness of different components
The ablation experiments are conducted on OTB-100, aiming at verifying the contribution of critical components in our TBACF. The results in Fig. 8(a) show that the “baseline” which refers to the original BACF with no update strategy (UP) and temporal regularization item (TR) achieves an AUC value of 83.6%. As desired, both “baseline+TR”, which is optimized with TR based on the temporal-confidence sample, and “baseline+TR+UP” (TBACF proposed in this paper) exhibit an enhanced performance with increased AUC of 86.7% and 88.1%, respectively. These results further demonstrate that the improvement of update strategy and the introduction of the temporal regularization item in our method play a critical role in this performance enhancement.
Figure 8.Results of ablation analysis on OTB-100: (a) baseline and (b) modified methods.
5.2 Effectiveness of different baselines
The temporal regularization term and update strategy are also embedded in SRDCF [13] as plug-ins, named TSRDCF, to verify the generalizability of our proposed method. As can be seen from Fig. 8(b), TSRDCF and TBACF modified with the proposed method in this paper are advantageous to their respective baselines: SRDCF and BACF. Compared with 83.6% AUC of BACF and 78.2% AUC of SRDCF, these values of TBACF and TSRDCF have increased 4.5% and 2.1%, respectively, and reached 80.3% and 88.1%. This means that the introduction of the temporal regularization term and the improvement of the update strategy proposed in this paper are effective and meanwhile universally applicable to improving the tracking performance of the existing methods.
6 Conclusions
This paper proposes a temporal regularization term to address the model drift problem caused by significant changes of target appearance. By introducing a temporal-confidence sample as the temporal regularization term, the filter can avoid focusing on erroneous targets, such as occlusions, during the update process, ensuring the correct update of the filter parameters. An increased update strategy is further proposed to ensure the high confidence of temporal-confidence samples. As a result, the temporal regularization term and update strategies are incorporated into the objective function of the baseline BACF, thus contributing to a more robust TBACF model which is especially desirable when the target appearance changes significantly. Its effectiveness is verified on OTB100, TC-128, UAV123, VOT2016, and LaSOT. The experimental results on the OTB-100 dataset demonstrate that our model is robust even under the situations of background clutter, deformation, occlusion, rotation, and illumination variation. The superiority and versatility of the proposed method are verified with ablation studies by applying the temporal regularization term and the update strategy as plug-ins in different baselines.
Despite these advances, our method also suffers from some limitations, such as complex computations mainly originating from the optimization of ADMM, reliance on manual adjustment of parameters, and the difficulty that the model is hard to autonomously learn parameter changes. In the future, further evaluations on other various datasets will be performed, to verify our method’s superior performance on handling occlusions in complex backgrounds as well as its robustness. We will also plan to integrate our method with deep learning by leveraging the excellent feature extraction capabilities and generalization performance of deep learning to further improve the method’s stability and robustness.
Disclosures
The authors declare no conflicts of interest.
References
[9] Z.Y. Huang, C.H. Fu, Y.M. Li, F.L. Lin, P. Lu, Learning aberrance repressed crelation filters f realtime UAV tracking, in: Proc. of the IEEECVF Intl. Conf. on Computer Vision, Seoul, Republic of Kea, 2019, pp. 2891–2900.
[10] D.S. Bolme, J.R. Beveridge, B.A. Draper, Y.M. Lui, Visual object tracking using adaptive crelation filters, in: Proc. of IEEE Computer Society Conf. on Computer Vision Pattern Recognition, San Francisco, USA, 2010, pp. 2544–2550.
[11] J.F. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the circulant structure of trackingbydetection with kernels, in: Proc. of the 12th European Conf. on Computer Vision, Flence, Italy, 2012, pp. 702–715.
[13] M. Danelljan, G. Häger, F.S. Khan, M. Felsberg, Learning spatially regularized crelation filters f visual tracking, in: Proc. of the IEEE Intl. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 4310–4318.
[14] H.K. Galoogahi, A. Fagg, S. Lucey, Learning backgroundaware crelation filters f visual tracking, in: Proc. of the IEEE Intl. Conf. on Computer Vision, Venice, Italy, 2017, pp. 1144–1152.
[15] H.K. Galoogahi, T. Sim, S. Lucey, Crelation filters with limited boundaries, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Boston, USA, 2015, pp. 4603–4638.
[16] F. Li, C. Tian, W.M. Zuo, L. Zhang, M.H. Yang, Learning spatialtempal regularized crelation filters f visual tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Salt Lake City, USA, 2018, pp. 4904–4913.
[17] M. Danelljan, A. Robinson, F.S. Khan, M. Felsberg, Beyond crelation filters: learning continuous convolution operats f visual tracking, in: Proc. of the 14th European Conf. on Computer Vision, Amsterdam, The herls, 2016, pp. 472–488.
[18] Y. Li, J.K. Zhu, A scale adaptive kernel crelation filter tracker with feature integration, in: Proc. of the European Conf. on Computer Vision, Zurich, Switzerl, 2015, pp. 254–265.
[19] L. Bertito, J. Valmadre, S. Golodetz, O. Miksik, P.H.S. Tr, Staple: complementary learners f realtime tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Las Vegas, USA, 2016, pp. 1401–1409.
[20] M. Danelljan, G. Häger, F.S. Khan, M. Felsberg, Accurate scale estimation f robust visual tracking, in: Proc. of the British Machine Vision Conf., Nottingham, UK, 2014, pp. 1–11.
[21] A. Lukezic, T. Vojir, L.C. Zajc, J. Matas, M. Kristan, Discriminative crelation filter with channel spatial reliability, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Honolulu, USA, 2017, pp. 4847–4856.
[22] C. Ma, J.B. Huang, X.K. Yang, M.H. Yang, Hierarchical convolutional features f visual tracking, in: Proc. of the IEEE Intl. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 3074–3082.
[23] J.R. Cai, M.Z. Xu, W. Li, et al., MeMOT: multiobject tracking with memy, in: Proc. of the IEEECVF Conf. on Computer Vision Pattern Recognition, New leans, USA, 2022, pp. 8090–8100.
[24] Z. Qin, S.P. Zhou, L. Wang, J.H. Duan, G. Hua, W. Tang, MotionTrack: learning robust shtterm longterm motions f multiobject tracking, in: Proc. of the IEEECVF Conf. on Computer Vision Pattern Recognition, Vancouver, Canada, 2023, pp. 17939–17948.
[25] B. Van Hoick, P. Tokmakov, S. Stent, J. Li, C. Vondrick, Tracking through containers occluders in the wild, in: Proc. of the IEEECVF Conf. on Computer Vision Pattern Recognition, Vancouver, Canada, 2023, pp. 13802–13812.
[26] H. Ren, S.D. Han, H.L. Ding, Z.W. Zhang, H.W. Wang, F.Q. Wang, Focus on details: online multiobject tracking with diverse finegrained representation, in: Proc. of the IEEECVF Conf. on Computer Vision Pattern Recognition, Vancouver, Canada, 2023, pp. 11289–11298.
[27] C. Ma, X.K. Yang, C.Y. Zhang, M.H. Yang, Longterm crelation tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Boston, USA, 2015, pp. 5388–5396.
[28] Crammer K., Dekel O., Keshet J., Shalev-Shwartz S., Singer Y., Warmuth M.K.. Online passive-aggressive algorithms. J. Mach. Learn. Res., 7, 551-585(2006).
[29] Boyd S., Parikh N., Chu E., Peleato B., Eckstein J.. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Le., 3, 1-122(2011).
[31] M.M. Wang, Y. Liu, Z.Y. Huang, Large margin object tracking with circulant feature maps, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Honolulu, USA, 2017, pp. 4800–4808.
[33] U.T. Benchmark, A benchmark simulat f UAV tracking, in: Proc. of the European Conf. on Computer Vision, Amsterdam, The herls, 2016, pp. 1–14.
[35] H. Fan, L.T. Lin, F. Yang, et al., LaSOT: a highquality benchmark f largescale single object tracking, in: Proc. of the IEEECVF Conf. on Computer Vision Pattern Recognition, Long Beach, USA, 2019, pp. 5369–5378.
[36] Y. Wu, J. Lim, M.H. Yang, Online object tracking: a benchmark, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Ptl, USA, 2013, pp. 2411–2418.
[37] Y.K. Qi, S.P. Zhang, L. Qin, et al., Hedged deep tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Las Vegas, USA, 2016, pp. 4303–4311.
[38] M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, ECO: efficient convolution operats f tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Honolulu, USA, 2017, pp. 6931–6939.
[39] L. Bertito, J. Valmadre, J.F. Henriques, A. Vedaldi, P.H.S. Tr, Fullyconvolutional Siamese wks f object tracking, in: Proc. of the European Conf. on Computer Vision, Amsterdam, The herls, 2016, pp. 850–965.
[40] B. Li, J.J. Yan, W. Wu, Z. Zhu, X.L. Hu, High perfmance visual tracking with Siamese region proposal wk, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Salt Lake City, USA, 2018, pp. 8971–8980.
[41] J.M. Zhang, S.G. Ma, S. Sclaroff, MEEM: robust tracking via multiple experts using entropy minimization, in: Proc. of the 13th European Conf. on Computer Vision, Zurich, Switzerl, 2014, pp. 188–203.
[42] N. Wang, W. Zhou, Q. Tian, et al., Multicue crelation filters f robust visual tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Salt Lake City, USA, 2018, pp. 4844–4853.
[43] J. Karoliny, B. Etzlinger, A. Springer, Mixture density wks f WSN localization, in: Proc. of the IEEE Intl. Conf. on Communications Wkshops, Dublin, Irel, 2020, pp. 1–5.
[44] Y. Song, C. Ma, X. Wu, et al., VITAL: visual tracking via adversarial learning, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Salt Lake City, USA, 2018, pp. 8990–8999.
[45] Zhu J., Wang D., Lu H.. Visual tracking by learning spatiotemporal consistency in correlation filters. Sci. China Inf. Sci., 50, 128-150(2020).
[46] Y. Zhang, L. Wang, J. Qi, et al., Structured Siamese wk f realtime visual tracking, in: Proc. of the European Conf. on Computer Vision, Munich, Germany, 2018, pp. 351–366.
[47] D. Held, S. Thrun, S. Savarese, Learning to track at 100 fps with deep regression wks, in: Proc. of the 14th European Conf. on Computer Vision, Amsterdam, The herls, 2016, pp. 749–765.
[48] J. Choi, H.J. Chang, T. Fischer, et al., Contextaware deep feature compression f highspeed visual tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Salt Lake City, USA, 2018, pp. 479–488.
[49] G. Zhang, Z. Li, J. Li, et al., CF: cade fusion wk f dense prediction, [Online]. Available, https:arxiv.gabs2302.06052, February 2023.
[50] H. Fan, H. Ling, Parallel tracking verifying: a framewk f realtime high accuracy visual tracking, in: Proc. of the IEEE Intl. Conf. on Computer Vision, Venice, Italy, 2017, pp. 5486–5494.
[51] C. Ma, J.B. Huang, X. Yang, et al., Hierarchical convolutional features f visual tracking, in: Proc. of the IEEE Intl. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 3074–3082.
[53] J. Choi, H.J. Chang, J. Jeong, et al., Visual tracking using attentionmodulated disintegration integration, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Las Vegas, USA, 2016, pp. 4321–4330.
[54] M. Danelljan, F.S. Khan, M. Felsberg, et al., Adaptive col attributes f realtime visual tracking, in: Proc. of the IEEE Conf. on Computer Vision Pattern Recognition, Columbus, USA, 2014, pp. 1090–1097.
[55] H. Nam, M. Baek, B. Han, Modeling propagating CNNs in a tree structure f visual tracking [Online]. Available, https:arxiv.gabs1608.07242, August 2016.

Set citation alerts for the article
Please enter your email address