Author Affiliations
1Postgraduate Brigade, Engineering University of PAP, Xi’an , Shaanxi 710086, China2School of Information Engineering, Engineering University of PAP, Xi’an , Shaanxi 710086, Chinashow less
Fig. 1. Kalman filter tracker flow diagram
Fig. 2. Particle filter tracker flow diagram
Feature descriptor | Representation methods | Scene adaptability |
---|
Manual visual feature | Intensity | MOSSE[8], CSK[9] | Single stable scene | Gray | Single stable scene | Color | CN[10], ASMS[11], DAT[12], CSCT[13] | Partial occlusion, fast move, scale invariation | CN, Lab | Inapplicable to light changes | Gradient | KCF[14], BACF[15], LCT[16], DSST[17] | Translation, rotation, illumination invariation | HOG, SIFT | Inapplicable to deformation and motion blur | Texture | TLD[18] | Illumination, scale, fast motion invariation | LBP | Inapplicable to deformation, time-consuming | Optical Flow | FlowTrack[19] | Suitable for short-term occlusion and scale change | Not suitable for severe occlusion, camera shake, and out-of-view | Deep feature | Deep SRDCF[20], CFNet[21] | Strong robustness to complex scenes including occlusion | Poor real-time performance and interpretability |
|
Table 1. Scene adaptability of tracking features and representative methods
Limitations of traditional methods | Related improvements measures | Methods |
---|
Time-consuming | Key point optical flow calculation instead of global filed | [22-23] | Limitation of small displacement | Construct an optical flow pyramid for downsampling | [24] | Inaccurate expression | Deep optical flow estimation | [25-26] | Uncertainty under occlusion | Symmetry between optical map and occlusion area | [26] |
|
Table 2. Development of optical flow estimation methods
Feature type | High-level feature of VGG-Net | Low-level feature of VGG-Net |
---|
Extraction layer | Conv4-4, Conv5-4 | Conv1-2, Conv2-2 | Description | Semantic nature information | Contour and texture | Advantages | Robust to appearance change, occlusion | Essential positioning | Disadvantages | Low spatial resolution | Poor anti-interference ability |
|
Table 3. Expression traits of features extracted from VGG-Net different layers
Type | Algorithm | Feature extracted | Object model characteristics | Fitted scenes |
---|
Color and gradient feature | SAMF[29] | HOG, CN, Grey | Scale adaptive with multiple features | Partial occlusion, deformation, rotations, interference | MOCA[30] | MC-HOG | HOG extracted from CN channels | Staple[31] | Color, HOG | Global-local features, 80 fps of speed | SAT[32] | RGB, SIFT | Encoding structure, keypoints spatial layout | DPCF[33] | Color, HOG | Collaborative and deformable model | MvCFT[35] | HOG, CN, Grey | Fusion model of multi-perspective feature | Appearance feature of different layers | HCF[37] | Conv3-4, Conv4-4,Conv5-4 | Holistic model with weighted multi-feature filter confidence map | Occlusion, clutters, deformation, semantics-distractors | C-COT[38] | Conv1, Conv5 | Continuous spatial interpolation, preciser | ECO[39] | Conv1, Conv5 | PCA decomposed convolution, fast | SANet[44] | CNN and RNN feature | Skip concatenation, self-structure encoding | Appearance and motion feature | DFCNet[41] | Optical flow, CNN feataure | Adaptive keyframe scheduling mechanism | Clutters, fast movements, deformation, heavy occlusion | DCTN[43] | Optical flow, CNN feataure | Pyramidal feature hierarchy | STSGS[42] | Conv3-4, Conv4-4, Conv5-4 | Quantum mechanics based saliency detection and motion flow map generation | FPRNet[45] | Optical flow, CNN and RNN feature | Multi-scale spatiotemporal representations by flow pyramid recurrent framework |
|
Table 4. Object model traits and fitted scenes of representative algorithms based on fusion features
Methods | Type | System requirements | Characteristics and applicability of tracker methods |
---|
Status | Noise |
---|
[34, 47-50] | KF | Linear | Gaussian | Vulnerable to interference | Applicable to severe occlusion temporarily | [51] | EKF | Non-linear | Gaussian | Easy to accumulate error | [52-53] | UKF | Non-linear | Non-gaussian | Higher speed and accuracy |
|
Table 5. Characteristics and scenario applicability of three kinds of KF tracking methods
Characteristics | Kalman filter | Particle filter |
---|
Probability model | State and observation models | Weighted particle swarm estimate | Requirments | Gaussian and linear system | Non-linear and non-gaussian system | Anti-occlusion | Prediction mechanism with fast convergence | Prediction and multi-modal maintenance | Applicable scenes | Short-term occlusion | Partial occlusion, background interference | Disadvantage | Applicable scene limitations | Particles degenerate, time-consuming |
|
Table 6. Scenes applicability and traits of status estimation model between Kalman filter and particle filter
Anti-occlusion scheme | Methods | Contribution and characteristics | Occlusion application |
---|
Occlusion detection | [54] | Normalization factor for occlusion detection | Long-term, fully, partial | [55] | Internality of weight value and distribution region | [56] | Third-order cumulants of reconstruction error | Trajectory prediction | [57] | Bhattacharyya coefficient as judgment criterion | Shot-term | Modified model to maintain particle diversity | Robust observation | [58] | Multiple likelihood models of HSV and HOG | Partial, serious, long-term | [59] | Color distribution model | [60] | Deep feature and color histogram in adaptive mode | Redistribution | [61] | Particles generation are independently | Serious | [62] | Redistribution based on region growth | Memory Mechanism | [63] | Memory-based state estimation scheme | Serious, long-term | [64] | Adaptive update strategy with well model saved | Mark flag | [65] | Binary occlusion flag state representation for particles | Partial | Deep learning | [66] | Observation model built by RBF neural network | Long-term |
|
Table 7. Anti-occlusion scheme, innovation and occlusion adaptation of algothrims based on PF state estimation
Characteristic | Convolutional neural networks | Recurrent Neural Network |
---|
LSTM | GRU |
---|
Network | Composition | Convolutional layer, pooling layer, downsampling layer | Forgot gate | Update gate, reset gate | Structural features | Local receptive field Weight sharing Spatiotemporal downsampling | Better effect | Higher efficiency | Memory and feedback functions by current and historical states connection | Tracking | Information | Spatial semantic information | Temporal information | Disadvantage | Sensitive to similar interference, lacking relevant information, may lose spatial details | High time consumption |
|
Table 8. Comparison of structure and tracking applications characteristics between CNN and RNN
Tracker | Object models | Success rate | Precision | FPS | PC(CPU,RAM,Nvidia GPU) | Test dataset |
---|
All | Occ | All | Occ |
---|
CSK | Grey | 0.398 | | 0.545 | | 152 | Intel i5-760 2.80 GHz CPU, 16 G RAM | OTB50 | KCF | G | 0.514 | 0.513 | 0.740 | 0.749 | 172 | Intel i5-760 2.80 GHz CPU, 16 G RAM | CN | C | 0.444 | 0.428 | 0.635 | 0.629 | 105 | Intel i5-760 2.80 GHz CPU, 16 G RAM | SAMF | G+C | 0.567 | 0.613 | 0.774 | 0.839 | 7 | Intel i5-760 2.80 GHz CPU, 16 G RAM | DSST | G | 0.555 | 0.534 | 0.549 | | 24 | Intel-i5-4590 CPU, 8 G RAM | MvCFT | MAF | 0.532 | 0.555 | | | 25.5 | Intel-i5-4590 CPU, 8 G RAM | MOCA | G+C | 0.569 | | 0.824 | 0.877 | 16.5 | Intel Xeon2 2.50 GHz CPU, 256 G RAM | Staple | G+C | 0.694 | | 0.513 | | 80 | Intel Core i7-4790K 4.0 GHz CPU | FPRNet | DMF, DAF | 0.613 | 0.854 | | | 1 | Intel Core i7-4790K 4.0 GHz CPU | RPT | PF, Pb | 0.576 | | 0.812 | | 4.15 | Intel-i7-3770 3.4 GHz CPU, 16 G RAM | SPWT | DMF, Kb | | 0.530 | | 0.792 | 62.3 | Intel i7-6700 CPU | RTT | Pb, RF | 0.588 | 0.827 | | | 3‒4 | 2.80 GHz CPU, 16 G RAM | Re3 | DAF, T-C | 0.422 | 0.390 | | | 150 | Intel Xeon 2.20 GHz CPU, Nvidia Titan X | KCF | G | 0.476 | | 0.693 | 0.625 | 172 | Intel i7 3.7 GHz CPU, 12 GB RAM | OTB100 | C-COT | DAF | 0.673 | | 0.902 | | 0.3 | Intel i7 3.7 GHz, GTX TITAN Z GPU | SANet | DAF, RF | 0.692 | | 0.928 | | 1 | Intel i7 3.7 GHz, GTX TITAN Z GPU | SAMF | G+C | 0.535 | 0.529 | 0.743 | | 16.8 | Intel Xeon 2.6 GHz CPU, 256 G RAM | SAMF-CA | ST-C | 0.575 | 0.550 | 0.793 | | 13 | Intel Xeon 2.6 GHz CPU, 256 G RAM | Staple | G+C | 0.579 | 0.543 | 0.784 | | 59.8 | Intel Xeon 2.6 GHz CPU, 256 G RAM | Staple-CA | ST-C | 0.579 | 0.558 | 0.810 | | 35.2 | Intel Xeon 2.6 GHz CPU, 256 G RAM | DSST | G | 0.475 | | 0.695 | 0.615 | 24 | Intel i7 3.7 GHz CPU, 12 GB RAM | LGCF | LGPb | 0.585 | | 0.782 | 0.719 | 8 | Intel i7 3.7 GHz CPU, 12 GB RAM | RPT | Kb | 0.715 | | 0.936 | | 20 | GeForce GTX 1080Ti GPU |
|
Table 9. Performance comparison of representative trackers with variety models on OTB50 and OTB100