9 April 2020

Robust visual tracking via collaborative and reinforced convolutional feature learning

Convolutional neural networks are potent models that yield hierarchies of features and have drawn increasing interest in the visual tracking field. In the paper, we design an end-to-end trainable tracking framework based on Siamese network, which proposes to learn the low-level fine-grained and high-level semantic representations simultaneously with the aim of mutual benefit. Due to the distinct and complementary characteristics of the feature hierarchies, different tracking mechanisms are adopted for different feature layers. The low-level features are exploited and updated with a correlation filter layer for adaptive tracking and the high-level features are compared through cross-correlation directly for robust tracking. The two-level features are jointly trained with a multi-task loss function end-to-end. The proposed tracker takes full advantage of the adaptability of the low-level features and the generalization ability of the high-level features. Extensive experimental tracking results on the widely used OTB and TC128 benchmarks demonstrate the superiority of our tracker. Meanwhile, our proposed tracker can achieve a real-time tracking speed.