Lifelong Fine-grained Image Retrieval

Fine-grained image retrieval has been extensively explored in a zero-shot manner. A deep model is trained on the seen part and then evaluated the generalization performance on the unseen part. However, this setting is infeasible for many real-world applications since (1) the retrieval dataset can be non-fixed so that new data are added constantly, and (2) data samples of the seen categories are also common in practice and are important for evaluation. In this paper, we explore lifelong fine-grained image retrieval (LFGIR), which learns continuously on a sequence of new tasks with data from different datasets. We first use knowledge distillation to minimize catastrophic forgetting on old tasks. Training continuously on different datasets causes large domain shifts between the old and new tasks while image retrieval is sensitive to even small shifts in the features. This tends to weaken the effectiveness of knowledge distillation by the frozen teacher. To mitigate the impact of domain shifts, we use the network inversion method to generate images of the old tasks. In addition, we design an on-the-fly teacher which transfers knowledge captured on a new task to the student to improve better generalization performance, thereby achieving a better balance between old and new tasks in the end. We name the whole framework as Dual Knowledge Distillation (DKD), whose efficacy is demonstrated by extensive experimental results on sequential tasks including 7 datasets.