Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection

This paper focuses on the semi-supervised object detection (SSOD) which makes good use of unlabeled data to boost performance. We face the following obstacles when adapting the knowledge distillation (KD) framework in SSOD. (1) The teacher model serves a dual role as a teacher and a student, such that the teacher predictions on unlabeled images may limit the upper bound of the student. (2) The data imbalance issue caused by the large quantity of consistent predictions between the teacher and student hinders an efficient knowledge transfer between them. To mitigate these issues, we propose a novel SSOD model called Temporal Self-Ensembling Teacher (TSET). Our teacher model ensembles its temporal predictions for unlabeled images under stochastic perturbations. Then, our teacher model ensembles its model weights with those of the student model by an exponential moving average. These ensembling strategies ensure data and model diversity, and lead to better teacher predictions for unlabeled images. In addition, we adapt the focal loss to formulate the consistency loss for handling the data imbalance issue. Together with a thresholding method, the focal loss automatically reweights the inconsistent predictions, which preserves the knowledge for difficult objects to detect in the unlabeled images. The mAP of our model reaches 80.73% and 40.52% on the VOC2007 test set and the COCO2014 minival5k set, respectively, and outperforms a strong fully supervised detector by 2.37% and 1.49%, respectively. Furthermore, the mAP of our model (80.73%) sets a new state-of-the-art performance in SSOD on the VOC2007 test set.