Salient object detection with CNNs and multi-scale CRFs

Recent CNNs based salient object detection approaches tend to embed a fully connected Conditional Random Field (CRF) layer to refine the saliency maps from CNNs for post processing. Due to the significant performance enhancement by the CRF layer, in this paper, we propose a more flexible CRF refinement framework by embedding the CRF inference to multiple levels of side outputs from CNNs for multi-scale saliency refinement. A fully convolutional neural networks based on the simple yet effective encoder-decoder architecture with only three scales of side output maps is pre-trained. Then, the CRF layers are embedded to each scale of the side output respectively to complement the defects of each side output maps. Finally, the refined side output maps are fused and refined by another CRF inference for the final saliency map. The proposed multi-scale CRFs model (MCRF) is trained with low computational costs and shows competitive performance over four datasets in comparison with the existing state-of-the-art saliency models.