Scalable multi-label canonical correlation analysis for cross-modal retrieval

Multi-label canonical correlation analysis (ml-CCA) has been developed for cross-modal retrieval. However, the computation of ml-CCA involves dense matrices eigendecomposition, which can be computationally expensive. In addition, ml-CCA only takes semantic correlation into account which ignores the cross-modal feature correlation. In this paper, we propose a novel framework to simultaneously integrate the semantic correlation and feature correlation for cross-modal retrieval. By using the semantic transformation, we show that our model can avoid computing the covariance matrix explicitly which is a huge save of computational cost. Further analysis shows that our proposed method can be solved via singular value decomposition which has linear time complexity. Experimental results on three multi-label datasets have demonstrated the accuracy and efficiency of our proposed method.