Fine-grained image classification with Gaussian mixture layer

Fine-grained image classification aims at recognizing different subordinates in one basic-level category, for example, distinguishing species of birds. Compared with basic-level classification, it has both low inter-class and high intra-class variances. Therefore, utilization of discriminative parts is crucial for fine-grained classification. In this paper, we propose a Gaussian mixture model, which fuses part features by Gaussian mixture layer. More specifically, it first generates a set of part proposals by selective search. Then, we extract image feature maps from mid-layers of convolutional neural networks. Feature maps and part proposals are used for calculating part features via spatial pyramid pooling. Next, Gaussian mixture layer treats part features as data points and uses several Gaussian components to model their distribution. It finds clusters for input and generates output features based on combination of cluster center. Finally, the output feature can represent the whole image and is used for classification. Training process of the model consists of two loops. The outer loop is the optimization of the whole network, and the inner loop is about the EM algorithm used in Gaussian mixture layer. Experiments demonstrate higher or similar performance on four fine-grained data sets compared with the state-of-the-arts. More discussions on Gaussian mixture layer are also provided.