Learning from hierarchical spatiotemporal descriptors for micro-expression recognition

Micro-expression recognition aims to infer genuine emotions that people try to conceal from facial video clips. It is a very challenging task because micro-expressions have a very low intensity and short duration, which makes micro-expressions difficult to observe. Recently, researchers have designed various spatiotemporal descriptors to describe micro-expressions. It is notable that for better capturing the low-intensity facial muscle movement, a fixed spatial division grid, 8× 8 for example, is commonly used to partition the facial images into a few facial blocks before extracting descriptors. However, it is hard to choose an ideal division grid for different micro-expression samples because the division grids affect the discriminative ability of spatiotemporal descriptors to distinguish micro-expressions. To address this problem, in this paper, we design a hierarchical spatial division scheme for spatiotemporal descriptor extraction. By using the proposed scheme, it would not be a problem to determine which division grid is most suitable regarding different micro-expression samples. Furthermore, we propose a kernelized group sparse learning (KGSL) model to process hierarchical scheme based spatiotemporal descriptors such that they are more effective for micro-expression recognition tasks. To evaluate the performance of the proposed micro-expression recognition method consisting of the hierarchical scheme based spatiotemporal descriptors and KGSL, extensive experiments are conducted on two public micro-expression databases: CASME II and SMIC. Compared with many recent state-of-the-art approaches, our method achieves more promising recognition results.