Multiscale 3D-Shift Graph Convolution Network for Emotion Recognition From Human Actions
Emotion recognition from body gestures is challenging since similar emotions can be expressed by arbitrary spatial configurations of joints which results in relying on modeling spatial-temporal patterns from a more global level. However most recent powerful graph convolution networks (GCNs) separate the spatial and temporal modeling into isolated processes where GCN models spatial interactions using partially fixed adjacent matrices and 1D convolution captures temporal dynamics which is insufficient for emotion recognition. In this work we propose the 3D-Shift GCN which enables interactions of joints within a spatial-temporal volume for global feature extraction. Besides we further develop a multiscale architecture the MS-Shift GCN to fuse features captured under different temporal ranges for modeling richer dynamics. After conducting evaluation on two regular action recognition benchmarks and two gesture based emotion recognition datasets the results show that the proposed method outperforms several state-of-the-art methods.