A spatiotemporal convolutional neural network for automatic pain intensity estimation from facial dynamics

Devising computational models for detecting abnormalities reflective of diseases from facial structures is a novel and emerging field of research in automatic face analysis. In this paper, we focus on automatic pain intensity estimation from faces. This has a paramount potential diagnosis values in healthcare applications. In this context, we present a novel 3D deep model for dynamic spatiotemporal representation of faces in videos. Using several convolutional layers with diverse temporal depths, our proposed model captures a wide range of spatiotemporal variations in the faces. Moreover, we introduce a cross-architecture knowledge transfer technique for training 3D convolutional neural networks using a pre-trained 2D architecture. This strategy is a practical approach for training 3D models, especially when the size of the database is relatively small. Our extensive experiments and analysis on two benchmarking and publicly available databases, namely the UNBC-McMaster shoulder pain and the BioVid, clearly show that our proposed method consistently outperforms many state-of-the-art methods in automatic pain intensity estimation.