This paper presents a novel method for remote heart rate (HR) estimation. Recent studies have proved that blood pumping by the heart is highly correlated to the intense color of face pixels, and surprisingly can be utilized for remote HR estimation. Researchers successfully proposed several methods for this task, but making it work in realistic situations is still a challenging problem in computer vision community. Furthermore, learning to solve such a complex task on a dataset with very limited annotated samples is not reasonable. Consequently, researchers do not prefer to use the deep learning approaches for this problem. In this paper, we propose a simple yet efficient approach to benefit the advantages of the Deep Neural Network (DNN) by simplifying HR estimation from a complex task to learning from very correlated representation to HR. Inspired by previous work, we learn a component called Front-End (FE) to provide a discriminative representation of face videos, afterward a light deep regression auto-encoder as Back-End (BE) is learned to map the FE representation to HR. Regression task on the informative representation is simple and could be learned efficiently on limited training samples. Beside of this, to be more accurate and work well on low-quality videos, two deep encoder–decoder networks are trained to refine the output of FE. We also introduce a challenging dataset (HR-D) to show that our method can efficiently work in realistic conditions. Experimental results on HR-D and MAHNOB datasets confirm that our method could run as a real-time method and estimate the average HR better than state-of-the-art ones.