Abstract:
Human Pose Estimation (HPE) is a relatively new and significant computer vision field and its applications. HPE is the process of estimating the location of the human body joints from the image or video. The correct estimate of human body joints is used to track people’s minimal activities in real-time applications. HPE is an extensive research area that relies on many individuals being monitored. HPE can be categorized in two ways, e.g. (a) based on the number of humans whose pose to be estimated, i.e. single-person or multi-person pose estimation and (b) based on the environment used, i.e. 2-dimension (2D) or 3-dimension (3D). Initially, this paper presents a traditional approach in brief and later paper focus on recent advancement in HPE using deep learning approaches. A rigorous review of deep learning approaches using both top-down and bottom-up approaches is expressed and compared using various evaluation matrices and models' accuracy. It is observed that most models succeed to perform well on MPII dataset in comparison to COCO dataset. The Distributed aware architecture gives the best performance among all models providing 97% Percentage of Correct Key (PCK) on the MPII dataset and 78.9% average precision (AP) on the COCO dataset. For multi-person HPE, AP is limited to 78.6% using the AlphaPose model.