| Summary: | Hand pose estimation from egocentric video is a topic of significant interest with broad implications for human-computer interactions, assistive technologies, activity recognition, and robotics. The efficacy of modern machine learning models depends on the quality of data used for their training. Thus, this work is devoted to the analysis of state-of-the-art egocentric datasets suitable for 2D hand pose estimation. We propose a novel protocol for dataset evaluation, which includes quantitative accuracy assessments, analysis of variability and challenging scenarios in dataset contents, realism, as well as the identification of dataset shortcomings through the performance evaluation of leading hand pose estimation models (OpenPose, DetNet, HRNetv2 and MediaPipe). Our study reveals that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases. There is no ideal benchmark dataset yet; however, H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively.
|