| Summary: | Accurate human pose embedding is crucial for action recognition. While traditional convolutional neural networks (CNNs) have advanced pose feature extraction, they struggle to model structural relationships and long-range dependencies between keypoints, and are less robust to occlusions. To address these limitations, we present SAGCN, a novel model integrating graph convolutional network (GCN) with self-attention. SAGCN leverages GCN to preserve keypoint structure and self-attention to capture long-range dependencies. We further introduce probabilistic pose embeddings to represent inherent multi-view pose uncertainty. Evaluated on Human 3.6M and MPI-INF-3DHP for cross-view retrieval, and on PenAction for sequence alignment, SAGCN outperforms existing methods in retrieval and achieves competitive alignment results compared to specialized approaches.
|