HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition

Although the existing few-shot action recognition methods have achieved impressive results, they suffer from two major shortcomings. (a) During feature extraction, few-shot tasks are not distinguished and task-irrelevant features are obtained, resulting in the loss of task-specific critical discrimi...

Full description

Bibliographic Details
Main Authors:	Chen, P. (Author), Dang, Y. (Author), Huan, R. (Author), Jiang, L. (Author), Yu, J. (Author)
Format:	Article
Language:	English
Published:	MDPI 2023
Subjects:	action recognition attention mechanism dynamic network few-shot learning
Online Access:	View Fulltext in Publisher View in Scopus


LEADER	02746nam a2200241Ia 4500
001	10.3390-app13095277
008	230529s2023 CNT 000 0 und d
020			\|a 20763417 (ISSN)
245	1	0	\|a HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition
260		0	\|b MDPI \|c 2023
856			\|z View Fulltext in Publisher \|u https://doi.org/10.3390/app13095277
856			\|z View in Scopus \|u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159370170&doi=10.3390%2fapp13095277&partnerID=40&md5=c44a211c5fc2b7659ef3bbbd00ce9f40
520	3		\|a Although the existing few-shot action recognition methods have achieved impressive results, they suffer from two major shortcomings. (a) During feature extraction, few-shot tasks are not distinguished and task-irrelevant features are obtained, resulting in the loss of task-specific critical discriminative information. (b) During feature matching, information critical to the features within the task, i.e., self-information and mutual information, is ignored, resulting in the accuracy being affected by redundant or irrelevant information. To overcome these two limitations, we propose a hierarchical task information mining (HiTIM) approach for few-shot action recognition that incorporates two key components: an inter-task learner ((Formula presented.)) and an attention-matching module with an intra-task learner ((Formula presented.)). The purpose of the (Formula presented.) is to learn the knowledge of different tasks and build a task-related feature space for obtaining task-specific features. The proposed matching module with (Formula presented.) consists of two branches: the spatiotemporal self-attention matching (STM) and correlated cross-attention matching (CM), which reinforce key spatiotemporal information in features and mine regions with strong correlations between features, respectively. The shared (Formula presented.) can further optimize STM and CM. In our method, we can use either a 2D convolutional neural network (CNN) or 3D CNN as embedding. In comparable experiments using two different embeddings in the five-way one-shot and five-way five-shot task, the proposed method achieved recognition accuracy that outperformed other state-of-the-art (SOTA) few-shot action recognition methods on the HMDB51 dataset and was comparable to SOTA few-shot action recognition methods on the UCF101 and Kinetics datasets. © 2023 by the authors.
650	0	4	\|a action recognition
650	0	4	\|a attention mechanism
650	0	4	\|a dynamic network
650	0	4	\|a few-shot learning
700	1	0	\|a Chen, P. \|e author
700	1	0	\|a Dang, Y. \|e author
700	1	0	\|a Huan, R. \|e author
700	1	0	\|a Jiang, L. \|e author
700	1	0	\|a Yu, J. \|e author
773			\|t Applied Sciences (Switzerland)

HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition

Similar Items