HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition

Although the existing few-shot action recognition methods have achieved impressive results, they suffer from two major shortcomings. (a) During feature extraction, few-shot tasks are not distinguished and task-irrelevant features are obtained, resulting in the loss of task-specific critical discrimi...

Full description

Bibliographic Details
Main Authors: Chen, P. (Author), Dang, Y. (Author), Huan, R. (Author), Jiang, L. (Author), Yu, J. (Author)
Format: Article
Language:English
Published: MDPI 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 02746nam a2200241Ia 4500
001 10.3390-app13095277
008 230529s2023 CNT 000 0 und d
020 |a 20763417 (ISSN) 
245 1 0 |a HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition 
260 0 |b MDPI  |c 2023 
856 |z View Fulltext in Publisher  |u https://doi.org/10.3390/app13095277 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159370170&doi=10.3390%2fapp13095277&partnerID=40&md5=c44a211c5fc2b7659ef3bbbd00ce9f40 
520 3 |a Although the existing few-shot action recognition methods have achieved impressive results, they suffer from two major shortcomings. (a) During feature extraction, few-shot tasks are not distinguished and task-irrelevant features are obtained, resulting in the loss of task-specific critical discriminative information. (b) During feature matching, information critical to the features within the task, i.e., self-information and mutual information, is ignored, resulting in the accuracy being affected by redundant or irrelevant information. To overcome these two limitations, we propose a hierarchical task information mining (HiTIM) approach for few-shot action recognition that incorporates two key components: an inter-task learner ((Formula presented.)) and an attention-matching module with an intra-task learner ((Formula presented.)). The purpose of the (Formula presented.) is to learn the knowledge of different tasks and build a task-related feature space for obtaining task-specific features. The proposed matching module with (Formula presented.) consists of two branches: the spatiotemporal self-attention matching (STM) and correlated cross-attention matching (CM), which reinforce key spatiotemporal information in features and mine regions with strong correlations between features, respectively. The shared (Formula presented.) can further optimize STM and CM. In our method, we can use either a 2D convolutional neural network (CNN) or 3D CNN as embedding. In comparable experiments using two different embeddings in the five-way one-shot and five-way five-shot task, the proposed method achieved recognition accuracy that outperformed other state-of-the-art (SOTA) few-shot action recognition methods on the HMDB51 dataset and was comparable to SOTA few-shot action recognition methods on the UCF101 and Kinetics datasets. © 2023 by the authors. 
650 0 4 |a action recognition 
650 0 4 |a attention mechanism 
650 0 4 |a dynamic network 
650 0 4 |a few-shot learning 
700 1 0 |a Chen, P.  |e author 
700 1 0 |a Dang, Y.  |e author 
700 1 0 |a Huan, R.  |e author 
700 1 0 |a Jiang, L.  |e author 
700 1 0 |a Yu, J.  |e author 
773 |t Applied Sciences (Switzerland)