HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition
Although the existing few-shot action recognition methods have achieved impressive results, they suffer from two major shortcomings. (a) During feature extraction, few-shot tasks are not distinguished and task-irrelevant features are obtained, resulting in the loss of task-specific critical discrimi...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI
2023
|
Subjects: | |
Online Access: | View Fulltext in Publisher View in Scopus |
LEADER | 02746nam a2200241Ia 4500 | ||
---|---|---|---|
001 | 10.3390-app13095277 | ||
008 | 230529s2023 CNT 000 0 und d | ||
020 | |a 20763417 (ISSN) | ||
245 | 1 | 0 | |a HiTIM: Hierarchical Task Information Mining for Few-Shot Action Recognition |
260 | 0 | |b MDPI |c 2023 | |
856 | |z View Fulltext in Publisher |u https://doi.org/10.3390/app13095277 | ||
856 | |z View in Scopus |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159370170&doi=10.3390%2fapp13095277&partnerID=40&md5=c44a211c5fc2b7659ef3bbbd00ce9f40 | ||
520 | 3 | |a Although the existing few-shot action recognition methods have achieved impressive results, they suffer from two major shortcomings. (a) During feature extraction, few-shot tasks are not distinguished and task-irrelevant features are obtained, resulting in the loss of task-specific critical discriminative information. (b) During feature matching, information critical to the features within the task, i.e., self-information and mutual information, is ignored, resulting in the accuracy being affected by redundant or irrelevant information. To overcome these two limitations, we propose a hierarchical task information mining (HiTIM) approach for few-shot action recognition that incorporates two key components: an inter-task learner ((Formula presented.)) and an attention-matching module with an intra-task learner ((Formula presented.)). The purpose of the (Formula presented.) is to learn the knowledge of different tasks and build a task-related feature space for obtaining task-specific features. The proposed matching module with (Formula presented.) consists of two branches: the spatiotemporal self-attention matching (STM) and correlated cross-attention matching (CM), which reinforce key spatiotemporal information in features and mine regions with strong correlations between features, respectively. The shared (Formula presented.) can further optimize STM and CM. In our method, we can use either a 2D convolutional neural network (CNN) or 3D CNN as embedding. In comparable experiments using two different embeddings in the five-way one-shot and five-way five-shot task, the proposed method achieved recognition accuracy that outperformed other state-of-the-art (SOTA) few-shot action recognition methods on the HMDB51 dataset and was comparable to SOTA few-shot action recognition methods on the UCF101 and Kinetics datasets. © 2023 by the authors. | |
650 | 0 | 4 | |a action recognition |
650 | 0 | 4 | |a attention mechanism |
650 | 0 | 4 | |a dynamic network |
650 | 0 | 4 | |a few-shot learning |
700 | 1 | 0 | |a Chen, P. |e author |
700 | 1 | 0 | |a Dang, Y. |e author |
700 | 1 | 0 | |a Huan, R. |e author |
700 | 1 | 0 | |a Jiang, L. |e author |
700 | 1 | 0 | |a Yu, J. |e author |
773 | |t Applied Sciences (Switzerland) |