Early Action Prediction With Generative Adversarial Networks

Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by...

Full description

Bibliographic Details
Main Authors: Dong Wang, Yuan Yuan, Qi Wang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8666721/
id doaj-957f63d234b642f68d1e695525cfe99b
record_format Article
spelling doaj-957f63d234b642f68d1e695525cfe99b2021-03-29T22:23:31ZengIEEEIEEE Access2169-35362019-01-017357953580410.1109/ACCESS.2019.29048578666721Early Action Prediction With Generative Adversarial NetworksDong Wang0Yuan Yuan1Qi Wang2https://orcid.org/0000-0002-7028-4956School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, ChinaSchool of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, ChinaSchool of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, ChinaAction Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.https://ieeexplore.ieee.org/document/8666721/Computer visionvideo analysisaction prediction
collection DOAJ
language English
format Article
sources DOAJ
author Dong Wang
Yuan Yuan
Qi Wang
spellingShingle Dong Wang
Yuan Yuan
Qi Wang
Early Action Prediction With Generative Adversarial Networks
IEEE Access
Computer vision
video analysis
action prediction
author_facet Dong Wang
Yuan Yuan
Qi Wang
author_sort Dong Wang
title Early Action Prediction With Generative Adversarial Networks
title_short Early Action Prediction With Generative Adversarial Networks
title_full Early Action Prediction With Generative Adversarial Networks
title_fullStr Early Action Prediction With Generative Adversarial Networks
title_full_unstemmed Early Action Prediction With Generative Adversarial Networks
title_sort early action prediction with generative adversarial networks
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.
topic Computer vision
video analysis
action prediction
url https://ieeexplore.ieee.org/document/8666721/
work_keys_str_mv AT dongwang earlyactionpredictionwithgenerativeadversarialnetworks
AT yuanyuan earlyactionpredictionwithgenerativeadversarialnetworks
AT qiwang earlyactionpredictionwithgenerativeadversarialnetworks
_version_ 1724191755946950656