Early Action Prediction With Generative Adversarial Networks

Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by...

Full description

Bibliographic Details
Main Authors:	Dong Wang, Yuan Yuan, Qi Wang
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Computer vision video analysis action prediction
Online Access:	https://ieeexplore.ieee.org/document/8666721/

id	doaj-957f63d234b642f68d1e695525cfe99b
record_format	Article
spelling	doaj-957f63d234b642f68d1e695525cfe99b2021-03-29T22:23:31ZengIEEEIEEE Access2169-35362019-01-017357953580410.1109/ACCESS.2019.29048578666721Early Action Prediction With Generative Adversarial NetworksDong Wang0Yuan Yuan1Qi Wang2https://orcid.org/0000-0002-7028-4956School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, ChinaSchool of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, ChinaSchool of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, ChinaAction Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.https://ieeexplore.ieee.org/document/8666721/Computer visionvideo analysisaction prediction
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Dong Wang Yuan Yuan Qi Wang
spellingShingle	Dong Wang Yuan Yuan Qi Wang Early Action Prediction With Generative Adversarial Networks IEEE Access Computer vision video analysis action prediction
author_facet	Dong Wang Yuan Yuan Qi Wang
author_sort	Dong Wang
title	Early Action Prediction With Generative Adversarial Networks
title_short	Early Action Prediction With Generative Adversarial Networks
title_full	Early Action Prediction With Generative Adversarial Networks
title_fullStr	Early Action Prediction With Generative Adversarial Networks
title_full_unstemmed	Early Action Prediction With Generative Adversarial Networks
title_sort	early action prediction with generative adversarial networks
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system. In this paper, we address this problem by developing an end-to-end architecture that improves the discriminability of features of partially observed videos by assimilating them to features from complete videos. For this purpose, the generative adversarial network is introduced for tackling action prediction problem, which improves the recognition accuracy of partially observed videos though narrowing the feature difference of partially observed videos from complete ones. Specifically, its generator comprises of two networks: a CNN for feature extraction and an LSTM for estimating residual error between features of the partially observed videos and complete ones, and then the features from CNN adds the residual error from LSTM, which is regarded as the enhanced feature to fool a competing discriminator. Meanwhile, the generator is trained with an additional perceptual objective, which forces the enhanced features of partially observed videos are discriminative enough for action prediction. The extensive experimental results on UCF101, BIT, and UT-Interaction datasets demonstrate that our approach outperforms the state-of-the-art methods, especially for videos that less than 50% portion of frames is observed.
topic	Computer vision video analysis action prediction
url	https://ieeexplore.ieee.org/document/8666721/
work_keys_str_mv	AT dongwang earlyactionpredictionwithgenerativeadversarialnetworks AT yuanyuan earlyactionpredictionwithgenerativeadversarialnetworks AT qiwang earlyactionpredictionwithgenerativeadversarialnetworks
_version_	1724191755946950656

Early Action Prediction With Generative Adversarial Networks

Similar Items