End to End Alignment Learning of Instructional Videos with Spatiotemporal Hybrid Encoding and Decoding Space Reduction

We solve the problem of how to densely align actions in videos at frame level, with only the order of occurring actions available, in order to save the time-consuming efforts to accurately annotate the temporal boundaries of each action. We propose three task-specific innovations under this scenario...

Full description

Bibliographic Details
Main Authors:	Lin Wang, Xingfu Wang, Ammar Hawbani, Yan Xiong
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Applied Sciences
Subjects:	temporal video segmentation temporal video alignment connectionist temporal classification (CTC) transformer convolutional neural networks (CNNs) computer vision
Online Access:	https://www.mdpi.com/2076-3417/11/11/4954

Internet

https://www.mdpi.com/2076-3417/11/11/4954

End to End Alignment Learning of Instructional Videos with Spatiotemporal Hybrid Encoding and Decoding Space Reduction

Internet

Similar Items