Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation

Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally p...

Full description

Bibliographic Details
Main Authors:	Le Wang, Xuhuan Duan, Qilin Zhang, Zhenxing Niu, Gang Hua, Nanning Zheng
Format:	Article
Language:	English
Published:	MDPI AG 2018-05-01
Series:	Sensors
Subjects:	action localization action segmentation 3D ConvNets LSTM
Online Access:	http://www.mdpi.com/1424-8220/18/5/1657

id	doaj-57fd4fb741dd405abdb18b01dd8cdfd6
record_format	Article
spelling	doaj-57fd4fb741dd405abdb18b01dd8cdfd62020-11-24T21:18:58ZengMDPI AGSensors1424-82202018-05-01185165710.3390/s18051657s18051657Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame SegmentationLe Wang0Xuhuan Duan1Qilin Zhang2Zhenxing Niu3Gang Hua4Nanning Zheng5Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi 710049, ChinaInstitute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi 710049, ChinaHERE Technologies, Chicago, IL 60606, USAAlibaba Group, Hangzhou 311121, ChinaMicrosoft Research, Redmond, WA 98052, USAInstitute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi 710049, ChinaInspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. Experimental results on three datasets validated the efficacy of the proposed method, including (1) temporal action localization on the THUMOS 2014 dataset; (2) spatial action segmentation on the Segtrack dataset; and (3) joint spatio-temporal action localization on the newly proposed ActSeg dataset. It is shown that our method compares favorably with existing state-of-the-art methods.http://www.mdpi.com/1424-8220/18/5/1657action localizationaction segmentation3D ConvNetsLSTM
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Le Wang Xuhuan Duan Qilin Zhang Zhenxing Niu Gang Hua Nanning Zheng
spellingShingle	Le Wang Xuhuan Duan Qilin Zhang Zhenxing Niu Gang Hua Nanning Zheng Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation Sensors action localization action segmentation 3D ConvNets LSTM
author_facet	Le Wang Xuhuan Duan Qilin Zhang Zhenxing Niu Gang Hua Nanning Zheng
author_sort	Le Wang
title	Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
title_short	Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
title_full	Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
title_fullStr	Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
title_full_unstemmed	Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
title_sort	segment-tube: spatio-temporal action localization in untrimmed videos with per-frame segmentation
publisher	MDPI AG
series	Sensors
issn	1424-8220
publishDate	2018-05-01
description	Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. Experimental results on three datasets validated the efficacy of the proposed method, including (1) temporal action localization on the THUMOS 2014 dataset; (2) spatial action segmentation on the Segtrack dataset; and (3) joint spatio-temporal action localization on the newly proposed ActSeg dataset. It is shown that our method compares favorably with existing state-of-the-art methods.
topic	action localization action segmentation 3D ConvNets LSTM
url	http://www.mdpi.com/1424-8220/18/5/1657
work_keys_str_mv	AT lewang segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT xuhuanduan segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT qilinzhang segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT zhenxingniu segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT ganghua segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT nanningzheng segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation
_version_	1726007475563397120

Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation

Similar Items