Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally p...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-05-01
|
Series: | Sensors |
Subjects: | |
Online Access: | http://www.mdpi.com/1424-8220/18/5/1657 |
id |
doaj-57fd4fb741dd405abdb18b01dd8cdfd6 |
---|---|
record_format |
Article |
spelling |
doaj-57fd4fb741dd405abdb18b01dd8cdfd62020-11-24T21:18:58ZengMDPI AGSensors1424-82202018-05-01185165710.3390/s18051657s18051657Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame SegmentationLe Wang0Xuhuan Duan1Qilin Zhang2Zhenxing Niu3Gang Hua4Nanning Zheng5Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi 710049, ChinaInstitute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi 710049, ChinaHERE Technologies, Chicago, IL 60606, USAAlibaba Group, Hangzhou 311121, ChinaMicrosoft Research, Redmond, WA 98052, USAInstitute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, Shannxi 710049, ChinaInspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. Experimental results on three datasets validated the efficacy of the proposed method, including (1) temporal action localization on the THUMOS 2014 dataset; (2) spatial action segmentation on the Segtrack dataset; and (3) joint spatio-temporal action localization on the newly proposed ActSeg dataset. It is shown that our method compares favorably with existing state-of-the-art methods.http://www.mdpi.com/1424-8220/18/5/1657action localizationaction segmentation3D ConvNetsLSTM |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Le Wang Xuhuan Duan Qilin Zhang Zhenxing Niu Gang Hua Nanning Zheng |
spellingShingle |
Le Wang Xuhuan Duan Qilin Zhang Zhenxing Niu Gang Hua Nanning Zheng Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation Sensors action localization action segmentation 3D ConvNets LSTM |
author_facet |
Le Wang Xuhuan Duan Qilin Zhang Zhenxing Niu Gang Hua Nanning Zheng |
author_sort |
Le Wang |
title |
Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation |
title_short |
Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation |
title_full |
Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation |
title_fullStr |
Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation |
title_full_unstemmed |
Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation |
title_sort |
segment-tube: spatio-temporal action localization in untrimmed videos with per-frame segmentation |
publisher |
MDPI AG |
series |
Sensors |
issn |
1424-8220 |
publishDate |
2018-05-01 |
description |
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. Experimental results on three datasets validated the efficacy of the proposed method, including (1) temporal action localization on the THUMOS 2014 dataset; (2) spatial action segmentation on the Segtrack dataset; and (3) joint spatio-temporal action localization on the newly proposed ActSeg dataset. It is shown that our method compares favorably with existing state-of-the-art methods. |
topic |
action localization action segmentation 3D ConvNets LSTM |
url |
http://www.mdpi.com/1424-8220/18/5/1657 |
work_keys_str_mv |
AT lewang segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT xuhuanduan segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT qilinzhang segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT zhenxingniu segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT ganghua segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation AT nanningzheng segmenttubespatiotemporalactionlocalizationinuntrimmedvideoswithperframesegmentation |
_version_ |
1726007475563397120 |