User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
Video segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the user's intention. However, previous studies of video segmentation have that far not taken a user's intention into...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8865015/ |
id |
doaj-4e78ef9df5cf42a48dcb12fe372a6824 |
---|---|
record_format |
Article |
spelling |
doaj-4e78ef9df5cf42a48dcb12fe372a68242021-03-29T23:41:04ZengIEEEIEEE Access2169-35362019-01-01714982014983210.1109/ACCESS.2019.29468898865015User-Guided Clustering for Video Segmentation on Coarse-Grained Feature ExtractionXinhui Peng0https://orcid.org/0000-0001-5795-0451Rui Li1https://orcid.org/0000-0001-6448-5092Jilong Wang2Hao Shang3College of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaVideo segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the user's intention. However, previous studies of video segmentation have that far not taken a user's intention into consideration. In this paper, a two-stage user-guided video segmentation framework has been presented, including dimension reduction and temporal clustering. During the dimension reduction stage, a coarse granularity feature extraction is conducted by a deep convolutional neural network pre-trained on ImageNet. In the temporal clustering stage, the information of the user's intention is utilized to segment videos on time domain with a proposed operator, which calculates the similarity distance between dimension reduced frames. To provide more insight into the videos, a hierarchical clustering method that allows users to segment videos at different granularities is proposed. Evaluation on Open Video Scene Detection(OVSD) dataset shows that the average F-score achieved by the proposed method is 0.72, even coarse-grained feature extraction is adopted. The evaluation also demonstrated that the proposed method can not only produce different segmentation results according to the user's intention, but it also produces hierarchical segmentation results from a low level to a higher abstraction level.https://ieeexplore.ieee.org/document/8865015/Clustering methodsdimension reductionfeature extractionuser centered designvideo segmentation |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xinhui Peng Rui Li Jilong Wang Hao Shang |
spellingShingle |
Xinhui Peng Rui Li Jilong Wang Hao Shang User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction IEEE Access Clustering methods dimension reduction feature extraction user centered design video segmentation |
author_facet |
Xinhui Peng Rui Li Jilong Wang Hao Shang |
author_sort |
Xinhui Peng |
title |
User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction |
title_short |
User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction |
title_full |
User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction |
title_fullStr |
User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction |
title_full_unstemmed |
User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction |
title_sort |
user-guided clustering for video segmentation on coarse-grained feature extraction |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Video segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the user's intention. However, previous studies of video segmentation have that far not taken a user's intention into consideration. In this paper, a two-stage user-guided video segmentation framework has been presented, including dimension reduction and temporal clustering. During the dimension reduction stage, a coarse granularity feature extraction is conducted by a deep convolutional neural network pre-trained on ImageNet. In the temporal clustering stage, the information of the user's intention is utilized to segment videos on time domain with a proposed operator, which calculates the similarity distance between dimension reduced frames. To provide more insight into the videos, a hierarchical clustering method that allows users to segment videos at different granularities is proposed. Evaluation on Open Video Scene Detection(OVSD) dataset shows that the average F-score achieved by the proposed method is 0.72, even coarse-grained feature extraction is adopted. The evaluation also demonstrated that the proposed method can not only produce different segmentation results according to the user's intention, but it also produces hierarchical segmentation results from a low level to a higher abstraction level. |
topic |
Clustering methods dimension reduction feature extraction user centered design video segmentation |
url |
https://ieeexplore.ieee.org/document/8865015/ |
work_keys_str_mv |
AT xinhuipeng userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction AT ruili userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction AT jilongwang userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction AT haoshang userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction |
_version_ |
1724189076390674432 |