User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction

Video segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the user's intention. However, previous studies of video segmentation have that far not taken a user's intention into...

Full description

Bibliographic Details
Main Authors: Xinhui Peng, Rui Li, Jilong Wang, Hao Shang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8865015/
id doaj-4e78ef9df5cf42a48dcb12fe372a6824
record_format Article
spelling doaj-4e78ef9df5cf42a48dcb12fe372a68242021-03-29T23:41:04ZengIEEEIEEE Access2169-35362019-01-01714982014983210.1109/ACCESS.2019.29468898865015User-Guided Clustering for Video Segmentation on Coarse-Grained Feature ExtractionXinhui Peng0https://orcid.org/0000-0001-5795-0451Rui Li1https://orcid.org/0000-0001-6448-5092Jilong Wang2Hao Shang3College of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaCollege of Computer Science and Electronic Engineering, Hunan University, Changsha, ChinaVideo segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the user's intention. However, previous studies of video segmentation have that far not taken a user's intention into consideration. In this paper, a two-stage user-guided video segmentation framework has been presented, including dimension reduction and temporal clustering. During the dimension reduction stage, a coarse granularity feature extraction is conducted by a deep convolutional neural network pre-trained on ImageNet. In the temporal clustering stage, the information of the user's intention is utilized to segment videos on time domain with a proposed operator, which calculates the similarity distance between dimension reduced frames. To provide more insight into the videos, a hierarchical clustering method that allows users to segment videos at different granularities is proposed. Evaluation on Open Video Scene Detection(OVSD) dataset shows that the average F-score achieved by the proposed method is 0.72, even coarse-grained feature extraction is adopted. The evaluation also demonstrated that the proposed method can not only produce different segmentation results according to the user's intention, but it also produces hierarchical segmentation results from a low level to a higher abstraction level.https://ieeexplore.ieee.org/document/8865015/Clustering methodsdimension reductionfeature extractionuser centered designvideo segmentation
collection DOAJ
language English
format Article
sources DOAJ
author Xinhui Peng
Rui Li
Jilong Wang
Hao Shang
spellingShingle Xinhui Peng
Rui Li
Jilong Wang
Hao Shang
User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
IEEE Access
Clustering methods
dimension reduction
feature extraction
user centered design
video segmentation
author_facet Xinhui Peng
Rui Li
Jilong Wang
Hao Shang
author_sort Xinhui Peng
title User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
title_short User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
title_full User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
title_fullStr User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
title_full_unstemmed User-Guided Clustering for Video Segmentation on Coarse-Grained Feature Extraction
title_sort user-guided clustering for video segmentation on coarse-grained feature extraction
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Video segmentation is the task of temporally dividing a video into semantic sections, which are typically based on a specific concept or a theme that is usually defined by the user's intention. However, previous studies of video segmentation have that far not taken a user's intention into consideration. In this paper, a two-stage user-guided video segmentation framework has been presented, including dimension reduction and temporal clustering. During the dimension reduction stage, a coarse granularity feature extraction is conducted by a deep convolutional neural network pre-trained on ImageNet. In the temporal clustering stage, the information of the user's intention is utilized to segment videos on time domain with a proposed operator, which calculates the similarity distance between dimension reduced frames. To provide more insight into the videos, a hierarchical clustering method that allows users to segment videos at different granularities is proposed. Evaluation on Open Video Scene Detection(OVSD) dataset shows that the average F-score achieved by the proposed method is 0.72, even coarse-grained feature extraction is adopted. The evaluation also demonstrated that the proposed method can not only produce different segmentation results according to the user's intention, but it also produces hierarchical segmentation results from a low level to a higher abstraction level.
topic Clustering methods
dimension reduction
feature extraction
user centered design
video segmentation
url https://ieeexplore.ieee.org/document/8865015/
work_keys_str_mv AT xinhuipeng userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction
AT ruili userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction
AT jilongwang userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction
AT haoshang userguidedclusteringforvideosegmentationoncoarsegrainedfeatureextraction
_version_ 1724189076390674432