Learning Deconvolutional Network for Object Tracking

Object tracking can be tackled by learning a model of tracking the target's appearance sequentially. Therefore, robust appearance representation is a critical step in visual tracking. Recently, deep convolution network has demonstrated remarkable ability in visual tracking via leveraging robust...

Full description

Bibliographic Details
Main Authors: Xiankai Lu, Hong Huo, Tao Fang, Huanlong Zhang
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8326476/
id doaj-ecd0e6b6ea364ec88eb0ac7f6a93b6fb
record_format Article
spelling doaj-ecd0e6b6ea364ec88eb0ac7f6a93b6fb2021-03-29T21:00:08ZengIEEEIEEE Access2169-35362018-01-016180321804110.1109/ACCESS.2018.28200048326476Learning Deconvolutional Network for Object TrackingXiankai Lu0https://orcid.org/0000-0002-9543-6960Hong Huo1Tao Fang2Huanlong Zhang3Department of Automation, Shanghai Jiao Tong University, Shanghai, ChinaDepartment of Automation, Shanghai Jiao Tong University, Shanghai, ChinaDepartment of Automation, Shanghai Jiao Tong University, Shanghai, ChinaZhengzhou University of Light Industry, Zhangzhou, ChinaObject tracking can be tackled by learning a model of tracking the target's appearance sequentially. Therefore, robust appearance representation is a critical step in visual tracking. Recently, deep convolution network has demonstrated remarkable ability in visual tracking via leveraging robust high-level features. To obtain these high-level features, convolution and pooling operations are executed alternatively in deep convolution network. However, these operations lead to low spatial resolution feature maps which degrade the localization precision in tracking. While low level features have sufficient spatial resolution, their representation ability is insufficient. To mitigate this issue, we exploited deconvolution network in visual tracking. This deconvolution network works as a learnable upsampling layer which takes low-resolution high-level feature maps as input and outputs enlarged feature maps. Meanwhile, the low level feature maps are fused with these high level feature maps via a summarization operation to better represent target appearance. We formulate the network training as a regression issue and train this network end to end. Extensive experiments on two tracking benchmarks demonstrate the effectiveness of our method.https://ieeexplore.ieee.org/document/8326476/Object trackingdeep learningdeconvolution neural networkregression network
collection DOAJ
language English
format Article
sources DOAJ
author Xiankai Lu
Hong Huo
Tao Fang
Huanlong Zhang
spellingShingle Xiankai Lu
Hong Huo
Tao Fang
Huanlong Zhang
Learning Deconvolutional Network for Object Tracking
IEEE Access
Object tracking
deep learning
deconvolution neural network
regression network
author_facet Xiankai Lu
Hong Huo
Tao Fang
Huanlong Zhang
author_sort Xiankai Lu
title Learning Deconvolutional Network for Object Tracking
title_short Learning Deconvolutional Network for Object Tracking
title_full Learning Deconvolutional Network for Object Tracking
title_fullStr Learning Deconvolutional Network for Object Tracking
title_full_unstemmed Learning Deconvolutional Network for Object Tracking
title_sort learning deconvolutional network for object tracking
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Object tracking can be tackled by learning a model of tracking the target's appearance sequentially. Therefore, robust appearance representation is a critical step in visual tracking. Recently, deep convolution network has demonstrated remarkable ability in visual tracking via leveraging robust high-level features. To obtain these high-level features, convolution and pooling operations are executed alternatively in deep convolution network. However, these operations lead to low spatial resolution feature maps which degrade the localization precision in tracking. While low level features have sufficient spatial resolution, their representation ability is insufficient. To mitigate this issue, we exploited deconvolution network in visual tracking. This deconvolution network works as a learnable upsampling layer which takes low-resolution high-level feature maps as input and outputs enlarged feature maps. Meanwhile, the low level feature maps are fused with these high level feature maps via a summarization operation to better represent target appearance. We formulate the network training as a regression issue and train this network end to end. Extensive experiments on two tracking benchmarks demonstrate the effectiveness of our method.
topic Object tracking
deep learning
deconvolution neural network
regression network
url https://ieeexplore.ieee.org/document/8326476/
work_keys_str_mv AT xiankailu learningdeconvolutionalnetworkforobjecttracking
AT honghuo learningdeconvolutionalnetworkforobjecttracking
AT taofang learningdeconvolutionalnetworkforobjecttracking
AT huanlongzhang learningdeconvolutionalnetworkforobjecttracking
_version_ 1724193784812535808