Convolutional Network Representation for Visual Recognition

Image representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that...

Full description

Bibliographic Details
Main Author:	Sharif Razavian, Ali
Format:	Doctoral Thesis
Language:	English
Published:	KTH, Robotik, perception och lärande, RPL 2017
Subjects:	Convolutional Network Visual Recognition Transfer Learning
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-197919 http://nbn-resolving.de/urn:isbn:978-91-7729-213-5

id	ndltd-UPSALLA1-oai-DiVA.org-kth-197919
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-1979192016-12-24T05:12:04ZConvolutional Network Representation for Visual RecognitionengSharif Razavian, AliKTH, Robotik, perception och lärande, RPL2017Convolutional NetworkVisual RecognitionTransfer LearningImage representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that the semantics of interest can be inferred easily and reliably. This thesis is written in the form of a compilation of publications and tries to look into the Convolutional Networks (CovnNets) representation in visual recognition problems from an empirical perspective. Convolutional Network is a special class of Neural Networks with a hierarchical structure where every layer’s output (except for the last layer) will be the input of another one. It was shown that ConvNets are powerful tools to learn a generic representation of an image. In this body of work, we first showed that this is indeed the case and ConvNet representation with a simple classifier can outperform highly-tuned pipelines based on hand-crafted features. To be precise, we first trained a ConvNet on a large dataset, then for every image in another task with a small dataset, we feedforward the image to the ConvNet and take the ConvNets activation on a certain layer as the image representation. Transferring the knowledge from the large dataset (source task) to the small dataset (target task) proved to be effective and outperformed baselines on a variety of tasks in visual recognition. We also evaluated the presence of spatial visual semantics in ConvNet representation and observed that ConvNet retains significant spatial information despite the fact that it has never been explicitly trained to preserve low-level semantics. We then tried to investigate the factors that affect the transferability of these representations. We studied various factors on a diverse set of visual recognition tasks and found a consistent correlation between the effect of those factors and the similarity of the target task to the source task. This intuition alongside the experimental results provides a guideline to improve the performance of visual recognition tasks using ConvNet features. Finally, we addressed the task of visual instance retrieval specifically as an example of how these simple intuitions can increase the performance of the target task massively. <p>QC 20161209</p>Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-197919urn:isbn:978-91-7729-213-5TRITA-CSC-A, 1653-5723 ; 2017:01application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Doctoral Thesis
sources	NDLTD
topic	Convolutional Network Visual Recognition Transfer Learning
spellingShingle	Convolutional Network Visual Recognition Transfer Learning Sharif Razavian, Ali Convolutional Network Representation for Visual Recognition
description	Image representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that the semantics of interest can be inferred easily and reliably. This thesis is written in the form of a compilation of publications and tries to look into the Convolutional Networks (CovnNets) representation in visual recognition problems from an empirical perspective. Convolutional Network is a special class of Neural Networks with a hierarchical structure where every layer’s output (except for the last layer) will be the input of another one. It was shown that ConvNets are powerful tools to learn a generic representation of an image. In this body of work, we first showed that this is indeed the case and ConvNet representation with a simple classifier can outperform highly-tuned pipelines based on hand-crafted features. To be precise, we first trained a ConvNet on a large dataset, then for every image in another task with a small dataset, we feedforward the image to the ConvNet and take the ConvNets activation on a certain layer as the image representation. Transferring the knowledge from the large dataset (source task) to the small dataset (target task) proved to be effective and outperformed baselines on a variety of tasks in visual recognition. We also evaluated the presence of spatial visual semantics in ConvNet representation and observed that ConvNet retains significant spatial information despite the fact that it has never been explicitly trained to preserve low-level semantics. We then tried to investigate the factors that affect the transferability of these representations. We studied various factors on a diverse set of visual recognition tasks and found a consistent correlation between the effect of those factors and the similarity of the target task to the source task. This intuition alongside the experimental results provides a guideline to improve the performance of visual recognition tasks using ConvNet features. Finally, we addressed the task of visual instance retrieval specifically as an example of how these simple intuitions can increase the performance of the target task massively. === <p>QC 20161209</p>
author	Sharif Razavian, Ali
author_facet	Sharif Razavian, Ali
author_sort	Sharif Razavian, Ali
title	Convolutional Network Representation for Visual Recognition
title_short	Convolutional Network Representation for Visual Recognition
title_full	Convolutional Network Representation for Visual Recognition
title_fullStr	Convolutional Network Representation for Visual Recognition
title_full_unstemmed	Convolutional Network Representation for Visual Recognition
title_sort	convolutional network representation for visual recognition
publisher	KTH, Robotik, perception och lärande, RPL
publishDate	2017
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-197919 http://nbn-resolving.de/urn:isbn:978-91-7729-213-5
work_keys_str_mv	AT sharifrazavianali convolutionalnetworkrepresentationforvisualrecognition
_version_	1718405529221464064

Convolutional Network Representation for Visual Recognition

Similar Items