Multiple-Cue Object Recognition for Interactionable Objects

Category-level object recognition is a fundamental capability for the potential use of robots in the assistance of humans in useful tasks. There have been numerous vision-based object recognition systems yielding fast and accurate results in constrained environments. However, by depending on visual...

Full description

Bibliographic Details
Main Author:	Aboutalib, Sarah
Format:	Others
Published:	Research Showcase @ CMU 2010
Subjects:	Computer Vision Multi-modal Human Interaction Object Recognition
Online Access:	http://repository.cmu.edu/dissertations/19 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1019&context=dissertations

id	ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-1019
record_format	oai_dc
spelling	ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-10192014-07-24T15:35:31Z Multiple-Cue Object Recognition for Interactionable Objects Aboutalib, Sarah Category-level object recognition is a fundamental capability for the potential use of robots in the assistance of humans in useful tasks. There have been numerous vision-based object recognition systems yielding fast and accurate results in constrained environments. However, by depending on visual cues, these techniques are susceptible to object variations in size, lighting, rotation, and pose, all of which cannot be avoided in real video data. Thus, the task of object recognition still remains very challenging. My thesis work builds upon the fact that robots can observe humans interacting with the objects in their environment. We refer to the set of objects, which can be involved in the interaction as `interactionable' objects. The interaction of humans with the `interactionable' objects provides numerous nonvisual cues to the identity of objects. In this thesis, I will introduce a flexible object recognition approach called Multiple-Cue Object Recognition (MCOR) that can use multiple cues of any predefined type, whether they are cues intrinsic to the object or provided by observation of a human. In pursuit of this goal, the thesis will provide several contributions: A representation for the multiple cues including an object definition that allows for the flexible addition of these cues; Weights that reflect the various strength of association between a particular cue and a particular object using a probabilistic relational model, as well as object displacement values for localizing the information in an image; Tools for defining visual features, segmentation, tracking, and the values for the non-visual cues; Lastly, an object recognition algorithm for the incremental discrimination of potential object categories. We evaluate these contributions through a number of methods including simulation to demonstrate the learning of weights and recognition based on an analytical model, an analytical model that demonstrates the robustness of the MCOR framework, and recognition results on real video data using a number of datasets including video taken from a humanoid robot (Sony QRIO), video captured from a meeting setting, scripted scenarios from outside universities, and unscripted TV cooking data. Using the datasets, we demonstrate the basic features of the MCOR algorithm including its ability to use multiple cues of different types. We demonstrate the applicability of MCOR to an outside dataset. We show that MCOR has better recognition results over vision-only recognition systems, and show that performance only improves with the addition of more cue types. 2010-12-08T08:00:00Z text application/pdf http://repository.cmu.edu/dissertations/19 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1019&context=dissertations Dissertations Research Showcase @ CMU Computer Vision Multi-modal Human Interaction Object Recognition
collection	NDLTD
format	Others
sources	NDLTD
topic	Computer Vision Multi-modal Human Interaction Object Recognition
spellingShingle	Computer Vision Multi-modal Human Interaction Object Recognition Aboutalib, Sarah Multiple-Cue Object Recognition for Interactionable Objects
description	Category-level object recognition is a fundamental capability for the potential use of robots in the assistance of humans in useful tasks. There have been numerous vision-based object recognition systems yielding fast and accurate results in constrained environments. However, by depending on visual cues, these techniques are susceptible to object variations in size, lighting, rotation, and pose, all of which cannot be avoided in real video data. Thus, the task of object recognition still remains very challenging. My thesis work builds upon the fact that robots can observe humans interacting with the objects in their environment. We refer to the set of objects, which can be involved in the interaction as `interactionable' objects. The interaction of humans with the `interactionable' objects provides numerous nonvisual cues to the identity of objects. In this thesis, I will introduce a flexible object recognition approach called Multiple-Cue Object Recognition (MCOR) that can use multiple cues of any predefined type, whether they are cues intrinsic to the object or provided by observation of a human. In pursuit of this goal, the thesis will provide several contributions: A representation for the multiple cues including an object definition that allows for the flexible addition of these cues; Weights that reflect the various strength of association between a particular cue and a particular object using a probabilistic relational model, as well as object displacement values for localizing the information in an image; Tools for defining visual features, segmentation, tracking, and the values for the non-visual cues; Lastly, an object recognition algorithm for the incremental discrimination of potential object categories. We evaluate these contributions through a number of methods including simulation to demonstrate the learning of weights and recognition based on an analytical model, an analytical model that demonstrates the robustness of the MCOR framework, and recognition results on real video data using a number of datasets including video taken from a humanoid robot (Sony QRIO), video captured from a meeting setting, scripted scenarios from outside universities, and unscripted TV cooking data. Using the datasets, we demonstrate the basic features of the MCOR algorithm including its ability to use multiple cues of different types. We demonstrate the applicability of MCOR to an outside dataset. We show that MCOR has better recognition results over vision-only recognition systems, and show that performance only improves with the addition of more cue types.
author	Aboutalib, Sarah
author_facet	Aboutalib, Sarah
author_sort	Aboutalib, Sarah
title	Multiple-Cue Object Recognition for Interactionable Objects
title_short	Multiple-Cue Object Recognition for Interactionable Objects
title_full	Multiple-Cue Object Recognition for Interactionable Objects
title_fullStr	Multiple-Cue Object Recognition for Interactionable Objects
title_full_unstemmed	Multiple-Cue Object Recognition for Interactionable Objects
title_sort	multiple-cue object recognition for interactionable objects
publisher	Research Showcase @ CMU
publishDate	2010
url	http://repository.cmu.edu/dissertations/19 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1019&context=dissertations
work_keys_str_mv	AT aboutalibsarah multiplecueobjectrecognitionforinteractionableobjects
_version_	1716709344053035008

Multiple-Cue Object Recognition for Interactionable Objects

Similar Items