Recognition of visual object classes
<p>Humans can look at a scene or a photograph and easily recognize objects. Outside my window I can see cars, people walking a dog on a brick pathway, trees, buildings, etc. This perception is so effortless that it belies the difficulty of the task. Visual perception begins with light that is...
Main Author: | |
---|---|
Format: | Others |
Published: |
1997
|
Online Access: | https://thesis.library.caltech.edu/93/1/Burl_mc_1997.pdf Burl, Michael C. (1997) Recognition of visual object classes. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/96P7-6E62. https://resolver.caltech.edu/CaltechETD:etd-01092008-094943 <https://resolver.caltech.edu/CaltechETD:etd-01092008-094943> |
id |
ndltd-CALTECH-oai-thesis.library.caltech.edu-93 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-CALTECH-oai-thesis.library.caltech.edu-932019-12-22T03:05:36Z Recognition of visual object classes Burl, Michael C. <p>Humans can look at a scene or a photograph and easily recognize objects. Outside my window I can see cars, people walking a dog on a brick pathway, trees, buildings, etc. This perception is so effortless that it belies the difficulty of the task. Visual perception begins with light that is reflected from the scene into the eye. The light impinges upon the retina and is transduced by a two-dimensional array of photoreceptors into noisy electrical signals. The brain must then accomplish the difficult task of transforming from this low-level representation to a higher-level understanding of the scene in terms of regions, surfaces, textures, and objects.</p> <p>For computer vision the problem is the same, but the hardware is different. A camera approximates the function of the eye and retina; that is, the camera produces a two-dimensional array of numbers (pixel values) representing the intensity of light reflected from the scene. The fundamental question addressed in this thesis is the following: what mathematical processing should be applied to the pixel values in order for a computer to recognize objects? The methods we propose are not intended as a model of human brain function, although they may provide some insight. We are simply trying to solve the same visual recognition problems as the brain without concern for whether (or how) our algorithms could be realized in neuronal "hardware."</p> <p>We have developed a new framework for recognizing visual object classes in which the class members consist of characteristic parts in a deformable spatial configuration. Human faces are an object class of this type, since faces consist of eyes, nose, and mouth arranged in a configuration that varies depending on expression and pose and also from one person to another. A second object class is cursive handwriting, which consists of loops, cusps, crossings, etc. arranged in a deformable pattern. In our approach, the allowed object deformations are represented through shape statistics, which are learned from examples. Instances of an object in an image are detected by finding the appropriate features in the correct spatial configuration. Our algorithm is robust with respect to partial occlusion, detector false alarms, and missed features.</p> <p>Potential applications include intelligent tools for finding objects in image data-bases, human-machine interfaces, user authentication, intelligent data gathering and compression, signature verification, and keyword spotting. Experimental results will be presented for two problems: (1) locating quasi-frontal views of human faces in cluttered scenes and with occlusions and (2) spotting keywords in on-line cursive handwriting data.</p> 1997 Thesis NonPeerReviewed application/pdf https://thesis.library.caltech.edu/93/1/Burl_mc_1997.pdf https://resolver.caltech.edu/CaltechETD:etd-01092008-094943 Burl, Michael C. (1997) Recognition of visual object classes. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/96P7-6E62. https://resolver.caltech.edu/CaltechETD:etd-01092008-094943 <https://resolver.caltech.edu/CaltechETD:etd-01092008-094943> https://thesis.library.caltech.edu/93/ |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
description |
<p>Humans can look at a scene or a photograph and easily recognize objects. Outside my window I can see cars, people walking a dog on a brick pathway, trees, buildings, etc. This perception is so effortless that it belies the difficulty of the task. Visual perception begins with light that is reflected from the scene into the eye. The light impinges upon the retina and is transduced by a two-dimensional array of photoreceptors into noisy electrical signals. The brain must then accomplish the difficult task of transforming from this low-level representation to a higher-level understanding of the scene in terms of regions, surfaces, textures, and objects.</p>
<p>For computer vision the problem is the same, but the hardware is different. A camera approximates the function of the eye and retina; that is, the camera produces a two-dimensional array of numbers (pixel values) representing the intensity of light reflected from the scene. The fundamental question addressed in this thesis is the following: what mathematical processing should be applied to the pixel values in order for a computer to recognize objects? The methods we propose are not intended as a model of human brain function, although they may provide some insight. We are simply trying to solve the same visual recognition problems as the brain without concern for whether (or how) our algorithms could be realized in neuronal "hardware."</p>
<p>We have developed a new framework for recognizing visual object classes in which the class members consist of characteristic parts in a deformable spatial configuration. Human faces are an object class of this type, since faces consist of eyes, nose, and mouth arranged in a configuration that varies depending on expression and pose and also from one person to another. A second object class is cursive handwriting, which consists of loops, cusps, crossings, etc. arranged in a deformable pattern. In our approach, the allowed object deformations are represented through shape statistics, which are learned from examples. Instances of an object in an image are detected by finding the appropriate features in the correct spatial configuration. Our algorithm is robust with respect to partial occlusion, detector false alarms, and missed features.</p>
<p>Potential applications include intelligent tools for finding objects in image data-bases, human-machine interfaces, user authentication, intelligent data gathering and compression, signature verification, and keyword spotting. Experimental results will be presented for two problems: (1) locating quasi-frontal views of human faces in cluttered scenes and with occlusions and (2) spotting keywords in on-line cursive handwriting data.</p> |
author |
Burl, Michael C. |
spellingShingle |
Burl, Michael C. Recognition of visual object classes |
author_facet |
Burl, Michael C. |
author_sort |
Burl, Michael C. |
title |
Recognition of visual object classes |
title_short |
Recognition of visual object classes |
title_full |
Recognition of visual object classes |
title_fullStr |
Recognition of visual object classes |
title_full_unstemmed |
Recognition of visual object classes |
title_sort |
recognition of visual object classes |
publishDate |
1997 |
url |
https://thesis.library.caltech.edu/93/1/Burl_mc_1997.pdf Burl, Michael C. (1997) Recognition of visual object classes. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/96P7-6E62. https://resolver.caltech.edu/CaltechETD:etd-01092008-094943 <https://resolver.caltech.edu/CaltechETD:etd-01092008-094943> |
work_keys_str_mv |
AT burlmichaelc recognitionofvisualobjectclasses |
_version_ |
1719304370711953408 |