Optimization of an Image-Based Talking Head System

<p/> <p>This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and th...

Full description

Bibliographic Details
Main Authors: Liu Kang, Ostermann Joern
Format: Article
Language:English
Published: SpringerOpen 2009-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://asmp.eurasipjournals.com/content/2009/174192
id doaj-cd666be9199044298127ffaeee886c87
record_format Article
spelling doaj-cd666be9199044298127ffaeee886c872020-11-25T00:34:24ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-0120091174192Optimization of an Image-Based Talking Head SystemLiu KangOstermann Joern<p/> <p>This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.</p>http://asmp.eurasipjournals.com/content/2009/174192
collection DOAJ
language English
format Article
sources DOAJ
author Liu Kang
Ostermann Joern
spellingShingle Liu Kang
Ostermann Joern
Optimization of an Image-Based Talking Head System
EURASIP Journal on Audio, Speech, and Music Processing
author_facet Liu Kang
Ostermann Joern
author_sort Liu Kang
title Optimization of an Image-Based Talking Head System
title_short Optimization of an Image-Based Talking Head System
title_full Optimization of an Image-Based Talking Head System
title_fullStr Optimization of an Image-Based Talking Head System
title_full_unstemmed Optimization of an Image-Based Talking Head System
title_sort optimization of an image-based talking head system
publisher SpringerOpen
series EURASIP Journal on Audio, Speech, and Music Processing
issn 1687-4714
1687-4722
publishDate 2009-01-01
description <p/> <p>This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.</p>
url http://asmp.eurasipjournals.com/content/2009/174192
work_keys_str_mv AT liukang optimizationofanimagebasedtalkingheadsystem
AT ostermannjoern optimizationofanimagebasedtalkingheadsystem
_version_ 1725313514269048832