LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES

Lip-reading is typically known as visually interpreting the speaker's lip movements during speaking. Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. This effect becomes more apparent in noisy environments. Taking st...

Full description

Bibliographic Details
Main Authors:	Fatemeh Vakhshiteh, Farshad Almasganj, Ahmad Nickabadi
Format:	Article
Language:	English
Published:	Slovenian Society for Stereology and Quantitative Image Analysis 2018-07-01
Series:	Image Analysis and Stereology
Subjects:	Deep belief Networks Hidden Markov Model lip-reading Restricted Boltzmann Machine
Online Access:	https://www.ias-iss.org/ojs/IAS/article/view/1859

id	doaj-e01eccdec6094166aeaf066f87557962
record_format	Article
spelling	doaj-e01eccdec6094166aeaf066f875579622020-11-24T23:08:39ZengSlovenian Society for Stereology and Quantitative Image AnalysisImage Analysis and Stereology1580-31391854-51652018-07-0137215917110.5566/ias.18591002LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURESFatemeh Vakhshiteh0Farshad Almasganj1Ahmad Nickabadi2Amirkabir University of Technology - Tehran PolytechnicAmirkabir University of Technology - Tehran PolytechnicAmirkabir University of Technology - Tehran PolytechnicLip-reading is typically known as visually interpreting the speaker's lip movements during speaking. Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. This effect becomes more apparent in noisy environments. Taking steps toward automating this process, some challenges will be raised such as coarticulation phenomenon, visual units' type, features diversity and their inter-speaker dependency. While efforts have been made to overcome these challenges, presentation of a flawless lip-reading system is still under the investigations. This paper searches for a lipreading model with an efficiently developed incorporation and arrangement of processing blocks to extract highly discriminative visual features. Here, application of a properly structured Deep Belief Network (DBN)- based recognizer is highlighted. Multi-speaker (MS) and speaker-independent (SI) tasks are performed over CUAVE database, and phone recognition rates (PRRs) of 77.65% and 73.40% are achieved, respectively. The best word recognition rates (WRRs) achieved in the tasks of MS and SI are 80.25% and 76.91%, respectively. Resulted accuracies demonstrate that the proposed method outperforms the conventional Hidden Markov Model (HMM) and competes well with the state-of-the-art visual speech recognition works.https://www.ias-iss.org/ojs/IAS/article/view/1859Deep belief NetworksHidden Markov Modellip-readingRestricted Boltzmann Machine
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Fatemeh Vakhshiteh Farshad Almasganj Ahmad Nickabadi
spellingShingle	Fatemeh Vakhshiteh Farshad Almasganj Ahmad Nickabadi LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES Image Analysis and Stereology Deep belief Networks Hidden Markov Model lip-reading Restricted Boltzmann Machine
author_facet	Fatemeh Vakhshiteh Farshad Almasganj Ahmad Nickabadi
author_sort	Fatemeh Vakhshiteh
title	LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES
title_short	LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES
title_full	LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES
title_fullStr	LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES
title_full_unstemmed	LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES
title_sort	lip-reading via deep neural networks using hybrid visual features
publisher	Slovenian Society for Stereology and Quantitative Image Analysis
series	Image Analysis and Stereology
issn	1580-3139 1854-5165
publishDate	2018-07-01
description	Lip-reading is typically known as visually interpreting the speaker's lip movements during speaking. Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. This effect becomes more apparent in noisy environments. Taking steps toward automating this process, some challenges will be raised such as coarticulation phenomenon, visual units' type, features diversity and their inter-speaker dependency. While efforts have been made to overcome these challenges, presentation of a flawless lip-reading system is still under the investigations. This paper searches for a lipreading model with an efficiently developed incorporation and arrangement of processing blocks to extract highly discriminative visual features. Here, application of a properly structured Deep Belief Network (DBN)- based recognizer is highlighted. Multi-speaker (MS) and speaker-independent (SI) tasks are performed over CUAVE database, and phone recognition rates (PRRs) of 77.65% and 73.40% are achieved, respectively. The best word recognition rates (WRRs) achieved in the tasks of MS and SI are 80.25% and 76.91%, respectively. Resulted accuracies demonstrate that the proposed method outperforms the conventional Hidden Markov Model (HMM) and competes well with the state-of-the-art visual speech recognition works.
topic	Deep belief Networks Hidden Markov Model lip-reading Restricted Boltzmann Machine
url	https://www.ias-iss.org/ojs/IAS/article/view/1859
work_keys_str_mv	AT fatemehvakhshiteh lipreadingviadeepneuralnetworksusinghybridvisualfeatures AT farshadalmasganj lipreadingviadeepneuralnetworksusinghybridvisualfeatures AT ahmadnickabadi lipreadingviadeepneuralnetworksusinghybridvisualfeatures
_version_	1725613062450315264

LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES

Similar Items