Lip Reading Sentences Using Deep Learning With Only Visual Cues

In this paper, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that ma...

Full description

Bibliographic Details
Main Authors: Souheil Fenghour, Daqing Chen, Kun Guo, Perry Xiao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9272286/
id doaj-e050523559d5405bb76b812045989caf
record_format Article
spelling doaj-e050523559d5405bb76b812045989caf2021-03-30T04:01:10ZengIEEEIEEE Access2169-35362020-01-01821551621553010.1109/ACCESS.2020.30409069272286Lip Reading Sentences Using Deep Learning With Only Visual CuesSouheil Fenghour0https://orcid.org/0000-0002-6725-0405Daqing Chen1https://orcid.org/0000-0003-0030-1199Kun Guo2https://orcid.org/0000-0002-1436-1742Perry Xiao3School of Engineering, London South Bank University, London, U.K.School of Engineering, London South Bank University, London, U.K.Xi’an VANXUM Electronics Technology Company Ltd., Helizijun, ChinaSchool of Engineering, London South Bank University, London, U.K.In this paper, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The system has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art works in lip reading sentences, the system has achieved a significantly improved performance with 15% lower word error rate. In addition, experiments with videos of varying illumination have shown that the proposed model has a good robustness to varying levels of lighting. The main contributions of this paper are: 1) The classification of visemes in continuous speech using a specially designed transformer with a unique topology; 2) The use of visemes as a classification schema for lip reading sentences; and 3) The conversion of visemes to words using perplexity analysis. All the contributions serve to enhance the accuracy of lip reading sentences. The paper also provides an essential survey of the research area.https://ieeexplore.ieee.org/document/9272286/Deep learninglip readingneural networksperplexity analysisspeech recognition
collection DOAJ
language English
format Article
sources DOAJ
author Souheil Fenghour
Daqing Chen
Kun Guo
Perry Xiao
spellingShingle Souheil Fenghour
Daqing Chen
Kun Guo
Perry Xiao
Lip Reading Sentences Using Deep Learning With Only Visual Cues
IEEE Access
Deep learning
lip reading
neural networks
perplexity analysis
speech recognition
author_facet Souheil Fenghour
Daqing Chen
Kun Guo
Perry Xiao
author_sort Souheil Fenghour
title Lip Reading Sentences Using Deep Learning With Only Visual Cues
title_short Lip Reading Sentences Using Deep Learning With Only Visual Cues
title_full Lip Reading Sentences Using Deep Learning With Only Visual Cues
title_fullStr Lip Reading Sentences Using Deep Learning With Only Visual Cues
title_full_unstemmed Lip Reading Sentences Using Deep Learning With Only Visual Cues
title_sort lip reading sentences using deep learning with only visual cues
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description In this paper, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The system has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art works in lip reading sentences, the system has achieved a significantly improved performance with 15% lower word error rate. In addition, experiments with videos of varying illumination have shown that the proposed model has a good robustness to varying levels of lighting. The main contributions of this paper are: 1) The classification of visemes in continuous speech using a specially designed transformer with a unique topology; 2) The use of visemes as a classification schema for lip reading sentences; and 3) The conversion of visemes to words using perplexity analysis. All the contributions serve to enhance the accuracy of lip reading sentences. The paper also provides an essential survey of the research area.
topic Deep learning
lip reading
neural networks
perplexity analysis
speech recognition
url https://ieeexplore.ieee.org/document/9272286/
work_keys_str_mv AT souheilfenghour lipreadingsentencesusingdeeplearningwithonlyvisualcues
AT daqingchen lipreadingsentencesusingdeeplearningwithonlyvisualcues
AT kunguo lipreadingsentencesusingdeeplearningwithonlyvisualcues
AT perryxiao lipreadingsentencesusingdeeplearningwithonlyvisualcues
_version_ 1724182569988128768