Distinctive Phonetic Features Modeling and Extraction Using Deep Neural Networks

Feature extraction is a critical stage of digital speech processing systems. Quality of features is of great importance to provide a solid foundation upon which the subsequent stages stand. Distinctive phonetic features (DPFs) are one of the most representative features of the speech signals. The si...

Full description

Bibliographic Details
Main Authors:	Yasser Seddiq, Yousef A. Alotaibi, Sid-Ahmed Selouani, Ali Hamid Meftah
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Modern standard Arabic distinctive phonetic features speech processing deep belief networks restricted Boltzmann machine
Online Access:	https://ieeexplore.ieee.org/document/8742638/

Description
Summary:	Feature extraction is a critical stage of digital speech processing systems. Quality of features is of great importance to provide a solid foundation upon which the subsequent stages stand. Distinctive phonetic features (DPFs) are one of the most representative features of the speech signals. The significance of DPFs is in their ability to provide abstract description of the places and manners of articulation of the language phonemes. A phoneme's DPF element reflects unique articulatory information about that phoneme. Therefore, there is a need to discover and investigate each DPF element individually in order to achieve a deeper understanding and to come up with a descriptive model for each one. Such fine-grained modeling will satisfy the uniqueness of each DPF element. In this paper, the problem of DPF modeling and extraction of modern standard Arabic is tackled. Due to the remarkable success of deep neural networks (DNNs) that are initialized using deep belief networks (DBNs) in serving DSP applications and its capability of extracting highly representative features from the raw data, we exploit its modeling power to investigate and model the DPF elements. DNN models are compared with the classical multilayer perceptron (MLP) models. The representativeness of several acoustic cues for different DPF elements was also measured. This paper is based on formalizing DPF modeling problem as a binary classification problem. Because the DPF elements are highly imbalanced data, evaluating the quality of models is a very tricky process. This paper addresses the proper evaluation measures satisfying the imbalanced nature of the DPF elements. After modeling each element individually, the two top-level DPF extractors are designed: MLP- and DNN-based extractors. The results show the quality of DNN models and their superiority over MLPs with accuracies of 89.0% and 86.7%, respectively.
ISSN:	2169-3536

Distinctive Phonetic Features Modeling and Extraction Using Deep Neural Networks

Similar Items