Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias

This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit d...

Full description

Bibliographic Details
Main Authors: Ryo Nakashima, Ryo Ozaki, Tadahiro Taniguchi
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-10-01
Series:Frontiers in Robotics and AI
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/frobt.2019.00092/full
id doaj-4b932ba456b74b6bb492cd16ee1b14af
record_format Article
spelling doaj-4b932ba456b74b6bb492cd16ee1b14af2020-11-25T01:34:57ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442019-10-01610.3389/frobt.2019.00092458555Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric BiasRyo NakashimaRyo OzakiTadahiro TaniguchiThis paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers.https://www.frontiersin.org/article/10.3389/frobt.2019.00092/fullword discoveryphoneme discoveryparametric biasBayesian modelneural network
collection DOAJ
language English
format Article
sources DOAJ
author Ryo Nakashima
Ryo Ozaki
Tadahiro Taniguchi
spellingShingle Ryo Nakashima
Ryo Ozaki
Tadahiro Taniguchi
Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
Frontiers in Robotics and AI
word discovery
phoneme discovery
parametric bias
Bayesian model
neural network
author_facet Ryo Nakashima
Ryo Ozaki
Tadahiro Taniguchi
author_sort Ryo Nakashima
title Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_short Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_full Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_fullStr Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_full_unstemmed Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
title_sort unsupervised phoneme and word discovery from multiple speakers using double articulation analyzer and neural network with parametric bias
publisher Frontiers Media S.A.
series Frontiers in Robotics and AI
issn 2296-9144
publishDate 2019-10-01
description This paper describes a new unsupervised machine-learning method for simultaneous phoneme and word discovery from multiple speakers. Phoneme and word discovery from multiple speakers is a more challenging problem than that from one speaker, because the speech signals from different speakers exhibit different acoustic features. The existing method, a nonparametric Bayesian double articulation analyzer (NPB-DAA) with deep sparse autoencoder (DSAE) only performed phoneme and word discovery from a single speaker. Extending NPB-DAA with DSAE to a multi-speaker scenario is, therefore, the research problem of this paper.This paper proposes the employment of a DSAE with parametric bias in the hidden layer (DSAE-PBHL) as a feature extractor for unsupervised phoneme and word discovery. DSAE-PBHL is designed to subtract speaker-dependent acoustic features and speaker-independent features by introducing parametric bias input to the DSAE hidden layer. An experiment demonstrated that DSAE-PBHL could subtract distributed representations of acoustic signals, enabling extraction based on the types of phonemes rather than the speakers. Another experiment demonstrated that a combination of NPB-DAA and DSAE-PBHL outperformed other available methods accomplishing phoneme and word discovery tasks involving speech signals with Japanese vowel sequences from multiple speakers.
topic word discovery
phoneme discovery
parametric bias
Bayesian model
neural network
url https://www.frontiersin.org/article/10.3389/frobt.2019.00092/full
work_keys_str_mv AT ryonakashima unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias
AT ryoozaki unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias
AT tadahirotaniguchi unsupervisedphonemeandworddiscoveryfrommultiplespeakersusingdoublearticulationanalyzerandneuralnetworkwithparametricbias
_version_ 1725069390305558528