Cochlea modelling and its application to speech processing

Models of the cochlea provide a valuable tool for both better understanding its mechanics and also as an inspiration for many speech processing algorithms. Realistic modelling of the cochlea can be computationally demanding, however, which limits its applicability in signal processing applications....

Full description

Bibliographic Details
Main Author:	Pan, Shuokai
Other Authors:	Elliott, Stephen
Published:	University of Southampton 2018
Subjects:	620.2
Online Access:	https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.766799

Description
Summary:	Models of the cochlea provide a valuable tool for both better understanding its mechanics and also as an inspiration for many speech processing algorithms. Realistic modelling of the cochlea can be computationally demanding, however, which limits its applicability in signal processing applications. To mitigate this issue, an efficient numerical method has been proposed for performing time domain simulations, based on a nonlinear state space formulation. This model has then been contrasted with another type of cochlear model, that is established from a cascade of digital filters. A comparison of the responses from these two models has been conducted, in terms of their realism in simulating the measured nonlinear cochlear response to single tones and pairs of tones. Guided by these results, the filter cascade model is chosen for subsequent signal processing applications because it is significantly more efficient than the state space model, while still producing realistic responses. Using this nonlinear filter cascade model as a front-end, two speech processing tasks have been investigated: voice activity detection and supervised speech separation. Both tasks are tackled within a machine learning framework, in which a neural network is trained to reproduce target outputs. The results are compared with those using a number of other simpler auditory-inspired analysis methods. Simulation results show that although the nonlinear filter cascade model can be more effective in many testing scenarios, its relative advantage against other analysis methods is small. The incorporation of temporal context information and network structure engineering are found to be more important in improving the performance of these tasks. Once a suitable context expansion strategy has been selected, the difference between various front-end processing methods considered is marginal.

Cochlea modelling and its application to speech processing

Similar Items