Spoken Language Identification Using Deep Learning

The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and e...

Full description

Bibliographic Details
Main Authors: Gundeep Singh, Sahil Sharma, Vijay Kumar, Manjit Kaur, Mohammed Baz, Mehedi Masud
Format: Article
Language:English
Published: Hindawi Limited 2021-01-01
Series:Computational Intelligence and Neuroscience
Online Access:http://dx.doi.org/10.1155/2021/5123671
id doaj-d605af44cdcf4ef1b3435770a82dff33
record_format Article
spelling doaj-d605af44cdcf4ef1b3435770a82dff332021-10-04T01:57:28ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52732021-01-01202110.1155/2021/5123671Spoken Language Identification Using Deep LearningGundeep Singh0Sahil Sharma1Vijay Kumar2Manjit Kaur3Mohammed Baz4Mehedi Masud5Computer Science and Engineering DepartmentComputer Science and Engineering DepartmentComputer Science and Engineering DepartmentSchool of Engineering and Applied SciencesDepartment of Computer EngineeringDepartment of Computer ScienceThe process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.http://dx.doi.org/10.1155/2021/5123671
collection DOAJ
language English
format Article
sources DOAJ
author Gundeep Singh
Sahil Sharma
Vijay Kumar
Manjit Kaur
Mohammed Baz
Mehedi Masud
spellingShingle Gundeep Singh
Sahil Sharma
Vijay Kumar
Manjit Kaur
Mohammed Baz
Mehedi Masud
Spoken Language Identification Using Deep Learning
Computational Intelligence and Neuroscience
author_facet Gundeep Singh
Sahil Sharma
Vijay Kumar
Manjit Kaur
Mohammed Baz
Mehedi Masud
author_sort Gundeep Singh
title Spoken Language Identification Using Deep Learning
title_short Spoken Language Identification Using Deep Learning
title_full Spoken Language Identification Using Deep Learning
title_fullStr Spoken Language Identification Using Deep Learning
title_full_unstemmed Spoken Language Identification Using Deep Learning
title_sort spoken language identification using deep learning
publisher Hindawi Limited
series Computational Intelligence and Neuroscience
issn 1687-5273
publishDate 2021-01-01
description The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.
url http://dx.doi.org/10.1155/2021/5123671
work_keys_str_mv AT gundeepsingh spokenlanguageidentificationusingdeeplearning
AT sahilsharma spokenlanguageidentificationusingdeeplearning
AT vijaykumar spokenlanguageidentificationusingdeeplearning
AT manjitkaur spokenlanguageidentificationusingdeeplearning
AT mohammedbaz spokenlanguageidentificationusingdeeplearning
AT mehedimasud spokenlanguageidentificationusingdeeplearning
_version_ 1716844772844371968