Spoken Language Identification Using Deep Learning
The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and e...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2021-01-01
|
Series: | Computational Intelligence and Neuroscience |
Online Access: | http://dx.doi.org/10.1155/2021/5123671 |
id |
doaj-d605af44cdcf4ef1b3435770a82dff33 |
---|---|
record_format |
Article |
spelling |
doaj-d605af44cdcf4ef1b3435770a82dff332021-10-04T01:57:28ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52732021-01-01202110.1155/2021/5123671Spoken Language Identification Using Deep LearningGundeep Singh0Sahil Sharma1Vijay Kumar2Manjit Kaur3Mohammed Baz4Mehedi Masud5Computer Science and Engineering DepartmentComputer Science and Engineering DepartmentComputer Science and Engineering DepartmentSchool of Engineering and Applied SciencesDepartment of Computer EngineeringDepartment of Computer ScienceThe process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.http://dx.doi.org/10.1155/2021/5123671 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Gundeep Singh Sahil Sharma Vijay Kumar Manjit Kaur Mohammed Baz Mehedi Masud |
spellingShingle |
Gundeep Singh Sahil Sharma Vijay Kumar Manjit Kaur Mohammed Baz Mehedi Masud Spoken Language Identification Using Deep Learning Computational Intelligence and Neuroscience |
author_facet |
Gundeep Singh Sahil Sharma Vijay Kumar Manjit Kaur Mohammed Baz Mehedi Masud |
author_sort |
Gundeep Singh |
title |
Spoken Language Identification Using Deep Learning |
title_short |
Spoken Language Identification Using Deep Learning |
title_full |
Spoken Language Identification Using Deep Learning |
title_fullStr |
Spoken Language Identification Using Deep Learning |
title_full_unstemmed |
Spoken Language Identification Using Deep Learning |
title_sort |
spoken language identification using deep learning |
publisher |
Hindawi Limited |
series |
Computational Intelligence and Neuroscience |
issn |
1687-5273 |
publishDate |
2021-01-01 |
description |
The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%. |
url |
http://dx.doi.org/10.1155/2021/5123671 |
work_keys_str_mv |
AT gundeepsingh spokenlanguageidentificationusingdeeplearning AT sahilsharma spokenlanguageidentificationusingdeeplearning AT vijaykumar spokenlanguageidentificationusingdeeplearning AT manjitkaur spokenlanguageidentificationusingdeeplearning AT mohammedbaz spokenlanguageidentificationusingdeeplearning AT mehedimasud spokenlanguageidentificationusingdeeplearning |
_version_ |
1716844772844371968 |