Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review

Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploi...

Full description

Bibliographic Details
Main Authors: Petar Tonkovic, Slobodan Kalajdziski, Eftim Zdravevski, Petre Lameski, Roberto Corizzo, Ivan Miguel Pires, Nuno M. Garcia, Tatjana Loncar-Turukalo, Vladimir Trajkovik
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Biology
Subjects:
Online Access:https://www.mdpi.com/2079-7737/9/12/453
id doaj-8d9540fbb55f4bf0b9eb128ecebc5230
record_format Article
spelling doaj-8d9540fbb55f4bf0b9eb128ecebc52302020-12-10T00:00:52ZengMDPI AGBiology2079-77372020-12-01945345310.3390/biology9120453Literature on Applied Machine Learning in Metagenomic Classification: A Scoping ReviewPetar Tonkovic0Slobodan Kalajdziski1Eftim Zdravevski2Petre Lameski3Roberto Corizzo4Ivan Miguel Pires5Nuno M. Garcia6Tatjana Loncar-Turukalo7Vladimir Trajkovik8Faculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, MacedoniaFaculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, MacedoniaFaculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, MacedoniaFaculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, MacedoniaDepartment of Computer Science, American University, Washington, DC 20016, USAInstituto de Telecomunicações, Universidade da Beira Interior, 6200-001 Covilhã, PortugalInstituto de Telecomunicações, Universidade da Beira Interior, 6200-001 Covilhã, PortugalFaculty of Technical Sciences, University of Novi Sad, 21102 Novi Sad, SerbiaFaculty of Computer Science and Engineering, Saints Cyril and Methodius University, 1000 Skopje, MacedoniaApplied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.https://www.mdpi.com/2079-7737/9/12/453metagenomicsscoping reviewclassificationdata preprocessing
collection DOAJ
language English
format Article
sources DOAJ
author Petar Tonkovic
Slobodan Kalajdziski
Eftim Zdravevski
Petre Lameski
Roberto Corizzo
Ivan Miguel Pires
Nuno M. Garcia
Tatjana Loncar-Turukalo
Vladimir Trajkovik
spellingShingle Petar Tonkovic
Slobodan Kalajdziski
Eftim Zdravevski
Petre Lameski
Roberto Corizzo
Ivan Miguel Pires
Nuno M. Garcia
Tatjana Loncar-Turukalo
Vladimir Trajkovik
Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
Biology
metagenomics
scoping review
classification
data preprocessing
author_facet Petar Tonkovic
Slobodan Kalajdziski
Eftim Zdravevski
Petre Lameski
Roberto Corizzo
Ivan Miguel Pires
Nuno M. Garcia
Tatjana Loncar-Turukalo
Vladimir Trajkovik
author_sort Petar Tonkovic
title Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_short Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_full Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_fullStr Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_full_unstemmed Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_sort literature on applied machine learning in metagenomic classification: a scoping review
publisher MDPI AG
series Biology
issn 2079-7737
publishDate 2020-12-01
description Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.
topic metagenomics
scoping review
classification
data preprocessing
url https://www.mdpi.com/2079-7737/9/12/453
work_keys_str_mv AT petartonkovic literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT slobodankalajdziski literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT eftimzdravevski literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT petrelameski literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT robertocorizzo literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT ivanmiguelpires literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT nunomgarcia literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT tatjanaloncarturukalo literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT vladimirtrajkovik literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
_version_ 1724387879455555584