Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data

Abstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and...

Full description

Bibliographic Details
Main Authors:	Natalia Szulc, Michał Burdukiewicz, Marlena Gąsior-Głogowska, Jakub W. Wojciechowski, Jarosław Chilimoniuk, Paweł Mackiewicz, Tomas Šneideris, Vytautas Smirnovas, Malgorzata Kotulska
Format:	Article
Language:	English
Published:	Nature Publishing Group 2021-04-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-021-86530-6

id	doaj-1b581be9c50f488f8323d496d600981b
record_format	Article
spelling	doaj-1b581be9c50f488f8323d496d600981b2021-05-02T11:37:12ZengNature Publishing GroupScientific Reports2045-23222021-04-0111111110.1038/s41598-021-86530-6Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training dataNatalia Szulc0Michał Burdukiewicz1Marlena Gąsior-Głogowska2Jakub W. Wojciechowski3Jarosław Chilimoniuk4Paweł Mackiewicz5Tomas Šneideris6Vytautas Smirnovas7Malgorzata Kotulska8Department of Biomedical Engineering, Wroclaw University of Science and TechnologyMedical University of BialystokDepartment of Biomedical Engineering, Wroclaw University of Science and TechnologyDepartment of Biomedical Engineering, Wroclaw University of Science and TechnologyFaculty of Biotechnology, University of WroclawFaculty of Biotechnology, University of WroclawLife Sciences Center, Institute of Biotechnology, Vilnius UniversityLife Sciences Center, Institute of Biotechnology, Vilnius UniversityDepartment of Biomedical Engineering, Wroclaw University of Science and TechnologyAbstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and costly. The most reliable identification of amyloids is obtained with high resolution microscopies, such as electron microscopy or atomic force microscopy (AFM). More frequently, less expensive and faster methods are used, especially infrared (IR) spectroscopy or Thioflavin T staining. Different experimental methods are not always concurrent, especially when amyloid peptides do not readily form fibrils but oligomers. This may lead to peptide misclassification and mislabeling. Several bioinformatics methods have been proposed for in-silico identification of amyloids, many of them based on machine learning. The effectiveness of these methods heavily depends on accurate annotation of the reference training data obtained from in-vitro experiments. We study how robust are bioinformatics methods to weak supervision, encountering imperfect training data. AmyloGram and three other amyloid predictors were applied. The results proved that a certain degree of misannotation in the reference data can be eliminated by the bioinformatics tools, even if they belonged to their training set. The computational results are supported by new experiments with IR and AFM methods.https://doi.org/10.1038/s41598-021-86530-6
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Natalia Szulc Michał Burdukiewicz Marlena Gąsior-Głogowska Jakub W. Wojciechowski Jarosław Chilimoniuk Paweł Mackiewicz Tomas Šneideris Vytautas Smirnovas Malgorzata Kotulska
spellingShingle	Natalia Szulc Michał Burdukiewicz Marlena Gąsior-Głogowska Jakub W. Wojciechowski Jarosław Chilimoniuk Paweł Mackiewicz Tomas Šneideris Vytautas Smirnovas Malgorzata Kotulska Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data Scientific Reports
author_facet	Natalia Szulc Michał Burdukiewicz Marlena Gąsior-Głogowska Jakub W. Wojciechowski Jarosław Chilimoniuk Paweł Mackiewicz Tomas Šneideris Vytautas Smirnovas Malgorzata Kotulska
author_sort	Natalia Szulc
title	Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_short	Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_full	Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_fullStr	Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_full_unstemmed	Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
title_sort	bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data
publisher	Nature Publishing Group
series	Scientific Reports
issn	2045-2322
publishDate	2021-04-01
description	Abstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and costly. The most reliable identification of amyloids is obtained with high resolution microscopies, such as electron microscopy or atomic force microscopy (AFM). More frequently, less expensive and faster methods are used, especially infrared (IR) spectroscopy or Thioflavin T staining. Different experimental methods are not always concurrent, especially when amyloid peptides do not readily form fibrils but oligomers. This may lead to peptide misclassification and mislabeling. Several bioinformatics methods have been proposed for in-silico identification of amyloids, many of them based on machine learning. The effectiveness of these methods heavily depends on accurate annotation of the reference training data obtained from in-vitro experiments. We study how robust are bioinformatics methods to weak supervision, encountering imperfect training data. AmyloGram and three other amyloid predictors were applied. The results proved that a certain degree of misannotation in the reference data can be eliminated by the bioinformatics tools, even if they belonged to their training set. The computational results are supported by new experiments with IR and AFM methods.
url	https://doi.org/10.1038/s41598-021-86530-6
work_keys_str_mv	AT nataliaszulc bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT michałburdukiewicz bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT marlenagasiorgłogowska bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT jakubwwojciechowski bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT jarosławchilimoniuk bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT pawełmackiewicz bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT tomassneideris bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT vytautassmirnovas bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata AT malgorzatakotulska bioinformaticsmethodsforidentificationofamyloidogenicpeptidesshowrobustnesstomisannotatedtrainingdata
_version_	1721491814264864768

Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data

Similar Items