Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory

DNA sequences are increasingly used for taxonomic and functional assessment of environmental communities. In mycology, the nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen marker for such pursuits. Molecular identification is associated with ma...

Full description

Bibliographic Details
Main Authors: R. Henrik Nilsson, Marisol Sánchez-García, Martin K. Ryberg, Kessy Abarenkov, Christian Wurzbacher, Erik Kristiansson
Format: Article
Language:English
Published: Pensoft Publishers 2017-08-01
Series:MycoKeys
Online Access:https://mycokeys.pensoft.net/articles.php?id=14591
id doaj-5afb866d3f5e4f3fb40e241c637e5a30
record_format Article
spelling doaj-5afb866d3f5e4f3fb40e241c637e5a302020-11-24T23:14:26ZengPensoft PublishersMycoKeys1314-40571314-40492017-08-0126132410.3897/mycokeys.26.1459114591Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactoryR. Henrik Nilsson0Marisol Sánchez-García1Martin K. Ryberg2Kessy Abarenkov3Christian Wurzbacher4Erik Kristiansson5University of GothenburgClark UniversityUppsala UniversityUniversity of Tartu, Natural History MuseumUniversity of GothenburgChalmers University of Technology DNA sequences are increasingly used for taxonomic and functional assessment of environmental communities. In mycology, the nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen marker for such pursuits. Molecular identification is associated with many challenges, one of which is low read quality of the reference sequences used for inference of taxonomic and functional properties of the newly sequenced community (or single taxon). This study investigates whether public fungal ITS sequences are subjected to sufficient trimming in their distal (5’ and 3’) ends prior to deposition in the public repositories. We examined 86 species (and 10,584 sequences) across the fungal tree of life, and we found that on average 13.1% of the sequences were poorly trimmed in one or both of their 5’ and 3’ ends. Deposition of poorly trimmed entries was found to continue through 2016. Poorly trimmed reference sequences add noise and mask biological signal in sequence similarity searches and phylogenetic analyses, and we provide a set of recommendations on how to manage the sequence trimming problem. https://mycokeys.pensoft.net/articles.php?id=14591
collection DOAJ
language English
format Article
sources DOAJ
author R. Henrik Nilsson
Marisol Sánchez-García
Martin K. Ryberg
Kessy Abarenkov
Christian Wurzbacher
Erik Kristiansson
spellingShingle R. Henrik Nilsson
Marisol Sánchez-García
Martin K. Ryberg
Kessy Abarenkov
Christian Wurzbacher
Erik Kristiansson
Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory
MycoKeys
author_facet R. Henrik Nilsson
Marisol Sánchez-García
Martin K. Ryberg
Kessy Abarenkov
Christian Wurzbacher
Erik Kristiansson
author_sort R. Henrik Nilsson
title Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory
title_short Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory
title_full Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory
title_fullStr Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory
title_full_unstemmed Read quality-based trimming of the distal ends of public fungal DNA sequences is nowhere near satisfactory
title_sort read quality-based trimming of the distal ends of public fungal dna sequences is nowhere near satisfactory
publisher Pensoft Publishers
series MycoKeys
issn 1314-4057
1314-4049
publishDate 2017-08-01
description DNA sequences are increasingly used for taxonomic and functional assessment of environmental communities. In mycology, the nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen marker for such pursuits. Molecular identification is associated with many challenges, one of which is low read quality of the reference sequences used for inference of taxonomic and functional properties of the newly sequenced community (or single taxon). This study investigates whether public fungal ITS sequences are subjected to sufficient trimming in their distal (5’ and 3’) ends prior to deposition in the public repositories. We examined 86 species (and 10,584 sequences) across the fungal tree of life, and we found that on average 13.1% of the sequences were poorly trimmed in one or both of their 5’ and 3’ ends. Deposition of poorly trimmed entries was found to continue through 2016. Poorly trimmed reference sequences add noise and mask biological signal in sequence similarity searches and phylogenetic analyses, and we provide a set of recommendations on how to manage the sequence trimming problem.
url https://mycokeys.pensoft.net/articles.php?id=14591
work_keys_str_mv AT rhenriknilsson readqualitybasedtrimmingofthedistalendsofpublicfungaldnasequencesisnowherenearsatisfactory
AT marisolsanchezgarcia readqualitybasedtrimmingofthedistalendsofpublicfungaldnasequencesisnowherenearsatisfactory
AT martinkryberg readqualitybasedtrimmingofthedistalendsofpublicfungaldnasequencesisnowherenearsatisfactory
AT kessyabarenkov readqualitybasedtrimmingofthedistalendsofpublicfungaldnasequencesisnowherenearsatisfactory
AT christianwurzbacher readqualitybasedtrimmingofthedistalendsofpublicfungaldnasequencesisnowherenearsatisfactory
AT erikkristiansson readqualitybasedtrimmingofthedistalendsofpublicfungaldnasequencesisnowherenearsatisfactory
_version_ 1725594431161106432