A scalable method for analysis and display of DNA sequences.

Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relati...

Full description

Bibliographic Details
Main Authors: Lawrence Sirovich, Mark Y Stoeckle, Yu Zhang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-10-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2749217?pdf=render
id doaj-e5375cd57f4745758c2753cc037729ff
record_format Article
spelling doaj-e5375cd57f4745758c2753cc037729ff2020-11-25T01:17:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-10-01410e705110.1371/journal.pone.0007051A scalable method for analysis and display of DNA sequences.Lawrence SirovichMark Y StoeckleYu ZhangComparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable.Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.http://europepmc.org/articles/PMC2749217?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Lawrence Sirovich
Mark Y Stoeckle
Yu Zhang
spellingShingle Lawrence Sirovich
Mark Y Stoeckle
Yu Zhang
A scalable method for analysis and display of DNA sequences.
PLoS ONE
author_facet Lawrence Sirovich
Mark Y Stoeckle
Yu Zhang
author_sort Lawrence Sirovich
title A scalable method for analysis and display of DNA sequences.
title_short A scalable method for analysis and display of DNA sequences.
title_full A scalable method for analysis and display of DNA sequences.
title_fullStr A scalable method for analysis and display of DNA sequences.
title_full_unstemmed A scalable method for analysis and display of DNA sequences.
title_sort scalable method for analysis and display of dna sequences.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2009-10-01
description Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable.Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy.The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.
url http://europepmc.org/articles/PMC2749217?pdf=render
work_keys_str_mv AT lawrencesirovich ascalablemethodforanalysisanddisplayofdnasequences
AT markystoeckle ascalablemethodforanalysisanddisplayofdnasequences
AT yuzhang ascalablemethodforanalysisanddisplayofdnasequences
AT lawrencesirovich scalablemethodforanalysisanddisplayofdnasequences
AT markystoeckle scalablemethodforanalysisanddisplayofdnasequences
AT yuzhang scalablemethodforanalysisanddisplayofdnasequences
_version_ 1725147203487399936