Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.

The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored bina...

Full description

Bibliographic Details
Main Authors: M Humberto Reyes-Valdés, Amalio Santacruz-Varela, Octavio Martínez, June Simpson, Corina Hayano-Kanashiro, Celso Cortés-Romero
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3833943?pdf=render
id doaj-be4d136eff1b4f6a98e93e17ed8e2c37
record_format Article
spelling doaj-be4d136eff1b4f6a98e93e17ed8e2c372020-11-25T02:22:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-01811e7993610.1371/journal.pone.0079936Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.M Humberto Reyes-ValdésAmalio Santacruz-VarelaOctavio MartínezJune SimpsonCorina Hayano-KanashiroCelso Cortés-RomeroThe strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.http://europepmc.org/articles/PMC3833943?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author M Humberto Reyes-Valdés
Amalio Santacruz-Varela
Octavio Martínez
June Simpson
Corina Hayano-Kanashiro
Celso Cortés-Romero
spellingShingle M Humberto Reyes-Valdés
Amalio Santacruz-Varela
Octavio Martínez
June Simpson
Corina Hayano-Kanashiro
Celso Cortés-Romero
Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.
PLoS ONE
author_facet M Humberto Reyes-Valdés
Amalio Santacruz-Varela
Octavio Martínez
June Simpson
Corina Hayano-Kanashiro
Celso Cortés-Romero
author_sort M Humberto Reyes-Valdés
title Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.
title_short Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.
title_full Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.
title_fullStr Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.
title_full_unstemmed Analysis and optimization of bulk DNA sampling with binary scoring for germplasm characterization.
title_sort analysis and optimization of bulk dna sampling with binary scoring for germplasm characterization.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.
url http://europepmc.org/articles/PMC3833943?pdf=render
work_keys_str_mv AT mhumbertoreyesvaldes analysisandoptimizationofbulkdnasamplingwithbinaryscoringforgermplasmcharacterization
AT amaliosantacruzvarela analysisandoptimizationofbulkdnasamplingwithbinaryscoringforgermplasmcharacterization
AT octaviomartinez analysisandoptimizationofbulkdnasamplingwithbinaryscoringforgermplasmcharacterization
AT junesimpson analysisandoptimizationofbulkdnasamplingwithbinaryscoringforgermplasmcharacterization
AT corinahayanokanashiro analysisandoptimizationofbulkdnasamplingwithbinaryscoringforgermplasmcharacterization
AT celsocortesromero analysisandoptimizationofbulkdnasamplingwithbinaryscoringforgermplasmcharacterization
_version_ 1724861234257330176