Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical ste...

Full description

Bibliographic Details
Main Authors: Wadim L. Matochko, Ratmir Derda
Format: Article
Language:English
Published: Hindawi Limited 2013-01-01
Series:Computational and Mathematical Methods in Medicine
Online Access:http://dx.doi.org/10.1155/2013/491612
id doaj-2647a00141164467bdbf22f3296cf910
record_format Article
spelling doaj-2647a00141164467bdbf22f3296cf9102020-11-24T23:49:24ZengHindawi LimitedComputational and Mathematical Methods in Medicine1748-670X1748-67182013-01-01201310.1155/2013/491612491612Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in SequencingWadim L. Matochko0Ratmir Derda1Department of Chemistry and Alberta Glycomics Centre, University of Alberta, Edmonton, AB, T6G 2G2, CanadaDepartment of Chemistry and Alberta Glycomics Centre, University of Alberta, Edmonton, AB, T6G 2G2, CanadaNext-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.http://dx.doi.org/10.1155/2013/491612
collection DOAJ
language English
format Article
sources DOAJ
author Wadim L. Matochko
Ratmir Derda
spellingShingle Wadim L. Matochko
Ratmir Derda
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
Computational and Mathematical Methods in Medicine
author_facet Wadim L. Matochko
Ratmir Derda
author_sort Wadim L. Matochko
title Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
title_short Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
title_full Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
title_fullStr Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
title_full_unstemmed Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
title_sort error analysis of deep sequencing of phage libraries: peptides censored in sequencing
publisher Hindawi Limited
series Computational and Mathematical Methods in Medicine
issn 1748-670X
1748-6718
publishDate 2013-01-01
description Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.
url http://dx.doi.org/10.1155/2013/491612
work_keys_str_mv AT wadimlmatochko erroranalysisofdeepsequencingofphagelibrariespeptidescensoredinsequencing
AT ratmirderda erroranalysisofdeepsequencingofphagelibrariespeptidescensoredinsequencing
_version_ 1725482379103961088