Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_2015fall_Bricker_fsu_0071N_12977 |
Summary: | Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the
reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived from the sequencing procedures, which represents our confidence in each
nucleotide in the sequence. Currently, these quality scores are used as a criteria for the removal or modification of reads in the data set. These methods result in the loss of information
contained in those sequences and rely on parameters that are somewhat arbitrary; this may lead to a biased sample and inaccurate analyses. I propose an alternative method for incorporating
the error of the sequences without discarding poor quality reads by including the error probabilities of the reads in the likelihood calculations used for sequence analysis. It was found
that, despite introducing variability, using the error-informed likelihood method improved analyses compared with those which ignored the error altogether. While this method will likely
result in analyses with less definite results compared with those in which the data was treated with a preprocessing technique, these results will utilize all of the provided data and will
be more grounded in reality as we take into account the uncertainty that we have in our sequenced samples. === A Thesis submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Master of Science. === Fall Semester 2015. === November 6, 2015. === error, likelihood, ngs, sequencing === Includes bibliographical references. === Peter Beerli, Professor Directing Thesis; Anke Meyer-Baese, Committee Member; Alan Lemmon, Committee Member. |
---|