Error-Informed Likelihood Calculations for More Realistic Genetic Analyses

Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived...

Full description

Bibliographic Details
Other Authors: Bricker, Justin (authoraut)
Format: Others
Language:English
English
Published: Florida State University
Subjects:
Online Access:http://purl.flvc.org/fsu/fd/FSU_2015fall_Bricker_fsu_0071N_12977
Description
Summary:Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived from the sequencing procedures, which represents our confidence in each nucleotide in the sequence. Currently, these quality scores are used as a criteria for the removal or modification of reads in the data set. These methods result in the loss of information contained in those sequences and rely on parameters that are somewhat arbitrary; this may lead to a biased sample and inaccurate analyses. I propose an alternative method for incorporating the error of the sequences without discarding poor quality reads by including the error probabilities of the reads in the likelihood calculations used for sequence analysis. It was found that, despite introducing variability, using the error-informed likelihood method improved analyses compared with those which ignored the error altogether. While this method will likely result in analyses with less definite results compared with those in which the data was treated with a preprocessing technique, these results will utilize all of the provided data and will be more grounded in reality as we take into account the uncertainty that we have in our sequenced samples. === A Thesis submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Master of Science. === Fall Semester 2015. === November 6, 2015. === error, likelihood, ngs, sequencing === Includes bibliographical references. === Peter Beerli, Professor Directing Thesis; Anke Meyer-Baese, Committee Member; Alan Lemmon, Committee Member.