Error-Informed Likelihood Calculations for More Realistic Genetic Analyses

Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived...

Full description

Bibliographic Details
Other Authors: Bricker, Justin (authoraut)
Format: Others
Language:English
English
Published: Florida State University
Subjects:
Online Access:http://purl.flvc.org/fsu/fd/FSU_2015fall_Bricker_fsu_0071N_12977
id ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_291266
record_format oai_dc
spelling ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_2912662019-07-01T04:59:22Z Error-Informed Likelihood Calculations for More Realistic Genetic Analyses Bricker, Justin (authoraut) Beerli, Peter (professor directing thesis) Meyer-Baese, Anke (committee member) Lemmon, Alan R. (committee member) Florida State University (degree granting institution) College of Arts and Sciences (degree granting college) Department of Scientific Computing (degree granting department) Text text Florida State University English eng 1 online resource (42 pages) computer application/pdf Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived from the sequencing procedures, which represents our confidence in each nucleotide in the sequence. Currently, these quality scores are used as a criteria for the removal or modification of reads in the data set. These methods result in the loss of information contained in those sequences and rely on parameters that are somewhat arbitrary; this may lead to a biased sample and inaccurate analyses. I propose an alternative method for incorporating the error of the sequences without discarding poor quality reads by including the error probabilities of the reads in the likelihood calculations used for sequence analysis. It was found that, despite introducing variability, using the error-informed likelihood method improved analyses compared with those which ignored the error altogether. While this method will likely result in analyses with less definite results compared with those in which the data was treated with a preprocessing technique, these results will utilize all of the provided data and will be more grounded in reality as we take into account the uncertainty that we have in our sequenced samples. A Thesis submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Master of Science. Fall Semester 2015. November 6, 2015. error, likelihood, ngs, sequencing Includes bibliographical references. Peter Beerli, Professor Directing Thesis; Anke Meyer-Baese, Committee Member; Alan Lemmon, Committee Member. Bioinformatics Evolution (Biology) Applied mathematics FSU_2015fall_Bricker_fsu_0071N_12977 http://purl.flvc.org/fsu/fd/FSU_2015fall_Bricker_fsu_0071N_12977 http://diginole.lib.fsu.edu/islandora/object/fsu%3A291266/datastream/TN/view/Error-Informed%20Likelihood%20Calculations%20for%20More%20Realistic%20Genetic%20Analyses.jpg
collection NDLTD
language English
English
format Others
sources NDLTD
topic Bioinformatics
Evolution (Biology)
Applied mathematics
spellingShingle Bioinformatics
Evolution (Biology)
Applied mathematics
Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
description Next generation sequencing can rapidly analyze entire genomes in just hours. However, due to the nature of the sequencing process, errors may arise which limit the accuracy of the reads obtained. Luckily, modern sequencing technologies associate with their reads, a quality score, derived from the sequencing procedures, which represents our confidence in each nucleotide in the sequence. Currently, these quality scores are used as a criteria for the removal or modification of reads in the data set. These methods result in the loss of information contained in those sequences and rely on parameters that are somewhat arbitrary; this may lead to a biased sample and inaccurate analyses. I propose an alternative method for incorporating the error of the sequences without discarding poor quality reads by including the error probabilities of the reads in the likelihood calculations used for sequence analysis. It was found that, despite introducing variability, using the error-informed likelihood method improved analyses compared with those which ignored the error altogether. While this method will likely result in analyses with less definite results compared with those in which the data was treated with a preprocessing technique, these results will utilize all of the provided data and will be more grounded in reality as we take into account the uncertainty that we have in our sequenced samples. === A Thesis submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Master of Science. === Fall Semester 2015. === November 6, 2015. === error, likelihood, ngs, sequencing === Includes bibliographical references. === Peter Beerli, Professor Directing Thesis; Anke Meyer-Baese, Committee Member; Alan Lemmon, Committee Member.
author2 Bricker, Justin (authoraut)
author_facet Bricker, Justin (authoraut)
title Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
title_short Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
title_full Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
title_fullStr Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
title_full_unstemmed Error-Informed Likelihood Calculations for More Realistic Genetic Analyses
title_sort error-informed likelihood calculations for more realistic genetic analyses
publisher Florida State University
url http://purl.flvc.org/fsu/fd/FSU_2015fall_Bricker_fsu_0071N_12977
_version_ 1719217491889094656