Understanding and Improving Identification of Somatic Variants

It is important to understand the entire spectrum of somatic variants to gain more insight into mutations that occur in different cancers for development of better diagnostic, prognostic and therapeutic tools. This thesis outlines our work in understanding somatic variant calling, improving the iden...

Full description

Bibliographic Details
Main Author: Vijayan, Vinaya
Other Authors: Animal and Poultry Sciences
Format: Others
Published: Virginia Tech 2016
Subjects:
Online Access:http://hdl.handle.net/10919/72969
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-72969
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-729692020-11-25T05:37:30Z Understanding and Improving Identification of Somatic Variants Vijayan, Vinaya Animal and Poultry Sciences Zhang, Liqing Wu, Xiaowei Heath, Lenwood S. Franck, Christopher T. Somatic variants Somatic variant callers Somatic point mutations Short tandem repeat variation Lung squamous cell carcinoma It is important to understand the entire spectrum of somatic variants to gain more insight into mutations that occur in different cancers for development of better diagnostic, prognostic and therapeutic tools. This thesis outlines our work in understanding somatic variant calling, improving the identification of somatic variants from whole genome and whole exome platforms and identification of biomarkers for lung cancer. Integrating somatic variants from whole genome and whole exome platforms poses a challenge as variants identified in the exonic regions of the whole genome platform may not be identified on the whole exome platform and vice-versa. Taking a simple union or intersection of the somatic variants from both platforms would lead to inclusion of many false positives (through union) and exclusion of many true variants (through intersection). We develop the first framework to improve the identification of somatic variants on whole genome and exome platforms using a machine learning approach by combining the results from two popular somatic variant callers. Testing on simulated and real data sets shows that our framework identifies variants more accurately than using only one somatic variant caller or using variants from only one platform. Short tandem repeats (STRs) are repetitive units of 2-6 nucleotides. STRs make up approximately 1% of the human genome and have been traditionally used as genetic markers in population studies. We conduct a series of in silico analyses using the exome data of 32 individuals with lung cancer to identify 103 STRs that could potentially serve as cancer diagnostic markers and 624 STRs that could potentially serve as cancer predisposition markers. Overall these studies improve the accuracy in identification of somatic variants and highlight the association of STRs to lung cancer. Ph. D. 2016-09-21T08:00:51Z 2016-09-21T08:00:51Z 2016-09-20 Dissertation vt_gsexam:8848 http://hdl.handle.net/10919/72969 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic Somatic variants
Somatic variant callers
Somatic point mutations
Short tandem repeat variation
Lung squamous cell carcinoma
spellingShingle Somatic variants
Somatic variant callers
Somatic point mutations
Short tandem repeat variation
Lung squamous cell carcinoma
Vijayan, Vinaya
Understanding and Improving Identification of Somatic Variants
description It is important to understand the entire spectrum of somatic variants to gain more insight into mutations that occur in different cancers for development of better diagnostic, prognostic and therapeutic tools. This thesis outlines our work in understanding somatic variant calling, improving the identification of somatic variants from whole genome and whole exome platforms and identification of biomarkers for lung cancer. Integrating somatic variants from whole genome and whole exome platforms poses a challenge as variants identified in the exonic regions of the whole genome platform may not be identified on the whole exome platform and vice-versa. Taking a simple union or intersection of the somatic variants from both platforms would lead to inclusion of many false positives (through union) and exclusion of many true variants (through intersection). We develop the first framework to improve the identification of somatic variants on whole genome and exome platforms using a machine learning approach by combining the results from two popular somatic variant callers. Testing on simulated and real data sets shows that our framework identifies variants more accurately than using only one somatic variant caller or using variants from only one platform. Short tandem repeats (STRs) are repetitive units of 2-6 nucleotides. STRs make up approximately 1% of the human genome and have been traditionally used as genetic markers in population studies. We conduct a series of in silico analyses using the exome data of 32 individuals with lung cancer to identify 103 STRs that could potentially serve as cancer diagnostic markers and 624 STRs that could potentially serve as cancer predisposition markers. Overall these studies improve the accuracy in identification of somatic variants and highlight the association of STRs to lung cancer. === Ph. D.
author2 Animal and Poultry Sciences
author_facet Animal and Poultry Sciences
Vijayan, Vinaya
author Vijayan, Vinaya
author_sort Vijayan, Vinaya
title Understanding and Improving Identification of Somatic Variants
title_short Understanding and Improving Identification of Somatic Variants
title_full Understanding and Improving Identification of Somatic Variants
title_fullStr Understanding and Improving Identification of Somatic Variants
title_full_unstemmed Understanding and Improving Identification of Somatic Variants
title_sort understanding and improving identification of somatic variants
publisher Virginia Tech
publishDate 2016
url http://hdl.handle.net/10919/72969
work_keys_str_mv AT vijayanvinaya understandingandimprovingidentificationofsomaticvariants
_version_ 1719362733577601024