Genome Graphs

Whole-genome shotgun sequencing is an experimental technique used for obtaining information about a genome’s sequence, whereby it is broken up into many short (possibly overlapping) segments whose sequence is then determined. A long-standing use of sequencing is in genome assembly – the problem of d...

Full description

Bibliographic Details
Main Author: Medvedev, Paul
Other Authors: Brudno, Michael
Language:en_ca
Published: 2010
Subjects:
Online Access:http://hdl.handle.net/1807/26297
id ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-26297
record_format oai_dc
spelling ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-262972013-04-19T19:55:10ZGenome GraphsMedvedev, Paulbioinforamatics0984Whole-genome shotgun sequencing is an experimental technique used for obtaining information about a genome’s sequence, whereby it is broken up into many short (possibly overlapping) segments whose sequence is then determined. A long-standing use of sequencing is in genome assembly – the problem of determining the sequence of an unknown genome, which plays a central role for the sequencing of novel species. However, even within the same species, the genomes of two individuals differ, and though these variations are relatively small, they account for the observed variation in phenotypes. A large portion of these are copy number variants (CNVs), or genomic segments which appear a different number of times in different individuals. The unifying theme of this thesis is the use of genome graphs for both CNV detection and genome assembly problems. Genome graphs, which have already been successfully used for alignment and assembly, capture the structure of a genome even when its sequence is not fully known, as with the case of sequencing data. In this thesis, we extend their uses in several ways, culminating in a method for CNV detection that is based on a novel genome graph model. First, we demonstrate how the double-stranded nature of DNA can be efficiently incorporated into genome graphs by using the technique of bidirected network flow. Furthermore, we show how genome graphs can be efficiently used for finding solutions that maximize the likelihood of the data, as opposed to the usual maximum parsimony approach. Finally, we show how genome graphs can be useful for CNV detection through a novel construction called the donor graph. These extensions are combined into a method for detecting CNVs, which we use on a Yoruban human individual, showing a high degree of accuracy and improvement over previous methods.Brudno, MichaelBorodin, Allan2010-112011-02-18T16:12:35ZNO_RESTRICTION2011-02-18T16:12:35Z2011-02-18T16:12:35ZThesishttp://hdl.handle.net/1807/26297en_ca
collection NDLTD
language en_ca
sources NDLTD
topic bioinforamatics
0984
spellingShingle bioinforamatics
0984
Medvedev, Paul
Genome Graphs
description Whole-genome shotgun sequencing is an experimental technique used for obtaining information about a genome’s sequence, whereby it is broken up into many short (possibly overlapping) segments whose sequence is then determined. A long-standing use of sequencing is in genome assembly – the problem of determining the sequence of an unknown genome, which plays a central role for the sequencing of novel species. However, even within the same species, the genomes of two individuals differ, and though these variations are relatively small, they account for the observed variation in phenotypes. A large portion of these are copy number variants (CNVs), or genomic segments which appear a different number of times in different individuals. The unifying theme of this thesis is the use of genome graphs for both CNV detection and genome assembly problems. Genome graphs, which have already been successfully used for alignment and assembly, capture the structure of a genome even when its sequence is not fully known, as with the case of sequencing data. In this thesis, we extend their uses in several ways, culminating in a method for CNV detection that is based on a novel genome graph model. First, we demonstrate how the double-stranded nature of DNA can be efficiently incorporated into genome graphs by using the technique of bidirected network flow. Furthermore, we show how genome graphs can be efficiently used for finding solutions that maximize the likelihood of the data, as opposed to the usual maximum parsimony approach. Finally, we show how genome graphs can be useful for CNV detection through a novel construction called the donor graph. These extensions are combined into a method for detecting CNVs, which we use on a Yoruban human individual, showing a high degree of accuracy and improvement over previous methods.
author2 Brudno, Michael
author_facet Brudno, Michael
Medvedev, Paul
author Medvedev, Paul
author_sort Medvedev, Paul
title Genome Graphs
title_short Genome Graphs
title_full Genome Graphs
title_fullStr Genome Graphs
title_full_unstemmed Genome Graphs
title_sort genome graphs
publishDate 2010
url http://hdl.handle.net/1807/26297
work_keys_str_mv AT medvedevpaul genomegraphs
_version_ 1716581776078405633