The Effect of Sequencing Coverage on Mining Simple Sequence Repeats by Simulation

碩士 === 國立臺灣大學 === 農藝學研究所 === 100 === Microsatellites or simple sequence repeats (SSRs) are tandem repeats distributed across genomes with 1 to 6 nucleotide motifs. Because of their genomic abundance and high level of polymorphism, SSRs is designed as molecular markers to apply in a variety of resear...

Full description

Bibliographic Details
Main Authors: Ying-Tsui Wang, 王瀅翠
Other Authors: Li-Yu Liu
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/66611629874821750641
Description
Summary:碩士 === 國立臺灣大學 === 農藝學研究所 === 100 === Microsatellites or simple sequence repeats (SSRs) are tandem repeats distributed across genomes with 1 to 6 nucleotide motifs. Because of their genomic abundance and high level of polymorphism, SSRs is designed as molecular markers to apply in a variety of researches. In recent year, the rapidly-developing next generation sequencing technology (NGST) has impacted the ways of mining SSRs. NGST not only has the advantage of higher speed and lower cost but also offers the opportunities to discover novel SSRs. However, in a pilot study, the budget may be limited and one can only afford a low-coverage sequencing project regarding to the genome of interest. The situation may be more severe when the genome size is large. In this study, we aimed to investigate the relation between the mined SSR counts and the sequencing depth for a genome whose sequence which is not yet available by simulations at low coverage sequencing. The simulation was two-fold. First, we separate whole rice genome to establish three databases. Second, we simulated a genome with approximate complexity by recombining known rice genome subsequences. Moreover, we mimicked 454 sequencing results under different coverage using 454sim and mined SSRs accordingly. The results showed that the number of mined SSRs increased as the sequencing depth increased. More importantly, this procedure provided a mean to estimate the number of mined SSRs without whole genome sequence and hence to assist to set budget in advance.