Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study

博士 === 國立臺灣大學 === 資訊工程學研究所 === 101 === Metagenomics enables the study of unculturable microorganisms in their original environments. The discrimination of the composition of the metagenomes from diverse microbial communities is important and challenging. Usually, each microbial community is represen...

Full description

Bibliographic Details
Main Authors: Chien-Hao Su, 蘇建豪
Other Authors: 高成炎
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/25215136978951077684
Description
Summary:博士 === 國立臺灣大學 === 資訊工程學研究所 === 101 === Metagenomics enables the study of unculturable microorganisms in their original environments. The discrimination of the composition of the metagenomes from diverse microbial communities is important and challenging. Usually, each microbial community is represented by its taxonomic composition. It is essential to accurately estimate the taxonomic composition of each microbial community. Therefore, we propose a series of computational methods that use different mechanisms to discriminate the differences between and within distinct microbial communities. To discriminate the differences between distinct communities, we started with analyzing three well-known distance functions related to the strengths and limitations in the clustering of samples. The similar but distinguishable performance in clustering accuracy motivated us to incorporate suitable normalizations and phylogenetic information into the distance functions. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Inspired by the rank-based normalization, we further proposed MetaRank, which employs a series of statistical hypothesis tests and the relative species abundance to reduce the noise from sampling biases and arrive at a better taxonomic estimation. We also found that existing methods discard a considerable proportion of low similarity reads when performing the taxonomic assignment (binning) process. To overcome this limitation, we retrieved the discarded reads by using conserved gene adjacency mechanism. In addition, current binning tools do not incorporate data adjustment methods while assigning reads to their respective taxa and producing abundance profiles. Hence, we developed a single platform by integrating several binning methods coupled with data filters and normalization techniques for improving the taxonomic assignment. During the development of the platform, we observed that the binning method itself is decisive while producing the species abundance profiles. We thus proposed a novel method by integrating existing binning tools to obtain a better taxonomic estimation in metagenomic analysis. In conclusion, this study explores the influence of some important factors on discriminating the differences between and within distinct microbial communities in metagenomic analysis. With the accumulation of data from sequencing technology, our study can provide a vivid understanding of more microbial communities. Thus, the analyses presented in this thesis reinforce our understanding of metagenomics in realizing the microbial communities.