Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study

博士 === 國立臺灣大學 === 資訊工程學研究所 === 101 === Metagenomics enables the study of unculturable microorganisms in their original environments. The discrimination of the composition of the metagenomes from diverse microbial communities is important and challenging. Usually, each microbial community is represen...

Full description

Bibliographic Details
Main Authors: Chien-Hao Su, 蘇建豪
Other Authors: 高成炎
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/25215136978951077684
id ndltd-TW-101NTU05392129
record_format oai_dc
spelling ndltd-TW-101NTU053921292015-10-13T23:10:18Z http://ndltd.ncl.edu.tw/handle/25215136978951077684 Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study 以多源基因體計算法檢測微生物群落間與群落內之差異 Chien-Hao Su 蘇建豪 博士 國立臺灣大學 資訊工程學研究所 101 Metagenomics enables the study of unculturable microorganisms in their original environments. The discrimination of the composition of the metagenomes from diverse microbial communities is important and challenging. Usually, each microbial community is represented by its taxonomic composition. It is essential to accurately estimate the taxonomic composition of each microbial community. Therefore, we propose a series of computational methods that use different mechanisms to discriminate the differences between and within distinct microbial communities. To discriminate the differences between distinct communities, we started with analyzing three well-known distance functions related to the strengths and limitations in the clustering of samples. The similar but distinguishable performance in clustering accuracy motivated us to incorporate suitable normalizations and phylogenetic information into the distance functions. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Inspired by the rank-based normalization, we further proposed MetaRank, which employs a series of statistical hypothesis tests and the relative species abundance to reduce the noise from sampling biases and arrive at a better taxonomic estimation. We also found that existing methods discard a considerable proportion of low similarity reads when performing the taxonomic assignment (binning) process. To overcome this limitation, we retrieved the discarded reads by using conserved gene adjacency mechanism. In addition, current binning tools do not incorporate data adjustment methods while assigning reads to their respective taxa and producing abundance profiles. Hence, we developed a single platform by integrating several binning methods coupled with data filters and normalization techniques for improving the taxonomic assignment. During the development of the platform, we observed that the binning method itself is decisive while producing the species abundance profiles. We thus proposed a novel method by integrating existing binning tools to obtain a better taxonomic estimation in metagenomic analysis. In conclusion, this study explores the influence of some important factors on discriminating the differences between and within distinct microbial communities in metagenomic analysis. With the accumulation of data from sequencing technology, our study can provide a vivid understanding of more microbial communities. Thus, the analyses presented in this thesis reinforce our understanding of metagenomics in realizing the microbial communities. 高成炎 2013 學位論文 ; thesis 121 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立臺灣大學 === 資訊工程學研究所 === 101 === Metagenomics enables the study of unculturable microorganisms in their original environments. The discrimination of the composition of the metagenomes from diverse microbial communities is important and challenging. Usually, each microbial community is represented by its taxonomic composition. It is essential to accurately estimate the taxonomic composition of each microbial community. Therefore, we propose a series of computational methods that use different mechanisms to discriminate the differences between and within distinct microbial communities. To discriminate the differences between distinct communities, we started with analyzing three well-known distance functions related to the strengths and limitations in the clustering of samples. The similar but distinguishable performance in clustering accuracy motivated us to incorporate suitable normalizations and phylogenetic information into the distance functions. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Inspired by the rank-based normalization, we further proposed MetaRank, which employs a series of statistical hypothesis tests and the relative species abundance to reduce the noise from sampling biases and arrive at a better taxonomic estimation. We also found that existing methods discard a considerable proportion of low similarity reads when performing the taxonomic assignment (binning) process. To overcome this limitation, we retrieved the discarded reads by using conserved gene adjacency mechanism. In addition, current binning tools do not incorporate data adjustment methods while assigning reads to their respective taxa and producing abundance profiles. Hence, we developed a single platform by integrating several binning methods coupled with data filters and normalization techniques for improving the taxonomic assignment. During the development of the platform, we observed that the binning method itself is decisive while producing the species abundance profiles. We thus proposed a novel method by integrating existing binning tools to obtain a better taxonomic estimation in metagenomic analysis. In conclusion, this study explores the influence of some important factors on discriminating the differences between and within distinct microbial communities in metagenomic analysis. With the accumulation of data from sequencing technology, our study can provide a vivid understanding of more microbial communities. Thus, the analyses presented in this thesis reinforce our understanding of metagenomics in realizing the microbial communities.
author2 高成炎
author_facet 高成炎
Chien-Hao Su
蘇建豪
author Chien-Hao Su
蘇建豪
spellingShingle Chien-Hao Su
蘇建豪
Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study
author_sort Chien-Hao Su
title Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study
title_short Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study
title_full Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study
title_fullStr Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study
title_full_unstemmed Determining the Differences between and within Microbial Communities: A Computational Metagenomic Study
title_sort determining the differences between and within microbial communities: a computational metagenomic study
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/25215136978951077684
work_keys_str_mv AT chienhaosu determiningthedifferencesbetweenandwithinmicrobialcommunitiesacomputationalmetagenomicstudy
AT sūjiànháo determiningthedifferencesbetweenandwithinmicrobialcommunitiesacomputationalmetagenomicstudy
AT chienhaosu yǐduōyuánjīyīntǐjìsuànfǎjiǎncèwēishēngwùqúnluòjiānyǔqúnluònèizhīchàyì
AT sūjiànháo yǐduōyuánjīyīntǐjìsuànfǎjiǎncèwēishēngwùqúnluòjiānyǔqúnluònèizhīchàyì
_version_ 1718084383845384192