Summary: | 碩士 === 國立清華大學 === 電機工程學系 === 93 === In bioinformatics, one of the challenging issue is to determine the specific structure of each
gene from the 3 billion base-pairs of human DNA sequences. Polyadenylation site is a
specific feature at the terminus of a gene which involves the endonucleolytic cleavage of the
pre-mRNA followed by the addition of a poly(A) tail, which is found at the 3’-terminal of
the majority of mRNA.
Factors related to cleavage and polyadenylation have to recognize associated signals, i.e.,
polyadenylation signal (PAS) and downstream element(DSE). PAS is the signal appearing
in 10 to 30 nucleotide upstream of the cleavage and polyadenylation site and is with a highly
conserved hexamer AAUAAA and a common variant AUUAAA in pre-mRNAs. DSE is in
20 to 40 nucleotide downstream to the cleavage and polyadenylation site and consists of a
much less conserved U- or GU-rich sequence.
In this thesis, we will construct a stochastic grammar of 3’-terminal of human genes by
establishing the dependency graphs and their expanded Bayesian networks of the features
in this region. Further more we will compare the performances of this stochastic grammar
and the PAS detector provided by former researchers.
|