Extended Cytogenetic Query System with Integrated Databases and Consistency Analysis

博士 === 國立成功大學 === 資訊工程學系碩博士班 === 97 === Staining the human metaphase chromosomes reveals characteristic banding patterns known as cytogenetic bands or cytobands. Using technologies based on metaphase chromosomes, researchers have accumulated much knowledge about the correlations between human diseas...

Full description

Bibliographic Details
Main Authors: Kuo-ho Yen, 嚴國何
Other Authors: 李強
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/82887164870856809114
Description
Summary:博士 === 國立成功大學 === 資訊工程學系碩博士班 === 97 === Staining the human metaphase chromosomes reveals characteristic banding patterns known as cytogenetic bands or cytobands. Using technologies based on metaphase chromosomes, researchers have accumulated much knowledge about the correlations between human diseases and specific cytoband aberrations, indicating the presence of disease-associated genes in those bands. With the progress of human genome project and techniques such as fluorescent in situ hybridization, many genes have been assigned to the cytobands and annotated in public database, making it possible to find all genes in the disease-related cytobands through database query. However, finding genes in cytobands remains to be an imprecise process, partly due to the insufficiency of current methods for cytoband query, especially for those based on cytogenetic annotations. By transforming the cytoband annotations into numerical segments, a new query method is developed that is able to accurately define any cytogenetic ranges in human chromosomes. A query system (designated CQS) is implemented using cytogenetic annotations in the public domain. Judged by a performance test, CQS executed as accurately as expected using cytogenetic annotations from NCBI Map Viewer. The new method is scalable and can be applied to genomes from other species. Many human genes have problematic cytogenetic annotations in the public database. For example, the common housekeeping gene beta-actin (ACTB) had been mapped to 7p22, but not 7p15-p12 as annotated in Entrez Gene. It is believed that the positional order of genes on the sequence map is the same as that of cytogenetic locations. However, there are many pairs of genes with discrepancy between cytogenetic annotations and sequence map positions. A systematic search for such discrepancies should uncover problematic cytogenetic annotations genome-wide. There are many genes with inconsistency between their cytogenetic annotations and sequence map positions in current databases. However, not all inconsistencies are the same. Some of them may be problematic which should be corrected in the future; while others may result from the imprecise nature of chromosomal banding which may be tolerable. It is important to stratify the cytogenetic position information into different confidence groups with the recognition of the impreciseness of cytogenetic banding. When plotting their cytogenetic annotations against sequence map positions on a 2-D plane, the consistent genes tend to have a compact linear distribution; while genes with inconsistent positions are more scattered. The overlapping areas between these 2 groups are defined as the tolerable imprecision-zones by linear regression and distance analysis. The system was implemented using sequence information from NCBI Map Viewer Build 36.3 and cytogenetic annotations from NCBI Entrez Gene. The genes position information is classified into five confidence groups: inconsistent-intolerable, inconsistent-tolerable, consistent-imprecise, consistent-precise and consistent-rough. The percentages of these confidence groups are 1.4%, 7.1%, 54.0%, 35.4% and 2.2%, respectively. Using information from NCBI Map Viewer Build 36.3 and NCBI OMIM, the percentages are 3.6%, 17.0%, 49.0%, 19.0%, and 11.4%, respectively. Combining these two results, a confidence table of genes position information was constructed. The Extended Cytogenetic Query System (ECQS) was built based on a unitary database with integrated information from NCBI Entrez Gene, NCBI Map Viewer and NCBI OMIM. We analyze the inconsistencies between cytogenetic annotations and sequence mapping by defining imprecision-zones of cytogenetic banding. ECQS is a web-based application, in which researchers can retrieve genes information and the related inconsistencies results by submitting a cytogenetic banding region as the query. The system can also automatically extend the query region to include genes in imprecision zones.