Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>

<p>Abstract</p> <p>Background</p> <p>Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder.<...

Full description

Bibliographic Details
Main Author: Harrison Paul M
Format: Article
Language:English
Published: BMC 2006-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/441
id doaj-cafb1a9e15074a3483af6c6a6673b5db
record_format Article
spelling doaj-cafb1a9e15074a3483af6c6a6673b5db2020-11-25T00:16:19ZengBMCBMC Bioinformatics1471-21052006-10-017144110.1186/1471-2105-7-441Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>Harrison Paul M<p>Abstract</p> <p>Background</p> <p>Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder.</p> <p>Results</p> <p>We derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan proteomes. Sequences are initially scanned for the lowest-probability subsequences (LPSs) for single amino-acid types; subsequently, an exhaustive search for lowest probability subsequences (LPSs) for multiple residue types is performed iteratively until convergence, to define CB region boundaries. We analysed > 40,000 CB regions with > 20 million residues; strikingly, nine single-/double- residue biases are universally abundant, and are consistently highly ranked across both vertebrates and invertebrates. To home in subpopulations of CB regions of interest in human and <it>D. melanogaster</it>, we analysed CB region lengths, conservation, inferred functional categories and predicted protein disorder, and filtered for coiled coils and protein structures. In particular, we found that some of the universally abundant CB regions have significant associations to transcription and nuclear localization in Human and <it>Drosophila</it>, and are also predicted to be moderately or highly disordered. Focussing on Q-based biased regions, we found that these regions are typically only well conserved within mammals (appearing in 60–80% of orthologs), with shorter human transcription-related CB regions being unconserved outside of mammals; they are also preferentially linked to protein domains such as the <it>homeodomain </it>and <it>glucocorticoid-receptor DNA-binding domain</it>. In general, only ~40–50% of residues in these human and <it>Drosophila </it>CB regions have predicted protein disorder.</p> <p>Conclusion</p> <p>This data is of use for the further functional characterization of genes, and for structural genomics initiatives.</p> http://www.biomedcentral.com/1471-2105/7/441
collection DOAJ
language English
format Article
sources DOAJ
author Harrison Paul M
spellingShingle Harrison Paul M
Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>
BMC Bioinformatics
author_facet Harrison Paul M
author_sort Harrison Paul M
title Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>
title_short Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>
title_full Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>
title_fullStr Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>
title_full_unstemmed Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>Drosophila</it>
title_sort exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and <it>drosophila</it>
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-10-01
description <p>Abstract</p> <p>Background</p> <p>Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder.</p> <p>Results</p> <p>We derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan proteomes. Sequences are initially scanned for the lowest-probability subsequences (LPSs) for single amino-acid types; subsequently, an exhaustive search for lowest probability subsequences (LPSs) for multiple residue types is performed iteratively until convergence, to define CB region boundaries. We analysed > 40,000 CB regions with > 20 million residues; strikingly, nine single-/double- residue biases are universally abundant, and are consistently highly ranked across both vertebrates and invertebrates. To home in subpopulations of CB regions of interest in human and <it>D. melanogaster</it>, we analysed CB region lengths, conservation, inferred functional categories and predicted protein disorder, and filtered for coiled coils and protein structures. In particular, we found that some of the universally abundant CB regions have significant associations to transcription and nuclear localization in Human and <it>Drosophila</it>, and are also predicted to be moderately or highly disordered. Focussing on Q-based biased regions, we found that these regions are typically only well conserved within mammals (appearing in 60–80% of orthologs), with shorter human transcription-related CB regions being unconserved outside of mammals; they are also preferentially linked to protein domains such as the <it>homeodomain </it>and <it>glucocorticoid-receptor DNA-binding domain</it>. In general, only ~40–50% of residues in these human and <it>Drosophila </it>CB regions have predicted protein disorder.</p> <p>Conclusion</p> <p>This data is of use for the further functional characterization of genes, and for structural genomics initiatives.</p>
url http://www.biomedcentral.com/1471-2105/7/441
work_keys_str_mv AT harrisonpaulm exhaustiveassignmentofcompositionalbiasrevealsuniversallyprevalentbiasedregionsanalysisoffunctionalassociationsinhumananditdrosophilait
_version_ 1725383290284670976