Computational identification of strain-, species- and genus-specific proteins

<p>Abstract</p> <p>Background</p> <p>The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relation...

Full description

Bibliographic Details
Main Authors: Thiagarajan Rathi, Murthy Sudhir, Natale Darren A, Mazumder Raja, Wu Cathy H
Format: Article
Language:English
Published: BMC 2005-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/279
id doaj-238f8aa390bf422ea93860ff1feea9ee
record_format Article
spelling doaj-238f8aa390bf422ea93860ff1feea9ee2020-11-24T23:58:02ZengBMCBMC Bioinformatics1471-21052005-11-016127910.1186/1471-2105-6-279Computational identification of strain-, species- and genus-specific proteinsThiagarajan RathiMurthy SudhirNatale Darren AMazumder RajaWu Cathy H<p>Abstract</p> <p>Background</p> <p>The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets.</p> <p>Description</p> <p>A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at <url>http://pir.georgetown.edu/cupid</url>.</p> <p>Conclusion</p> <p>CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.</p> http://www.biomedcentral.com/1471-2105/6/279
collection DOAJ
language English
format Article
sources DOAJ
author Thiagarajan Rathi
Murthy Sudhir
Natale Darren A
Mazumder Raja
Wu Cathy H
spellingShingle Thiagarajan Rathi
Murthy Sudhir
Natale Darren A
Mazumder Raja
Wu Cathy H
Computational identification of strain-, species- and genus-specific proteins
BMC Bioinformatics
author_facet Thiagarajan Rathi
Murthy Sudhir
Natale Darren A
Mazumder Raja
Wu Cathy H
author_sort Thiagarajan Rathi
title Computational identification of strain-, species- and genus-specific proteins
title_short Computational identification of strain-, species- and genus-specific proteins
title_full Computational identification of strain-, species- and genus-specific proteins
title_fullStr Computational identification of strain-, species- and genus-specific proteins
title_full_unstemmed Computational identification of strain-, species- and genus-specific proteins
title_sort computational identification of strain-, species- and genus-specific proteins
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-11-01
description <p>Abstract</p> <p>Background</p> <p>The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets.</p> <p>Description</p> <p>A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at <url>http://pir.georgetown.edu/cupid</url>.</p> <p>Conclusion</p> <p>CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.</p>
url http://www.biomedcentral.com/1471-2105/6/279
work_keys_str_mv AT thiagarajanrathi computationalidentificationofstrainspeciesandgenusspecificproteins
AT murthysudhir computationalidentificationofstrainspeciesandgenusspecificproteins
AT nataledarrena computationalidentificationofstrainspeciesandgenusspecificproteins
AT mazumderraja computationalidentificationofstrainspeciesandgenusspecificproteins
AT wucathyh computationalidentificationofstrainspeciesandgenusspecificproteins
_version_ 1725452199610286080