Orphan Genes Bioinformatics : Identification and properties of de novo created genes

Even today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organism...

Full description

Bibliographic Details
Main Author: Basile, Walter
Format: Doctoral Thesis
Language:English
Published: Stockholms universitet, Institutionen för biokemi och biofysik 2017
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-149168
http://nbn-resolving.de/urn:isbn:978-91-7797-085-9
http://nbn-resolving.de/urn:isbn:978-91-7797-086-6
id ndltd-UPSALLA1-oai-DiVA.org-su-149168
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-su-1491682017-12-21T05:34:25ZOrphan Genes Bioinformatics : Identification and properties of de novo created genesengBasile, WalterStockholms universitet, Institutionen för biokemi och biofysikStockholm : Department of Biochemistry and Biophysics, Stockholm University2017bioinformaticsde novoorphansevolutionary geneticsBiological SciencesBiologiska vetenskaperEven today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organisms via lateral transfer; others have homologs, but mutated beyond the point of recognition. However, a sizeable fraction of orphan genes is unambiguously created via "de novo" mechanisms. The study of such novel genes can contribute to our understanding of the emergence of functional novelty and the adaptation of species to new ecological niches. In this work, we first survey the field of orphan studies, and illustrate some of the common issues. Next, we analyze some of the intrinsic properties of orphans proteins, including secondary structure elements and Intrinsic Structural Disorder; specifically, we observe that in young proteins the relationship between these properties and the G+C content of their coding sequence is stronger than in older proteins. We then tackle some of the methodological problems often found in orphan studies. We find that using evolutionarily close species, and sensitive, state-of-the art homology recognition methods is instrumental to the identification of a set of orphans enriched in de novo created ones. Finally, we compare how intrinsic disorder is distributed in bacteria versus eukaryota. Eukaryotic proteins are longer and more disordered; the difference is to be attributed primarily to eukaryotic-specific domains and linker regions. In these sections of the proteins, a higher frequency of the disorder-promoting amino acid Serine can be observed in Eukaryotes. <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Submitted. Paper 4: Manuscript.</p>Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-149168urn:isbn:978-91-7797-085-9urn:isbn:978-91-7797-086-6application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic bioinformatics
de novo
orphans
evolutionary genetics
Biological Sciences
Biologiska vetenskaper
spellingShingle bioinformatics
de novo
orphans
evolutionary genetics
Biological Sciences
Biologiska vetenskaper
Basile, Walter
Orphan Genes Bioinformatics : Identification and properties of de novo created genes
description Even today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organisms via lateral transfer; others have homologs, but mutated beyond the point of recognition. However, a sizeable fraction of orphan genes is unambiguously created via "de novo" mechanisms. The study of such novel genes can contribute to our understanding of the emergence of functional novelty and the adaptation of species to new ecological niches. In this work, we first survey the field of orphan studies, and illustrate some of the common issues. Next, we analyze some of the intrinsic properties of orphans proteins, including secondary structure elements and Intrinsic Structural Disorder; specifically, we observe that in young proteins the relationship between these properties and the G+C content of their coding sequence is stronger than in older proteins. We then tackle some of the methodological problems often found in orphan studies. We find that using evolutionarily close species, and sensitive, state-of-the art homology recognition methods is instrumental to the identification of a set of orphans enriched in de novo created ones. Finally, we compare how intrinsic disorder is distributed in bacteria versus eukaryota. Eukaryotic proteins are longer and more disordered; the difference is to be attributed primarily to eukaryotic-specific domains and linker regions. In these sections of the proteins, a higher frequency of the disorder-promoting amino acid Serine can be observed in Eukaryotes. === <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Submitted. Paper 4: Manuscript.</p>
author Basile, Walter
author_facet Basile, Walter
author_sort Basile, Walter
title Orphan Genes Bioinformatics : Identification and properties of de novo created genes
title_short Orphan Genes Bioinformatics : Identification and properties of de novo created genes
title_full Orphan Genes Bioinformatics : Identification and properties of de novo created genes
title_fullStr Orphan Genes Bioinformatics : Identification and properties of de novo created genes
title_full_unstemmed Orphan Genes Bioinformatics : Identification and properties of de novo created genes
title_sort orphan genes bioinformatics : identification and properties of de novo created genes
publisher Stockholms universitet, Institutionen för biokemi och biofysik
publishDate 2017
url http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-149168
http://nbn-resolving.de/urn:isbn:978-91-7797-085-9
http://nbn-resolving.de/urn:isbn:978-91-7797-086-6
work_keys_str_mv AT basilewalter orphangenesbioinformaticsidentificationandpropertiesofdenovocreatedgenes
_version_ 1718566073361498112