Human protein-coding genes and gene feature statistics in 2019

Abstract Objective A well-known limit of genome browsers is that the large amount of genome and gene data is not organized in the form of a searchable database, hampering full management of numerical data and free calculations. Due to the continuous increase of data deposited in genomic repositories...

Full description

Bibliographic Details
Main Authors: Allison Piovesan, Francesca Antonaros, Lorenza Vitale, Pierluigi Strippoli, Maria Chiara Pelleri, Maria Caracausi
Format: Article
Language:English
Published: BMC 2019-06-01
Series:BMC Research Notes
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13104-019-4343-8
id doaj-c1d93a9ee23f444a9b6554332c80364b
record_format Article
spelling doaj-c1d93a9ee23f444a9b6554332c80364b2020-11-25T03:41:56ZengBMCBMC Research Notes1756-05002019-06-011211510.1186/s13104-019-4343-8Human protein-coding genes and gene feature statistics in 2019Allison Piovesan0Francesca Antonaros1Lorenza Vitale2Pierluigi Strippoli3Maria Chiara Pelleri4Maria Caracausi5Unit of Histology, Embryology and Applied Biology, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of BolognaUnit of Histology, Embryology and Applied Biology, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of BolognaUnit of Histology, Embryology and Applied Biology, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of BolognaUnit of Histology, Embryology and Applied Biology, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of BolognaUnit of Histology, Embryology and Applied Biology, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of BolognaUnit of Histology, Embryology and Applied Biology, Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of BolognaAbstract Objective A well-known limit of genome browsers is that the large amount of genome and gene data is not organized in the form of a searchable database, hampering full management of numerical data and free calculations. Due to the continuous increase of data deposited in genomic repositories, their content revision and analysis is recommended. Using GeneBase, a software with a graphical interface able to import and elaborate National Center for Biotechnology Information (NCBI) Gene database entries, we provide tabulated spreadsheets updated to 2019 about human nuclear protein-coding gene data set ready to be used for any type of analysis about genes, transcripts and gene organization. Results Comparison with previous reports reveals substantial change in the number of known nuclear protein-coding genes (now 19,116), the protein-coding non-redundant transcriptome space [now 59,281,518 base pair (bp), 10.1% increase], the number of exons (now 562,164, 36.2% increase) due to a relevant increase of the RNA isoforms recorded. Other parameters such as gene, exon or intron mean and extreme length appear to have reached a stability that is unlikely to be substantially modified by human genome data updates, at least regarding protein-coding genes. Finally, we confirm that there are no human introns shorter than 30 bp.http://link.springer.com/article/10.1186/s13104-019-4343-8Human genesProtein-coding genesGene statistics
collection DOAJ
language English
format Article
sources DOAJ
author Allison Piovesan
Francesca Antonaros
Lorenza Vitale
Pierluigi Strippoli
Maria Chiara Pelleri
Maria Caracausi
spellingShingle Allison Piovesan
Francesca Antonaros
Lorenza Vitale
Pierluigi Strippoli
Maria Chiara Pelleri
Maria Caracausi
Human protein-coding genes and gene feature statistics in 2019
BMC Research Notes
Human genes
Protein-coding genes
Gene statistics
author_facet Allison Piovesan
Francesca Antonaros
Lorenza Vitale
Pierluigi Strippoli
Maria Chiara Pelleri
Maria Caracausi
author_sort Allison Piovesan
title Human protein-coding genes and gene feature statistics in 2019
title_short Human protein-coding genes and gene feature statistics in 2019
title_full Human protein-coding genes and gene feature statistics in 2019
title_fullStr Human protein-coding genes and gene feature statistics in 2019
title_full_unstemmed Human protein-coding genes and gene feature statistics in 2019
title_sort human protein-coding genes and gene feature statistics in 2019
publisher BMC
series BMC Research Notes
issn 1756-0500
publishDate 2019-06-01
description Abstract Objective A well-known limit of genome browsers is that the large amount of genome and gene data is not organized in the form of a searchable database, hampering full management of numerical data and free calculations. Due to the continuous increase of data deposited in genomic repositories, their content revision and analysis is recommended. Using GeneBase, a software with a graphical interface able to import and elaborate National Center for Biotechnology Information (NCBI) Gene database entries, we provide tabulated spreadsheets updated to 2019 about human nuclear protein-coding gene data set ready to be used for any type of analysis about genes, transcripts and gene organization. Results Comparison with previous reports reveals substantial change in the number of known nuclear protein-coding genes (now 19,116), the protein-coding non-redundant transcriptome space [now 59,281,518 base pair (bp), 10.1% increase], the number of exons (now 562,164, 36.2% increase) due to a relevant increase of the RNA isoforms recorded. Other parameters such as gene, exon or intron mean and extreme length appear to have reached a stability that is unlikely to be substantially modified by human genome data updates, at least regarding protein-coding genes. Finally, we confirm that there are no human introns shorter than 30 bp.
topic Human genes
Protein-coding genes
Gene statistics
url http://link.springer.com/article/10.1186/s13104-019-4343-8
work_keys_str_mv AT allisonpiovesan humanproteincodinggenesandgenefeaturestatisticsin2019
AT francescaantonaros humanproteincodinggenesandgenefeaturestatisticsin2019
AT lorenzavitale humanproteincodinggenesandgenefeaturestatisticsin2019
AT pierluigistrippoli humanproteincodinggenesandgenefeaturestatisticsin2019
AT mariachiarapelleri humanproteincodinggenesandgenefeaturestatisticsin2019
AT mariacaracausi humanproteincodinggenesandgenefeaturestatisticsin2019
_version_ 1724527339630493696