HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations]
Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome In...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
F1000 Research Ltd
2020-12-01
|
Series: | F1000Research |
Online Access: | https://f1000research.com/articles/9-1493/v1 |
id |
doaj-504a3c9ad1474c1eb9bc5387a7a21850 |
---|---|
record_format |
Article |
spelling |
doaj-504a3c9ad1474c1eb9bc5387a7a218502021-02-09T14:42:55ZengF1000 Research LtdF1000Research2046-14022020-12-01910.12688/f1000research.28033.131006HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations]Sehyun Oh0Jasmine Abdelnabi1Ragheed Al-Dulaimi2Ayush Aggarwal3Marcel Ramos4Sean Davis5Markus Riester6Levi Waldron7Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USACSIR-Institute of Genomics and Integrative Biology, New Delhi, 110025, IndiaEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USACenter for Cancer Research, National Cancer Institute, Maryland, 20892, USANovartis Institutes for BioMedical Research Incorporation, Massachusetts, 02139, USAEpidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York, New York, 10027, USAGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.https://f1000research.com/articles/9-1493/v1 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sehyun Oh Jasmine Abdelnabi Ragheed Al-Dulaimi Ayush Aggarwal Marcel Ramos Sean Davis Markus Riester Levi Waldron |
spellingShingle |
Sehyun Oh Jasmine Abdelnabi Ragheed Al-Dulaimi Ayush Aggarwal Marcel Ramos Sean Davis Markus Riester Levi Waldron HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] F1000Research |
author_facet |
Sehyun Oh Jasmine Abdelnabi Ragheed Al-Dulaimi Ayush Aggarwal Marcel Ramos Sean Davis Markus Riester Levi Waldron |
author_sort |
Sehyun Oh |
title |
HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] |
title_short |
HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] |
title_full |
HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] |
title_fullStr |
HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] |
title_full_unstemmed |
HGNChelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] |
title_sort |
hgnchelper: identification and correction of invalid gene symbols for human and mouse [version 1; peer review: 2 approved, 1 approved with reservations] |
publisher |
F1000 Research Ltd |
series |
F1000Research |
issn |
2046-1402 |
publishDate |
2020-12-01 |
description |
Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN. |
url |
https://f1000research.com/articles/9-1493/v1 |
work_keys_str_mv |
AT sehyunoh hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT jasmineabdelnabi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT ragheedaldulaimi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT ayushaggarwal hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT marcelramos hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT seandavis hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT markusriester hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations AT leviwaldron hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouseversion1peerreview2approved1approvedwithreservations |
_version_ |
1724276812857475072 |