Assembly and annotation of an Ashkenazi human reference genome

Abstract Background Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genom...

Full description

Bibliographic Details
Main Authors: Alaina Shumate, Aleksey V. Zimin, Rachel M. Sherman, Daniela Puiu, Justin M. Wagner, Nathan D. Olson, Mihaela Pertea, Marc L. Salit, Justin M. Zook, Steven L. Salzberg
Format: Article
Language:English
Published: BMC 2020-06-01
Series:Genome Biology
Online Access:http://link.springer.com/article/10.1186/s13059-020-02047-7
id doaj-34b6a666474a481a80ed46ff3264dec4
record_format Article
spelling doaj-34b6a666474a481a80ed46ff3264dec42020-11-25T03:28:59ZengBMCGenome Biology1474-760X2020-06-0121111810.1186/s13059-020-02047-7Assembly and annotation of an Ashkenazi human reference genomeAlaina Shumate0Aleksey V. Zimin1Rachel M. Sherman2Daniela Puiu3Justin M. Wagner4Nathan D. Olson5Mihaela Pertea6Marc L. Salit7Justin M. Zook8Steven L. Salzberg9Center for Computational Biology, Johns Hopkins UniversityCenter for Computational Biology, Johns Hopkins UniversityCenter for Computational Biology, Johns Hopkins UniversityCenter for Computational Biology, Johns Hopkins UniversityNational Institute of Standards and TechnologyNational Institute of Standards and TechnologyCenter for Computational Biology, Johns Hopkins UniversityJoint Initiative for Metrology in Biology, Stanford UniversityNational Institute of Standards and TechnologyCenter for Computational Biology, Johns Hopkins UniversityAbstract Background Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. Results Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. Conclusions The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.http://link.springer.com/article/10.1186/s13059-020-02047-7
collection DOAJ
language English
format Article
sources DOAJ
author Alaina Shumate
Aleksey V. Zimin
Rachel M. Sherman
Daniela Puiu
Justin M. Wagner
Nathan D. Olson
Mihaela Pertea
Marc L. Salit
Justin M. Zook
Steven L. Salzberg
spellingShingle Alaina Shumate
Aleksey V. Zimin
Rachel M. Sherman
Daniela Puiu
Justin M. Wagner
Nathan D. Olson
Mihaela Pertea
Marc L. Salit
Justin M. Zook
Steven L. Salzberg
Assembly and annotation of an Ashkenazi human reference genome
Genome Biology
author_facet Alaina Shumate
Aleksey V. Zimin
Rachel M. Sherman
Daniela Puiu
Justin M. Wagner
Nathan D. Olson
Mihaela Pertea
Marc L. Salit
Justin M. Zook
Steven L. Salzberg
author_sort Alaina Shumate
title Assembly and annotation of an Ashkenazi human reference genome
title_short Assembly and annotation of an Ashkenazi human reference genome
title_full Assembly and annotation of an Ashkenazi human reference genome
title_fullStr Assembly and annotation of an Ashkenazi human reference genome
title_full_unstemmed Assembly and annotation of an Ashkenazi human reference genome
title_sort assembly and annotation of an ashkenazi human reference genome
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2020-06-01
description Abstract Background Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. Results Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. Conclusions The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.
url http://link.springer.com/article/10.1186/s13059-020-02047-7
work_keys_str_mv AT alainashumate assemblyandannotationofanashkenazihumanreferencegenome
AT alekseyvzimin assemblyandannotationofanashkenazihumanreferencegenome
AT rachelmsherman assemblyandannotationofanashkenazihumanreferencegenome
AT danielapuiu assemblyandannotationofanashkenazihumanreferencegenome
AT justinmwagner assemblyandannotationofanashkenazihumanreferencegenome
AT nathandolson assemblyandannotationofanashkenazihumanreferencegenome
AT mihaelapertea assemblyandannotationofanashkenazihumanreferencegenome
AT marclsalit assemblyandannotationofanashkenazihumanreferencegenome
AT justinmzook assemblyandannotationofanashkenazihumanreferencegenome
AT stevenlsalzberg assemblyandannotationofanashkenazihumanreferencegenome
_version_ 1724581512230207488