De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes tha...
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-10-01
|
Series: | Genes |
Subjects: | |
Online Access: | http://www.mdpi.com/2073-4425/9/10/486 |
id |
doaj-d5295c0ad00449b0b8f2592137db29a3 |
---|---|
record_format |
Article |
spelling |
doaj-d5295c0ad00449b0b8f2592137db29a32020-11-25T00:16:49ZengMDPI AGGenes2073-44252018-10-0191048610.3390/genes9100486genes9100486De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing DataAdam Ameur0Huiwen Che1Marcel Martin2Ignas Bunikis3Johan Dahlberg4Ida Höijer5Susana Häggqvist6Francesco Vezzi7Jessica Nordlund8Pall Olason9Lars Feuk10Ulf Gyllensten11Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Biochemistry and Biophysics (DBB), Stockholm University, 114 19 Stockholm, SwedenScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Medical Sciences, Molecular Medicine, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Biochemistry and Biophysics (DBB), Stockholm University, 114 19 Stockholm, SwedenScience for Life Laboratory, Department of Medical Sciences, Molecular Medicine, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenScience for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, SwedenThe current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.http://www.mdpi.com/2073-4425/9/10/486de novo assemblySMRT sequencingGRCh38human reference genomehuman whole-genome sequencingpopulation sequencingSwedish population |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Adam Ameur Huiwen Che Marcel Martin Ignas Bunikis Johan Dahlberg Ida Höijer Susana Häggqvist Francesco Vezzi Jessica Nordlund Pall Olason Lars Feuk Ulf Gyllensten |
spellingShingle |
Adam Ameur Huiwen Che Marcel Martin Ignas Bunikis Johan Dahlberg Ida Höijer Susana Häggqvist Francesco Vezzi Jessica Nordlund Pall Olason Lars Feuk Ulf Gyllensten De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data Genes de novo assembly SMRT sequencing GRCh38 human reference genome human whole-genome sequencing population sequencing Swedish population |
author_facet |
Adam Ameur Huiwen Che Marcel Martin Ignas Bunikis Johan Dahlberg Ida Höijer Susana Häggqvist Francesco Vezzi Jessica Nordlund Pall Olason Lars Feuk Ulf Gyllensten |
author_sort |
Adam Ameur |
title |
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_short |
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_full |
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_fullStr |
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_full_unstemmed |
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_sort |
de novo assembly of two swedish genomes reveals missing segments from the human grch38 reference and improves variant calling of population-scale sequencing data |
publisher |
MDPI AG |
series |
Genes |
issn |
2073-4425 |
publishDate |
2018-10-01 |
description |
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data. |
topic |
de novo assembly SMRT sequencing GRCh38 human reference genome human whole-genome sequencing population sequencing Swedish population |
url |
http://www.mdpi.com/2073-4425/9/10/486 |
work_keys_str_mv |
AT adamameur denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT huiwenche denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT marcelmartin denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT ignasbunikis denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT johandahlberg denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT idahoijer denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT susanahaggqvist denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT francescovezzi denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT jessicanordlund denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT pallolason denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT larsfeuk denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT ulfgyllensten denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata |
_version_ |
1725382358665789440 |