Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population

Abstract Background Next-generation sequencing has allowed for the identification of different genetic variations, which are known to contribute to diseases. Of these, insertions and deletions are the second most abundant type of variations in the genome, but their biological importance or disease a...

Full description

Bibliographic Details
Main Authors: Jing Hao Wong, Daichi Shigemizu, Yukiko Yoshii, Shintaro Akiyama, Azusa Tanaka, Hidewaki Nakagawa, Shu Narumiya, Akihiro Fujimoto
Format: Article
Language:English
Published: BMC 2019-07-01
Series:Genome Medicine
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13073-019-0656-4
id doaj-6e6ded06d44e426c815c8e48ffe41fe4
record_format Article
spelling doaj-6e6ded06d44e426c815c8e48ffe41fe42020-11-25T03:45:04ZengBMCGenome Medicine1756-994X2019-07-0111111510.1186/s13073-019-0656-4Identification of intermediate-sized deletions and inference of their impact on gene expression in a human populationJing Hao Wong0Daichi Shigemizu1Yukiko Yoshii2Shintaro Akiyama3Azusa Tanaka4Hidewaki Nakagawa5Shu Narumiya6Akihiro Fujimoto7Department of Drug Discovery Medicine, Kyoto University Graduate School of MedicineMedical Genome Center, National Center for Geriatrics and GerontologyDepartment of Drug Discovery Medicine, Kyoto University Graduate School of MedicineMedical Genome Center, National Center for Geriatrics and GerontologyDepartment of Drug Discovery Medicine, Kyoto University Graduate School of MedicineLaboratory for Cancer Genomics, RIKEN Center for Integrative Medical ScienceDepartment of Drug Discovery Medicine, Kyoto University Graduate School of MedicineDepartment of Drug Discovery Medicine, Kyoto University Graduate School of MedicineAbstract Background Next-generation sequencing has allowed for the identification of different genetic variations, which are known to contribute to diseases. Of these, insertions and deletions are the second most abundant type of variations in the genome, but their biological importance or disease association is not well-studied, especially for deletions of intermediate sizes. Methods We identified intermediate-sized deletions from whole-genome sequencing (WGS) data of Japanese samples (n = 174) with a novel deletion calling method which considered multiple samples. These deletions were used to construct a reference panel for use in imputation. Imputation was then conducted using the reference panel and data from 82 publically available Japanese samples with gene expression data. The accuracy of the deletion calling and imputation was examined with Nanopore long-read sequencing technology. We also conducted an expression quantitative trait loci (eQTL) association analysis using the deletions to infer their functional impacts on genes, before characterizing the deletions causal for gene expression level changes. Results We obtained a set of polymorphic 4378 high-confidence deletions and constructed a reference panel. The deletions were successfully imputed into the Japanese samples with high accuracy (97.3%). The eQTL analysis identified 181 deletions (4.1%) suggested as causal for gene expression level changes. The causal deletion candidates were significantly enriched in promoters, super-enhancers, and transcription elongation chromatin states. Generation of deletions in a cell line with the CRISPR-Cas9 system confirmed that they were indeed causative variants for gene expression change. Furthermore, one of the deletions was observed to affect the gene expression levels of a gene it was not located in. Conclusions This paper reports an accurate deletion calling method for genotype imputation at the whole genome level and shows the importance of intermediate-sized deletions in the human population.http://link.springer.com/article/10.1186/s13073-019-0656-4Intermediate-sized deletionExpression quantitative trait loci (eQTL)Genomic imputationLong-read sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Jing Hao Wong
Daichi Shigemizu
Yukiko Yoshii
Shintaro Akiyama
Azusa Tanaka
Hidewaki Nakagawa
Shu Narumiya
Akihiro Fujimoto
spellingShingle Jing Hao Wong
Daichi Shigemizu
Yukiko Yoshii
Shintaro Akiyama
Azusa Tanaka
Hidewaki Nakagawa
Shu Narumiya
Akihiro Fujimoto
Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
Genome Medicine
Intermediate-sized deletion
Expression quantitative trait loci (eQTL)
Genomic imputation
Long-read sequencing
author_facet Jing Hao Wong
Daichi Shigemizu
Yukiko Yoshii
Shintaro Akiyama
Azusa Tanaka
Hidewaki Nakagawa
Shu Narumiya
Akihiro Fujimoto
author_sort Jing Hao Wong
title Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
title_short Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
title_full Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
title_fullStr Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
title_full_unstemmed Identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
title_sort identification of intermediate-sized deletions and inference of their impact on gene expression in a human population
publisher BMC
series Genome Medicine
issn 1756-994X
publishDate 2019-07-01
description Abstract Background Next-generation sequencing has allowed for the identification of different genetic variations, which are known to contribute to diseases. Of these, insertions and deletions are the second most abundant type of variations in the genome, but their biological importance or disease association is not well-studied, especially for deletions of intermediate sizes. Methods We identified intermediate-sized deletions from whole-genome sequencing (WGS) data of Japanese samples (n = 174) with a novel deletion calling method which considered multiple samples. These deletions were used to construct a reference panel for use in imputation. Imputation was then conducted using the reference panel and data from 82 publically available Japanese samples with gene expression data. The accuracy of the deletion calling and imputation was examined with Nanopore long-read sequencing technology. We also conducted an expression quantitative trait loci (eQTL) association analysis using the deletions to infer their functional impacts on genes, before characterizing the deletions causal for gene expression level changes. Results We obtained a set of polymorphic 4378 high-confidence deletions and constructed a reference panel. The deletions were successfully imputed into the Japanese samples with high accuracy (97.3%). The eQTL analysis identified 181 deletions (4.1%) suggested as causal for gene expression level changes. The causal deletion candidates were significantly enriched in promoters, super-enhancers, and transcription elongation chromatin states. Generation of deletions in a cell line with the CRISPR-Cas9 system confirmed that they were indeed causative variants for gene expression change. Furthermore, one of the deletions was observed to affect the gene expression levels of a gene it was not located in. Conclusions This paper reports an accurate deletion calling method for genotype imputation at the whole genome level and shows the importance of intermediate-sized deletions in the human population.
topic Intermediate-sized deletion
Expression quantitative trait loci (eQTL)
Genomic imputation
Long-read sequencing
url http://link.springer.com/article/10.1186/s13073-019-0656-4
work_keys_str_mv AT jinghaowong identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT daichishigemizu identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT yukikoyoshii identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT shintaroakiyama identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT azusatanaka identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT hidewakinakagawa identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT shunarumiya identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
AT akihirofujimoto identificationofintermediatesizeddeletionsandinferenceoftheirimpactongeneexpressioninahumanpopulation
_version_ 1724511656439971840