IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling

Prediction of the effect of a single-nucleotide variant (SNV) in an intronic region on aberrant pre-mRNA splicing is challenging except for an SNV affecting the canonical GU/AG splice sites (ss). To predict pathogenicity of SNVs at intronic positions −50 (Int-50) to −3 (Int-3) close to the 3’ ss, we...

Full description

Bibliographic Details
Main Authors: Jun-ichi Takeda, Sae Fukami, Akira Tamura, Akihide Shibata, Kinji Ohno
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2021.701076/full
id doaj-14aff9fb6e614f06b2297d27d3a5ff7a
record_format Article
spelling doaj-14aff9fb6e614f06b2297d27d3a5ff7a2021-07-19T11:53:47ZengFrontiers Media S.A.Frontiers in Genetics1664-80212021-07-011210.3389/fgene.2021.701076701076IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM ModelingJun-ichi Takeda0Sae Fukami1Akira Tamura2Akihide Shibata3Akihide Shibata4Kinji Ohno5Division of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, JapanDivision of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, JapanDivision of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, JapanDivision of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, JapanDepartment of Anesthesiology, Toranomon Hospital, Tokyo, JapanDivision of Neurogenetics, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, Nagoya, JapanPrediction of the effect of a single-nucleotide variant (SNV) in an intronic region on aberrant pre-mRNA splicing is challenging except for an SNV affecting the canonical GU/AG splice sites (ss). To predict pathogenicity of SNVs at intronic positions −50 (Int-50) to −3 (Int-3) close to the 3’ ss, we developed light gradient boosting machine (LightGBM)-based IntSplice2 models using pathogenic SNVs in the human gene mutation database (HGMD) and ClinVar and common SNVs in dbSNP with 0.01 ≤ minor allelic frequency (MAF) < 0.50. The LightGBM models were generated using features representing splicing cis-elements. The average recall/sensitivity and specificity of IntSplice2 by fivefold cross-validation (CV) of the training dataset were 0.764 and 0.884, respectively. The recall/sensitivity of IntSplice2 was lower than the average recall/sensitivity of 0.800 of IntSplice that we previously made with support vector machine (SVM) modeling for the same intronic positions. In contrast, the specificity of IntSplice2 was higher than the average specificity of 0.849 of IntSplice. For benchmarking (BM) of IntSplice2 with IntSplice, we made a test dataset that was not used to train IntSplice. After excluding the test dataset from the training dataset, we generated IntSplice2-BM and compared it with IntSplice using the test dataset. IntSplice2-BM was superior to IntSplice in all of the seven statistical measures of accuracy, precision, recall/sensitivity, specificity, F1 score, negative predictive value (NPV), and matthews correlation coefficient (MCC). We made the IntSplice2 web service at https://www.med.nagoya-u.ac.jp/neurogenetics/IntSplice2.https://www.frontiersin.org/articles/10.3389/fgene.2021.701076/fullsplice acceptor siteaberrant splicingsingle nucleotide variationsintronic mutationsLightGBM
collection DOAJ
language English
format Article
sources DOAJ
author Jun-ichi Takeda
Sae Fukami
Akira Tamura
Akihide Shibata
Akihide Shibata
Kinji Ohno
spellingShingle Jun-ichi Takeda
Sae Fukami
Akira Tamura
Akihide Shibata
Akihide Shibata
Kinji Ohno
IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
Frontiers in Genetics
splice acceptor site
aberrant splicing
single nucleotide variations
intronic mutations
LightGBM
author_facet Jun-ichi Takeda
Sae Fukami
Akira Tamura
Akihide Shibata
Akihide Shibata
Kinji Ohno
author_sort Jun-ichi Takeda
title IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
title_short IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
title_full IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
title_fullStr IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
title_full_unstemmed IntSplice2: Prediction of the Splicing Effects of Intronic Single-Nucleotide Variants Using LightGBM Modeling
title_sort intsplice2: prediction of the splicing effects of intronic single-nucleotide variants using lightgbm modeling
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2021-07-01
description Prediction of the effect of a single-nucleotide variant (SNV) in an intronic region on aberrant pre-mRNA splicing is challenging except for an SNV affecting the canonical GU/AG splice sites (ss). To predict pathogenicity of SNVs at intronic positions −50 (Int-50) to −3 (Int-3) close to the 3’ ss, we developed light gradient boosting machine (LightGBM)-based IntSplice2 models using pathogenic SNVs in the human gene mutation database (HGMD) and ClinVar and common SNVs in dbSNP with 0.01 ≤ minor allelic frequency (MAF) < 0.50. The LightGBM models were generated using features representing splicing cis-elements. The average recall/sensitivity and specificity of IntSplice2 by fivefold cross-validation (CV) of the training dataset were 0.764 and 0.884, respectively. The recall/sensitivity of IntSplice2 was lower than the average recall/sensitivity of 0.800 of IntSplice that we previously made with support vector machine (SVM) modeling for the same intronic positions. In contrast, the specificity of IntSplice2 was higher than the average specificity of 0.849 of IntSplice. For benchmarking (BM) of IntSplice2 with IntSplice, we made a test dataset that was not used to train IntSplice. After excluding the test dataset from the training dataset, we generated IntSplice2-BM and compared it with IntSplice using the test dataset. IntSplice2-BM was superior to IntSplice in all of the seven statistical measures of accuracy, precision, recall/sensitivity, specificity, F1 score, negative predictive value (NPV), and matthews correlation coefficient (MCC). We made the IntSplice2 web service at https://www.med.nagoya-u.ac.jp/neurogenetics/IntSplice2.
topic splice acceptor site
aberrant splicing
single nucleotide variations
intronic mutations
LightGBM
url https://www.frontiersin.org/articles/10.3389/fgene.2021.701076/full
work_keys_str_mv AT junichitakeda intsplice2predictionofthesplicingeffectsofintronicsinglenucleotidevariantsusinglightgbmmodeling
AT saefukami intsplice2predictionofthesplicingeffectsofintronicsinglenucleotidevariantsusinglightgbmmodeling
AT akiratamura intsplice2predictionofthesplicingeffectsofintronicsinglenucleotidevariantsusinglightgbmmodeling
AT akihideshibata intsplice2predictionofthesplicingeffectsofintronicsinglenucleotidevariantsusinglightgbmmodeling
AT akihideshibata intsplice2predictionofthesplicingeffectsofintronicsinglenucleotidevariantsusinglightgbmmodeling
AT kinjiohno intsplice2predictionofthesplicingeffectsofintronicsinglenucleotidevariantsusinglightgbmmodeling
_version_ 1721294947540271104