DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery

Abstract Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely...

Full description

Bibliographic Details
Main Authors: Haodong Xu, Peilin Jia, Zhongming Zhao
Format: Article
Language:English
Published: Wiley 2021-05-01
Series:Advanced Science
Subjects:
EBV
HBV
HPV
Online Access:https://doi.org/10.1002/advs.202004958
id doaj-c930b648def6469c974ac77b3075eed6
record_format Article
spelling doaj-c930b648def6469c974ac77b3075eed62021-05-05T07:56:42ZengWileyAdvanced Science2198-38442021-05-0189n/an/a10.1002/advs.202004958DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif DiscoveryHaodong Xu0Peilin Jia1Zhongming Zhao2Center for Precision Health School of Biomedical Informatics The University of Texas Health Science Center at Houston (UTHealth) Houston TX 77030 USACenter for Precision Health School of Biomedical Informatics The University of Texas Health Science Center at Houston (UTHealth) Houston TX 77030 USACenter for Precision Health School of Biomedical Informatics The University of Texas Health Science Center at Houston (UTHealth) Houston TX 77030 USAAbstract Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely DeepVISP, for accurately predicting oncogenic virus integration sites (VISs) in the human genome. Using the curated benchmark integration data of three viruses, hepatitis B virus (HBV), human herpesvirus (HPV), and Epstein‐Barr virus (EBV), DeepVISP achieves high accuracy and robust performance for all three viruses through automatically learning informative features and essential genomic positions only from the DNA sequences. In comparison, DeepVISP outperforms conventional machine learning methods by 8.43–34.33% measured by area under curve (AUC) value enhancement in three viruses. Moreover, DeepVISP can decode cis‐regulatory factors that are potentially involved in virus integration and tumorigenesis, such as HOXB7, IKZF1, and LHX6. These findings are supported by multiple lines of evidence in literature. The clustering analysis of the informative motifs reveales that the representative k‐mers in clusters could help guide virus recognition of the host genes. A user‐friendly web server is developed for predicting putative oncogenic VISs in the human genome using DeepVISP.https://doi.org/10.1002/advs.202004958cancerdeep learningEBVHBVHPVviruses
collection DOAJ
language English
format Article
sources DOAJ
author Haodong Xu
Peilin Jia
Zhongming Zhao
spellingShingle Haodong Xu
Peilin Jia
Zhongming Zhao
DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
Advanced Science
cancer
deep learning
EBV
HBV
HPV
viruses
author_facet Haodong Xu
Peilin Jia
Zhongming Zhao
author_sort Haodong Xu
title DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_short DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_full DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_fullStr DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_full_unstemmed DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery
title_sort deepvisp: deep learning for virus site integration prediction and motif discovery
publisher Wiley
series Advanced Science
issn 2198-3844
publishDate 2021-05-01
description Abstract Approximately 15% of human cancers are estimated to be attributed to viruses. Virus sequences can be integrated into the host genome, leading to genomic instability and carcinogenesis. Here, a new deep convolutional neural network (CNN) model is developed with attention architecture, namely DeepVISP, for accurately predicting oncogenic virus integration sites (VISs) in the human genome. Using the curated benchmark integration data of three viruses, hepatitis B virus (HBV), human herpesvirus (HPV), and Epstein‐Barr virus (EBV), DeepVISP achieves high accuracy and robust performance for all three viruses through automatically learning informative features and essential genomic positions only from the DNA sequences. In comparison, DeepVISP outperforms conventional machine learning methods by 8.43–34.33% measured by area under curve (AUC) value enhancement in three viruses. Moreover, DeepVISP can decode cis‐regulatory factors that are potentially involved in virus integration and tumorigenesis, such as HOXB7, IKZF1, and LHX6. These findings are supported by multiple lines of evidence in literature. The clustering analysis of the informative motifs reveales that the representative k‐mers in clusters could help guide virus recognition of the host genes. A user‐friendly web server is developed for predicting putative oncogenic VISs in the human genome using DeepVISP.
topic cancer
deep learning
EBV
HBV
HPV
viruses
url https://doi.org/10.1002/advs.202004958
work_keys_str_mv AT haodongxu deepvispdeeplearningforvirussiteintegrationpredictionandmotifdiscovery
AT peilinjia deepvispdeeplearningforvirussiteintegrationpredictionandmotifdiscovery
AT zhongmingzhao deepvispdeeplearningforvirussiteintegrationpredictionandmotifdiscovery
_version_ 1721467805305405440