Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns

Background When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecul...

Full description

Bibliographic Details
Main Authors: Yan-mei Dong, Li-da Qin, Yi-fan Tong, Qi-en He, Ling Wang, Kai Song
Format: Article
Language:English
Published: PeerJ Inc. 2020-01-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/8349.pdf
id doaj-efd6aab386d84b178bcd211ec55a53be
record_format Article
spelling doaj-efd6aab386d84b178bcd211ec55a53be2020-11-25T01:18:00ZengPeerJ Inc.PeerJ2167-83592020-01-018e834910.7717/peerj.8349Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patternsYan-mei Dong0Li-da Qin1Yi-fan Tong2Qi-en He3Ling Wang4Kai Song5School of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaThe First Affiliated Hospital Oncology, Dalian Medical University, Dalian, Liaoning, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaBackground When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. Methods Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. Results Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD.https://peerj.com/articles/8349.pdfMultiple genomeLung adenocarcinomaPrecision MedicineTobacco-related signature genes
collection DOAJ
language English
format Article
sources DOAJ
author Yan-mei Dong
Li-da Qin
Yi-fan Tong
Qi-en He
Ling Wang
Kai Song
spellingShingle Yan-mei Dong
Li-da Qin
Yi-fan Tong
Qi-en He
Ling Wang
Kai Song
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
PeerJ
Multiple genome
Lung adenocarcinoma
Precision Medicine
Tobacco-related signature genes
author_facet Yan-mei Dong
Li-da Qin
Yi-fan Tong
Qi-en He
Ling Wang
Kai Song
author_sort Yan-mei Dong
title Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_short Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_full Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_fullStr Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_full_unstemmed Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_sort multiple genome pattern analysis and signature gene identification for the caucasian lung adenocarcinoma patients with different tobacco exposure patterns
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2020-01-01
description Background When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. Methods Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. Results Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD.
topic Multiple genome
Lung adenocarcinoma
Precision Medicine
Tobacco-related signature genes
url https://peerj.com/articles/8349.pdf
work_keys_str_mv AT yanmeidong multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT lidaqin multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT yifantong multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT qienhe multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT lingwang multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT kaisong multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
_version_ 1725144358325321728