Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
Background When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecul...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2020-01-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/8349.pdf |
id |
doaj-efd6aab386d84b178bcd211ec55a53be |
---|---|
record_format |
Article |
spelling |
doaj-efd6aab386d84b178bcd211ec55a53be2020-11-25T01:18:00ZengPeerJ Inc.PeerJ2167-83592020-01-018e834910.7717/peerj.8349Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patternsYan-mei Dong0Li-da Qin1Yi-fan Tong2Qi-en He3Ling Wang4Kai Song5School of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaThe First Affiliated Hospital Oncology, Dalian Medical University, Dalian, Liaoning, ChinaSchool of Chemical Engineering and Technology, Tianjin University, Tianjin, ChinaBackground When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. Methods Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. Results Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD.https://peerj.com/articles/8349.pdfMultiple genomeLung adenocarcinomaPrecision MedicineTobacco-related signature genes |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yan-mei Dong Li-da Qin Yi-fan Tong Qi-en He Ling Wang Kai Song |
spellingShingle |
Yan-mei Dong Li-da Qin Yi-fan Tong Qi-en He Ling Wang Kai Song Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns PeerJ Multiple genome Lung adenocarcinoma Precision Medicine Tobacco-related signature genes |
author_facet |
Yan-mei Dong Li-da Qin Yi-fan Tong Qi-en He Ling Wang Kai Song |
author_sort |
Yan-mei Dong |
title |
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_short |
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_full |
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_fullStr |
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_full_unstemmed |
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_sort |
multiple genome pattern analysis and signature gene identification for the caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
publisher |
PeerJ Inc. |
series |
PeerJ |
issn |
2167-8359 |
publishDate |
2020-01-01 |
description |
Background When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. Methods Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. Results Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD. |
topic |
Multiple genome Lung adenocarcinoma Precision Medicine Tobacco-related signature genes |
url |
https://peerj.com/articles/8349.pdf |
work_keys_str_mv |
AT yanmeidong multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT lidaqin multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT yifantong multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT qienhe multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT lingwang multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT kaisong multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns |
_version_ |
1725144358325321728 |