ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis

For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discov...

Full description

Bibliographic Details
Main Authors:	Saurav Mallik, Zhongming Zhao
Format:	Article
Language:	English
Published:	MDPI AG 2017-12-01
Series:	Genes
Subjects:	gene co-expression modules Limma association rule mining dynamic tree cut method gene expression markers lung squamous cell carcinoma
Online Access:	https://www.mdpi.com/2073-4425/9/1/7

id	doaj-2314c25d2a964debbcb0f025dd442275
record_format	Article
spelling	doaj-2314c25d2a964debbcb0f025dd4422752020-11-25T02:42:39ZengMDPI AGGenes2073-44252017-12-0191710.3390/genes9010007genes9010007ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to CarcinogenesisSaurav Mallik0Zhongming Zhao1Department of Computer Science & Engineering, Aliah University, Newtown, WB-700156, IndiaCenter for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USAFor transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures—weighted rank-based Jaccard and Cosine measures—and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm—RANWAR—was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.https://www.mdpi.com/2073-4425/9/1/7gene co-expression modulesLimmaassociation rule miningdynamic tree cut methodgene expression markerslung squamous cell carcinoma
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Saurav Mallik Zhongming Zhao
spellingShingle	Saurav Mallik Zhongming Zhao ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis Genes gene co-expression modules Limma association rule mining dynamic tree cut method gene expression markers lung squamous cell carcinoma
author_facet	Saurav Mallik Zhongming Zhao
author_sort	Saurav Mallik
title	ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis
title_short	ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis
title_full	ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis
title_fullStr	ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis
title_full_unstemmed	ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis
title_sort	congems: condensed gene co-expression module discovery through rule-based clustering and its application to carcinogenesis
publisher	MDPI AG
series	Genes
issn	2073-4425
publishDate	2017-12-01
description	For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures—weighted rank-based Jaccard and Cosine measures—and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm—RANWAR—was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.
topic	gene co-expression modules Limma association rule mining dynamic tree cut method gene expression markers lung squamous cell carcinoma
url	https://www.mdpi.com/2073-4425/9/1/7
work_keys_str_mv	AT sauravmallik congemscondensedgenecoexpressionmodulediscoverythroughrulebasedclusteringanditsapplicationtocarcinogenesis AT zhongmingzhao congemscondensedgenecoexpressionmodulediscoverythroughrulebasedclusteringanditsapplicationtocarcinogenesis
_version_	1724772407724474368

ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis

Similar Items