A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION

Cancer tumor prediction and diagnosis at an early stage has become a necessity in cancer research, as it provides an increase in the treatment success chances. Recently, DNA microarray technology became a powerful tool for cancer identification, that can analyze the expression level of a different...

Full description

Bibliographic Details
Main Authors: Mohammed Hamim, Ismail El Mouden, Mounir Ouzir, Hicham Moutachaouik, Mustapha Hain
Format: Article
Language:English
Published: IIUM Press, International Islamic University Malaysia 2021-01-01
Series:International Islamic University Malaysia Engineering Journal
Subjects:
Online Access:https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/1447
id doaj-b4a92afcd745452a8b7c6b419c3c179b
record_format Article
spelling doaj-b4a92afcd745452a8b7c6b419c3c179b2021-01-24T04:01:57ZengIIUM Press, International Islamic University MalaysiaInternational Islamic University Malaysia Engineering Journal1511-788X2289-78602021-01-0122110.31436/iiumej.v22i1.1447A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATIONMohammed Hamim0Ismail El Mouden1Mounir Ouzir2Hicham Moutachaouik3Mustapha Hain4I2SI2E Laboratory, ENSAM-casablancaEVMS-Sentara Healthcare Analytics and Delivery Science Institute, Eastern Virginia Medical School, Norfolk, VA, USA3Group of Research in Physiology and Physiopathology, Department of Biology, Faculty of Science, University Mohammed V, Rabat, MoroccoI2SI2E Laboratory, ENSAM- Casablanca, University Hassan II, Casablanca, MoroccoI2SI2E Laboratory, ENSAM- Casablanca, University Hassan II, Casablanca, Morocco Cancer tumor prediction and diagnosis at an early stage has become a necessity in cancer research, as it provides an increase in the treatment success chances. Recently, DNA microarray technology became a powerful tool for cancer identification, that can analyze the expression level of a different and huge number of genes simultaneously. In microarray data, the large genes number versus a few records may affect the prediction performance. In order to handle this "curse of dimensionality” constraint of microarray dataset while improving the cancer identification performance, a dimensional reduction phase is necessary. In this paper, we proposed a framework that combines dimensional reduction methods and machine learning algorithms in order to achieve the best cancer prediction performance using different microarray datasets. In the dimensional reduction phase, a combination of feature selection and feature extraction techniques was proposed. Pearson and Ant Colony Optimization was used to select the most important genes. Principal Component Analysis and Kernel Principal Component Analysis were used to linearly and non-linearly transform the selected genes to a new reduced space. In the cancer identification phase, we proposed four algorithms C5.0, Logistic Regression, Artificial Neural Network, and Support Vector Machine. Experimental results demonstrated that the framework performs effectively and competitively compared to state-of-the-art methods. ABSTRAK: Ramalan tumor kanser dan diagnosis pada peringkat awal telah menjadi keperluan dalam kajian kanser, kerana ia membuka peluang peningkatan kejayaan dalam rawatan. Kebelakangan ini, teknologi mikrotatasusunan DNA menjadi alat berkuasa bagi mengenal pasti kanser, di mana ia mampu menganalisa level ekspresi yang pelbagai dan gen-gen yang banyak secara serentak. Dalam data mikrotatasusunan, gen-gen yang banyak ini bakal menentukan ramalan prestasi berbanding analisa melalui rekod-rekod yang sebilangan. Fasa pengurangan dimensi adalah perlu bagi mengawal kakangan “penentuan kedimensian” dataset mikrotatasusunan, sementara itu ia memantapkan lagi keberkesanan kenal pasti kanser. Kajian ini mencadangkan rangka kombinasi kaedah pengurangan dimensi dan algoritma pembelajaran mesin bagi mencapai prestasi ramalan kanser terbaik dengan menggunakan pelbagai dataset mikrotatasusunan. Dalam fasa pengurangan dimensi, kombinasi pemilihan ciri dan teknik pengekstrakan ciri telah dicadangkan, Pengoptimuman Pearson dan Koloni Semut bagi memilih gen yang paling penting, Analisis Komponen Prinsipal dan Analisis Komponen Prinsipal Kernel, bagi menukar gen terpilih yang linear dan tak linear kepada ruang baru yang dikurangkan. Dalam menentukan fasa mengenal pasti kanser, kajian ini mencadangkan empat algoritma iaitu C5.0, Regresi Logistik, Rangkaian Neural Buatan dan Mesin Vektor Sokongan. Dapatan kajian menunjukkan rangka ini adalah berkesan dan kompetitif berbanding kaedah semasa. https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/1447Gene SelectionMetaheuristic-Ant Colony OptimizationFeature ExtractionPattern RecognitionMicroarray Data Analysis
collection DOAJ
language English
format Article
sources DOAJ
author Mohammed Hamim
Ismail El Mouden
Mounir Ouzir
Hicham Moutachaouik
Mustapha Hain
spellingShingle Mohammed Hamim
Ismail El Mouden
Mounir Ouzir
Hicham Moutachaouik
Mustapha Hain
A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION
International Islamic University Malaysia Engineering Journal
Gene Selection
Metaheuristic-Ant Colony Optimization
Feature Extraction
Pattern Recognition
Microarray Data Analysis
author_facet Mohammed Hamim
Ismail El Mouden
Mounir Ouzir
Hicham Moutachaouik
Mustapha Hain
author_sort Mohammed Hamim
title A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION
title_short A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION
title_full A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION
title_fullStr A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION
title_full_unstemmed A NOVEL DIMENSIONALITY REDUCTION APPROACH TO IMPROVE MICROARRAY DATA CLASSIFICATION
title_sort novel dimensionality reduction approach to improve microarray data classification
publisher IIUM Press, International Islamic University Malaysia
series International Islamic University Malaysia Engineering Journal
issn 1511-788X
2289-7860
publishDate 2021-01-01
description Cancer tumor prediction and diagnosis at an early stage has become a necessity in cancer research, as it provides an increase in the treatment success chances. Recently, DNA microarray technology became a powerful tool for cancer identification, that can analyze the expression level of a different and huge number of genes simultaneously. In microarray data, the large genes number versus a few records may affect the prediction performance. In order to handle this "curse of dimensionality” constraint of microarray dataset while improving the cancer identification performance, a dimensional reduction phase is necessary. In this paper, we proposed a framework that combines dimensional reduction methods and machine learning algorithms in order to achieve the best cancer prediction performance using different microarray datasets. In the dimensional reduction phase, a combination of feature selection and feature extraction techniques was proposed. Pearson and Ant Colony Optimization was used to select the most important genes. Principal Component Analysis and Kernel Principal Component Analysis were used to linearly and non-linearly transform the selected genes to a new reduced space. In the cancer identification phase, we proposed four algorithms C5.0, Logistic Regression, Artificial Neural Network, and Support Vector Machine. Experimental results demonstrated that the framework performs effectively and competitively compared to state-of-the-art methods. ABSTRAK: Ramalan tumor kanser dan diagnosis pada peringkat awal telah menjadi keperluan dalam kajian kanser, kerana ia membuka peluang peningkatan kejayaan dalam rawatan. Kebelakangan ini, teknologi mikrotatasusunan DNA menjadi alat berkuasa bagi mengenal pasti kanser, di mana ia mampu menganalisa level ekspresi yang pelbagai dan gen-gen yang banyak secara serentak. Dalam data mikrotatasusunan, gen-gen yang banyak ini bakal menentukan ramalan prestasi berbanding analisa melalui rekod-rekod yang sebilangan. Fasa pengurangan dimensi adalah perlu bagi mengawal kakangan “penentuan kedimensian” dataset mikrotatasusunan, sementara itu ia memantapkan lagi keberkesanan kenal pasti kanser. Kajian ini mencadangkan rangka kombinasi kaedah pengurangan dimensi dan algoritma pembelajaran mesin bagi mencapai prestasi ramalan kanser terbaik dengan menggunakan pelbagai dataset mikrotatasusunan. Dalam fasa pengurangan dimensi, kombinasi pemilihan ciri dan teknik pengekstrakan ciri telah dicadangkan, Pengoptimuman Pearson dan Koloni Semut bagi memilih gen yang paling penting, Analisis Komponen Prinsipal dan Analisis Komponen Prinsipal Kernel, bagi menukar gen terpilih yang linear dan tak linear kepada ruang baru yang dikurangkan. Dalam menentukan fasa mengenal pasti kanser, kajian ini mencadangkan empat algoritma iaitu C5.0, Regresi Logistik, Rangkaian Neural Buatan dan Mesin Vektor Sokongan. Dapatan kajian menunjukkan rangka ini adalah berkesan dan kompetitif berbanding kaedah semasa.
topic Gene Selection
Metaheuristic-Ant Colony Optimization
Feature Extraction
Pattern Recognition
Microarray Data Analysis
url https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/1447
work_keys_str_mv AT mohammedhamim anoveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT ismailelmouden anoveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT mounirouzir anoveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT hichammoutachaouik anoveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT mustaphahain anoveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT mohammedhamim noveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT ismailelmouden noveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT mounirouzir noveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT hichammoutachaouik noveldimensionalityreductionapproachtoimprovemicroarraydataclassification
AT mustaphahain noveldimensionalityreductionapproachtoimprovemicroarraydataclassification
_version_ 1724326842793459712