Deep learning identifies erroneous microarray-based, gene-level conclusions in literature

More than 110 000 publications have used microarrays to decipher phenotype-associated genes, clinical biomarkers and gene functions. Microarrays rely on digital assaying the fluorescence signals of arrays. In this study, we retrospectively constructed raw images for 37 724 published microarray data,...

Full description

Bibliographic Details
Main Authors: Chen, X. (Author), Guan, Y. (Author), Qin, Y. (Author), Yi, D. (Author)
Format: Article
Language:English
Published: Oxford University Press 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02048nam a2200253Ia 4500
001 10.1093-nargab-lqab089
008 220427s2021 CNT 000 0 und d
020 |a 26319268 (ISSN) 
245 1 0 |a Deep learning identifies erroneous microarray-based, gene-level conclusions in literature 
260 0 |b Oxford University Press  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1093/nargab/lqab089 
520 3 |a More than 110 000 publications have used microarrays to decipher phenotype-associated genes, clinical biomarkers and gene functions. Microarrays rely on digital assaying the fluorescence signals of arrays. In this study, we retrospectively constructed raw images for 37 724 published microarray data, and developed deep learning algorithms to automatically detect systematic defects. We report that an alarming amount of 26.73% of the microarray-based studies are affected by serious imaging defects. By literature mining, we found that publications associated with these affected microarrays have reported disproportionately more biological discoveries on the genes in the contaminated areas compared to other genes. 28.82% of the gene-level conclusions reported in these publications were based on measurements falling into the contaminated area, indicating severe, systematic problems caused by such contaminations. We provided the identified published, problematic datasets, affected genes and the imputed arrays as well as software tools for scanning such contamination that will become essential to future studies to scrutinize and critically analyze microarray data. © 2021 The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. 
650 0 4 |a algorithm 
650 0 4 |a article 
650 0 4 |a contamination 
650 0 4 |a deep learning 
650 0 4 |a DNA microarray 
650 0 4 |a mining 
650 0 4 |a software 
700 1 |a Chen, X.  |e author 
700 1 |a Guan, Y.  |e author 
700 1 |a Qin, Y.  |e author 
700 1 |a Yi, D.  |e author 
773 |t NAR Genomics and Bioinformatics