A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technic...

Full description

Bibliographic Details
Main Authors: Meysam Bastani, Larissa Vos, Nasimeh Asgarian, Jean Deschenes, Kathryn Graham, John Mackey, Russell Greiner
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3846850?pdf=render
id doaj-5c6469c1f5424ab9b95e6e52aa4e3386
record_format Article
spelling doaj-5c6469c1f5424ab9b95e6e52aa4e33862020-11-25T00:44:09ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-01812e8214410.1371/journal.pone.0082144A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.Meysam BastaniLarissa VosNasimeh AsgarianJean DeschenesKathryn GrahamJohn MackeyRussell GreinerBACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.http://europepmc.org/articles/PMC3846850?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Meysam Bastani
Larissa Vos
Nasimeh Asgarian
Jean Deschenes
Kathryn Graham
John Mackey
Russell Greiner
spellingShingle Meysam Bastani
Larissa Vos
Nasimeh Asgarian
Jean Deschenes
Kathryn Graham
John Mackey
Russell Greiner
A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
PLoS ONE
author_facet Meysam Bastani
Larissa Vos
Nasimeh Asgarian
Jean Deschenes
Kathryn Graham
John Mackey
Russell Greiner
author_sort Meysam Bastani
title A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
title_short A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
title_full A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
title_fullStr A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
title_full_unstemmed A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
title_sort machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.
url http://europepmc.org/articles/PMC3846850?pdf=render
work_keys_str_mv AT meysambastani amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT larissavos amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT nasimehasgarian amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT jeandeschenes amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT kathryngraham amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT johnmackey amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT russellgreiner amachinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT meysambastani machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT larissavos machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT nasimehasgarian machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT jeandeschenes machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT kathryngraham machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT johnmackey machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
AT russellgreiner machinelearnedclassifierthatusesgeneexpressiondatatoaccuratelypredictestrogenreceptorstatus
_version_ 1725276236567019520