Identifying incident dementia by applying machine learning to a very large administrative claims dataset.
Alzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administr...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2019-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0203246 |
id |
doaj-3dcd00da34b84ea3a078156c7b24df66 |
---|---|
record_format |
Article |
spelling |
doaj-3dcd00da34b84ea3a078156c7b24df662021-03-03T20:35:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01147e020324610.1371/journal.pone.0203246Identifying incident dementia by applying machine learning to a very large administrative claims dataset.Vijay S NoriChristopher A HaneDavid C MartinAlexander D KravetzDarshak M SanghaviAlzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administrative claims data may represent a novel approach to predicting ADRD. Using a national de-identified dataset of more than 125 million patients including over 10,000 clinical, pharmaceutical, and demographic variables, we developed a cohort to train a machine learning model to predict ADRD 4-5 years in advance. The Lasso algorithm selected a 50-variable model with an area under the curve (AUC) of 0.693. Top diagnosis codes in the model were memory loss (780.93), Parkinson's disease (332.0), mild cognitive impairment (331.83) and bipolar disorder (296.80), and top pharmacy codes were psychoactive drugs. Machine learning algorithms can rapidly develop predictive models for ADRD with massive datasets, without requiring hypothesis-driven feature engineering.https://doi.org/10.1371/journal.pone.0203246 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Vijay S Nori Christopher A Hane David C Martin Alexander D Kravetz Darshak M Sanghavi |
spellingShingle |
Vijay S Nori Christopher A Hane David C Martin Alexander D Kravetz Darshak M Sanghavi Identifying incident dementia by applying machine learning to a very large administrative claims dataset. PLoS ONE |
author_facet |
Vijay S Nori Christopher A Hane David C Martin Alexander D Kravetz Darshak M Sanghavi |
author_sort |
Vijay S Nori |
title |
Identifying incident dementia by applying machine learning to a very large administrative claims dataset. |
title_short |
Identifying incident dementia by applying machine learning to a very large administrative claims dataset. |
title_full |
Identifying incident dementia by applying machine learning to a very large administrative claims dataset. |
title_fullStr |
Identifying incident dementia by applying machine learning to a very large administrative claims dataset. |
title_full_unstemmed |
Identifying incident dementia by applying machine learning to a very large administrative claims dataset. |
title_sort |
identifying incident dementia by applying machine learning to a very large administrative claims dataset. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2019-01-01 |
description |
Alzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administrative claims data may represent a novel approach to predicting ADRD. Using a national de-identified dataset of more than 125 million patients including over 10,000 clinical, pharmaceutical, and demographic variables, we developed a cohort to train a machine learning model to predict ADRD 4-5 years in advance. The Lasso algorithm selected a 50-variable model with an area under the curve (AUC) of 0.693. Top diagnosis codes in the model were memory loss (780.93), Parkinson's disease (332.0), mild cognitive impairment (331.83) and bipolar disorder (296.80), and top pharmacy codes were psychoactive drugs. Machine learning algorithms can rapidly develop predictive models for ADRD with massive datasets, without requiring hypothesis-driven feature engineering. |
url |
https://doi.org/10.1371/journal.pone.0203246 |
work_keys_str_mv |
AT vijaysnori identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT christopherahane identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT davidcmartin identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT alexanderdkravetz identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT darshakmsanghavi identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset |
_version_ |
1714821626182238208 |