Identifying incident dementia by applying machine learning to a very large administrative claims dataset.

Alzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administr...

Full description

Bibliographic Details
Main Authors:	Vijay S Nori, Christopher A Hane, David C Martin, Alexander D Kravetz, Darshak M Sanghavi
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2019-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0203246

id	doaj-3dcd00da34b84ea3a078156c7b24df66
record_format	Article
spelling	doaj-3dcd00da34b84ea3a078156c7b24df662021-03-03T20:35:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01147e020324610.1371/journal.pone.0203246Identifying incident dementia by applying machine learning to a very large administrative claims dataset.Vijay S NoriChristopher A HaneDavid C MartinAlexander D KravetzDarshak M SanghaviAlzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administrative claims data may represent a novel approach to predicting ADRD. Using a national de-identified dataset of more than 125 million patients including over 10,000 clinical, pharmaceutical, and demographic variables, we developed a cohort to train a machine learning model to predict ADRD 4-5 years in advance. The Lasso algorithm selected a 50-variable model with an area under the curve (AUC) of 0.693. Top diagnosis codes in the model were memory loss (780.93), Parkinson's disease (332.0), mild cognitive impairment (331.83) and bipolar disorder (296.80), and top pharmacy codes were psychoactive drugs. Machine learning algorithms can rapidly develop predictive models for ADRD with massive datasets, without requiring hypothesis-driven feature engineering.https://doi.org/10.1371/journal.pone.0203246
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Vijay S Nori Christopher A Hane David C Martin Alexander D Kravetz Darshak M Sanghavi
spellingShingle	Vijay S Nori Christopher A Hane David C Martin Alexander D Kravetz Darshak M Sanghavi Identifying incident dementia by applying machine learning to a very large administrative claims dataset. PLoS ONE
author_facet	Vijay S Nori Christopher A Hane David C Martin Alexander D Kravetz Darshak M Sanghavi
author_sort	Vijay S Nori
title	Identifying incident dementia by applying machine learning to a very large administrative claims dataset.
title_short	Identifying incident dementia by applying machine learning to a very large administrative claims dataset.
title_full	Identifying incident dementia by applying machine learning to a very large administrative claims dataset.
title_fullStr	Identifying incident dementia by applying machine learning to a very large administrative claims dataset.
title_full_unstemmed	Identifying incident dementia by applying machine learning to a very large administrative claims dataset.
title_sort	identifying incident dementia by applying machine learning to a very large administrative claims dataset.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2019-01-01
description	Alzheimer's disease and related dementias (ADRD) are highly prevalent conditions, and prior efforts to develop predictive models have relied on demographic and clinical risk factors using traditional logistical regression methods. We hypothesized that machine-learning algorithms using administrative claims data may represent a novel approach to predicting ADRD. Using a national de-identified dataset of more than 125 million patients including over 10,000 clinical, pharmaceutical, and demographic variables, we developed a cohort to train a machine learning model to predict ADRD 4-5 years in advance. The Lasso algorithm selected a 50-variable model with an area under the curve (AUC) of 0.693. Top diagnosis codes in the model were memory loss (780.93), Parkinson's disease (332.0), mild cognitive impairment (331.83) and bipolar disorder (296.80), and top pharmacy codes were psychoactive drugs. Machine learning algorithms can rapidly develop predictive models for ADRD with massive datasets, without requiring hypothesis-driven feature engineering.
url	https://doi.org/10.1371/journal.pone.0203246
work_keys_str_mv	AT vijaysnori identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT christopherahane identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT davidcmartin identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT alexanderdkravetz identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset AT darshakmsanghavi identifyingincidentdementiabyapplyingmachinelearningtoaverylargeadministrativeclaimsdataset
_version_	1714821626182238208

Identifying incident dementia by applying machine learning to a very large administrative claims dataset.

Similar Items