A natural language processing approach for identifying temporal disease onset information from mental healthcare text

Abstract Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associ...

Full description

Bibliographic Details
Main Authors: Natalia Viani, Riley Botelle, Jack Kerwin, Lucia Yin, Rashmi Patel, Robert Stewart, Sumithra Velupillai
Format: Article
Language:English
Published: Nature Publishing Group 2021-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-020-80457-0
id doaj-f5a32a8957e043d9834a6be68505b20b
record_format Article
spelling doaj-f5a32a8957e043d9834a6be68505b20b2021-01-17T12:34:23ZengNature Publishing GroupScientific Reports2045-23222021-01-0111111210.1038/s41598-020-80457-0A natural language processing approach for identifying temporal disease onset information from mental healthcare textNatalia Viani0Riley Botelle1Jack Kerwin2Lucia Yin3Rashmi Patel4Robert Stewart5Sumithra Velupillai6IoPPN, King’s College LondonIoPPN, King’s College LondonIoPPN, King’s College LondonIoPPN, King’s College LondonIoPPN, King’s College LondonIoPPN, King’s College LondonIoPPN, King’s College LondonAbstract Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associated with worse intervention outcomes. The growing adoption of electronic health records (EHRs) within mental health services provides an invaluable opportunity to study this problem at scale retrospectively. However, disease onset information is often only available in open text fields, requiring natural language processing (NLP) techniques for automated analyses. Since this variable can be documented at different points during a patient’s care, NLP methods that model clinical and temporal associations are needed. We address the identification of psychosis onset by: 1) manually annotating a corpus of mental health EHRs with disease onset mentions, 2) modelling the underlying NLP problem as a paragraph classification approach, and 3) combining multiple onset paragraphs at the patient level to generate a ranked list of likely disease onset dates. For 22/31 test patients (71%) the correct onset date was found among the top-3 NLP predictions. The proposed approach was also applied at scale, allowing an onset date to be estimated for 2483 patients.https://doi.org/10.1038/s41598-020-80457-0
collection DOAJ
language English
format Article
sources DOAJ
author Natalia Viani
Riley Botelle
Jack Kerwin
Lucia Yin
Rashmi Patel
Robert Stewart
Sumithra Velupillai
spellingShingle Natalia Viani
Riley Botelle
Jack Kerwin
Lucia Yin
Rashmi Patel
Robert Stewart
Sumithra Velupillai
A natural language processing approach for identifying temporal disease onset information from mental healthcare text
Scientific Reports
author_facet Natalia Viani
Riley Botelle
Jack Kerwin
Lucia Yin
Rashmi Patel
Robert Stewart
Sumithra Velupillai
author_sort Natalia Viani
title A natural language processing approach for identifying temporal disease onset information from mental healthcare text
title_short A natural language processing approach for identifying temporal disease onset information from mental healthcare text
title_full A natural language processing approach for identifying temporal disease onset information from mental healthcare text
title_fullStr A natural language processing approach for identifying temporal disease onset information from mental healthcare text
title_full_unstemmed A natural language processing approach for identifying temporal disease onset information from mental healthcare text
title_sort natural language processing approach for identifying temporal disease onset information from mental healthcare text
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-01-01
description Abstract Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associated with worse intervention outcomes. The growing adoption of electronic health records (EHRs) within mental health services provides an invaluable opportunity to study this problem at scale retrospectively. However, disease onset information is often only available in open text fields, requiring natural language processing (NLP) techniques for automated analyses. Since this variable can be documented at different points during a patient’s care, NLP methods that model clinical and temporal associations are needed. We address the identification of psychosis onset by: 1) manually annotating a corpus of mental health EHRs with disease onset mentions, 2) modelling the underlying NLP problem as a paragraph classification approach, and 3) combining multiple onset paragraphs at the patient level to generate a ranked list of likely disease onset dates. For 22/31 test patients (71%) the correct onset date was found among the top-3 NLP predictions. The proposed approach was also applied at scale, allowing an onset date to be estimated for 2483 patients.
url https://doi.org/10.1038/s41598-020-80457-0
work_keys_str_mv AT nataliaviani anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT rileybotelle anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT jackkerwin anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT luciayin anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT rashmipatel anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT robertstewart anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT sumithravelupillai anaturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT nataliaviani naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT rileybotelle naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT jackkerwin naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT luciayin naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT rashmipatel naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT robertstewart naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
AT sumithravelupillai naturallanguageprocessingapproachforidentifyingtemporaldiseaseonsetinformationfrommentalhealthcaretext
_version_ 1724334651585069056