Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes

Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. === Includes bibliographical references (p. 31-33). === Background A number of important applications in medicine and biomedical research, including quality of care surveillance and identification of prospective study subje...

Full description

Bibliographic Details
Main Author: Turchin, Alexander
Other Authors: Isaac S. Kohane.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2006
Subjects:
Online Access:http://hdl.handle.net/1721.1/33085
id ndltd-MIT-oai-dspace.mit.edu-1721.1-33085
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-330852019-05-02T16:33:34Z Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes Turchin, Alexander Isaac S. Kohane. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. Includes bibliographical references (p. 31-33). Background A number of important applications in medicine and biomedical research, including quality of care surveillance and identification of prospective study subjects, require identification of large cohorts of patients with specific clinical characteristics. Currently used conventional techniques are either labor-intensive or imprecise, while natural language processing-based applications are relatively slow and expensive. Specific Aims In this thesis we describe the design and formal evaluation of PACT - a suite of rapid, accurate, and easily portable software tools for identification of patients with specific clinical characteristics through analysis of the text of physician notes in the electronic medical record. Methods PACT algorithm is based on sentence-level semantic analysis. The major steps involve identification of word tags (e.g. name of the disease or medications exclusively used to treat the disease) specific for the clinical characteristics in the sentences of the physician notes. Sentences with word tags and negative qualifiers (e.g. "rule out diabetes") are excluded from consideration. PACT can also identify quantitative (e.g. blood pressure, height, weight) and semi-quantitative (e.g. compliance with medical treatment) clinical characteristics. PACT performance was evaluated against blinded manual chart review (the "gold standard") and currently used computational methods (analysis of billing data). Results Evaluation of PACT demonstrated it to be rapid and highly accurate. PACT processed 6.5 to 8.8x 10⁵ notes/hour (1.0 to 1.4 GB of text / hour). (cont) When compared to the gold standard of manual chart review, PACT sensitivity ranged (depending on the patient characteristic being extracted from the notes) from 74 to 100%, and specificity from 86 to 100%. K statistic for agreement between PACT and manual chart review ranged from 0.67 to 1.0 and in most cases exceeded 0.75, indicating excellent agreement. PACT accuracy substantially exceeded the performance of currently used techniques (billing data analysis). Finally, index of patient non-compliance with physician recommendations computed by PACT was shown to correlate with the frequency of annual Emergency Department visits: patients in the highest quartile for the index of non-compliance had 50% as many annual visits as the patients in the lowest quartile. Conclusion PACT is a rapid, precise and easily portable suite of software tools for extracting focused clinical information out of free text clinical documents. It compares favorably with computation techniques currently available for the purpose (where ones exist). It represents an important advance in the field, and we plan to continue to develop this concept further to improve its performance and functionality. by Alexander Turchin. S.M. 2006-06-19T17:39:09Z 2006-06-19T17:39:09Z 2005 2005 Thesis http://hdl.handle.net/1721.1/33085 62172055 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 33 p. 1929512 bytes 1928597 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Harvard University--MIT Division of Health Sciences and Technology.
spellingShingle Harvard University--MIT Division of Health Sciences and Technology.
Turchin, Alexander
Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
description Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. === Includes bibliographical references (p. 31-33). === Background A number of important applications in medicine and biomedical research, including quality of care surveillance and identification of prospective study subjects, require identification of large cohorts of patients with specific clinical characteristics. Currently used conventional techniques are either labor-intensive or imprecise, while natural language processing-based applications are relatively slow and expensive. Specific Aims In this thesis we describe the design and formal evaluation of PACT - a suite of rapid, accurate, and easily portable software tools for identification of patients with specific clinical characteristics through analysis of the text of physician notes in the electronic medical record. Methods PACT algorithm is based on sentence-level semantic analysis. The major steps involve identification of word tags (e.g. name of the disease or medications exclusively used to treat the disease) specific for the clinical characteristics in the sentences of the physician notes. Sentences with word tags and negative qualifiers (e.g. "rule out diabetes") are excluded from consideration. PACT can also identify quantitative (e.g. blood pressure, height, weight) and semi-quantitative (e.g. compliance with medical treatment) clinical characteristics. PACT performance was evaluated against blinded manual chart review (the "gold standard") and currently used computational methods (analysis of billing data). Results Evaluation of PACT demonstrated it to be rapid and highly accurate. PACT processed 6.5 to 8.8x 10⁵ notes/hour (1.0 to 1.4 GB of text / hour). === (cont) When compared to the gold standard of manual chart review, PACT sensitivity ranged (depending on the patient characteristic being extracted from the notes) from 74 to 100%, and specificity from 86 to 100%. K statistic for agreement between PACT and manual chart review ranged from 0.67 to 1.0 and in most cases exceeded 0.75, indicating excellent agreement. PACT accuracy substantially exceeded the performance of currently used techniques (billing data analysis). Finally, index of patient non-compliance with physician recommendations computed by PACT was shown to correlate with the frequency of annual Emergency Department visits: patients in the highest quartile for the index of non-compliance had 50% as many annual visits as the patients in the lowest quartile. Conclusion PACT is a rapid, precise and easily portable suite of software tools for extracting focused clinical information out of free text clinical documents. It compares favorably with computation techniques currently available for the purpose (where ones exist). It represents an important advance in the field, and we plan to continue to develop this concept further to improve its performance and functionality. === by Alexander Turchin. === S.M.
author2 Isaac S. Kohane.
author_facet Isaac S. Kohane.
Turchin, Alexander
author Turchin, Alexander
author_sort Turchin, Alexander
title Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
title_short Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
title_full Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
title_fullStr Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
title_full_unstemmed Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
title_sort identification of clinical characteristics of large patient cohorts through analysis of free text physician notes
publisher Massachusetts Institute of Technology
publishDate 2006
url http://hdl.handle.net/1721.1/33085
work_keys_str_mv AT turchinalexander identificationofclinicalcharacteristicsoflargepatientcohortsthroughanalysisoffreetextphysiciannotes
_version_ 1719042979370369024