Identification of clinical characteristics of large patient cohorts through analysis of free text physician notes

Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. === Includes bibliographical references (p. 31-33). === Background A number of important applications in medicine and biomedical research, including quality of care surveillance and identification of prospective study subje...

Full description

Bibliographic Details
Main Author: Turchin, Alexander
Other Authors: Isaac S. Kohane.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2006
Subjects:
Online Access:http://hdl.handle.net/1721.1/33085
Description
Summary:Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2005. === Includes bibliographical references (p. 31-33). === Background A number of important applications in medicine and biomedical research, including quality of care surveillance and identification of prospective study subjects, require identification of large cohorts of patients with specific clinical characteristics. Currently used conventional techniques are either labor-intensive or imprecise, while natural language processing-based applications are relatively slow and expensive. Specific Aims In this thesis we describe the design and formal evaluation of PACT - a suite of rapid, accurate, and easily portable software tools for identification of patients with specific clinical characteristics through analysis of the text of physician notes in the electronic medical record. Methods PACT algorithm is based on sentence-level semantic analysis. The major steps involve identification of word tags (e.g. name of the disease or medications exclusively used to treat the disease) specific for the clinical characteristics in the sentences of the physician notes. Sentences with word tags and negative qualifiers (e.g. "rule out diabetes") are excluded from consideration. PACT can also identify quantitative (e.g. blood pressure, height, weight) and semi-quantitative (e.g. compliance with medical treatment) clinical characteristics. PACT performance was evaluated against blinded manual chart review (the "gold standard") and currently used computational methods (analysis of billing data). Results Evaluation of PACT demonstrated it to be rapid and highly accurate. PACT processed 6.5 to 8.8x 10⁵ notes/hour (1.0 to 1.4 GB of text / hour). === (cont) When compared to the gold standard of manual chart review, PACT sensitivity ranged (depending on the patient characteristic being extracted from the notes) from 74 to 100%, and specificity from 86 to 100%. K statistic for agreement between PACT and manual chart review ranged from 0.67 to 1.0 and in most cases exceeded 0.75, indicating excellent agreement. PACT accuracy substantially exceeded the performance of currently used techniques (billing data analysis). Finally, index of patient non-compliance with physician recommendations computed by PACT was shown to correlate with the frequency of annual Emergency Department visits: patients in the highest quartile for the index of non-compliance had 50% as many annual visits as the patients in the lowest quartile. Conclusion PACT is a rapid, precise and easily portable suite of software tools for extracting focused clinical information out of free text clinical documents. It compares favorably with computation techniques currently available for the purpose (where ones exist). It represents an important advance in the field, and we plan to continue to develop this concept further to improve its performance and functionality. === by Alexander Turchin. === S.M.