Ontology-guided Health Information Extraction, Organization, and Exploration
Main Author: | |
---|---|
Language: | English |
Published: |
Case Western Reserve University School of Graduate Studies / OhioLINK
2014
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-case1401709795 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science information extraction information retrieval faceted search patient cohort identification multi-topic assignment formal concept analysis conjunctive exploratory navigation interface crowdsourcing |
spellingShingle |
Computer Science information extraction information retrieval faceted search patient cohort identification multi-topic assignment formal concept analysis conjunctive exploratory navigation interface crowdsourcing Cui, Licong Ontology-guided Health Information Extraction, Organization, and Exploration |
author |
Cui, Licong |
author_facet |
Cui, Licong |
author_sort |
Cui, Licong |
title |
Ontology-guided Health Information Extraction, Organization, and Exploration |
title_short |
Ontology-guided Health Information Extraction, Organization, and Exploration |
title_full |
Ontology-guided Health Information Extraction, Organization, and Exploration |
title_fullStr |
Ontology-guided Health Information Extraction, Organization, and Exploration |
title_full_unstemmed |
Ontology-guided Health Information Extraction, Organization, and Exploration |
title_sort |
ontology-guided health information extraction, organization, and exploration |
publisher |
Case Western Reserve University School of Graduate Studies / OhioLINK |
publishDate |
2014 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795 |
work_keys_str_mv |
AT cuilicong ontologyguidedhealthinformationextractionorganizationandexploration |
_version_ |
1719436281295929344 |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-case14017097952021-08-03T06:25:17Z Ontology-guided Health Information Extraction, Organization, and Exploration Cui, Licong Computer Science information extraction information retrieval faceted search patient cohort identification multi-topic assignment formal concept analysis conjunctive exploratory navigation interface crowdsourcing Electronic information in unstructured or semi-structured form in health and healthcare has been steadily generated for decades. An explosive growth has occurred since the recent adoption of electronic health records (EHRs). Textual information includes clinical notes recorded in hospitals and health-related information on the web. Such health-related textual data contains an extraordinary amount of underutilized biomedical knowledge. However, the proliferation of such data presents myriad of challenges for information retrieval and access. Manual review of protected clinical documents to find patient cohorts of interest is a time-consuming and cumbersome task. Consumers have also been overwhelmed by the ever-growing public health information on the Internet. Traditional keyword-based search engines such as Google can return hundreds of thousands of links, though only a few may be relevant. Hence effective querying and exploring of both protected and public health data requires new approaches for information extraction, organization, and exploration.This dissertation proposes an ontology-guided approach to health information extraction, organization, and exploration. This approach allows the extraction of key information from textual data, organization in structured formats, and provision of interfaces for their effective search and exploration. This approach is applied to two independent but related domains: (1) Extracting complex epilepsy phenotypes from narrative clinical discharge summaries for effectively querying patient cohort; (2) Information organization based on extracted biomedical concepts from consumer health questions in NetWellness, an online non-profit community service providing high quality health information, for supporting effective consumer health information retrieval and exploration. For (1), a prototyping Epilepsy Data Extraction and Annotation (EpiDEA) system is developed for effective processing of discharge summaries, where patients' sex, age, epileptogenic zone, etiology, EEG pattern, current antiepileptic medication, and past antiepileptic medication are automatically extracted. Further, a system called Phenotype Exaction in Epilepsy (PEEP) is developed to extract complex epilepsy phenotypes and correlated anatomical locations from narrative discharge summaries and store them as structured information. Both EpiDEA and PEEP use an Epilepsy and Seizure Ontology (EpSO) as the primary knowledge source to perform regular expression-based epilepsy named entity recognition. A parametric and dynamic faceted search interface (PaDyF) is developed for querying the extracted epilepsy data. PaDyF combines the benefits of faceted search, database query, and ontological attributes and structures for exploring clinical patient data. Evaluations against manually created reference standards show that EpiDEA achieves an overall precision of 0.936 and recall of 0.840 with an F1-measure of 0.885; PEEP achieves a precision of 0.924, recall of 0.931, and F1-measure of 0.927 for extracting epilepsy phenotypes; PEEP's performance on the extraction of correlated phenotypes and anatomical locations shows a precision of 0.852, recall of 0.859, and an F1-measure of 0.856. The evaluations demonstrate that EpiDEA is effective in extracting basic phenotypic characteristics, and PEEP is effective in extracting complex epilepsy phenotypes and correlated anatomical locations.For (2), key biomedical concepts are extracted from health questions in NetWellness and used for categorizing questions into multiple topics. A new multi-topic assignment method is introduced, combining Formal Concept Analysis (FCA) and semantic annotation using Unified Medical Language System (UMLS). A novel Conjunctive Exploratory Navigation Interface (CENI) is developed for exploring NetWellness health questions with health topics as dynamic and searchable menus, complementing keyword-based search. The effectiveness of CENI is evaluated through a comparative search-interface evaluation with crowdsourcing through Amazon Mechanical Turk (AMT), a new and valuable method to collect user evaluation data. Evaluation against manually created reference standard showed that the multi-topic assignment method attains an example-based precision of 0.849, recall of 0.774, and F1-measure of 0.782. CENI interface is comparatively evaluated against main-stream search modalities, and is favored by a nearly two to one margin over Google and other search methods. 2014-09-02 English text Case Western Reserve University School of Graduate Studies / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795 http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |