Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities

ABSTRACT Objectives Our objectives were unify and deduplicate databases’ of patients registration information coming from Information Systems of SUS in Brazil: Hospital, Outpatient, Births, Notifications and Mortalities, between the years 2008-2015, to get an individualize data and plot patients’ l...

Full description

Bibliographic Details
Main Authors: Ramon Pereira, Leonardo Dias, Juliano Ávila, Núbia Santos, Eli Iola Gurgel, Mariangela Leal Cherchiglia, Francisco AcÚrcio, Afonso Reis, Wagner Meira, Junior, Augusto Afonso Guerra, Junior
Format: Article
Language:English
Published: Swansea University 2017-04-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/225
id doaj-b9e0a56f14c64d8daf297085469c82a8
record_format Article
spelling doaj-b9e0a56f14c64d8daf297085469c82a82020-11-24T23:56:45ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.225225Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalitiesRamon Pereira0Leonardo Dias1Juliano Ávila2Núbia Santos3Eli Iola Gurgel4Mariangela Leal Cherchiglia5Francisco AcÚrcio6Afonso Reis7Wagner Meira, Junior8Augusto Afonso Guerra, Junior9DCC/UFMGCCATES/UFMGDAAEDDAAEDMEDICINA/UFMGMEDICINA/UFMGFARMACIA/UFMGDEMASDCC/UFMGFARMACIA/UFMGABSTRACT Objectives Our objectives were unify and deduplicate databases’ of patients registration information coming from Information Systems of SUS in Brazil: Hospital, Outpatient, Births, Notifications and Mortalities, between the years 2008-2015, to get an individualize data and plot patients’ lines of care during the period, enabling pharmacoeconomic and epidemiological studies that parameterize effectiveness and efficiency of public policies and embedded technologies. Methods Semantic analysis of data was performed to describe and understand different meanings of different fields existing in the studied bases. In addition, there were four main procedures, executed with database operations tools and PLSQL programming language: cleaning and standardization of databases(document’s numbers was checked in the brazilian national people’s database, with a string approximator algorithm to decide if the document’s number belonged or no the register); registration information extraction, deterministic and probabilistic deduplication thereof. The procedures were first performed on each database separately and after the unification of the records, was held again a deterministic deduplication. Except the probabilistic deduplication which was performed only on the final deterministic deduplicated's database. Performed procedures allowed a decision-making to chose fields used in data model for the unified database creation. Nine database's representative fields related to patients were selected: patient's name; patient mother’s name; sex; birth date; state; city; zip code; cpf and cns(brazilian documents). Results Initially, the unified registration database resulted in 705.599.785 records, after deterministic deduplication there was a reduction culminating in 198.400.762 records. This reduction is explained because these databases are not fully integrated. Moreover, there is not always agreement between systems’ semantics and in some cases changes occur in the data format over the period within the same system. After probabilistic deduplication, the number of unique records decreased to 124.545.186 which is explained by non-linked pairs by deterministic process. This result is guaranteed with a estimate error of at most 3.3% of false positive and at most 12.3% of false negative pairs. Conclusion The results show that data deduplication is necessary and should be carried out thoroughly. Where the databases had limited patients’ registration information, the technique enabled to capture, in more complete basis, additional information. Futhermore, it allowed to identify and assist in the understanding of positive and negative aspects within systems and trace clinical condition of patients, enabling pharmacoeconomic and epidemiological studies that define effectiveness and efficiency of public policies and embedded technologies. As future work, is important ensure the univocity of records and link this database with past period.https://ijpds.org/article/view/225
collection DOAJ
language English
format Article
sources DOAJ
author Ramon Pereira
Leonardo Dias
Juliano Ávila
Núbia Santos
Eli Iola Gurgel
Mariangela Leal Cherchiglia
Francisco AcÚrcio
Afonso Reis
Wagner Meira, Junior
Augusto Afonso Guerra, Junior
spellingShingle Ramon Pereira
Leonardo Dias
Juliano Ávila
Núbia Santos
Eli Iola Gurgel
Mariangela Leal Cherchiglia
Francisco AcÚrcio
Afonso Reis
Wagner Meira, Junior
Augusto Afonso Guerra, Junior
Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
International Journal of Population Data Science
author_facet Ramon Pereira
Leonardo Dias
Juliano Ávila
Núbia Santos
Eli Iola Gurgel
Mariangela Leal Cherchiglia
Francisco AcÚrcio
Afonso Reis
Wagner Meira, Junior
Augusto Afonso Guerra, Junior
author_sort Ramon Pereira
title Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
title_short Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
title_full Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
title_fullStr Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
title_full_unstemmed Unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
title_sort unified health database creation: 125 million brazilian cohort from information systems of hospital, outpatient, births, notifications and mortalities
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2017-04-01
description ABSTRACT Objectives Our objectives were unify and deduplicate databases’ of patients registration information coming from Information Systems of SUS in Brazil: Hospital, Outpatient, Births, Notifications and Mortalities, between the years 2008-2015, to get an individualize data and plot patients’ lines of care during the period, enabling pharmacoeconomic and epidemiological studies that parameterize effectiveness and efficiency of public policies and embedded technologies. Methods Semantic analysis of data was performed to describe and understand different meanings of different fields existing in the studied bases. In addition, there were four main procedures, executed with database operations tools and PLSQL programming language: cleaning and standardization of databases(document’s numbers was checked in the brazilian national people’s database, with a string approximator algorithm to decide if the document’s number belonged or no the register); registration information extraction, deterministic and probabilistic deduplication thereof. The procedures were first performed on each database separately and after the unification of the records, was held again a deterministic deduplication. Except the probabilistic deduplication which was performed only on the final deterministic deduplicated's database. Performed procedures allowed a decision-making to chose fields used in data model for the unified database creation. Nine database's representative fields related to patients were selected: patient's name; patient mother’s name; sex; birth date; state; city; zip code; cpf and cns(brazilian documents). Results Initially, the unified registration database resulted in 705.599.785 records, after deterministic deduplication there was a reduction culminating in 198.400.762 records. This reduction is explained because these databases are not fully integrated. Moreover, there is not always agreement between systems’ semantics and in some cases changes occur in the data format over the period within the same system. After probabilistic deduplication, the number of unique records decreased to 124.545.186 which is explained by non-linked pairs by deterministic process. This result is guaranteed with a estimate error of at most 3.3% of false positive and at most 12.3% of false negative pairs. Conclusion The results show that data deduplication is necessary and should be carried out thoroughly. Where the databases had limited patients’ registration information, the technique enabled to capture, in more complete basis, additional information. Futhermore, it allowed to identify and assist in the understanding of positive and negative aspects within systems and trace clinical condition of patients, enabling pharmacoeconomic and epidemiological studies that define effectiveness and efficiency of public policies and embedded technologies. As future work, is important ensure the univocity of records and link this database with past period.
url https://ijpds.org/article/view/225
work_keys_str_mv AT ramonpereira unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT leonardodias unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT julianoavila unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT nubiasantos unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT eliiolagurgel unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT mariangelalealcherchiglia unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT franciscoacurcio unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT afonsoreis unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT wagnermeirajunior unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
AT augustoafonsoguerrajunior unifiedhealthdatabasecreation125millionbraziliancohortfrominformationsystemsofhospitaloutpatientbirthsnotificationsandmortalities
_version_ 1725456677843501056