The SAIL databank: linking multiple health and social care datasets

<p>Abstract</p> <p>Background</p> <p>Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and resear...

Full description

Bibliographic Details
Main Authors: Ford David V, Verplancke Jean-Philippe, Brooks Caroline J, John Gareth, Jones Kerina H, Lyons Ronan A, Brown Ginevra, Leake Ken
Format: Article
Language:English
Published: BMC 2009-01-01
Series:BMC Medical Informatics and Decision Making
Online Access:http://www.biomedcentral.com/1472-6947/9/3
id doaj-d8a1169d3a6a4618bbc2a68efb3bc408
record_format Article
spelling doaj-d8a1169d3a6a4618bbc2a68efb3bc4082020-11-25T00:12:01ZengBMCBMC Medical Informatics and Decision Making1472-69472009-01-0191310.1186/1472-6947-9-3The SAIL databank: linking multiple health and social care datasetsFord David VVerplancke Jean-PhilippeBrooks Caroline JJohn GarethJones Kerina HLyons Ronan ABrown GinevraLeake Ken<p>Abstract</p> <p>Background</p> <p>Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress.</p> <p>Methods</p> <p>Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique.</p> <p>Results</p> <p>The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care.</p> <p>Conclusion</p> <p>With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.</p> http://www.biomedcentral.com/1472-6947/9/3
collection DOAJ
language English
format Article
sources DOAJ
author Ford David V
Verplancke Jean-Philippe
Brooks Caroline J
John Gareth
Jones Kerina H
Lyons Ronan A
Brown Ginevra
Leake Ken
spellingShingle Ford David V
Verplancke Jean-Philippe
Brooks Caroline J
John Gareth
Jones Kerina H
Lyons Ronan A
Brown Ginevra
Leake Ken
The SAIL databank: linking multiple health and social care datasets
BMC Medical Informatics and Decision Making
author_facet Ford David V
Verplancke Jean-Philippe
Brooks Caroline J
John Gareth
Jones Kerina H
Lyons Ronan A
Brown Ginevra
Leake Ken
author_sort Ford David V
title The SAIL databank: linking multiple health and social care datasets
title_short The SAIL databank: linking multiple health and social care datasets
title_full The SAIL databank: linking multiple health and social care datasets
title_fullStr The SAIL databank: linking multiple health and social care datasets
title_full_unstemmed The SAIL databank: linking multiple health and social care datasets
title_sort sail databank: linking multiple health and social care datasets
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2009-01-01
description <p>Abstract</p> <p>Background</p> <p>Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress.</p> <p>Methods</p> <p>Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique.</p> <p>Results</p> <p>The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care.</p> <p>Conclusion</p> <p>With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.</p>
url http://www.biomedcentral.com/1472-6947/9/3
work_keys_str_mv AT forddavidv thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT verplanckejeanphilippe thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT brookscarolinej thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT johngareth thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT joneskerinah thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT lyonsronana thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT brownginevra thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT leakeken thesaildatabanklinkingmultiplehealthandsocialcaredatasets
AT forddavidv saildatabanklinkingmultiplehealthandsocialcaredatasets
AT verplanckejeanphilippe saildatabanklinkingmultiplehealthandsocialcaredatasets
AT brookscarolinej saildatabanklinkingmultiplehealthandsocialcaredatasets
AT johngareth saildatabanklinkingmultiplehealthandsocialcaredatasets
AT joneskerinah saildatabanklinkingmultiplehealthandsocialcaredatasets
AT lyonsronana saildatabanklinkingmultiplehealthandsocialcaredatasets
AT brownginevra saildatabanklinkingmultiplehealthandsocialcaredatasets
AT leakeken saildatabanklinkingmultiplehealthandsocialcaredatasets
_version_ 1725401691096875008