Improving opportunities for data linkage within Children Looked After administrative records in Wales

Introduction Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, the...

全面介紹

書目詳細資料
發表在:International Journal of Population Data Science
Main Authors: Grace Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy Griffiths
格式: Article
語言:英语
出版: Swansea University 2025-02-01
主題:
在線閱讀:https://ijpds.org/article/view/2383
實物特徵
總結:Introduction Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality. Methods Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL. Results Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%. Conclusion We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.
ISSN:2399-4908