Regression analysis with linked data: problems and possible solutions

In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using  the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which  comprises both t...

Full description

Bibliographic Details
Main Authors: Andrea Tancredi, Brunero Liseo
Format: Article
Language:English
Published: University of Bologna 2015-03-01
Series:Statistica
Subjects:
Online Access:http://rivista-statistica.unibo.it/article/view/5821
Description
Summary:In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using  the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which  comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage.We have argued that this feedback effect is both  essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to improve record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of  multiple linear regression set-up for expository convenience.
ISSN:0390-590X
1973-2201