Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interv...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2016-09-01
|
Series: | Journal of Official Statistics |
Subjects: | |
Online Access: | https://doi.org/10.1515/jos-2016-0033 |
id |
doaj-751d198b697f48bab3bb10933f53aebe |
---|---|
record_format |
Article |
spelling |
doaj-751d198b697f48bab3bb10933f53aebe2021-09-06T19:40:52ZengSciendoJournal of Official Statistics2001-73672016-09-0132364366010.1515/jos-2016-0033jos-2016-0033Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a QuestionnaireDe Haas Samuel0Winker Peter1University of Giessen, Chair of Industrial Organisation, Regulation and Antitrust, and Chair of Statistics and Econometrics, Licher Strasse 64, 35394 Giessen, Germany.University of Giessen, Chair of Industrial Organisation, Regulation and Antitrust, and Chair of Statistics and Econometrics, Licher Strasse 64, 35394 Giessen, Germany.Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably.https://doi.org/10.1515/jos-2016-0033survey data falsificationspartial falsificationscluster analysisconstraint cluster analysisbootstrap |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
De Haas Samuel Winker Peter |
spellingShingle |
De Haas Samuel Winker Peter Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire Journal of Official Statistics survey data falsifications partial falsifications cluster analysis constraint cluster analysis bootstrap |
author_facet |
De Haas Samuel Winker Peter |
author_sort |
De Haas Samuel |
title |
Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire |
title_short |
Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire |
title_full |
Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire |
title_fullStr |
Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire |
title_full_unstemmed |
Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire |
title_sort |
detecting fraudulent interviewers by improved clustering methods – the case of falsifications of answers to parts of a questionnaire |
publisher |
Sciendo |
series |
Journal of Official Statistics |
issn |
2001-7367 |
publishDate |
2016-09-01 |
description |
Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably. |
topic |
survey data falsifications partial falsifications cluster analysis constraint cluster analysis bootstrap |
url |
https://doi.org/10.1515/jos-2016-0033 |
work_keys_str_mv |
AT dehaassamuel detectingfraudulentinterviewersbyimprovedclusteringmethodsthecaseoffalsificationsofanswerstopartsofaquestionnaire AT winkerpeter detectingfraudulentinterviewersbyimprovedclusteringmethodsthecaseoffalsificationsofanswerstopartsofaquestionnaire |
_version_ |
1717767587309289472 |