Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire

Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interv...

Full description

Bibliographic Details
Main Authors: De Haas Samuel, Winker Peter
Format: Article
Language:English
Published: Sciendo 2016-09-01
Series:Journal of Official Statistics
Subjects:
Online Access:https://doi.org/10.1515/jos-2016-0033
id doaj-751d198b697f48bab3bb10933f53aebe
record_format Article
spelling doaj-751d198b697f48bab3bb10933f53aebe2021-09-06T19:40:52ZengSciendoJournal of Official Statistics2001-73672016-09-0132364366010.1515/jos-2016-0033jos-2016-0033Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a QuestionnaireDe Haas Samuel0Winker Peter1University of Giessen, Chair of Industrial Organisation, Regulation and Antitrust, and Chair of Statistics and Econometrics, Licher Strasse 64, 35394 Giessen, Germany.University of Giessen, Chair of Industrial Organisation, Regulation and Antitrust, and Chair of Statistics and Econometrics, Licher Strasse 64, 35394 Giessen, Germany.Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably.https://doi.org/10.1515/jos-2016-0033survey data falsificationspartial falsificationscluster analysisconstraint cluster analysisbootstrap
collection DOAJ
language English
format Article
sources DOAJ
author De Haas Samuel
Winker Peter
spellingShingle De Haas Samuel
Winker Peter
Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
Journal of Official Statistics
survey data falsifications
partial falsifications
cluster analysis
constraint cluster analysis
bootstrap
author_facet De Haas Samuel
Winker Peter
author_sort De Haas Samuel
title Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
title_short Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
title_full Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
title_fullStr Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
title_full_unstemmed Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire
title_sort detecting fraudulent interviewers by improved clustering methods – the case of falsifications of answers to parts of a questionnaire
publisher Sciendo
series Journal of Official Statistics
issn 2001-7367
publishDate 2016-09-01
description Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably.
topic survey data falsifications
partial falsifications
cluster analysis
constraint cluster analysis
bootstrap
url https://doi.org/10.1515/jos-2016-0033
work_keys_str_mv AT dehaassamuel detectingfraudulentinterviewersbyimprovedclusteringmethodsthecaseoffalsificationsofanswerstopartsofaquestionnaire
AT winkerpeter detectingfraudulentinterviewersbyimprovedclusteringmethodsthecaseoffalsificationsofanswerstopartsofaquestionnaire
_version_ 1717767587309289472