A Greedy Algorithm for Representative Sampling: repsample in Stata

Quantitative empirical analyses of a population of interest usually aim to estimate the causal effect of one or more independent variables on a dependent variable. However, only in rare instances is the whole population available for analysis. Researchers tend to estimate causal effects on a selecte...

Full description

Bibliographic Details
Main Author: Evangelos Kontopantelis
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2013-11-01
Series:Journal of Statistical Software
Online Access:http://www.jstatsoft.org/index.php/jss/article/view/2110
id doaj-8be0ccf0150b438abe664c75b8d891ca
record_format Article
spelling doaj-8be0ccf0150b438abe664c75b8d891ca2020-11-24T23:24:37ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602013-11-0155111910.18637/jss.v055.c01714A Greedy Algorithm for Representative Sampling: repsample in StataEvangelos KontopantelisQuantitative empirical analyses of a population of interest usually aim to estimate the causal effect of one or more independent variables on a dependent variable. However, only in rare instances is the whole population available for analysis. Researchers tend to estimate causal effects on a selected sample and generalize their conclusions to the whole population. The validity of this approach rests on the assumption that the sample is representative of the population on certain key characteristics. A study using a non-representative sample is lacking in external validity by failing to minimize population choice bias. When the sample is large and non-response bias is not an issue, a random selection process is adequate to ensure external validity. If that is not the case, however, researchers could follow a more deterministic approach to ensure representativeness on the selected characteristics, provided these are known, or can be estimated, in the parent population. Although such approaches exist for matched sampling designs, research on representative sampling and the similarity between the sample and the parent population seems to be lacking. In this article we propose a greedy algorithm for obtaining a representative sample and quantifying representativeness in Stata.http://www.jstatsoft.org/index.php/jss/article/view/2110
collection DOAJ
language English
format Article
sources DOAJ
author Evangelos Kontopantelis
spellingShingle Evangelos Kontopantelis
A Greedy Algorithm for Representative Sampling: repsample in Stata
Journal of Statistical Software
author_facet Evangelos Kontopantelis
author_sort Evangelos Kontopantelis
title A Greedy Algorithm for Representative Sampling: repsample in Stata
title_short A Greedy Algorithm for Representative Sampling: repsample in Stata
title_full A Greedy Algorithm for Representative Sampling: repsample in Stata
title_fullStr A Greedy Algorithm for Representative Sampling: repsample in Stata
title_full_unstemmed A Greedy Algorithm for Representative Sampling: repsample in Stata
title_sort greedy algorithm for representative sampling: repsample in stata
publisher Foundation for Open Access Statistics
series Journal of Statistical Software
issn 1548-7660
publishDate 2013-11-01
description Quantitative empirical analyses of a population of interest usually aim to estimate the causal effect of one or more independent variables on a dependent variable. However, only in rare instances is the whole population available for analysis. Researchers tend to estimate causal effects on a selected sample and generalize their conclusions to the whole population. The validity of this approach rests on the assumption that the sample is representative of the population on certain key characteristics. A study using a non-representative sample is lacking in external validity by failing to minimize population choice bias. When the sample is large and non-response bias is not an issue, a random selection process is adequate to ensure external validity. If that is not the case, however, researchers could follow a more deterministic approach to ensure representativeness on the selected characteristics, provided these are known, or can be estimated, in the parent population. Although such approaches exist for matched sampling designs, research on representative sampling and the similarity between the sample and the parent population seems to be lacking. In this article we propose a greedy algorithm for obtaining a representative sample and quantifying representativeness in Stata.
url http://www.jstatsoft.org/index.php/jss/article/view/2110
work_keys_str_mv AT evangeloskontopantelis agreedyalgorithmforrepresentativesamplingrepsampleinstata
AT evangeloskontopantelis greedyalgorithmforrepresentativesamplingrepsampleinstata
_version_ 1725559755230937088