A Greedy Algorithm for Representative Sampling: repsample in Stata
Quantitative empirical analyses of a population of interest usually aim to estimate the causal effect of one or more independent variables on a dependent variable. However, only in rare instances is the whole population available for analysis. Researchers tend to estimate causal effects on a selecte...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation for Open Access Statistics
2013-11-01
|
Series: | Journal of Statistical Software |
Online Access: | http://www.jstatsoft.org/index.php/jss/article/view/2110 |
id |
doaj-8be0ccf0150b438abe664c75b8d891ca |
---|---|
record_format |
Article |
spelling |
doaj-8be0ccf0150b438abe664c75b8d891ca2020-11-24T23:24:37ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602013-11-0155111910.18637/jss.v055.c01714A Greedy Algorithm for Representative Sampling: repsample in StataEvangelos KontopantelisQuantitative empirical analyses of a population of interest usually aim to estimate the causal effect of one or more independent variables on a dependent variable. However, only in rare instances is the whole population available for analysis. Researchers tend to estimate causal effects on a selected sample and generalize their conclusions to the whole population. The validity of this approach rests on the assumption that the sample is representative of the population on certain key characteristics. A study using a non-representative sample is lacking in external validity by failing to minimize population choice bias. When the sample is large and non-response bias is not an issue, a random selection process is adequate to ensure external validity. If that is not the case, however, researchers could follow a more deterministic approach to ensure representativeness on the selected characteristics, provided these are known, or can be estimated, in the parent population. Although such approaches exist for matched sampling designs, research on representative sampling and the similarity between the sample and the parent population seems to be lacking. In this article we propose a greedy algorithm for obtaining a representative sample and quantifying representativeness in Stata.http://www.jstatsoft.org/index.php/jss/article/view/2110 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Evangelos Kontopantelis |
spellingShingle |
Evangelos Kontopantelis A Greedy Algorithm for Representative Sampling: repsample in Stata Journal of Statistical Software |
author_facet |
Evangelos Kontopantelis |
author_sort |
Evangelos Kontopantelis |
title |
A Greedy Algorithm for Representative Sampling: repsample in Stata |
title_short |
A Greedy Algorithm for Representative Sampling: repsample in Stata |
title_full |
A Greedy Algorithm for Representative Sampling: repsample in Stata |
title_fullStr |
A Greedy Algorithm for Representative Sampling: repsample in Stata |
title_full_unstemmed |
A Greedy Algorithm for Representative Sampling: repsample in Stata |
title_sort |
greedy algorithm for representative sampling: repsample in stata |
publisher |
Foundation for Open Access Statistics |
series |
Journal of Statistical Software |
issn |
1548-7660 |
publishDate |
2013-11-01 |
description |
Quantitative empirical analyses of a population of interest usually aim to estimate the causal effect of one or more independent variables on a dependent variable. However, only in rare instances is the whole population available for analysis. Researchers tend to estimate causal effects on a selected sample and generalize their conclusions to the whole population. The validity of this approach rests on the assumption that the sample is representative of the population on certain key characteristics. A study using a non-representative sample is lacking in external validity by failing to minimize population choice bias. When the sample is large and non-response bias is not an issue, a random selection process is adequate to ensure external validity. If that is not the case, however, researchers could follow a more deterministic approach to ensure representativeness on the selected characteristics, provided these are known, or can be estimated, in the parent population. Although such approaches exist for matched sampling designs, research on representative sampling and the similarity between the sample and the parent population seems to be lacking. In this article we propose a greedy algorithm for obtaining a representative sample and quantifying representativeness in Stata. |
url |
http://www.jstatsoft.org/index.php/jss/article/view/2110 |
work_keys_str_mv |
AT evangeloskontopantelis agreedyalgorithmforrepresentativesamplingrepsampleinstata AT evangeloskontopantelis greedyalgorithmforrepresentativesamplingrepsampleinstata |
_version_ |
1725559755230937088 |