Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study

Test equating is important to large-scale testing programs because of the following two reasons: strict test security is a key concern for high-stakes tests and fairness of test equating is important for test takers. The question of adequacy of sample size often arises in test equating. However, mos...

Full description

Bibliographic Details
Main Author:	Wang, Xiangrong
Other Authors:	Educational Research and Evaluation
Format:	Others
Published:	Virginia Tech 2014
Subjects:	sample size test equating IRT
Online Access:	http://hdl.handle.net/10919/37555 http://scholar.lib.vt.edu/theses/available/etd-04062012-155934/

id	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-37555
record_format	oai_dc
spelling	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-375552020-09-26T05:30:39Z Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study Wang, Xiangrong Educational Research and Evaluation Skaggs, Gary E. Creamer, Elizabeth G. Chang, Mido Kim, Inyoung sample size test equating IRT Test equating is important to large-scale testing programs because of the following two reasons: strict test security is a key concern for high-stakes tests and fairness of test equating is important for test takers. The question of adequacy of sample size often arises in test equating. However, most recommendations in the existing literature are based on classical test equating. Very few research studies systematically investigated the minimal sample size which leads to reasonably accurate equating results based on item response theory (IRT). The main purpose of this study was to examine the minimal sample size for desired IRT equating accuracy for the common-item nonequivalent groups design under various conditions. Accuracy was determined by examining the relative magnitude of six accuracy statistics. Two IRT equating methods were carried out on simulated tests with combinations of test length, test format, group ability difference, similarity of the form difficulty, and parameter estimation methods for 14 sample sizes using Monte Carlo simulations with 1,000 replications per cell. Observed score equating and true score equating were compared to the criterion equating to obtain the accuracy statistics. The results suggest that different sample size requirements exist for different test lengths, test formats and parameter estimation methods. Additionally, the results show the following: first, the results for true score equating and observed score equating are very similar. Second, the longer test has less accurate equating than the shorter one at the same sample size level and as the sample size decreases, the gap is greater. Third, concurrent parameter estimation method produced less equating error than separate estimation at the same sample size level and as the sample size reduces, the difference increases. Fourth, the cases with different group ability have larger and less stable error comparing to the base case and the cases with different test difficulty, especially when using separate parameter estimation method with sample size less than 750. Last, the mixed formatted test is more accurate than the single formatted one at the same sample size level. Ph. D. 2014-03-14T21:10:16Z 2014-03-14T21:10:16Z 2012-03-23 2012-04-06 2012-05-03 2012-05-03 Dissertation etd-04062012-155934 http://hdl.handle.net/10919/37555 http://scholar.lib.vt.edu/theses/available/etd-04062012-155934/ Wang_X_D_2012.pdf.pdf In Copyright http://rightsstatements.org/vocab/InC/1.0/ application/pdf Virginia Tech
collection	NDLTD
format	Others
sources	NDLTD
topic	sample size test equating IRT
spellingShingle	sample size test equating IRT Wang, Xiangrong Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study
description	Test equating is important to large-scale testing programs because of the following two reasons: strict test security is a key concern for high-stakes tests and fairness of test equating is important for test takers. The question of adequacy of sample size often arises in test equating. However, most recommendations in the existing literature are based on classical test equating. Very few research studies systematically investigated the minimal sample size which leads to reasonably accurate equating results based on item response theory (IRT). The main purpose of this study was to examine the minimal sample size for desired IRT equating accuracy for the common-item nonequivalent groups design under various conditions. Accuracy was determined by examining the relative magnitude of six accuracy statistics. Two IRT equating methods were carried out on simulated tests with combinations of test length, test format, group ability difference, similarity of the form difficulty, and parameter estimation methods for 14 sample sizes using Monte Carlo simulations with 1,000 replications per cell. Observed score equating and true score equating were compared to the criterion equating to obtain the accuracy statistics. The results suggest that different sample size requirements exist for different test lengths, test formats and parameter estimation methods. Additionally, the results show the following: first, the results for true score equating and observed score equating are very similar. Second, the longer test has less accurate equating than the shorter one at the same sample size level and as the sample size decreases, the gap is greater. Third, concurrent parameter estimation method produced less equating error than separate estimation at the same sample size level and as the sample size reduces, the difference increases. Fourth, the cases with different group ability have larger and less stable error comparing to the base case and the cases with different test difficulty, especially when using separate parameter estimation method with sample size less than 750. Last, the mixed formatted test is more accurate than the single formatted one at the same sample size level. === Ph. D.
author2	Educational Research and Evaluation
author_facet	Educational Research and Evaluation Wang, Xiangrong
author	Wang, Xiangrong
author_sort	Wang, Xiangrong
title	Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study
title_short	Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study
title_full	Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study
title_fullStr	Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study
title_full_unstemmed	Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study
title_sort	effect of sample size on irt equating of uni-dimensional tests in common item non-equivalent group design: a monte carlo simulation study
publisher	Virginia Tech
publishDate	2014
url	http://hdl.handle.net/10919/37555 http://scholar.lib.vt.edu/theses/available/etd-04062012-155934/
work_keys_str_mv	AT wangxiangrong effectofsamplesizeonirtequatingofunidimensionaltestsincommonitemnonequivalentgroupdesignamontecarlosimulationstudy
_version_	1719340688483549184

Effect of Sample Size on Irt Equating of Uni-Dimensional Tests in Common Item Non-Equivalent Group Design: a Monte Carlo Simulation Study

Similar Items