Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations

The construction of the composite index of a system can be considered as a problem of separating signal from noise. The signal in this case is the weight coefficients of the linear convolution of indicators. The weights to be determined should reflect the structure of the system being evaluated. How...

Full description

Bibliographic Details
Main Author: Tatyana Zhgun
Format: Article
Language:Russian
Published: The Fund for Promotion of Internet media, IT education, human development «League Internet Media» 2020-09-01
Series:Современные информационные технологии и IT-образование
Subjects:
Online Access:http://sitito.cs.msu.ru/index.php/SITITO/article/view/633
id doaj-c8912d17f01a42528d9f5fa9373b2f99
record_format Article
spelling doaj-c8912d17f01a42528d9f5fa9373b2f992021-08-10T12:44:53ZrusThe Fund for Promotion of Internet media, IT education, human development «League Internet Media»Современные информационные технологии и IT-образование2411-14732020-09-0116229530310.25559/SITITO.16.202002.295-303Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of ObservationsTatyana Zhgun0https://orcid.org/0000-0002-7518-6925Yaroslav-the-Wise Novgorod State UniversityThe construction of the composite index of a system can be considered as a problem of separating signal from noise. The signal in this case is the weight coefficients of the linear convolution of indicators. The weights to be determined should reflect the structure of the system being evaluated. However, principal component analysis and factor analysis determine the structure of principal components and principal factors differently for different observations. The reason for this may be the presence of inevitable errors in the used data. A solution of the problem requires a detailed understanding of input data errors’ influence on the calculated model’s parameters. The article discusses the use of the finite difference method for evaluating statistical data quality in the problem of calculating the integral characteristic of a system for a number of observations. For this technique to be applicable, the data must be approximated with polynomials of lower degrees than the number of observations minus one. The assumption is tested empirically on a specific data set. 37 variables characterizing the quality of life of the population of Russia for 2010-2017 are considered. The dependencies of the quality of data approximation on the degree of polynomial regression are analyzed. The results of the numerical experiment make it possible to draw a conclusion about the legitimacy of evaluating data errors using the finite difference method. The use of the finite difference apparatus for analyzing the data shows the presence of fatal errors from 0.59% to 28.92%. Therefore, obtaining the composite characteristics of objects on the basis of such data must necessarily take into account the presence of a fatal error. In particular, the number of parameters characterizing the system should be large enough to compensate for random errors with averaging.http://sitito.cs.msu.ru/index.php/SITITO/article/view/633composite indexdata qualitydata errorsprincipal component analysismethod of finite differences
collection DOAJ
language Russian
format Article
sources DOAJ
author Tatyana Zhgun
spellingShingle Tatyana Zhgun
Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
Современные информационные технологии и IT-образование
composite index
data quality
data errors
principal component analysis
method of finite differences
author_facet Tatyana Zhgun
author_sort Tatyana Zhgun
title Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
title_short Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
title_full Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
title_fullStr Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
title_full_unstemmed Evaluation of Statistical Data Quality in the Problem of Calculating the Integral Characteristic of a System for a Number of Observations
title_sort evaluation of statistical data quality in the problem of calculating the integral characteristic of a system for a number of observations
publisher The Fund for Promotion of Internet media, IT education, human development «League Internet Media»
series Современные информационные технологии и IT-образование
issn 2411-1473
publishDate 2020-09-01
description The construction of the composite index of a system can be considered as a problem of separating signal from noise. The signal in this case is the weight coefficients of the linear convolution of indicators. The weights to be determined should reflect the structure of the system being evaluated. However, principal component analysis and factor analysis determine the structure of principal components and principal factors differently for different observations. The reason for this may be the presence of inevitable errors in the used data. A solution of the problem requires a detailed understanding of input data errors’ influence on the calculated model’s parameters. The article discusses the use of the finite difference method for evaluating statistical data quality in the problem of calculating the integral characteristic of a system for a number of observations. For this technique to be applicable, the data must be approximated with polynomials of lower degrees than the number of observations minus one. The assumption is tested empirically on a specific data set. 37 variables characterizing the quality of life of the population of Russia for 2010-2017 are considered. The dependencies of the quality of data approximation on the degree of polynomial regression are analyzed. The results of the numerical experiment make it possible to draw a conclusion about the legitimacy of evaluating data errors using the finite difference method. The use of the finite difference apparatus for analyzing the data shows the presence of fatal errors from 0.59% to 28.92%. Therefore, obtaining the composite characteristics of objects on the basis of such data must necessarily take into account the presence of a fatal error. In particular, the number of parameters characterizing the system should be large enough to compensate for random errors with averaging.
topic composite index
data quality
data errors
principal component analysis
method of finite differences
url http://sitito.cs.msu.ru/index.php/SITITO/article/view/633
work_keys_str_mv AT tatyanazhgun evaluationofstatisticaldataqualityintheproblemofcalculatingtheintegralcharacteristicofasystemforanumberofobservations
_version_ 1721212199182008320