Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

In this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix...

Full description

Bibliographic Details
Main Authors: Shaukat S. Shahid, Rao Toqeer Ahmed, Khan Moazzam A.
Format: Article
Language:English
Published: Sciendo 2016-06-01
Series:Ekológia (Bratislava)
Subjects:
pca
Online Access:https://doi.org/10.1515/eko-2016-0014
id doaj-e1e8e13554fc47e7a9f812a8322392cb
record_format Article
spelling doaj-e1e8e13554fc47e7a9f812a8322392cb2021-09-05T20:44:47ZengSciendoEkológia (Bratislava)1337-947X2016-06-0135217319010.1515/eko-2016-0014eko-2016-0014Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructureShaukat S. Shahid0Rao Toqeer Ahmed1Khan Moazzam A.2Institute of Environmental Studies, University of Karachi, Karachi-75270, PakistanDepartment of Botany, Federal Urdu University of Arts, Sciences & Technology, Karachi-75300, PakistanInstitute of Environmental Studies, University of Karachi, Karachi-75270, PakistanIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.https://doi.org/10.1515/eko-2016-0014eigenstructureenvironmental dataordinationpca
collection DOAJ
language English
format Article
sources DOAJ
author Shaukat S. Shahid
Rao Toqeer Ahmed
Khan Moazzam A.
spellingShingle Shaukat S. Shahid
Rao Toqeer Ahmed
Khan Moazzam A.
Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
Ekológia (Bratislava)
eigenstructure
environmental data
ordination
pca
author_facet Shaukat S. Shahid
Rao Toqeer Ahmed
Khan Moazzam A.
author_sort Shaukat S. Shahid
title Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
title_short Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
title_full Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
title_fullStr Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
title_full_unstemmed Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
title_sort impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure
publisher Sciendo
series Ekológia (Bratislava)
issn 1337-947X
publishDate 2016-06-01
description In this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.
topic eigenstructure
environmental data
ordination
pca
url https://doi.org/10.1515/eko-2016-0014
work_keys_str_mv AT shaukatsshahid impactofsamplesizeonprincipalcomponentanalysisordinationofanenvironmentaldataseteffectsoneigenstructure
AT raotoqeerahmed impactofsamplesizeonprincipalcomponentanalysisordinationofanenvironmentaldataseteffectsoneigenstructure
AT khanmoazzama impactofsamplesizeonprincipalcomponentanalysisordinationofanenvironmentaldataseteffectsoneigenstructure
_version_ 1717785121041416192