Principal Component Analysis of Process Datasets with Missing Values

Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building....

Full description

Bibliographic Details
Main Authors:	Severson, Kristen (Author), Molaro, Mark (Author), Braatz, Richard D (Author)
Other Authors:	Massachusetts Institute of Technology. Department of Chemical Engineering (Contributor)
Format:	Article
Language:	English
Published:	MDPI AG, 2020-06-02T18:39:46Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	01893 am a22001813u 4500
001	125630
042			\|a dc
100	1	0	\|a Severson, Kristen \|e author
100	1	0	\|a Massachusetts Institute of Technology. Department of Chemical Engineering \|e contributor
700	1	0	\|a Molaro, Mark \|e author
700	1	0	\|a Braatz, Richard D \|e author
245	0	0	\|a Principal Component Analysis of Process Datasets with Missing Values
260			\|b MDPI AG, \|c 2020-06-02T18:39:46Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/125630
520			\|a Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA), which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets. Keywords: principal component analysis; missing data; process data analytics; chemometrics; machine learning; multivariable statistical process control; process monitoring; Tennessee Eastman problem
546			\|a en
655	7		\|a Article
773			\|t Processes

Principal Component Analysis of Process Datasets with Missing Values

Similar Items