Mining Order-preserving Submatrices under Data Uncertainty: A Possible-world Approach and Efficient Approximation Methods

Given a data matrix , a submatrix of is an order-preserving submatrix (OPSM) if there is a permutation of the columns of , under which the entry values of each row in are strictly increasing. OPSM mining is widely used in real-life applications such as identifying coexpressed genes and finding custo...

Full description

Bibliographic Details
Main Authors:	Cheng, J. (Author), Hao, X. (Author), Long, C. (Author), Ng, W. (Author), Qu, W. (Author), Wang, X. (Author), Yan, D. (Author)
Format:	Article
Language:	English
Published:	Association for Computing Machinery 2022
Subjects:	Data matrix Data mining Dynamic programming Expected support Matrix algebra Mining algorithms Mining order OPSM Order-preserving submatrices Order-preserving submatrix Poisson distribution Possible world semantics Probabilistic frequentness Probabilistics Semantics Value distribution
Online Access:	View Fulltext in Publisher


LEADER	03226nam a2200397Ia 4500
001	10.1145-3524915
008	220718s2022 CNT 000 0 und d
020			\|a 03625915 (ISSN)
245	1	0	\|a Mining Order-preserving Submatrices under Data Uncertainty: A Possible-world Approach and Efficient Approximation Methods
260		0	\|b Association for Computing Machinery \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1145/3524915
520	3		\|a Given a data matrix , a submatrix of is an order-preserving submatrix (OPSM) if there is a permutation of the columns of , under which the entry values of each row in are strictly increasing. OPSM mining is widely used in real-life applications such as identifying coexpressed genes and finding customers with similar preference. However, noise is ubiquitous in real data matrices due to variable experimental conditions and measurement errors, which makes conventional OPSM mining algorithms inapplicable. No previous work on OPSM has ever considered uncertain value intervals using the well-established possible world semantics.We establish two different definitions of significant OPSMs based on the possible world semantics: (1) expected support-based and (2) probabilistic frequentness-based. An optimized dynamic programming approach is proposed to compute the probability that a row supports a particular column permutation, with a closed-form formula derived to efficiently handle the special case of uniform value distribution and an accurate cubic spline approximation approach that works well with any uncertain value distributions. To efficiently check the probabilistic frequentness, several effective pruning rules are designed to efficiently prune insignificant OPSMs; two approximation techniques based on the Poisson and Gaussian distributions, respectively, are proposed for further speedup. These techniques are integrated into our two OPSM mining algorithms, based on prefix-projection and Apriori, respectively. We further parallelize our prefix-projection-based mining algorithm using PrefixFPM, a recently proposed framework for parallel frequent pattern mining, and we achieve a good speedup with the number of CPU cores. Extensive experiments on real microarray data demonstrate that the OPSMs found by our algorithms have a much higher quality than those found by existing approaches. © 2022 Association for Computing Machinery.
650	0	4	\|a Data matrix
650	0	4	\|a Data mining
650	0	4	\|a Dynamic programming
650	0	4	\|a Expected support
650	0	4	\|a Matrix algebra
650	0	4	\|a Mining algorithms
650	0	4	\|a Mining order
650	0	4	\|a OPSM
650	0	4	\|a Order-preserving submatrices
650	0	4	\|a Order-preserving submatrix
650	0	4	\|a Poisson distribution
650	0	4	\|a Possible world semantics
650	0	4	\|a Probabilistic frequentness
650	0	4	\|a Probabilistics
650	0	4	\|a Semantics
650	0	4	\|a Value distribution
700	1		\|a Cheng, J. \|e author
700	1		\|a Hao, X. \|e author
700	1		\|a Long, C. \|e author
700	1		\|a Ng, W. \|e author
700	1		\|a Qu, W. \|e author
700	1		\|a Wang, X. \|e author
700	1		\|a Yan, D. \|e author
773			\|t ACM Transactions on Database Systems \|x 03625915 (ISSN) \|g 47 2

Mining Order-preserving Submatrices under Data Uncertainty: A Possible-world Approach and Efficient Approximation Methods

Similar Items