Unsupervised Anomaly Detection in Numerical Datasets

Bibliographic Details
Main Author: Joshi, Vineet
Language:English
Published: University of Cincinnati / OhioLINK 2015
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744
id ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1427799744
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-ucin14277997442021-08-03T06:29:33Z Unsupervised Anomaly Detection in Numerical Datasets Joshi, Vineet Computer Science Data Mining Anomaly Detection Outlier Detection Subspaces Anomaly detection is an important problem in data mining with diverseapplications such as financial fraud detection, medical diagnosis andcomputer systems intrusion detection. Anomalies are data points thatare substantially different from the rest of the population. Thesegenerally represent valuable information about the system for whichthe analyst is interested in detecting anomalies accurately andefficiently, and then taking appropriate actions in response. Thereare scenarios where tremendous impact can be made by detectinganomalies in a timely and accurate manner, e.g. early detection ofspurious credit card transactions can prevent financial damages to acredit card holder as well as the banking institution that issued thecredit card. Similarly, abnormal readings by a sensor monitoring anindustrial plant can help detect system faults and avert damages. Allthese applications have led to an interest in finding efficientmethods for detecting anomalies. Anomaly detection continues to be anactive research area within data mining.In this dissertation we investigate various aspects of anomalydetection problem. To determine anomalies in a dataset, a concretedefinition of anomalous behavior is required. There is no singleuniversally applicable definition of anomalies because each definitionpresents perspective of an anomalous behavior which may notnecessarily apply across diverse datasets. In this work we investigatea new definition of anomalous behavior. We compare this definitionwith an existing definition of outlier-ness and demonstrate theeffectiveness of the new definition.We further present a refinement of the metric of outlier-ness that wehave mentioned above. We discovered that the metric initially proposedcan be altered to yield a new metric of outlier-ness that accentuatesthe difference in the outlier-ness scores of strong outliers ascompared to the non-anomalous datapoints. We compare this updatedmetric with the metric we first presented, and also with anestablished metric of outlier-ness.As the number of attributes increases, the distances between thenearest and the farthest data points tend to converge resulting indistance concentration. Thus the anomalies reported by mostdefinitions of anomalous behavior tend to lose meaning with increasingnumbers of attributes. It has been suggested that in such datasets,the anomalies are located in smaller subspaces of attributes. Hence,anomalies should be searched in subspaces of the attributes, insteadof the complete attribute space. However the number of subspacesincreases very rapidly as the number of attributes increases. Thenumber of possible subspaces for a given set of attributes in thedataset is a combinatorial number. This makes, an exhaustive searchthrough all possible subspaces infeasible. In this dissertation, afterpresenting a novel definition of anomalous behavior, we present anefficient method of exploring the possible subspaces arising from theattributes of a dataset.The subspaces of attributes in any dataset can be arranged in alattice. The anomalous behavior of data points as we traverse thislattice conveys meaningful information about the structure of thedata. In the fourth problem that we address, we present a method thatinvestigates the anomalous behavior of data points across thedifferent subspaces in the lattice in which the same point displaysanomalous behavior. Further, our method also computes the contiguousregions of the subspace lattice where the same data point demonstratesanomalous behavior. 2015-06-05 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
Data Mining
Anomaly Detection
Outlier Detection
Subspaces
spellingShingle Computer Science
Data Mining
Anomaly Detection
Outlier Detection
Subspaces
Joshi, Vineet
Unsupervised Anomaly Detection in Numerical Datasets
author Joshi, Vineet
author_facet Joshi, Vineet
author_sort Joshi, Vineet
title Unsupervised Anomaly Detection in Numerical Datasets
title_short Unsupervised Anomaly Detection in Numerical Datasets
title_full Unsupervised Anomaly Detection in Numerical Datasets
title_fullStr Unsupervised Anomaly Detection in Numerical Datasets
title_full_unstemmed Unsupervised Anomaly Detection in Numerical Datasets
title_sort unsupervised anomaly detection in numerical datasets
publisher University of Cincinnati / OhioLINK
publishDate 2015
url http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744
work_keys_str_mv AT joshivineet unsupervisedanomalydetectioninnumericaldatasets
_version_ 1719437751644848128