Efficient analysis of data streams

Data streams provide a challenging environment for statistical analysis. Data points can arrive at a high velocity and may need to be deleted once they have been observed. Due to these restrictions, standard techniques may not be applicable to the data streaming scenario. This leads to the need for...

Full description

Bibliographic Details
Main Author: Davies, Rhian
Other Authors: Eckley, Idris ; Pavlidis, Nicos ; Mihaylova, Lyudmila
Published: Lancaster University 2017
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.727163
id ndltd-bl.uk-oai-ethos.bl.uk-727163
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7271632018-10-03T03:22:41ZEfficient analysis of data streamsDavies, RhianEckley, Idris ; Pavlidis, Nicos ; Mihaylova, Lyudmila2017Data streams provide a challenging environment for statistical analysis. Data points can arrive at a high velocity and may need to be deleted once they have been observed. Due to these restrictions, standard techniques may not be applicable to the data streaming scenario. This leads to the need for data summaries to represent the data stream. This thesis explores how data summaries can be used to perform clustering and classification on data streams across a broad range of applications. Spectral clustering is one such technique which prior to this work has not been applicable to the data streaming setting due to the high computation involved. CluStream is an existing method which uses micro-clusters to summarise data streams. We present two algorithms which utilise these micro-cluster summaries to enable spectral clustering to be performed on data streams. The methods were tested on simulated data streams, as well as textured images and hand-written digits. Distributed acoustic sensing is used to monitor oil flow at various depths throughout an oil well. Vibrations are recorded at very high resolutions, up to 10000 observations a second at each depth. Unfortunately, corruption can occur in the signal and engineers need to know where corruption occurs. We develop a method which treats the multiple time series as a high-dimensional clustering problem and uses the cluster labels to identify changes within the signal. The final piece of work concerns identifying areas of activity within a video stream, in particular CCTV footage. It is more efficient if this classification stage is performed on a compressed version of the video stream. In order to reconstruct areas of activity in the original video a recovery algorithm is needed. We present a comparison of the performance of two recovery algorithms and identify an ideal range for the compression ratio.Lancaster University10.17635/lancaster/thesis/137https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.727163http://eprints.lancs.ac.uk/88556/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
description Data streams provide a challenging environment for statistical analysis. Data points can arrive at a high velocity and may need to be deleted once they have been observed. Due to these restrictions, standard techniques may not be applicable to the data streaming scenario. This leads to the need for data summaries to represent the data stream. This thesis explores how data summaries can be used to perform clustering and classification on data streams across a broad range of applications. Spectral clustering is one such technique which prior to this work has not been applicable to the data streaming setting due to the high computation involved. CluStream is an existing method which uses micro-clusters to summarise data streams. We present two algorithms which utilise these micro-cluster summaries to enable spectral clustering to be performed on data streams. The methods were tested on simulated data streams, as well as textured images and hand-written digits. Distributed acoustic sensing is used to monitor oil flow at various depths throughout an oil well. Vibrations are recorded at very high resolutions, up to 10000 observations a second at each depth. Unfortunately, corruption can occur in the signal and engineers need to know where corruption occurs. We develop a method which treats the multiple time series as a high-dimensional clustering problem and uses the cluster labels to identify changes within the signal. The final piece of work concerns identifying areas of activity within a video stream, in particular CCTV footage. It is more efficient if this classification stage is performed on a compressed version of the video stream. In order to reconstruct areas of activity in the original video a recovery algorithm is needed. We present a comparison of the performance of two recovery algorithms and identify an ideal range for the compression ratio.
author2 Eckley, Idris ; Pavlidis, Nicos ; Mihaylova, Lyudmila
author_facet Eckley, Idris ; Pavlidis, Nicos ; Mihaylova, Lyudmila
Davies, Rhian
author Davies, Rhian
spellingShingle Davies, Rhian
Efficient analysis of data streams
author_sort Davies, Rhian
title Efficient analysis of data streams
title_short Efficient analysis of data streams
title_full Efficient analysis of data streams
title_fullStr Efficient analysis of data streams
title_full_unstemmed Efficient analysis of data streams
title_sort efficient analysis of data streams
publisher Lancaster University
publishDate 2017
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.727163
work_keys_str_mv AT daviesrhian efficientanalysisofdatastreams
_version_ 1718758089005465600