Annotation-Enabled Interpretation and Analysis of Time-Series Data

As we continue to produce large amounts of time-series data, the need for data analysis is growing rapidly to help gain insights from this data. These insights form the foundation of data-driven decisions in various aspects of life. Data annotations are information about the data such as comments, e...

Full description

Bibliographic Details
Main Author: Venugopal, Niveditha
Format: Others
Published: PDXScholar 2018
Subjects:
Online Access:https://pdxscholar.library.pdx.edu/open_access_etds/4708
https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?article=5779&context=open_access_etds
Description
Summary:As we continue to produce large amounts of time-series data, the need for data analysis is growing rapidly to help gain insights from this data. These insights form the foundation of data-driven decisions in various aspects of life. Data annotations are information about the data such as comments, errors and provenance, which provide context to the underlying data and aid in meaningful data analysis in domains such as scientific research, genomics and ECG analysis. Storing such annotations in the database along with the data makes them available to help with analysis of the data. In this thesis, I propose a user-friendly technique for Annotation-Enabled Analysis through which a user can employ annotations to help query and analyze data without having prior knowledge of the details of the database schema or any kind of database programming language. The proposed technique receives the request for analysis as a high-level specification, hiding the details of the schema, joins, etc., and parses it, validates the input and converts it into SQL. This SQL query can then be executed in a relational database and the result of the query returned to the user. I evaluate this technique by providing real-world data from a building-data platform containing data about Portland State University buildings such as room temperature, air volume and CO2 level. This data is annotated with information such as class schedules, power outages and control modes (for example, day or night mode). I test my technique with three increasingly sophisticated levels of use cases drawn from this building science domain. (1) Retrieve data with include or exclude annotation selection (2) Correlate data with include or exclude annotation selection (3) Align data based on include annotation selection to support aggregation over multiple periods. I evaluate the technique by performing two kinds of tests: (1) To validate correctness, I generate synthetic datasets for which I know the expected result of these annotation-enabled analyses and compare the expected results with the results generated from my technique (2) I evaluate the performance of the queries generated by this service with respect to execution time in the database by comparing them with alternative SQL translations that I developed.