HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes

Additive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are common and important in monitoring metrics in Internet companies. When an anomaly happens to an overall KPI, it is critical but...

Full description

Bibliographic Details
Main Authors: Yongqian Sun, Youjian Zhao, Ya Su, Dapeng Liu, Xiaohui Nie, Yuan Meng, Shiwen Cheng, Dan Pei, Shenglin Zhang, Xianping Qu, Xuanyou Guo
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8288614/
id doaj-5b0cf8b9121e42cdbba61b7d560cf337
record_format Article
spelling doaj-5b0cf8b9121e42cdbba61b7d560cf3372021-04-05T16:58:02ZengIEEEIEEE Access2169-35362018-01-016109091092310.1109/ACCESS.2018.28047648288614HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional AttributesYongqian Sun0https://orcid.org/0000-0003-0266-7899Youjian Zhao1Ya Su2Dapeng Liu3Xiaohui Nie4Yuan Meng5Shiwen Cheng6Dan Pei7Shenglin Zhang8Xianping Qu9Xuanyou Guo10Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaDepartment of Intelligent Operation, Baidu, Inc., Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaTsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, ChinaSchool of Software, Nankai University, Tianjin, ChinaDepartment of Intelligent Operation, Baidu, Inc., Beijing, ChinaDepartment of Intelligent Operation, Baidu, Inc., Beijing, ChinaAdditive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are common and important in monitoring metrics in Internet companies. When an anomaly happens to an overall KPI, it is critical but challenging to localize the root cause, which is one (or more) combination of attribute values in multiple dimensions. For example, is the total PV decrease caused by the PV decrease from “Beijing”or “China Mobile in Beijing”, or “Beijing and Shanghai”? However, this task is very challenging for two major reasons. First, the PVs of different combinations are interdependent; thus, the PV anomalies at the root cause can cause the changes of many other PVs at different aggregation levels. Second, there could be tens of thousands of combinations to investigate in multi-dimensional attribute space. It is a difficulty to find the root cause from a huge search space. To address the first challenge, our approach HotSpot uses a novel potential score based on the ripple effect for anomaly propagation that we reveal. To address the second challenge, HotSpot adopts the Monte Carlo Tree Search algorithm and a hierarchical pruning strategy. Using the real-world data from a top global search engine, we show that HotSpot achieves a great improvement on effectiveness and robustness, i.e., 95% of all types of root cause cases using HotSpot (compared with only 15% using existing approaches) achieves an F-score over 90%. Operational experiences show that HotSpot can reduce the localization time from more than 1 h in manual efforts to less than 20 s.https://ieeexplore.ieee.org/document/8288614/Anomaly localizationmulti-dimensional attributeshuge search spacepotential scoreMonte Carlo Tree Search (MTCS)hierarchical pruning
collection DOAJ
language English
format Article
sources DOAJ
author Yongqian Sun
Youjian Zhao
Ya Su
Dapeng Liu
Xiaohui Nie
Yuan Meng
Shiwen Cheng
Dan Pei
Shenglin Zhang
Xianping Qu
Xuanyou Guo
spellingShingle Yongqian Sun
Youjian Zhao
Ya Su
Dapeng Liu
Xiaohui Nie
Yuan Meng
Shiwen Cheng
Dan Pei
Shenglin Zhang
Xianping Qu
Xuanyou Guo
HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
IEEE Access
Anomaly localization
multi-dimensional attributes
huge search space
potential score
Monte Carlo Tree Search (MTCS)
hierarchical pruning
author_facet Yongqian Sun
Youjian Zhao
Ya Su
Dapeng Liu
Xiaohui Nie
Yuan Meng
Shiwen Cheng
Dan Pei
Shenglin Zhang
Xianping Qu
Xuanyou Guo
author_sort Yongqian Sun
title HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
title_short HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
title_full HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
title_fullStr HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
title_full_unstemmed HotSpot: Anomaly Localization for Additive KPIs With Multi-Dimensional Attributes
title_sort hotspot: anomaly localization for additive kpis with multi-dimensional attributes
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Additive key performance indicators (KPIs) (such as page view (PV), revenue, and error count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are common and important in monitoring metrics in Internet companies. When an anomaly happens to an overall KPI, it is critical but challenging to localize the root cause, which is one (or more) combination of attribute values in multiple dimensions. For example, is the total PV decrease caused by the PV decrease from “Beijing”or “China Mobile in Beijing”, or “Beijing and Shanghai”? However, this task is very challenging for two major reasons. First, the PVs of different combinations are interdependent; thus, the PV anomalies at the root cause can cause the changes of many other PVs at different aggregation levels. Second, there could be tens of thousands of combinations to investigate in multi-dimensional attribute space. It is a difficulty to find the root cause from a huge search space. To address the first challenge, our approach HotSpot uses a novel potential score based on the ripple effect for anomaly propagation that we reveal. To address the second challenge, HotSpot adopts the Monte Carlo Tree Search algorithm and a hierarchical pruning strategy. Using the real-world data from a top global search engine, we show that HotSpot achieves a great improvement on effectiveness and robustness, i.e., 95% of all types of root cause cases using HotSpot (compared with only 15% using existing approaches) achieves an F-score over 90%. Operational experiences show that HotSpot can reduce the localization time from more than 1 h in manual efforts to less than 20 s.
topic Anomaly localization
multi-dimensional attributes
huge search space
potential score
Monte Carlo Tree Search (MTCS)
hierarchical pruning
url https://ieeexplore.ieee.org/document/8288614/
work_keys_str_mv AT yongqiansun hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT youjianzhao hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT yasu hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT dapengliu hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT xiaohuinie hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT yuanmeng hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT shiwencheng hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT danpei hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT shenglinzhang hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT xianpingqu hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
AT xuanyouguo hotspotanomalylocalizationforadditivekpiswithmultidimensionalattributes
_version_ 1721540598800842752