Automating telemetry- and trace-based analytics on large-scale distributed systems
Large-scale distributed systems---such as supercomputers, cloud computing platforms, and distributed applications---routinely suffer from slowdowns and crashes due to software and hardware problems, resulting in reduced efficiency and wasted resources. These large-scale systems typica...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/2144/41472 |