Analysis and optimization of the input pipeline and the use of XLA for Tensorflow deep learning systems

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Deep Learning is a subset of machine learning and deep learning applications include image detection and voice recognition. For deep learning applications, most developers should not only focus on the design and accuracy of neural network, but also take the inp...

Full description

Bibliographic Details
Main Authors: Yuan-Di Li, 李沅迪
Other Authors: 洪士灝
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/x2gvh9
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === Deep Learning is a subset of machine learning and deep learning applications include image detection and voice recognition. For deep learning applications, most developers should not only focus on the design and accuracy of neural network, but also take the input pipeline in an inference step in real world as consideration. Data preprocessing will be a serious performance issue in some cases. In purpose of getting a better performance, developers need a profiling tool to analyze deep learning applications. However, profiling tools, Nvprof and TFprof, nowadays could not acquire the entire details of TensorFlow data preprocessing. In this study, a deep learning profiling tool, SOFA(Swarms of Functions Analysis), is developed for solving the problem. In the purpose of evaluation SOFA, there are four data preprocessing methods implemented by five neural network models and analyzed by SOFA separately in this study. SOFA allows the developers to discover the performance bottleneck and the root cause of it. After data preprocessing pipeline is optimized, great improvement of deep learning application performance is possible in these case studies. In the case of using Alexnet, a 19.8x speedup is achieved, and 12.3x in the case of using Googlenet. When CPU is no longer the performance bottleneck, an additional speedup is achievable with XLA, such as an increment in growth from 7.8x speedup in original version to 8.4x speedup in XLA version when using VGG11.