Summary: | 碩士 === 國立交通大學 === 電子研究所 === 106 === We propose a high-accuracy and cost-effective array processor for Deep Convolution Neural Network (DCNN) inference application. The proposed Static Floating-Point (SFP) arithmetic allows the MAC operations operated on non-zeros bits of data. This will guarantee the energy efficiency as well as the accuracy of the proposed computing engine. Moreover, applying scalable universal matrix multiplication algorithm (SUMMA), we avoid storing repeated data in the local storage, and data can be broadcasted to corresponding PEs. With the proposed simple stream interface unit (SIU), the proposed design can greatly reduce the access frequency of operands (data or weights) being read/written from/to the central register file (CRF), and minimize the power consumption. Simulation results reveal that the proposed SFP SUMMA array processor can achieve approximately 56.47% top-1 accuracy performance and consume only 167mW. Synthesized by TSMC 90 nm CMOS technology, the proposed SFP SUMMA DIP achieves 0.45 TOPs/W. On the contrary, performing the same work load of the 5 convolutional layers within Alexnet, the performance of MIT Eyeriss is only 0.3 TOPs/W (@65 nm CMOS).
|