Defense mechanism against adversarial attacks using density-based representation of images

碩士 === 國立政治大學 === 資訊科學系 === 107 === Adversarial examples are slightly modified inputs that are devised to cause erroneous inference of deep learning models. Recently, many methods have been proposed to counter the attack of adversarial examples. However, new ways of generating attacks have also surf...

Full description

Bibliographic Details
Main Authors: Huang, Chen-Wei, 黃辰瑋
Other Authors: Liao, Wen-Hung
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/u239p4
Description
Summary:碩士 === 國立政治大學 === 資訊科學系 === 107 === Adversarial examples are slightly modified inputs that are devised to cause erroneous inference of deep learning models. Recently, many methods have been proposed to counter the attack of adversarial examples. However, new ways of generating attacks have also surfaced accordingly. Protection against the intervention of adversarial examples is a fundamental issue that needs to be addressed before wide adoption of deep learning based intelligent systems. In this research, we utilize the method known as input recharacterization to effectively remove the perturbations found in the adversarial examples in order to maintain the performance of the original model. Input recharacterization typically consists of two stages: a forward transform and a backward reconstruction. Our hope is that by going through the lossy two-way transformation, the purposely added 'noise' or 'perturbation' will become ineffective. In this work, we employ digital halftoning and inverse halftoning for input recharacterization, although there exist many possible choices. We apply convolution layer visualization to better understand the network architecture and characteristics. The data set used in this study is Tiny ImageNet, consisting of 260 thousand 128x128 grayscale images belonging to 200 classes. Most of defense mechanisms rely on gradient masking, input transform and adversarial training. Among these strategies, adversarial training is widely regarded as the most effective. However, it requires adversarial examples to be generated and included in the training set, which is impractical in most applications. The proposed approach is more similar to input transform. We convert the image from intensity-based representation to density-based representation using halftone operation, which hopefully invalidates the attack by changing the image representation. We also investigate whether inverse halftoning can eliminate the adversarial perturbation. The proposed method does not require extra training of adversarial samples. Only low-cost input pre-processing is needed. On the VGG-16 architecture, the top-5 accuracy for the grayscale model is 76.5%, the top-5 accuracy for halftone model is 80.4%, and the top-5 accuracy for the hybrid model (trained with both grayscale and halftone images) is 85.14%. With adversarial attacks generated using FGSM, I-FGSM, and PGD, the top-5 accuracy of the hybrid model can still maintain 80.97%, 78.77%, 81.56%, respectively. Although the accuracy has been affected, the influence of adversarial examples is significantly discounted. The average improvement over existing input transform defense mechanisms is approximately 10%.