Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Activation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, ho...

Full description

Bibliographic Details
Main Authors:	Hock Hung Chieng, Noorhaniza Wahid, Ong Pauline, Sai Raj Kishore Perla
Format:	Article
Language:	English
Published:	Universitas Ahmad Dahlan 2018-07-01
Series:	IJAIN (International Journal of Advances in Intelligent Informatics)
Subjects:	Deep learning Activation function Flatten-T Swish Fully connected neural networks
Online Access:	http://ijain.org/index.php/IJAIN/article/view/249

id	doaj-c059f16e1b594ad2a5519490acafa0bb
record_format	Article
spelling	doaj-c059f16e1b594ad2a5519490acafa0bb2020-11-25T00:06:59ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612018-07-0142768610.26555/ijain.v4i2.24989Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learningHock Hung Chieng0Noorhaniza Wahid1Ong Pauline2Sai Raj Kishore Perla3Universiti Tun Hussein Onn MalaysiaUniversiti Tun Hussein Onn MalaysiaUniversiti Tun Hussein Onn MalaysiaInstitute of Engineering and ManagementActivation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, however, the hard zero property of the ReLU has heavily hindering the negative values from propagating through the network. Consequently, the deep neural network has not been benefited from the negative representations. In this work, an activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed. To verify its performance, this study evaluates FTS with ReLU and several recent activation functions. Each activation function is trained using MNIST dataset on five different deep fully connected neural networks (DFNNs) with depth vary from five to eight layers. For a fair evaluation, all DFNNs are using the same configuration settings. Based on the experimental results, FTS with a threshold value, T=-0.20 has the best overall performance. As compared with ReLU, FTS (T=-0.20) improves MNIST classification accuracy by 0.13%, 0.70%, 0.67%, 1.07% and 1.15% on wider 5 layers, slimmer 5 layers, 6 layers, 7 layers and 8 layers DFNNs respectively. Apart from this, the study also noticed that FTS converges twice as fast as ReLU. Although there are other existing activation functions are also evaluated, this study elects ReLU as the baseline activation function.http://ijain.org/index.php/IJAIN/article/view/249Deep learningActivation functionFlatten-T SwishFully connected neural networks
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hock Hung Chieng Noorhaniza Wahid Ong Pauline Sai Raj Kishore Perla
spellingShingle	Hock Hung Chieng Noorhaniza Wahid Ong Pauline Sai Raj Kishore Perla Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning IJAIN (International Journal of Advances in Intelligent Informatics) Deep learning Activation function Flatten-T Swish Fully connected neural networks
author_facet	Hock Hung Chieng Noorhaniza Wahid Ong Pauline Sai Raj Kishore Perla
author_sort	Hock Hung Chieng
title	Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
title_short	Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
title_full	Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
title_fullStr	Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
title_full_unstemmed	Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning
title_sort	flatten-t swish: a thresholded relu-swish-like activation function for deep learning
publisher	Universitas Ahmad Dahlan
series	IJAIN (International Journal of Advances in Intelligent Informatics)
issn	2442-6571 2548-3161
publishDate	2018-07-01
description	Activation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, however, the hard zero property of the ReLU has heavily hindering the negative values from propagating through the network. Consequently, the deep neural network has not been benefited from the negative representations. In this work, an activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed. To verify its performance, this study evaluates FTS with ReLU and several recent activation functions. Each activation function is trained using MNIST dataset on five different deep fully connected neural networks (DFNNs) with depth vary from five to eight layers. For a fair evaluation, all DFNNs are using the same configuration settings. Based on the experimental results, FTS with a threshold value, T=-0.20 has the best overall performance. As compared with ReLU, FTS (T=-0.20) improves MNIST classification accuracy by 0.13%, 0.70%, 0.67%, 1.07% and 1.15% on wider 5 layers, slimmer 5 layers, 6 layers, 7 layers and 8 layers DFNNs respectively. Apart from this, the study also noticed that FTS converges twice as fast as ReLU. Although there are other existing activation functions are also evaluated, this study elects ReLU as the baseline activation function.
topic	Deep learning Activation function Flatten-T Swish Fully connected neural networks
url	http://ijain.org/index.php/IJAIN/article/view/249
work_keys_str_mv	AT hockhungchieng flattentswishathresholdedreluswishlikeactivationfunctionfordeeplearning AT noorhanizawahid flattentswishathresholdedreluswishlikeactivationfunctionfordeeplearning AT ongpauline flattentswishathresholdedreluswishlikeactivationfunctionfordeeplearning AT sairajkishoreperla flattentswishathresholdedreluswishlikeactivationfunctionfordeeplearning
_version_	1725420636612853760

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Similar Items