Fig. 1. (a) f1, (b) f2, (c) f3,(d) f4 functions and images
Fig. 2. Derivatives of (a)f1, (b) f2,(c) f3,(d) f4 and their graphs
Fig. 3. Test accuracy (a) and training time (b) of different activation functions using ResNet18 network on CIFAR10
Fig. 4. Test accuracy (a) and training time (b) of different activation functions using VGG16 network on CIFAR10
Fig. 5. Test accuracy (a) and training time (b) of different activation functions using ResNet18 network on CIFAR100
Fig. 6. Test accuracy (a) and training time (b) of different activation functions using VGG16 network on CIFAR100
Fig. 7. Test accuracy (a) and training time (b) of different activation functions using ResNet18 network on Fer2013
Fig. 8. Test accuracy (a) and training time (b) of different activation functions using VGG16 network on Fer2013
Function | Function model | f1 | ${f_1}(x) = \left\{ {\begin{array}{*{20}{c}} {\;\;x\;,x \geqslant 0} \\ { - x,x < 0} \end{array}} \right.$ | f2 | ${f_2}(x) = \left\{ {\begin{array}{*{20}{c} } {\quad \;\;x\;\;\;\;\;\,\;,x \geqslant 0} \\ { - \dfrac{2}{3}{ {( - x)}^{\frac{3}{2} } },x < 0} \end{array} } \right.$ | f3 | ${f_3}(x) = \left\{ {\begin{array}{*{20}{c} } {\quad x\;\;,x \geqslant 0} \\ {\dfrac{x}{ {1 - x} },x < 0} \end{array} } \right.$ | f4 | ${f_4}(x) = \left\{ {\begin{array}{*{20}{c}} {\;\;\;{\kern 1pt} {\kern 1pt} {\kern 1pt} x\quad ,x \geqslant 0} \\ { - {{\ln }^{1 - x}},x < 0} \end{array}} \right.$ |
|
Table 1. Mathematical models of four activation functions
Derived function | Function model | f1’
| ${f_1}^\prime (x) = \left\{ {\begin{array}{*{20}{c}} {\;\,1\;,x \geqslant 0} \\ { - 1,x < 0} \end{array}} \right.$ | f2’
| ${f_2}^\prime (x) = \left\{ {\begin{array}{*{20}{c}} {\;\quad \;\;1\quad \;,x \geqslant 0} \\ { - \sqrt {( - x)} ,x < 0} \end{array}} \right.$ | f3’
| ${f_3}^\prime (x) = \left\{ {\begin{array}{*{20}{c} } 1 \\ {\dfrac{1}{ { { {(1 - x)}^2} } } } \end{array} } \right.\begin{array}{*{20}{c} } {,x \geqslant 0} \\ {,x < 0} \end{array}$ | f4’
| ${f_4}^\prime (x) = \left\{ {\begin{array}{*{20}{c} } {\;\;\;1\;\;{\kern 1pt} \;,x \geqslant 0} \\ {\dfrac{1}{ {1 - x} },x < 0} \end{array} } \right.$ |
|
Table 2. Four kinds of activation function derivative function model
Results
Methods
| Datasets | CIFAR10 | | CIFAR100 | ACC | T/h
| | ACC | T/h
| f1 | 93.11% | 1.332 | | 74.82% | 1.332 | f2 | 93.03% | 1.335 | 74.27% | 1.335 | f3 | 93.66% | 1.290 | 75.23% | 1.290 | f4 | 93.78% | 1.262 | 75.87% | 1.262 | ReLU | 92.90% | 1.325 | 73.68% | 1.325 |
|
Table 3. Performance of different activation functions on the ResNet18 network
Results
Methods
| Datasets | CIFAR10 | | CIFAR100 | ACC | T/h
| | ACC | T/h
| f1 | 91.31% | 1.225 | | 58.91% | 1.225 | f2 | 91.24% | 1.248 | 58.35% | 1.248 | f3 | 91.86% | 1.243 | 59.23% | 1.243 | f4 | 91.98% | 1.175 | 59.95% | 1.175 | ReLU | 91.15% | 1.238 | 56.24% | 1.238 |
|
Table 4. Performance of different activation functions on the VGG16 network