Activation Functions#

Notes:

  1. Introducing non linearity to the network. Why?

  2. According to me we need one parameter to compare all the nodes results after learning and passing the value to upcoming nodes.

  3. To make sense of the data and a mapping for approximation.

  4. Understand what is the impact of weights and biases changing value to the network/nodes. If there is only linear fx then it can only fit linear data but if we have not linear data like a sine wave then it will fail to do so.

  5. If there is no activate function then the whole network will be similar to a one linear node.

\(w^T(w^T (w^T x + b) + b) + b ... = output\)

384daae7b983439d803852d2c52bf414

[3]:
import numpy as np
import matplotlib.pyplot as plt

Sigmoid#

\(f(x) = \frac{1}{(1 + e^{-x})}\)

  • granular

  • between 0 and 1

  • Comparatively complex calcultaion

[17]:
class ActivationSigmoid:
    """Sigmoid Activation Fx
    """
    def forward(self, inputs):
        """Apply Sigmoid to input
        """
        self.output = 1 / (1 + np.exp(-inputs))
[18]:
data = np.linspace(-10, 10, 100)
act = ActivationSigmoid()
act.forward(data)


plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_6_0.png

Stepwise#

\(f(x) = 0\) | if \(x \leq 0\)

\(f(x) = 1\) | if \(x \gt 0\)

  • non granular

  • only 0 and 1

[19]:
class ActivationStepwise:
    """Stepwise Activation Fx
    """
    def forward(self, inputs):
        """Apply Stepwise to inputs

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        self.inputs = inputs # save inputs
        self.output = (inputs > 0).astype('int') # calculate from inputs
[20]:
data = np.linspace(-10, 10, 100)
act = ActivationStepwise()
act.forward(data)


plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_9_0.png

Relu#

\(f(x) = 0\) | if \(x \leq 0\)

\(f(x) = x\) | if \(x \gt 0\)

  • granular

  • between 0 to x

  • easy calculation

  • almost linear but rectified so less than zeros are not allowed.so introducing slight non linearity makes it eligible for an activation function but also inherently easy and fast calculation than sigmoid.

[21]:
class ActivationReLU:
    """ReLU Activation Fx
    """

    def forward(self, inputs):
        """Apply ReLU to input

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        self.inputs = inputs # save inputs
        self.output = np.maximum(0, inputs) # calculate from inputs

    def backward(self, dvalues):
        """Apply backward propogation

        Args:
            dvalues (numpy.ndarray) : inputs from previous later in backward prop
        """
        self.dinputs = dvalues.copy()
        self.dinputs[self.inputs <= 0] = 0
[22]:
data = np.linspace(-10, 10, 100)
act = ActivationReLU()
act.forward(data)


plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_12_0.png

Leaky Relu#

\(f(x) = 0.01x\) | if \(x \leq 0\)

\(f(x) = x\) | if \(x \gt 0\)

[34]:
class ActivationLeakyReLU:
    """ReLU Activation Fx
    """

    def forward(self, inputs):
        """Apply Leaky ReLU to input

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        self.inputs = inputs # save inputs
        self.output = np.where(inputs > 0,inputs,0.01 * inputs)
[40]:
data = np.linspace(-100, 20, 100)
act = ActivationLeakyReLU()
act.forward(data)


plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_15_0.png

Softplus#

smooth ReLU function

\begin{align} f(x) &= \log{(1 + \exp(x))} \end{align}
[1]:
class Softplus:
    def forward(self, inputs):
        """Apply Leaky ReLU to input

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        self.inputs = inputs # save inputs
        self.output = np.log(1 + np.exp(self.inputs))
[4]:
data = np.linspace(-10, 10, 100)
act = Softplus()
act.forward(data)


plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_18_0.png

Hyperbolic Tangent(Tanh)#

\begin{align} f(x) &= Tanh(x)\\ &= \frac{2}{1 + e^{-2x}} - 1\\ &= \frac{1 - e^{-2x}}{1 + e^{-2x}}\\ &= \frac{e^x - e^{-x}}{e^x + e^{-x}} \end{align}
[48]:
class ActivationTanh:

    def forward(self, inputs):
        """Apply Leaky ReLU to input

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        self.inputs = inputs # save inputs

#         # scratch
#         ez = np.exp(inputs)
#         e_z = np.exp(-inputs)
#         self.output = (ez - e_z)/(ez + e_z)

        self.output = np.tanh(inputs)
[51]:
data = np.linspace(-10, 10, 100)
act = ActivationTanh()
act.forward(data)


plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_21_0.png

Softmax#

\begin{align} \sigma(\mathbf{z})_i &= \frac{e^{z_i}}{\sum_{j=1}^m e^{z_j}} \\ \\ & \text{ for } i = 1, \dotsc , m \text{ and } \mathbf z =(z_1,\dotsc,z_m) \in R^m\\ \\ \sigma &= \text{softmax}\\ \vec{z} &= \text{input vector}\\ e^{z_{i}} &= \text{standard exponential function for input vector}\\ K &= \text{number of classes in the multi-class classifier}\\ e^{z_{j}} &= \text{standard exponential function for output vector}\\ \end{align}

here z is actually z = x - x.max(). because exponential values increase really fast. and that can cause out of memory error. so we can’t use x directly.

when x - x.max() is done then the largest value is 0. so values will not blow out.

[25]:
np.exp(1000) # like this
<ipython-input-25-87fe7bad57ec>:1: RuntimeWarning: overflow encountered in exp
  np.exp(1000) # like this
[25]:
inf
[26]:
class ActivationSoftmax:

    def forward(self, inputs):
        """Forward propogation calculation

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        exp_values = np.exp(inputs - inputs.max(axis=1, keepdims=True))
        probabilites = exp_values / exp_values.sum(axis=1, keepdims=True)
        self.output = probabilites
[27]:
data = np.linspace(-10,100,100).reshape(1,100) #(1,100)
[28]:
t_exp = np.exp(data)
t_prob = t_exp / np.sum(t_exp, axis=1, keepdims=True)
plt.plot(data[0],t_prob[0],'k.-')
plt.grid()
../_images/notebooks_activationfunc_27_0.png
[29]:
act = ActivationSoftmax()
act.forward(data)

plt.plot((data - data.max(axis=1, keepdims=True))[0],act.output[0],'k.-')
plt.grid()
../_images/notebooks_activationfunc_28_0.png

Gaussian#

\begin{align} f(x) = exp(-x^2) \end{align}
[10]:
class ActivationGaussian:

    def forward(self, inputs):
        """Forward propogation calculation

        Args:
            inputs (numpy.ndarray) : input matrix
        """
        self.output = np.exp(-(inputs**2))
[11]:
data = np.linspace(-10, 10, 100)
act = ActivationGaussian()
act.forward(data)

plt.plot(data,act.output,'k.-')
plt.grid()
../_images/notebooks_activationfunc_31_0.png