Architecture#

66349a8f98864dec813b16fef8184d92

Symbols & Naming Conventions#

\begin{align*} n &= \text{number of nodes}\\ l &= \text{layer number}\\ w,W &= \text{weights matrix}\\ b &= \text{bias matrix}\\ z,Z &= \text{hypthesis result (result before applying activation function)}\\ g(z) &= \text{activation function}\\ a,A &= \text{activation matrix (result after applying activation function)}\\ x,X &= \text{input to network}\\ \hat{y} &= \text{output of network}\\ \end{align*}

values for forward propogation \begin{align*} \huge{n^{[l]}} &= \text{number of nodes in the layer}\\ \huge{z^{[l]}} &= \text{hypothesis result of the layer}\\ \huge{w^{[l]}} &= \text{weights results of the layer}\\ \huge{b^{[l]}} &= \text{bias results of the layer}\\ \huge{a^{[l]}} &= \text{activation results of the layer}\\ \end{align*}

derivatives for backward propogation \begin{align*} \huge{dw`^{[l]}} &= \frac{\partial L}{\partial w} \rightarrow \text{loss derivative based on weights}\\ \huge{db^{[l]}} &= \frac{\partial L}{\partial b} \rightarrow \text{ loss derivative based on biases}\\ \huge{dz^{[l]}} &= \frac{\partial L}{\partial z} \rightarrow \text{ loss derivative based on hypothesis result}\\ \huge{da^{[l]}} &= \frac{\partial L}{\partial a} \rightarrow \text{ loss derivative based on activation r`esult}\\ \end{align*}

Flow#

7bffa6db65514f3799a2ef63672b4202

Shapes#

where
l = layer number >= 1
\(A^{[0]}\) = X