'Matrix Derivation for Neural Network Formula
I am learning some insights of Neural network but I have some problem with the derivation of matrix for backpropagation. On an assumption that the formula for calculating for one node in a neural network, which as been vectorized, is $Z^{[i]} = W^{[i]}A^{[i - 1]} + B^{[i]}$.
For notation,
$Z^{[i]}$ is the score for layer $i$,
$W^{[i]}$ is the weight coefficient for layer $i$,
$A^{[i - 1]}$ is the score for layer $i - 1$
$B^{[i]}$ is the bias coefficient for layer $i$
For backpropagation, we need to calculate for $dZ$, $dW$, $dA$, $dB$, toward a cost function L = f(Z), a function of Z. In the other word, we need to calculate $dL/dZ$, $dL/dW$, $dL/dA$, $dL/dB$ respectively
For calculating $dZ$, $dB$, it has no problem. But for computing $dW$, $dA$, due to applying chain rule, it creates some confusions. For more details, the solution is stated as
$dW = \frac{dL}{dW} = \frac{dL}{dZ} \frac{dZ}{dW} = dZ A^{[i - 1]T}$
$dA = \frac{dL}{dA} = \frac{dL}{dZ} \frac{dZ}{dA} = W^{[i]T}dZ$
The confusion is that, I do not understand why in computing $dW$, $\frac{dZ}{dW}$ is $A^{[i - 1]T}$. Moreover, in matrix multiplication property, it states that the multiplication of two matrices are not commutative or AB $\neq$ BA. Hence, why, in computing $dA$, $dZ$, which is derived from $\frac{dL}{dZ}$, could be standing behind $W^{[i]T}$?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|