Gradients of Matrix Multiplication in Deep Learning
Deriving the gradients for the backward pass for matrix multiplication using tensor calculus
Deriving the gradients for the backward pass for matrix multiplication using tensor calculus
How does tensor parallelism work?
Deriving the gradient for the backward pass for the linear layer using tensor calculus
Obtaining the gradient of the layer normalization layer
Deriving the gradient for the backward pass using tensor calculus and index notation
A quick intro on backpropagation and multivariable calculus for deep learning