Demystifying Tensor Parallelism
How does tensor parallelism work?
How does tensor parallelism work?
Deriving the gradient for the backward pass for the linear layer using tensor calculus
Obtaining the gradient of the layer normalization layer
Deriving the gradient for the backward pass using tensor calculus and index notation
A quick intro on backpropagation and multivariable calculus for deep learning