Harsh Menon

WMMA and MFMA instructions on AMD GPUs

MFMA Instructions

Below, we show the layout for the $A$ matrix in the v_mfma_f32_32x32x8f16 instruction.

The layout for the $B$ matrix shown below is the transpose of the $A$ matrix above.

The layout for the $C$ and $D$ matrices is shown below.

If we look at the columns of this matrix, we can begin to label them as follows

Similarly, if we look at the rows of this matrix, we can begin to label them as follows